Edge AI isn’t just a cloud dream anymore. Real-world setups show local LLMs and on-device inference maturing—from a $60k, GPU-packed rig to 16GB Android phones—and even self-hosted speech stacks that sidestep data clouds.
Edge Rig for Local LLMs A local-LLM build stacks Mac Studio M3 Ultra (512GB RAM, 4TB HDD) with a 96-core Threadripper CPU and four RTX Pro 6000 Max Q GPUs, all claimed under a ~$60k budget. Power is real too: a 110V/30A install and dual PSUs keep the node humming [1].
On-Device Android AI On the mobile front, Gemma 3 12B QAT q4_0 runs on a OnePlus 12 (5548 FP16 GFLOPS, 76.8GB/s) via MNN-LLM and ChatterUI, delivering around 11 T/s for prompts and 9-10 T/s for inference in practice [2]. Battery heating is a concern; some users discuss passthrough charging to separate device load from the battery [2].
Local Speech-to-Speech & Self-Hosted Stacks For fully local, self-hosted speech-to-speech, a growing ecosystem lists: - Unmute.sh — Linux, cascading - Ultravox (Fixie) — Windows/Linux, hybrid UIs - RealtimeVoiceChat — Linux-friendly, pluggable LLM - Vocalis — macOS/Windows/Linux, tool calling via backend LLM - LFM2 — Windows/Linux, built-in LLM + tool calling - Mini-omni2 — cross-platform - Pipecat — Windows/macOS/Linux/iOS/Android, explicit tool calling [3]
Vocalis also relies on Kokoro-FastAPI for TTS, and Koboldcpp can bridge to kokoro flows [3].
The takeaway: edge setups are shifting from curiosity to privacy-preserving, low-latency reality—and the options keep multiplying.
References
New Build for local LLM
User shares 60k local LLM rig with multiple GPUs; discusses models, quantization, latency, power, and comparisons to Mac Studio.
View sourceAnyone running llm on their 16GB android phone?
Discusses running Gemma-3-12B and Qwen-3 on 16GB phones, speeds, hardware limits, swap, battery, cooling, and app performance benchmarks.
View sourceAwesome Local LLM Speech-to-Speech Models & Frameworks
Curates local speech-to-speech LLMs; compares cascading vs end-to-end, tool calling, and self-hosted setups.
View source