Local LLMs are moving from cloud to pocket, and token bills are fading. Open models like SmolLM3 (~3B) and Qwen2-1.5B are turning laptops and phones into AI workstations, while Apple rolls out on-device LLMs in iOS 18. [1]
Hardware is catching up: Apple's M-series Neural Engine hits ~133 TOPS, and consumer GPUs chew through 4-8B models. Tooling’s catching up fast: Ollama for local runtimes, Cactus / RunLocal for mobile, and ExecuTorch / LiteRT for on-device inference. [1] Still some pain: iOS memory limits, packaging overhead, distillation quirks. Quantization helps, but 4-bit isn’t magic. The upside’s clear: privacy by default, offline by design, zero latency, no token bills. [1]
Local-first makes sense for vision and copilots: Gemma 2 2B Vision and Qwen2-VL can caption and reason about images locally. [1] StenoAI—the open-source Mac app—transcribes with Whisper and summarizes with Llama 3.2, all on-device. [4]
Beyond chat, folks repurpose local models for real work: - ImageIndexer catalog and tag a family photo collection [2] - searxng + Perplexica to replace Googling [2] - KaraKeep for bookmarking [2] - LibreTranslate for translations [2] - Local coding in VSCodium with local models [2] - Upfixing older photos via ComfyUI and Qwen Image Edit [2]
Two blockers slow mass adoption: average consumer HW isn’t universally ready, and the Netflix-style simplicity many want isn’t there yet. [3] On the Apple front, Apple Foundation Models exist. The on-device options are small (3B, heavily quantized as q2), multimodal, and not meant for online serving. [5]
Expect more local-first tools and hybrid setups, with apps like StenoAI shaping how we balance privacy, latency, and capability. [4]
References
LLMs Are Moving Local – So Why Are We Still Paying for Tokens?
Open models run locally on laptops or phones; benefits privacy, latency; challenges include speed, memory, packaging; hybrid approach suggested solutions
View sourceWhat can local LLM's be used for?
Discusses practical uses for local LLMs, hardware needs, model recommendations, coding, data processing, image tasks, and tooling.
View sourceWhy aren't more people using local models?
Debates viability of local models vs APIs; hardware, privacy, latency; obstacles for mainstream adoption; examples SmolLM3, Qwen‑2‑1.5B, on-device iOS 18
View sourceStenoAI: Open Source LocalLLM AI Meeting Notes Taker with Whisper Transcription & LLama 3.2 Summaries
StenoAI is a local Mac app using Whisper for transcription and Llama 3.2 for on-device summarization, open-source, privacy-first, no cloud.
View sourceDoes Apple have their own language model?
Discusses Apple Foundation Models; on-device 3B q2, larger PCC; compares Gemini GPT Grok; focus on multimodal, local use and privacy.
View source