The Local-First LLMs movement is heating up, with Russet running completely on-device using Apple Intelligence's foundation model. Everything is private and offline—no internet needed, no account, no tracking, and conversations stay on your device [1].
On-device privacy wins - Russet shows privacy-first on-device chat; all processing happens locally [1]. - Ollama enables self-hosted AI agents on your own hardware, with egress controls and audit logging to avoid third-party data sharing [2]. - Qwen 1.5B runs fully on the Jetson Orin Nano with no cloud, delivering ~30 tokens/sec and under 10W power [4]. - Emu3.5 (Open-Source World Learner) matches Gemini 2.5 Flash performance while operating entirely on local hardware [3]. - Qwen2.5-VL offers real-time, local surgical video understanding on-device [3]. - Distil-expenses shows two Llama 3.2 models you can run locally via Ollama for personal finance tasks [5]. - Granite-nano and related tools highlighted in IBM’s local-copilot workflow underscore privacy-by-default on edge setups [6].
Local-edge setups aren’t just sci-fi—people are testing these on consumer hardware and dedicated devices alike. The chatter spans weekly roundups like Last Week in Multimodal AI Local Edition, with multi-modal models and edge-friendly designs leading the way [3].
Closing thought: the draw of pure-local AI is strong—privacy, latency, and control come first, even as cloud-backed options still push scale. Expect more hands-on, on-device experiments to ripple into mainstream toolchains [3][4].
References
Show HN Russet uses Apple's on-device foundation model for private, offline AI chat; runs locally with privacy and no tracking.
View sourceSelf-hosted platform for running third-party AI agents with Ollama support (Apache-2.0)
Open-source platform to run third-party AI agents locally with Ollama, focusing on privacy, controlled egress, and auditability.
View sourceLast week in Multimodal AI - Local Edition
Local edition highlights edge-friendly multimodal models (Emu3.5, Qwen2.5-VL, ChronoEdit, Wan2GP, LongCat-Flash-Omni, Ming-flash-omni) with local/offline use.
View sourceRunning Qwen 1.5B Fully On-Device on Jetson Orin Nano - No Cloud, Under 10W Power
User shares local on-device Qwen 1.5B on Jetson Orin Nano, no cloud, low power; discusses speed, tasks, and comparisons.
View sourceWe trained SLM-powered assistants for personal expenses summaries that you can run locally via Ollama.
Compares locally run Llama 3.2 SLMs to GPT-OSS, showing distillation gains and tool calling limits in a personal expenses demo.
View sourceIBM Developer - Setting up local co-pilot using Ollama with VS Code (or VSCodium for no telemetry air-gapped) with Continue extension.
Discusses offline/local coding assistants (Granite-Qwen) with private deployment, comparing performance, hardware requirements, privacy, maintenance, and overall cost against cloud providers.
View source