Back to topics

The Local-First LLMs Movement: How Enthusiasts Are Pushing Privacy and Low-Latency with On-Device and Self-Hosted Solutions

1 min read
223 words
Opinions on LLMs Local-First Movement:

The Local-First LLMs movement is heating up, with Russet running completely on-device using Apple Intelligence's foundation model. Everything is private and offline—no internet needed, no account, no tracking, and conversations stay on your device [1].

On-device privacy wins - Russet shows privacy-first on-device chat; all processing happens locally [1]. - Ollama enables self-hosted AI agents on your own hardware, with egress controls and audit logging to avoid third-party data sharing [2]. - Qwen 1.5B runs fully on the Jetson Orin Nano with no cloud, delivering ~30 tokens/sec and under 10W power [4]. - Emu3.5 (Open-Source World Learner) matches Gemini 2.5 Flash performance while operating entirely on local hardware [3]. - Qwen2.5-VL offers real-time, local surgical video understanding on-device [3]. - Distil-expenses shows two Llama 3.2 models you can run locally via Ollama for personal finance tasks [5]. - Granite-nano and related tools highlighted in IBM’s local-copilot workflow underscore privacy-by-default on edge setups [6].

Local-edge setups aren’t just sci-fi—people are testing these on consumer hardware and dedicated devices alike. The chatter spans weekly roundups like Last Week in Multimodal AI Local Edition, with multi-modal models and edge-friendly designs leading the way [3].

Closing thought: the draw of pure-local AI is strong—privacy, latency, and control come first, even as cloud-backed options still push scale. Expect more hands-on, on-device experiments to ripple into mainstream toolchains [3][4].

References

[1]
HackerNews

Show HN Russet uses Apple's on-device foundation model for private, offline AI chat; runs locally with privacy and no tracking.

View source
[2]
Reddit

Self-hosted platform for running third-party AI agents with Ollama support (Apache-2.0)

Open-source platform to run third-party AI agents locally with Ollama, focusing on privacy, controlled egress, and auditability.

View source
[3]
Reddit

Last week in Multimodal AI - Local Edition

Local edition highlights edge-friendly multimodal models (Emu3.5, Qwen2.5-VL, ChronoEdit, Wan2GP, LongCat-Flash-Omni, Ming-flash-omni) with local/offline use.

View source
[4]
Reddit

Running Qwen 1.5B Fully On-Device on Jetson Orin Nano - No Cloud, Under 10W Power

User shares local on-device Qwen 1.5B on Jetson Orin Nano, no cloud, low power; discusses speed, tasks, and comparisons.

View source
[5]
Reddit

We trained SLM-powered assistants for personal expenses summaries that you can run locally via Ollama.

Compares locally run Llama 3.2 SLMs to GPT-OSS, showing distillation gains and tool calling limits in a personal expenses demo.

View source
[6]
Reddit

IBM Developer - Setting up local co-pilot using Ollama with VS Code (or VSCodium for no telemetry air-gapped) with Continue extension.

Discusses offline/local coding assistants (Granite-Qwen) with private deployment, comparing performance, hardware requirements, privacy, maintenance, and overall cost against cloud providers.

View source

Want to track your own topics?

Create custom trackers and get AI-powered insights from social discussions

Get Started