The shift is real: teams are ditching cloud-only playbooks and building on local stacks. Local Runners let you run models from Hugging Face, LM Studio, Ollama, and vLLM on your own machine and still access them via a secure API endpoint; weights and data stay local [1].
On laptops, open LLMs are gaining traction. Here's what people are testing: • MacBook Pro with 128GB RAM running gpt-oss-120b shows the promise of portable, capable local workflows [2]. • Privacy-first tweaks like zero-data retention policies and local routing tools are part of the conversation as people sanity-check what belongs on-device vs. in the cloud [2].
Orchestration across machines is getting practical. rbee turns multiple GPUs or machines into a single, OpenAI‑compatible API via a queen/ hive/ worker model, enabling local “cloud-like” queues [3]. And for multi-machine setups, people are experimenting with Proxmox/LXC deployments that chain a front-end like Open WebUI, a router such as LiteLLM, and a per-model vLLM back-end [5].
Hardware reality is shaping the plan. Posts discuss heavy rigs (2–4 AMD Instinct MI50 cards) and smart RAM budgeting for multi-model stacks, including setups with 8B and 32B class models, plus questions about offloading and throughput [4].
Bottom line: teams are proving local stacks can be practical, private, and faster for certain workflows—with plenty to learn as cross-model pipelines mature [5].
References
Run Hugging Face, LM Studio, Ollama, and vLLM models locally and call them through an API
Discusses running Hugging Face, LM Studio, Ollama, vLLM locally with public API endpoints; privacy concerns and alternatives raised.
View sourceAsk HN: Who uses open LLMs and coding assistants locally? Share setup and laptop
Post solicits real-world setups for open-source LLMs and coding assistants on laptops; hardware, models, tasks, and performance discussed by users.
View sourceI'm currently solving a problem I have with Ollama and LM Studio.
Describes rbee system to pool local GPUs, route LLM tasks via single API endpoint across machines, SSH scheduling security open-source.
View sourceLooking for Advice: Local Inference Setup for Multiple LLMs (VLLM, Embeddings + Chat + Reranking)
User seeks advice on running multiple LLMs locally, hardware choices, software compatibility, and optimizations (VLLM, ROCm, MI50) today in practice.
View sourceAnyone else running their whole AI stack as Proxmox LXC containers? Im currently using Open WebUI as front-end, LiteLLM as a router and A vLLM container per model as back-ends
Discusses local LLM deployment with Proxmox LXC, LiteLLM, vLLM back-ends; pondering RAG, hardware tuning, and local-first workflows for accounting data.
View source