Local LLMs are going from Minecraft-sized chaos to serious GPU farms in 2025, all on a shoestring. The standout is a ~5M-parameter LLM running inside Minecraft thanks to Sammyuri's redstone setup [1]. In vanilla Minecraft, a single response would take years, but with MCHPRS it can hit about 2 hours at a 40,000x speedup, and the CHUNGUS 2 8-bit redstone processor runs at 1 Hz [1].
On the hardware front, local options stack up: Mac Mini M4 Pro 128GB, Beelink GTR9 128GB (ryzen AI Max 395), or a dedicated GPU rig with at least two RTX 3090 cards [2]. The Mac mini and Beelink offer 128GB of unified memory and easier setup, but their memory bandwidth isn't as high as a real GPU. If you need small to mid models (roughly 70B-100B moe), the GPU path wins on speed; for truly larger models, the Mac mini approach can still run it—slowly [2].
Quantization is a wild card. AWQ quant on an A100 can be painful; vLLM handles the same quant more smoothly, sparking debates about docs and versioning [3]. Projects like SGLang, LMQL, and OME show alternative routes, underscoring how fragile setup can be if you chase the latest docs [3].
The spectrum—from Minecraft experiments to home-lab GPUs and quantization quirks—defines 2025's local-LMM vibe. Watch for quant tooling and hardware price shifts next year.
References
Sammyuri built a redstone system to run a small language model (~5M params) in Minecraft!
Discussion of a 5M-param LLM running via redstone in Minecraft; speedups, tooling, and community reactions.
View sourcetorn between GPU, Mini PC for local LLM
Compares Mac Mini M4 Pro and Beelink 395 vs 2x RTX 3090; discusses MOE vs dense models, RAM, power.
View sourceLiterally me this weekend, after 2+ hours of trying I did not manage to make AWQ quant work on a100, meanwhile the same quant works in vLLM without any problems...
Discusses AWQ vs vLLM and SGLang; docs, config, and GPU compatibility concerns.
View source