Best coding LLM right now? The thread leans into hardware, quantization, and real-world speed. Front-runners named include qwen3-coder-30b-a3b, gpt-oss-20b, and qwen3-32B [1].
Hardware reality VRAM matters more than model size. A 24GB ceiling is a frequent bottleneck, with jokes like running oss-120b on that memory not happening in practice [1]. Some battle-tested setups push with offloading or tight context, like running a file Qwen3-Coder-30B-A3B-Instruct-GGUF-IQ4_NL.gguf on 24GB, hitting ~23 t/s at 57344 ctx length; others trim ctx to fit in 23GiB [1].
Top coding contenders & notes • qwen3-coder-30b-a3b — the “flash” coder option in the discussion [1]. • gpt-oss-20b — praised for reasoning; “oss-20b is goated” in the thread [1]. • qwen3-32B — a popular all‑rounder for coding tasks [1]. • Magistral — decent, but the post’s users haven’t locked in long-term tests yet [1]. • KAT-72B — claims strong coding, though some find it not as good as advertised; may be slower on some rigs [1]. • Sonnet 4.5 — strong coding performance, often cited as a benchmark tier [1]. • KAT-Dev — GGUF variant; speed can be an issue on common hardware [1]. • Kwaipilot.KAT-Dev-GGUF — DevQuasar path, different GGUF flavors referenced [1]. • Devstral — small with a decent 6‑bit quant; can edge out qwen3-coder-30b-a3b on some tests, but speed wins the day [1].
Practical tips • Context length and VRAM trade-offs matter. A 23GiB setup at 60+ t/s with 57344 ctx length shows the value of memory-aware configs [1].
Closing thought: in coding LLMs, there’s no one clear winner—hardware, quantization, and how you tune context decide the winner for your setup [1].
References
best coding LLM right now?
Thread discusses best coding LLMs, VRAM, quantization, performance, speed, memory, and setup tips, with many model comparisons and benchmarks today.
View source