Coding assistants aren’t there yet. Copy-paste chaos and context loss still trip up LLM coding agents, and the tooling around them is often too heavyweight or misaligned with how developers work [1].
What’s still hard - Copy-paste fragility and shell-wrangling: Claude often trips over Windows- or bash-like commands, spending many attempts to list files before it gets something right [1]. - Context bloat and editing tools: adding too many tools drains the model’s context; yet effective editing tooling and refactoring aren’t always obvious to get right in time [1]. - The up-front question: clarifying tasks early helps prevent costly rework, but fewer teams pay for bite-sized, task-based workstreams when costs trend down [1].
A path forward - Better questioning and tooling: breaking work into a spec kit and modular tasks can reduce mistakes and speed iteration; this is where the economics of tooling start to tilt in favor correctness [1]. - UX matters as much as math: Windows GUI habits and training biases (vs Unix-centric prompts) shape how folks actually code with copilots; embracing broader tooling and workflows helps developers switch modes smoothly [1].
A counterpoint from the open-source corner - Open-source, domain-focused models like Playable1-GGUF are proving that small, tuned models can outperform broad, generic ones on specific tasks. It’s fine-tuned on 52,809 lines of Python pygame scripts and ships with oneshot game-generation and debugging capabilities [2]. - The project also provides a ready-made app, Infinity Arcade, and MIT-licensed assets, underscoring how focused tooling and sharing accelerate practical productivity [2]. - The future, as discussed by builders, leans toward dedicated, smaller models (e.g., Qwen3-Coder-30B and other specialized branches) rather than one-size-fits-all copilots [2].
Closing thought: productivity will hinge on sharper questions, leaner toolchains, and truly domain-aligned UX.
References
Two things LLM coding agents are still bad at
Thread debates LLM coding agents' copy-paste limitations and need for better questioning, contextual understanding, tooling, and UX in coding workflows.
View sourceIntroducing Playable1-GGUF, by far the world's best open-source 7B model for vibe coding retro arcade games!
Fine-tuned 7B coding model for Pygame; claims top performance vs 8B; advocates specialized, smaller LLMs and open tooling for devs
View source