Quantization’s rule of thumb—the big, quantized models usually win—gets challenged in a lively discussion thread. The question: is there a breaking point where larger models no longer beat smaller ones, especially as quant levels drop? The thread points to a lack of solid empirical data and teases out task-specific quirks. For example, GLM 4.5 vs GLM 4.5 Air shows Air can push higher bitrate, yet a 2-bit quantized GLM 4.5 can still perform reasonably for coding in certain setups. [1]
Task-specific snapshots • Coding — many say coding likes more bits, yet one line notes: 2-bit quant is probably fine for writing/conversation, but they wouldn’t use anything below Q5 for coding; other anecdotes cite GLM 4.6 offering a tag to disable thinking. [1] • Writing / conversation — 2-bit quant can be adequate for writing tasks, with some users praising looser ideation in creative writing due to hallucination tendencies. [1] • Reasoning / math — reduced instruction-following and more cautious correctness appear with heavier quantization in some cases, though creative roles can benefit from looser adherence. [1]
Outliers and anecdotes • Specialist models (like QwenCoder) sometimes suffer quantisation less than generalist models, and QwenCoder seems okay for coding because it’s tuned for that domain. [1] • A striking data point: 32b Q4 outperforms 235b Q2, hinting that size isn’t the only predictor of usefulness at a given quant level. [1] • Hardware backers note: setups like 8xP100 achieve about 8t/s—fast enough for prompts but not ideal for live chat. [1]
Takeaway: there may not be a universal rule. Quantization interacts with task and target model in nuanced ways, and more empirical data is needed to map the landscape.
References
We know the rule of thumb… large quantized models outperform smaller less quantized models, but is there a level where that breaks down?
Discussion of large vs small models at various quantization levels; experiences, trade-offs, task differences (coding, writing, reasoning) without empirical data
View source