Legal AI in Oct 2025 is a showdown among 20B–120B LLMs for drafting, RAG, and knowledge tasks, with censorship quirks and latency shaping what actually ships [1]. Here’s the lay of the land from hands-on testing.
Model contenders • Qwen3 (30B MOE, 32B) — fast and popular, but struggles with subtler drafting nuances and commentary [1]. • Gemma3-27b — stronger latent legal knowledge, yet wobblers on instruction-following in drafting [1]. • Llama3.3-70b (4-bit) and distills like Cogito — still holds up well on legal knowledge and clause drafting, despite being a bit dated [1]. • Magistral 24b — slightly lousier than Gemma3 for English nuance in practice [1]. • GLM 4.5-Air — 115b, quantized to 4-bit/8-bit; overall performance trails Llama3-70b in knowledge and drafting tasks [1]. • GPT-OSS-20B and GPT-OSS-120B — strongest knowledge and instruction-following when you can bypass censorship; prompting to frame it as assisting a qualified attorney helps [1].
Censorship vs. capability Censorship gates are real: the OSS models shine once you nudge them into assisting a qualified attorney, underscoring the safety vs. performance trade-off [1].
Latency and quantization 4-bit quantization is common for smaller deployments; the 8-bit approach can be on par with the 4-bit setup in some tests, affecting speed and memory use [1].
Bottom line: human in the loop No single model dominates every legal task. The pattern is mix-and-match, with careful prompts and human review for high-stakes drafting, as seen when editors rely on qualified attorneys to verify AI outputs [1].
References
What are the best models for legal work in Oct 2025?
User compares 20B–120B LLMs for legal tasks—drafting, RAG, knowledge; notes censorship, latency; ranks models; cautions on accuracy and human oversight.
View source