SQL and AI: Benchmarking LLM-Generated Queries and Real-World Optimizers

AI-generated SQL isn't the slam dunk it once seemed. Sonnet 4.5 ranks #25 in a Claude-model SQL benchmark, a sharp reminder that AI-suggested queries can miss the mark on both accuracy and speed ^[1].

AI-Generated Queries under the Microscope The test runs on a 200 million-record dataset from the GitHub Archive, and it shows Sonnet 4.5 can produce SQL, but its results sit behind other Claude models ^[1]. The takeaway: don’t trust generated SQL in isolation—tie it to the actual execution plan, because the plan is what makes the query real-world fast or slow ^[1].

Real-World Optimizers Push Back A core theme is that ORs can be expensive. The discussion notes that seeing generated SQL can be almost useless without understanding the execution plan ^[2]. It also flags practical patterns—like the extension pattern and indexing in Elasticsearch—that shape how data is read and searched. When ORM-generated SQL falls short, teams lean on options like Entity Framework with stored procedures for broad searches ^[2]. The historical quip that Query Planners were once seen as “AI” still echoes in these debates ^[2].

Bottom line: AI-assisted data work shines when it sparks human plan tuning, not when it pretends to replace it. The future is hybrid: generate ideas, verify with plans, then optimize the actual data path.

References

[1]

HackerNews

Sonnet 4.5 ranks #25 (below other Claude models) in generating SQL

Benchmark of LLMs generating SQL from natural language prompts on 200M GitHub data; Sonnet 4.5 ranks 25 among Claude models.

View source

[2]

HackerNews

A SQL Heuristic: ORs Are Expensive

Discusses OR performance, indexing, bitmap-OR, query plans, and optimizer differences across PostgreSQL, MSSQL, MySQL; evaluates strategies and learning for performance

View source

References

Sonnet 4.5 ranks #25 (below other Claude models) in generating SQL

A SQL Heuristic: ORs Are Expensive

Want to track your own topics?