Back to topics

Table formats under the lens: CSV, Markdown, YAML, JSON—Which do LLMs actually understand best across models?

1 min read
273 words
Opinions on LLMs Table Markdown,

Table formats under the lens: CSV, Markdown tables, KV-Markdown, YAML, JSON/HTML/XML—LLMs’ understanding varies more than you think. A focused discussion tracks how different tabular encodings land in accuracy and token cost across models [1].

Format-by-format snapshot:

KV-Markdown – a dict-like wrap; high semantic context and a top performer in the thread [1]. • INI – similar high scoring, per the same discussion [1]. • CSV and Markdown tables – among the weakest, described as index-based formats [1]. • JSON – middle ground: high context but more syntactic noise and fewer clear record labels [1]. • HTML – surprisingly decent: uses th and td yet lands better than JSON in this look [1]. • XML – like JSON but with even more noise; places above INI, and short element names can save tokens (a noted tactic) [1].

Practical prompts and tips surfaced in the discussion:

• Short XML element names (for example, f instead of function, c instead of class) can trim token use; top/bottom legends help mapping without overloading context [1]. • The idea of testing with the OpenAI tokenizer is raised, underscoring token-count awareness [1]. • Some notes touch on tools like Tree-sitter for project structure tasks, hinting at how tooling choices intersect with data formatting thoughts [1].

Model variability and caveats:

• Results are described as highly parameter- and architecture-dependent; the same formats can shift in accuracy and cost across model families [1]. • Tests referenced GPT-4.1 nano; authors warn results would differ with other models such as Claude [1].

Closing thought: there’s no one-size-fits-all—experiment with formats per model and keep an eye on token budgets as models evolve [1].

References

[1]
HackerNews

Which table format do LLMs understand best?

Explores tabular formats (CSV, Markdown, KV, YAML, JSON/HTML/XML) for LLM understanding; discusses model variation, accuracy, tokens and comparisons across models.

View source

Want to track your own topics?

Create custom trackers and get AI-powered insights from social discussions

Get Started