Cost-efficient LLM interfacing: serving Markdown via Accept header to cut inference costs

The hot trick in LLM interfacing: serve Markdown instead of HTML to AI models via the Accept header, slashing HTML/CSS processing and cutting inference costs. Humans get HTML, bots get markdown. ^[1]

What it is The Accept header can request text/markdown or text/plain, letting servers skip HTML attributes and CSS when content is sent to LLMs. This keeps model work focused on the plain text that matters. ^[1]

Practical steps • Send Vary: Accept so caches don’t mix Markdown and HTML. This preserves cache integrity across content types. ^[1] • Expose it with a Link: …; rel="alternate"; type="text/markdown" so it’s discoverable by clients and tooling. ^[1]

Tradeoffs • Browser support is spotty; real traction requires broad native browser support, otherwise platforms like Wordpress, Wikipedia, and Ghost won’t adopt it. ^[1] • Humans get HTML, bots get markdown. The separation is intentional but creates UX differences that teams should plan for. ^[1] • Markdown→DOM on the client adds latency but keeps deployment simple (no build step involving pandoc or similar). ^[1]

Comparison to HTML workflows Compared to serving full HTML, this approach reduces inference compute by avoiding HTML/CSS parsing and rendering, letting LLMs focus on content semantics. ^[1]

Closing thought Watch browser support and caching strategies as this idea matures and gains broader adoption. ^[1]

References

[1]

HackerNews

Use the Accept Header to Serve Markdown Instead of HTML to LLMs

Discusses serving markdown via Accept header to LLMs to reduce inference cost; debates browser support, caching, SEO, and content negotiation.

View source

References

Use the Accept Header to Serve Markdown Instead of HTML to LLMs

Want to track your own topics?