Back to topics

Cost-efficient LLM interfacing: serving Markdown via Accept header to cut inference costs

1 min read
214 words
Opinions on LLMs Cost-efficient Markdown

The hot trick in LLM interfacing: serve Markdown instead of HTML to AI models via the Accept header, slashing HTML/CSS processing and cutting inference costs. Humans get HTML, bots get markdown. [1]

What it is The Accept header can request text/markdown or text/plain, letting servers skip HTML attributes and CSS when content is sent to LLMs. This keeps model work focused on the plain text that matters. [1]

Practical steps • Send Vary: Accept so caches don’t mix Markdown and HTML. This preserves cache integrity across content types. [1] • Expose it with a Link: …; rel="alternate"; type="text/markdown" so it’s discoverable by clients and tooling. [1]

Tradeoffs • Browser support is spotty; real traction requires broad native browser support, otherwise platforms like Wordpress, Wikipedia, and Ghost won’t adopt it. [1] • Humans get HTML, bots get markdown. The separation is intentional but creates UX differences that teams should plan for. [1] • Markdown→DOM on the client adds latency but keeps deployment simple (no build step involving pandoc or similar). [1]

Comparison to HTML workflows Compared to serving full HTML, this approach reduces inference compute by avoiding HTML/CSS parsing and rendering, letting LLMs focus on content semantics. [1]

Closing thought Watch browser support and caching strategies as this idea matures and gains broader adoption. [1]

References

[1]
HackerNews

Use the Accept Header to Serve Markdown Instead of HTML to LLMs

Discusses serving markdown via Accept header to LLMs to reduce inference cost; debates browser support, caching, SEO, and content negotiation.

View source

Want to track your own topics?

Create custom trackers and get AI-powered insights from social discussions

Get Started