Back to topics

Optical Context and OCR: A New Path to Efficient LLM Contexts

1 min read
213 words
Opinions on LLMs Optical Context

Optical context is reshaping LLM inputs. The chatter around DeepSeek OCR and experimental optical encoders points to shorter context windows without losing meaning [1].

DeepSeek OCR in Action It introduces a Contexts Optical Compression module that compresses visual tokens between the vision encoder and the MoE language decoder [1]. In tests, it delivers 97% OCR precision with under 10x compression and ~60% at 20x [1]. This sparks a broader question: could pixels be a denser input than text tokens for LLMs?

Optical Encoders for VLMs A recent patch experiment attaches an optical encoder to Qwen3-VLM-2B-Instruct, with a custom adapter to fit differing input dims. Early results on a synthetic Longbench V2 show slight gains versus the original encoder, hinting at scalability beyond small models [2].

Un-LOCC and the context economy Un-LOCC encodes long text as compact images and lets a vision-language model decompress them. In tests, Gemini 2.5-Flash-Lite reaches 100% at 1.3:1 and ~93.65% at 2.8:1, while Qwen2.5-VL-72B-Instruct hits 99.26% at 1.7:1 and ~75.56% at 2.3:1 [4][5]. Similar work with Qwen3-VL-235B-a22b-Instruct shows 95.24% at 2.2:1 and ~82.22% at 2.8:1 [4][5]. The upshot: cheaper context, no tokenizer tinkering, and easy composition with retrieval and multimodal workflows [4].

Closing thought: if image-encoded context proves robust, it could become a mainstream path to leaner, faster LLMs.

References

[1]
Reddit

Deepseek OCR : High Compression Focus, But Is the Core Idea New? + A Thought on LLM Context Compression[D]

Paper proposes contexts optical compression for visual tokens in LLMs; author argues it's not new, and compares to text-token compression.

View source
[2]
Reddit

Experimental Optical Encoder for Qwen3-VLM-2B-Instruct

Proposes optical encoder transfer from DeepSeek OCR to Qwen-VLM, patches adapter, reports slight gains on Longbench V2; seeks validation too.

View source
[4]
Reddit

[R] Un-LOCC (Universal Lossy Optical Context Compression), Achieve Up To 3× context compression with 93.65% Accuracy.

Proposes encoding long text context as images, decoded by VLMs, showing cross-model compression and accuracy metrics across multiple models experiments

View source
[5]
Reddit

Un-LOCC (Universal Lossy Optical Context Compression), Achieve Up To 3× context compression with 93.65% Accuracy.

Proposes Un-LOCC: encode long text as images; evaluates multiple LLMs and multimodal models; discusses OCR-like decode and trade-offs; token compression

View source

Want to track your own topics?

Create custom trackers and get AI-powered insights from social discussions

Get Started