Optical context is reshaping LLM inputs. The chatter around DeepSeek OCR and experimental optical encoders points to shorter context windows without losing meaning [1].
DeepSeek OCR in Action It introduces a Contexts Optical Compression module that compresses visual tokens between the vision encoder and the MoE language decoder [1]. In tests, it delivers 97% OCR precision with under 10x compression and ~60% at 20x [1]. This sparks a broader question: could pixels be a denser input than text tokens for LLMs?
Optical Encoders for VLMs A recent patch experiment attaches an optical encoder to Qwen3-VLM-2B-Instruct, with a custom adapter to fit differing input dims. Early results on a synthetic Longbench V2 show slight gains versus the original encoder, hinting at scalability beyond small models [2].
Un-LOCC and the context economy Un-LOCC encodes long text as compact images and lets a vision-language model decompress them. In tests, Gemini 2.5-Flash-Lite reaches 100% at 1.3:1 and ~93.65% at 2.8:1, while Qwen2.5-VL-72B-Instruct hits 99.26% at 1.7:1 and ~75.56% at 2.3:1 [4][5]. Similar work with Qwen3-VL-235B-a22b-Instruct shows 95.24% at 2.2:1 and ~82.22% at 2.8:1 [4][5]. The upshot: cheaper context, no tokenizer tinkering, and easy composition with retrieval and multimodal workflows [4].
Closing thought: if image-encoded context proves robust, it could become a mainstream path to leaner, faster LLMs.
References
Deepseek OCR : High Compression Focus, But Is the Core Idea New? + A Thought on LLM Context Compression[D]
Paper proposes contexts optical compression for visual tokens in LLMs; author argues it's not new, and compares to text-token compression.
View sourceExperimental Optical Encoder for Qwen3-VLM-2B-Instruct
Proposes optical encoder transfer from DeepSeek OCR to Qwen-VLM, patches adapter, reports slight gains on Longbench V2; seeks validation too.
View source[R] Un-LOCC (Universal Lossy Optical Context Compression), Achieve Up To 3× context compression with 93.65% Accuracy.
Proposes encoding long text context as images, decoded by VLMs, showing cross-model compression and accuracy metrics across multiple models experiments
View sourceUn-LOCC (Universal Lossy Optical Context Compression), Achieve Up To 3× context compression with 93.65% Accuracy.
Proposes Un-LOCC: encode long text as images; evaluates multiple LLMs and multimodal models; discusses OCR-like decode and trade-offs; token compression
View source