RAG pipelines in practice: comparing doc processing and retrieval quality from OCR to invoices

RAG pipelines in practice show doc-heavy workflows that actually ship. Two threads spotlight real-world playbooks—from Extend's messy-doc mastery to an end-to-end invoice RAG that stitches several models together.

Extend's doc-processor toolkit ingests PDFs, images, and Excel files, handling handwriting and large tables. It adds an agentic OCR correction layer using a VLM to fix low-confidence OCR, plus a semantic chunking engine to carve documents into model-friendly pieces. A prompt-optimization agent automates the endless prompt tuning in the background, and all of it ships today in production across customer use cases ^[1].

End-to-end invoice RAG leans on a multi-LLM flow that sees content from diverse sources stitched into one responsive pipeline: - Source PDFs via MinerU2.5-2509-1.2B, Docling Accurate, and PyMuPDF - Assemble into a single Markdown file with RFC 5322 metadata - Overlay with Qwen2.5-VL-7B-Instruct to improve character accuracy on images of the PDFs - Feed to GPT-OSS-20B to call MCP tools that query SQL reports and enrich the JSON; PDFs stay in a reference folder ^[2]

Benchmarks and domain notes matter: - Build a benchmark dataset first; expect lots of tweaking and careful retrieval tuning with rerankers ^[2] - Use RagView for retrieval benchmarking ^[2] - Domain-specific performance varies; ReasonScape shows mixed results across tasks like arithmetic, dates, and cars, underscoring task alignment with the domain ^[3]

Bottom line: real-world doc RAG succeeds where CV, LLM orchestration, and disciplined benchmarking all line up.

References

[1]

HackerNews

Launch HN: Extend (YC W23) – Turn your messiest documents into data

Extend.ai launches doc-processor toolkit; emphasizes CV, LLM context engineering, and tooling; reports OCR, handwriting, tables challenges and promises real-world pipelines.

View source

[2]

Document Processing for RAG question and answering, and automatic processing of incoming with Business Metadata

describes RAG for invoices using several LLMs (Qwen, GPT-OSS), emphasizing benchmarks, retrieval quality, and tool chaining, with RagView.

View source

[3]

ReasonScape Evaluation: AI21 Jamba Reasoning vs Qwen3 4B vs Qwen3 4B 2507

ReasonScape evaluates Jamba 3B versus Qwen3-4B OG and 2507, highlighting truncation and selective domain strengths with personal critique of performance.

View source

References

Launch HN: Extend (YC W23) – Turn your messiest documents into data

Document Processing for RAG question and answering, and automatic processing of incoming with Business Metadata

ReasonScape Evaluation: AI21 Jamba Reasoning vs Qwen3 4B vs Qwen3 4B 2507

Want to track your own topics?