RAG pipelines in practice show doc-heavy workflows that actually ship. Two threads spotlight real-world playbooks—from Extend's messy-doc mastery to an end-to-end invoice RAG that stitches several models together.
Extend's doc-processor toolkit ingests PDFs, images, and Excel files, handling handwriting and large tables. It adds an agentic OCR correction layer using a VLM to fix low-confidence OCR, plus a semantic chunking engine to carve documents into model-friendly pieces. A prompt-optimization agent automates the endless prompt tuning in the background, and all of it ships today in production across customer use cases [1].
End-to-end invoice RAG leans on a multi-LLM flow that sees content from diverse sources stitched into one responsive pipeline: - Source PDFs via MinerU2.5-2509-1.2B, Docling Accurate, and PyMuPDF - Assemble into a single Markdown file with RFC 5322 metadata - Overlay with Qwen2.5-VL-7B-Instruct to improve character accuracy on images of the PDFs - Feed to GPT-OSS-20B to call MCP tools that query SQL reports and enrich the JSON; PDFs stay in a reference folder [2]
Benchmarks and domain notes matter: - Build a benchmark dataset first; expect lots of tweaking and careful retrieval tuning with rerankers [2] - Use RagView for retrieval benchmarking [2] - Domain-specific performance varies; ReasonScape shows mixed results across tasks like arithmetic, dates, and cars, underscoring task alignment with the domain [3]
Bottom line: real-world doc RAG succeeds where CV, LLM orchestration, and disciplined benchmarking all line up.
References
Launch HN: Extend (YC W23) – Turn your messiest documents into data
Extend.ai launches doc-processor toolkit; emphasizes CV, LLM context engineering, and tooling; reports OCR, handwriting, tables challenges and promises real-world pipelines.
View sourceDocument Processing for RAG question and answering, and automatic processing of incoming with Business Metadata
describes RAG for invoices using several LLMs (Qwen, GPT-OSS), emphasizing benchmarks, retrieval quality, and tool chaining, with RagView.
View sourceReasonScape Evaluation: AI21 Jamba Reasoning vs Qwen3 4B vs Qwen3 4B 2507
ReasonScape evaluates Jamba 3B versus Qwen3-4B OG and 2507, highlighting truncation and selective domain strengths with personal critique of performance.
View source