In our sector, everyone publishes their successes. We will tell you about a failure. Our first RAG deployment in production was stopped after 6 weeks. Here is what happened, why, and how we rebuilt everything correctly the second time.

Too much confidence, not enough rigour

The project: an AI assistant for a technical support team at an industrial company. 4,000 pages of technical documentation. We had done RAG internally before. We were confident. Too confident.

RAG debugging
Identifying failure points in a complex RAG pipeline
6 weeks
before the first deployment was stopped
3 errors
identified post-mortem
V2 deployed
in 4 weeks with 87% satisfaction

Error 1: data quality first

We indexed all 4,000 pages directly without preprocessing. Result: scanned documents with poor OCR, badly parsed tables, technical diagrams without textual context. The retriever recovered incoherent chunks.

Error 2: default chunking

We used LangChain's default chunking: 1,000 characters with 200 overlap. For technical documentation with 15-step procedures, this was catastrophic. A procedure was often cut in the middle of a critical step.

The rule we learned: chunking must follow the semantic structure of the document, not an arbitrary character count. For technical docs: chunk by section or complete procedure.

Error 3: no systematic evaluation

We evaluated on 20 manual questions during development. In production, technicians asked 300 very different questions per day. Without automatic evaluation, we did not see the degradation coming.

Version 2: what we changed

Full document preprocessing, semantic chunking by procedure section, automatic evaluation with RAGAS on a 200-question golden dataset, production monitoring with Langfuse. Result: 87% satisfaction rate in 4 weeks.

RAG LangChain Chunking Evaluation Post-mortem Experience

With care,

Sylvie Wendkuni NITIEMA
Founder & Data Scientist · DataSAI