AI Agents in 2026 : DataSAI Blog

In our sector, everyone publishes their successes. We will tell you about a failure. Our first RAG deployment in production was stopped after 6 weeks. Here is what happened, why, and how we rebuilt everything correctly the second time.

Too much confidence, not enough rigour

The project: an AI assistant for a technical support team at an industrial company. 4,000 pages of technical documentation. We had done RAG internally before. We were confident. Too confident.

RAG debugging — Identifying failure points in a complex RAG pipeline

6 weeks

before the first deployment was stopped

3 errors

identified post-mortem

V2 deployed

in 4 weeks with 87% satisfaction

Error 1: data quality first

We indexed all 4,000 pages directly without preprocessing. Result: scanned documents with poor OCR, badly parsed tables, technical diagrams without textual context. The retriever recovered incoherent chunks.

Error 2: default chunking

We used LangChain's default chunking: 1,000 characters with 200 overlap. For technical documentation with 15-step procedures, this was catastrophic. A procedure was often cut in the middle of a critical step.

The rule we learned: chunking must follow the semantic structure of the document, not an arbitrary character count. For technical docs: chunk by section or complete procedure.

Error 3: no systematic evaluation

We evaluated on 20 manual questions during development. In production, technicians asked 300 very different questions per day. Without automatic evaluation, we did not see the degradation coming.

Version 2: what we changed

Full document preprocessing, semantic chunking by procedure section, automatic evaluation with RAGAS on a 200-question golden dataset, production monitoring with Langfuse. Result: 87% satisfaction rate in 4 weeks.

RAG LangChain Chunking Evaluation Post-mortem Experience

With care,

Sylvie Wendkuni NITIEMA

Founder & Data Scientist · DataSAI

Reviews & Comments

24 comments

Average rating

★★★★★

4.8 / 5

James Carter 3 days ago

Excellent article, this matches exactly what we're seeing with our enterprise clients. The section on inference costs is especially valuable. It's a topic most articles gloss over but it's make-or-break at scale.

DataSAI TEAM 2 days ago

Thanks James! Inference cost optimization is often deprioritized during prototyping but becomes critical in production. Feel free to book a session if you'd like to go deeper on this.

Sarah Mitchell 5 days ago

Sharing this with my whole team. The distinction between an impressive demo and robust production is exactly the debate we're having internally right now. The human checkpoint advice is immediately actionable.

David Okonkwo 1 week ago

★★★★☆

Great article. I'd push back slightly on the 18-day deployment estimate, in our experience with enterprise security and GDPR requirements, 4–6 weeks is more realistic for a first production agent.

DataSAI TEAM 6 days ago

Completely fair point David. The 18 days refers to a scoped first agent in a test environment. For full enterprise production with security constraints, your estimate is accurate.

YOUR RATING

✓ Your comment has been posted!

Why our first RAG project failed:
and what we learned

Too much confidence, not enough rigour

Error 1: data quality first

Error 2: default chunking

Error 3: no systematic evaluation

Version 2: what we changed

Reviews & Comments

Let's talk about
your Project

Why our first RAG project failed:and what we learned

Too much confidence, not enough rigour

Error 1: data quality first

Error 2: default chunking

Error 3: no systematic evaluation

Version 2: what we changed

Reviews & Comments

Let's talk aboutyour Project

Why our first RAG project failed:
and what we learned

Let's talk about
your Project