A 300-person company deployed an internal AI assistant for their 8-person HR team. Employee questions, interview summaries, job offer writing. 4 months in: here are the results, surprises and lessons.
Context and objectives
The HR team received an average of 120 questions per week from employees: leave, health insurance, training, pay, onboarding. 80% of these questions had answers in existing documentation, but the documents were scattered and poorly accessible.
The technical architecture
We chose a RAG architecture: HR documents indexed in a vector database. At each question, relevant passages retrieved and injected into the LLM context. Stack: LangChain, Chroma, GPT-4o, Microsoft Teams, deployed on Azure.
Critical point: we added a systematic note at the bottom of each response: 'This response is based on current HR documentation. In case of doubt, contact your HR manager directly.' Essential for compliance and trust.
Surprises (good and bad)
Good surprise: immediate adoption
Contrary to fears, employees adopted the tool very quickly, especially shift workers who had no real-time HR access.
Bad surprise: out-of-scope questions
The assistant received very personal questions we had not anticipated: harassment situations, health problems, manager conflicts. We quickly added detection logic and redirection to a human for these sensitive cases.
With care,
Excellent article, this matches exactly what we're seeing with our enterprise clients. The section on inference costs is especially valuable. It's a topic most articles gloss over but it's make-or-break at scale.
Thanks James! Inference cost optimization is often deprioritized during prototyping but becomes critical in production. Feel free to book a session if you'd like to go deeper on this.
Sharing this with my whole team. The distinction between an impressive demo and robust production is exactly the debate we're having internally right now. The human checkpoint advice is immediately actionable.
Great article. I'd push back slightly on the 18-day deployment estimate, in our experience with enterprise security and GDPR requirements, 4–6 weeks is more realistic for a first production agent.
Completely fair point David. The 18 days refers to a scoped first agent in a test environment. For full enterprise production with security constraints, your estimate is accurate.