A French retail group with 80 stores, a costly data infrastructure, fragile pipelines and teams that had lost trust in their dashboards. In 4 months, we cut costs by 40% and restored confidence in the data.
Context: a critical data technical debt
When this client approached us, the situation was typical of fast-growing companies: a legacy on-premise data warehouse, a dozen poorly documented ETL pipelines, duplicated data across multiple silos, and a cloud bill that had tripled in 18 months without any visible value gain.
The CTO knew something was wrong but didn't know where to start. Our first move was a thorough 2-week audit.
The approach: audit, prioritize, migrate
Weeks 1–2: Audit and mapping
Complete inventory of the existing landscape: data sources, volumes, frequencies, cost per pipeline, data quality, actual vs declared usage patterns.
Weeks 3–4: Target architecture and roadmap
Definition of the modern architecture (lakehouse on Databricks), migration prioritized by business impact and technical risk.
Month 2: Migration of critical pipelines
Rewriting the 5 most expensive pipelines in dbt + Spark, with integrated monitoring. Immediate 25% reduction in compute costs.
Months 3–4: Data governance & tooling
Implementation of the data catalog (DataHub), quality rules (Great Expectations), and end-to-end monitoring (Monte Carlo).
The levers of cost reduction
1. Eliminating zombie pipelines
The audit revealed that 31% of ETL pipelines were producing data nobody consumed. Jobs running every hour, 24/7, feeding dashboards that teams had stopped checking months ago. Immediate shutdown: -12% in costs.
2. Query and partitioning optimization
BigQuery queries scanning entire tables at each run, with no partitioning or clustering. Schema redesign and date-based partitioning divided query costs by 6 on the largest tables.
3. Rationalizing environments
Three dev/staging environments replicating all production data. We implemented intelligent sampling that reduces volume by 95% while maintaining statistical representativeness.
The most valuable lesson: in every data project, we find 25–35% of resources wasted on unused pipelines or environments. The audit always pays for itself within weeks.
What the teams think today
Four months after the project ended, the CTO shared an unexpected metric: the number of "where does this data come from?" questions in meetings dropped by 80%. Teams trust their dashboards again. That's the real measure of success.
With care,
As CDO of a retail group, this article resonates deeply. The 31% zombie pipeline stat is consistent with what we found in our own audit. Glad to know we're not alone!
Thanks Laura! It's a pattern we see consistently across retail clients. If you'd like to discuss your current challenges, feel free to book a session.
Great case study with a realistic timeline. The -12% instant saving just from stopping unused pipelines surprised me the most. Simple but powerful.
Very instructive. Practical question: how do you handle team resistance when migrating to a new architecture? That's often the biggest obstacle in these projects.
Great question Priya. The key is involving teams from the audit phase, they become stakeholders in the change rather than victims. We'll cover change management in an upcoming article!