A French retail group with 80 stores, a costly data infrastructure, fragile pipelines and teams that had lost trust in their dashboards. In 4 months, we cut costs by 40% and restored confidence in the data.

Context: a critical data technical debt

When this client approached us, the situation was typical of fast-growing companies: a legacy on-premise data warehouse, a dozen poorly documented ETL pipelines, duplicated data across multiple silos, and a cloud bill that had tripled in 18 months without any visible value gain.

The CTO knew something was wrong but didn't know where to start. Our first move was a thorough 2-week audit.

40%
data cost reduction in 4 months
12×
improvement in data freshness
3
data silos consolidated into one source

The approach: audit, prioritize, migrate

Weeks 1–2: Audit and mapping

Complete inventory of the existing landscape: data sources, volumes, frequencies, cost per pipeline, data quality, actual vs declared usage patterns.

Weeks 3–4: Target architecture and roadmap

Definition of the modern architecture (lakehouse on Databricks), migration prioritized by business impact and technical risk.

Month 2: Migration of critical pipelines

Rewriting the 5 most expensive pipelines in dbt + Spark, with integrated monitoring. Immediate 25% reduction in compute costs.

Months 3–4: Data governance & tooling

Implementation of the data catalog (DataHub), quality rules (Great Expectations), and end-to-end monitoring (Monte Carlo).

The levers of cost reduction

1. Eliminating zombie pipelines

The audit revealed that 31% of ETL pipelines were producing data nobody consumed. Jobs running every hour, 24/7, feeding dashboards that teams had stopped checking months ago. Immediate shutdown: -12% in costs.

2. Query and partitioning optimization

BigQuery queries scanning entire tables at each run, with no partitioning or clustering. Schema redesign and date-based partitioning divided query costs by 6 on the largest tables.

3. Rationalizing environments

Three dev/staging environments replicating all production data. We implemented intelligent sampling that reduces volume by 95% while maintaining statistical representativeness.

The most valuable lesson: in every data project, we find 25–35% of resources wasted on unused pipelines or environments. The audit always pays for itself within weeks.

What the teams think today

Four months after the project ended, the CTO shared an unexpected metric: the number of "where does this data come from?" questions in meetings dropped by 80%. Teams trust their dashboards again. That's the real measure of success.

Data EngineeringRetailOptimizationdbtDatabricksGovernanceROI

With care,

Sylvie Wendkuni NITIEMA
Founder & Data Scientist · DataSAI