When Your Data Warehouse Needs a Second Act

Every data warehouse I’ve worked on has had the same origin story. Someone built it quickly to answer urgent business questions. It worked well enough that more teams started using it. More sources were added, more transformations were layered on, more reports were built against it. By the time I arrived, the warehouse was doing a job it was never designed for, and everyone knew it needed to change but nobody wanted to be the one to start.

That pattern isn’t a failure of planning. It’s what happens when a system succeeds beyond its original scope. The first warehouse was right for the problem it solved. The problem grew.

Signals that it’s time

The clearest signal is query performance degradation that can’t be solved by adding compute. If materialised views, index tuning, and warehouse scaling have all been tried and the core reporting queries still take minutes, the problem is usually the data model — not the infrastructure.

Other signals are subtler:

Multiple sources of truth for the same metric. When two dashboards show different revenue numbers because they query different intermediate tables with slightly different business logic, the warehouse has a semantic layer problem. The data model doesn’t enforce a single definition of key metrics.

Fear of change. When engineers are afraid to modify a transformation because they don’t know what downstream dependencies it has, the warehouse has a lineage problem. In a well-designed warehouse, changing a staging model shouldn’t require auditing thirty reports.

Ingest-first, model-never. When new data sources are loaded into the warehouse and immediately queried by analysts without any transformation layer, the warehouse is accumulating technical debt. Raw source tables are not analytical models. They’re ingredients, not meals.

The “just add a column” pattern. When every new business requirement is met by adding another column to an existing wide table instead of creating a properly modelled entity, the warehouse is growing horizontally in ways that make it progressively harder to reason about.

The strangler pattern for warehouses

Full warehouse rebuilds fail. I’ve seen three attempted rip-and-replace migrations. None of them shipped on time. Two of them were quietly abandoned.

The approach that works is the strangler pattern, borrowed from application architecture. Build the new models alongside the old ones. Migrate consumers one at a time. Decommission old models only when nothing depends on them.

In practice, this means:

Build the new serving layer using proper dimensional modelling or whatever approach fits the use case. Give it a clear namespace — analytics.* or mart.* — separate from the legacy tables.
Dual-write the most critical data so that both old and new models are populated. Run reconciliation queries that compare the output. When the new model matches the old one for a given metric, you have confidence to migrate that metric’s consumers.
Migrate consumers individually. Start with the reports and dashboards that are most painful on the old model. Each migration is small, verifiable, and reversible. If something breaks, roll back that one consumer, not the entire platform.
Decommission deliberately. When a legacy table has zero active consumers — verified through query logs, not assumptions — archive it. Don’t delete it immediately. Archive it for 90 days, then delete.

The technology question

Teams often frame a warehouse redesign as a platform migration: “We should move from Snowflake to Databricks” or “We need to migrate to Microsoft Fabric.” Sometimes that’s the right call. Usually, it’s a distraction from the real problem.

If the warehouse’s issues are modelling and governance — multiple sources of truth, no clear ownership, inconsistent business logic — moving to a different platform will reproduce the same problems in a new environment. The platform isn’t the bottleneck. The data model is.

I’d rather redesign the data model on the existing platform than migrate a broken model to a new one. Fix the architecture first. Evaluate the platform second.

What the second act looks like

A well-redesigned warehouse has a few properties that the original rarely had:

Clear layer boundaries. Raw data in one schema, staged and cleaned data in another, business-logic-applied models in a third, and consumer-ready datasets in a fourth. Each layer has a defined contract with the next.

Metric definitions in code. Key business metrics are calculated once, in one place, in version-controlled transformation code. Every consumer gets the same number. Disagreements about what “revenue” means are resolved in a pull request, not a Slack argument.

Ownership metadata. Every table has an owner, a description, and a freshness SLA. Not because documentation is fun — because when something breaks at 2am, someone needs to know whose problem it is.

The best warehouse redesign I’ve been part of took eight months and migrated 200+ reports from a legacy model to a properly governed dimensional model. The trick wasn’t technical sophistication. It was patience — migrating one consumer at a time, validating each one, and resisting the urge to declare victory before the old tables were actually gone.

The second act is never as fast as the first. It doesn’t need to be. It needs to be right.