Why Your dbt Project Will Become Unmaintainable

I inherited a dbt project last year with 400 models, no naming convention, and a DAG that looked like a plate of spaghetti someone had thrown at a wall. Every model was a select * from something, joined to something else, with business logic scattered across staging, intermediate, and mart layers with no clear boundary between them.

The team that built it wasn’t incompetent. They were fast. dbt makes it incredibly easy to create a new model — so easy that nobody stops to ask whether they should.

The patterns that rot

Model sprawl without naming conventions

When model names don’t encode their layer or domain, you can’t tell what anything does from the DAG. Is customer_metrics a staging model, an intermediate calculation, or a final mart? Is it safe to change, or do twelve downstream models depend on it?

I use a prefix convention: stg_ for staging, int_ for intermediate, fct_ for facts, dim_ for dimensions, rpt_ for reports. It’s not original. It’s not clever. It means I can look at any model name and know where it sits in the pipeline without opening the file.

Inconsistent grain

The fastest way to create bugs in a dbt project is to join two models at different grains without realising it. A model at the customer level joined to a model at the transaction level will silently duplicate rows. dbt won’t warn you. Your downstream consumers will find out when their numbers don’t match.

Every model should declare its grain in a comment or in the YAML description. Not because comments are documentation — they rot too — but because writing “one row per customer per month” forces you to think about whether that’s actually true after the join you just added.

Business logic in staging

Staging models should do three things: rename columns to a consistent convention, cast types, and filter out obviously invalid records. That’s it. No calculations, no joins, no business rules.

The moment business logic enters the staging layer, you’ve created a hidden dependency. Someone changes the staging model to fix a calculation, and every downstream model that assumed the old behaviour breaks. Worse, they break silently — the numbers just change.

Missing tests on relationships

dbt’s built-in relationship test (relationships) is the most underused feature in the framework. It checks whether every value in a column exists in another model’s column — the equivalent of a foreign key constraint, except your warehouse probably doesn’t enforce those.

I’ve seen projects where a dimension table was rebuilt with a new surrogate key strategy, and every fact table that referenced it continued to build successfully with broken joins. A single relationships test on each foreign key would have caught it immediately.

What sustainable looks like

A maintainable dbt project has a few properties that are boring to implement and invaluable to inherit.

Clear layer boundaries. Sources feed staging. Staging feeds intermediate. Intermediate feeds marts. No skipping layers. No marts referencing sources directly. The DAG should have a clear left-to-right flow with no backward references.

One model, one job. Each model transforms data in one way. If a model does staging and business logic, split it. If a model serves two different downstream use cases with different grains, split it. Small models with clear purposes are easier to test, easier to debug, and easier to replace.

YAML as the source of truth. Every model has a YAML entry with a description, column descriptions for anything non-obvious, and tests. This isn’t documentation for its own sake — it’s the contract that downstream consumers rely on. When the YAML says “one row per customer per day” and someone changes the model to aggregate monthly, the description flags the discrepancy.

Consistent SQL style. CTEs named descriptively. One join per CTE when the logic is complex. No select * in anything beyond staging. Explicit column lists in marts so that adding a column upstream doesn’t silently change the output schema.

The discipline problem

None of this is technically hard. dbt’s contribution to data engineering isn’t the SQL compilation or the DAG — it’s the opinion that transformations should be version-controlled, tested, and documented. But dbt gives you the tools for discipline without enforcing it.

The projects that stay maintainable are the ones where someone cares enough to write a style guide and review pull requests against it. The ones that rot are the ones where speed is always more important than structure.

Every dbt project starts clean. The question is whether anyone is willing to keep it that way.