'How to prevent dbt from rebuilding incremental tables when source view's columns change?

We have the following structure in dbt:

[events (view), with column1, column2, column3]‒‒‒>[incremental1, using column1 from events]
                                             |
                                             |
                                             └‒‒‒‒>[incremental2, using column2 from events]

In our continuous integration setup, we run dbt run --models state:modified+ to avoid running models that were not changed in a PR. However, the problem is that if we change column3 in events (e.g. rename it), both incremental tables, incremental1 and incremental2, will be fully rebuilt which may take an excessively long time, sometimes more than a whole day. Note that neither incremental tables are actually affected by the change in their source as they do not use the changed column in any way.

Is there a way to avoid rebuilding these models while also not having to run all the other models (which is what a simple dbt run would do)?

Some additional details:

  • our dbt version is 1.0
  • our data warehouse engine is Snowflake
  • our incremental strategy is 'delete+insert'


Solution 1:[1]

dbt run takes an --exclude option; you could exclude those long-running tables by name, or tag them. This obviously carries its own risks and maintenance burden.

Another option would be to filter the records in these large tables on a dev or CI run so that they build much more quickly. You could add something like this to the incremental model:

{% if target.name != 'prod' %}
where updated_at > current_timestamp() - interval '30 days'
{% endif %}

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 tconbeer