'DBT join multiple tables

I am learning DBT, specifically dbt-mysql. I am having trouble combining several tables into one table. What I want to do: Group By several columns by the last_updated (timestamp) date of the table and then combine those columns into a single table by the split last_updated field. Here is how I want my data to end up:

enter image description here

Here is my staging model (which I think should be straight selects from the database):

staging/clients/stg_clients_fields.sql

SELECT id, created, last_updated, service, order_count, spent_count, deleted, country
FROM client_database.clients

Then I have intermediate models (which I think should reconstruct data for my needs): intermediate/clients/clients_last_updated_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as client_count
FROM {{ ref('stg_clients_fields') }}
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_deleted_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as deleted
FROM {{ ref('stg_clients_fields') }}
WHERE deleted = 1
GROUP BY YEAR(last_updated), MONTH (last_updated)

intermediate/clients/clients_service_grouped.sql

SELECT YEAR(last_updated) as year_updated, MONTH(last_updated) as month_updated, COUNT(id) as service
FROM {{ ref('stg_clients_fields') }}
WHERE service IS NOT NULL
GROUP BY YEAR(last_updated), MONTH (last_updated)

And other columns follow the same pattern based on their WHERE clauses.

Now I need to create a marts model that would use all previously created data and put it in one single table.

At this point, I end up with several tables that have the last_updated field separated and the specific column value next to the date.

How can I now combine all these tables that they would join on the last_updated split into to columns field?

Or perhaps there is a better solution to group data by year and month and get individual column values based on conditions?

I am new to DBT so all the help and all advice are welcome!



Solution 1:[1]

since clients_last_updated_grouped doesn't have a where condition, it's guaranteed to have all of the year/month combinations found in the other models. This makes it much easier. You can just select from that model and join the other models on year and month:

with
    updated as (select * from {{ ref('clients_last_updated_grouped') }} ),
    deleted as (select * from  ),
    service as (select * from  ),

    joined as (
        select
            updated.year,
            updated.month,
            updated.client_count,
            coalesce(deleted.deleted, 0) as deleted_count,
            coalesce(service.service, 0) as service_count

        from
            updated
            left join deleted on updated.year = deleted.year and updated.month = deleted.month
            left join service on updated.year = service.year and updated.month = service.month
    )
select *
from joined

If your database doesn't support CTEs (with ...), this becomes:

select
    updated.year,
    updated.month,
    updated.client_count,
    coalesce(deleted.deleted, 0) as deleted_count,
    coalesce(service.service, 0) as service_count

from
    {{ ref('clients_last_updated_grouped') }} as updated
    left join {{ ref('clients_deleted_grouped') }} as deleted on updated.year = deleted.year and updated.month = deleted.month
    left join {{ ref('clients_service_grouped') }} as service on updated.year = service.year and updated.month = service.month

If it's not the case that clients_last_updated_grouped has every month/year combination of the other tables, you would need to first construct a "date spine", and then left join all 3 tables to that date spine.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1