'SQL/Power BI How to expand table according dates
I have a table like below, where a new record is created when there is a change in the status of a task.
| task | status | last update |
|---|---|---|
| A | 1 | 28/04/2022 |
| A | 3 | 01/05/2022 |
| A | 5 | 05/05/2022 |
| B | 1 | 28/04/2022 |
| B | 3 | 03/05/2022 |
| B | 4 | 05/05/2022 |
The problem is that I need to plot a graph within a time range, where I know the status of each item regardless of the date it was changed/created. With that, I think the easiest is to transform to the table below:
| task | status | last update |
|---|---|---|
| A | 1 | 28/04/2022 |
| A | 1 | 29/04/2022 |
| A | 1 | 28/04/2022 |
| A | 1 | 29/04/2022 |
| A | 1 | 30/04/2022 |
| A | 3 | 01/05/2022 |
| A | 3 | 02/05/2022 |
| A | 3 | 03/05/2022 |
| A | 3 | 04/05/2022 |
| A | 5 | 05/05/2022 |
| B | 1 | 28/04/2022 |
| B | 1 | 29/04/2022 |
| B | 1 | 30/04/2022 |
| B | 1 | 01/05/2022 |
| B | 1 | 02/05/2022 |
| B | 3 | 03/05/2022 |
| B | 3 | 04/05/2022 |
| B | 4 | 05/05/2022 |
However, I can't think of a way to do it, either directly in Power BI or even in SQL, since I'm connecting to a redshift database through a sql query. Could you please help me? Thanks
Solution 1:[1]
You can create the below visual using the standard line chart visualization. In the visualization settings, go to the "Shapes" menu and turn the "Stepped" view on.
While not necessary, it may be best practice to create a date dimension table with daily values spanning from the minimum update date to the maximum update date.
Dates = CALENDAR(MIN(Tasks[last update]),MAX(Tasks[last update]))
You can then create a one to many relationship between Dates and Tasks.
Solution 2:[2]
demo
very similar question: How to do forward fill as a PL/PGSQL function
I don't know the actual differences between amazon redshift and postgresql.
The demo is based on postgresql 14. It may not works on redshift.
Basic idea:for every distinct task, get the max, min last_updated date then use the generate_series function to expand the date based on task, task's min & max last_update. key point is first_value(status), because the once you expand the date, then obviously some date the status value is null, then use partition to fill the gap. If you want deep more, you can read manual: https://www.postgresql.org/docs/14/plpgsql.html
CREATE OR REPLACE FUNCTION test_expand ()
RETURNS TABLE (
_date1 date,
_first_ctask text,
_first_cstatus bigint
)
AS $$
DECLARE
distinct_task record;
max_last_update date;
min_last_update date;
_sql text;
BEGIN
FOR distinct_task IN SELECT DISTINCT
task
FROM
test_1
ORDER BY
1 LOOP
min_last_update := (
SELECT
min(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
max_last_update := (
SELECT
max(last_update)
FROM
test_1
WHERE
task = distinct_task.task
LIMIT 1);
_sql := format($dml$ WITH cte AS (
SELECT
date1::date, $task$ % s$task $ AS _task, status, count(status) OVER (ORDER BY date1) AS c_s FROM (
SELECT
generate_series($a$ % s$a $::date, $b$ % s$b $::date, interval '1 day')) g (date1)
LEFT JOIN test_1 ON date1 = last_update)
SELECT
date1, _task, first_value(status) OVER (PARTITION BY c_s ORDER BY date1, status)
FROM cte $dml$, distinct_task.task, min_last_update, max_last_update);
RETURN query EXECUTE _sql;
END LOOP;
RETURN;
END;
$$
LANGUAGE plpgsql;
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Strictly Funk |
| Solution 2 | Mark |



