'Create Date Ranges based on sum of record count (KQL, Azure Data Explorer, Kusto)
Given the below table, I would like to create an output that finds the maximum date ranges where the sum of 'RecordCount' is less than or equal to 20,000. Also, If a single row is greater than 20,000 the resulting date range will be the start and end of that single day:
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
Expected Result:
datatable(StartDate:datetime, EndDate:datetime, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", "2022-04-25T23:59:59.9999999Z", 15750,
"2022-04-26T00:00:00.0000000Z", "2022-04-26T23:59:59.9999999Z", 18498,
"2022-04-27T00:00:00.0000000Z", "2022-04-27T23:59:59.9999999Z", 17558,
"2022-04-28T00:00:00.0000000Z", "2022-04-28T23:59:59.9999999Z", 15626,
"2022-04-29T00:00:00.0000000Z", "2022-04-30T23:59:59.9999999Z", 14958,
"2022-05-01T00:00:00.0000000Z", "2022-05-01T23:59:59.9999999Z", 48594,
"2022-05-02T00:00:00.0000000Z", "2022-05-03T23:59:59.9999999Z", 15811,
"2022-05-04T00:00:00.0000000Z", "2022-05-04T23:59:59.9999999Z", 27505,
"2022-05-05T00:00:00.0000000Z", "2022-05-05T23:59:59.9999999Z", 22808,
"2022-05-06T00:00:00.0000000Z", "2022-05-06T23:59:59.9999999Z", 23119,
"2022-05-07T00:00:00.0000000Z", "2022-05-10T23:59:59.9999999Z", 18750,
]
Solution 1:[1]
Based on the scan operator
datatable(Date:string, RecordCount:long)
[
"2022-04-24T00:00:00.0000000Z", 825,
"2022-04-25T00:00:00.0000000Z", 14925,
"2022-04-26T00:00:00.0000000Z", 18498,
"2022-04-27T00:00:00.0000000Z", 17558,
"2022-04-28T00:00:00.0000000Z", 15626,
"2022-04-29T00:00:00.0000000Z", 12755,
"2022-04-30T00:00:00.0000000Z", 2203,
"2022-05-01T00:00:00.0000000Z", 48594,
"2022-05-02T00:00:00.0000000Z", 4976,
"2022-05-03T00:00:00.0000000Z", 10835,
"2022-05-04T00:00:00.0000000Z", 27505,
"2022-05-05T00:00:00.0000000Z", 22808,
"2022-05-06T00:00:00.0000000Z", 23119,
"2022-05-07T00:00:00.0000000Z", 5141,
"2022-05-08T00:00:00.0000000Z", 2217,
"2022-05-09T00:00:00.0000000Z", 11334,
"2022-05-10T00:00:00.0000000Z", 58,
]
| order by Date asc
| scan declare (acc_sum:long = 0, group_id:int = 0)
with
(
step s1 : true => acc_sum = RecordCount + iff(s1.acc_sum + RecordCount > 20000, 0, s1.acc_sum)
,group_id = s1.group_id + iff(s1.acc_sum + RecordCount > 20000, 1, 0);
)
| summarize StartDate = min(Date), EndDate = max(Date), RecordCount = sum(RecordCount) by group_id
| project-away group_id
| StartDate | EndDate | RecordCount |
|---|---|---|
| 2022-04-24T00:00:00.0000000Z | 2022-04-25T00:00:00.0000000Z | 15750 |
| 2022-04-26T00:00:00.0000000Z | 2022-04-26T00:00:00.0000000Z | 18498 |
| 2022-04-27T00:00:00.0000000Z | 2022-04-27T00:00:00.0000000Z | 17558 |
| 2022-04-28T00:00:00.0000000Z | 2022-04-28T00:00:00.0000000Z | 15626 |
| 2022-04-29T00:00:00.0000000Z | 2022-04-30T00:00:00.0000000Z | 14958 |
| 2022-05-01T00:00:00.0000000Z | 2022-05-01T00:00:00.0000000Z | 48594 |
| 2022-05-02T00:00:00.0000000Z | 2022-05-03T00:00:00.0000000Z | 15811 |
| 2022-05-04T00:00:00.0000000Z | 2022-05-04T00:00:00.0000000Z | 27505 |
| 2022-05-05T00:00:00.0000000Z | 2022-05-05T00:00:00.0000000Z | 22808 |
| 2022-05-06T00:00:00.0000000Z | 2022-05-06T00:00:00.0000000Z | 23119 |
| 2022-05-07T00:00:00.0000000Z | 2022-05-10T00:00:00.0000000Z | 18750 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | David דודו Markovitz |
