'Check if value between a threshold range using sliding time window/bins - KQL query
I would like to write a sliding window query in KQL which would check if the the speed of a car is ALWAYS between a certain speed limit (e.g. b/w 100 and 150 km/h) for a time window of 5 mins.
Following is a sample dataset for it:
Timestamp | Speed | Temperature |
---|---|---|
2022-01-01 00:00:00.0000000 | 142.5 | 25.1 |
2022-01-01 00:01:00.0000000 | 147.4 | 25.5 |
2022-01-01 00:02:00.0000000 | 158.2 | 25.4 |
2022-01-01 00:03:00.0000000 | 134.8 | 25.6 |
2022-01-01 00:04:00.0000000 | 125.3 | 25.5 |
2022-01-01 00:05:00.0000000 | 118.4 | 25.4 |
2022-01-01 00:06:00.0000000 | 106.3 | 26.3 |
2022-01-01 00:07:00.0000000 | 119.6 | 26.5 |
2022-01-01 00:08:00.0000000 | 134.7 | 25.4 |
2022-01-01 00:09:00.0000000 | 153.2 | 26.6 |
2022-01-01 00:10:00.0000000 | 137.5 | 25.5 |
2022-01-01 00:11:00.0000000 | 129.9 | 27.4 |
2022-01-01 00:12:00.0000000 | 118.1 | 26.3 |
2022-01-01 00:13:00.0000000 | 105.4 | 25.7 |
2022-01-01 00:14:00.0000000 | 101.7 | 24.4 |
2022-01-01 00:15:00.0000000 | 100.8 | 25.6 |
2022-01-01 00:16:00.0000000 | 95.4 | 26.2 |
2022-01-01 00:17:00.0000000 | 105.6 | 26.7 |
First the window would check if the speed is in the defined range from 0-4mins, then 1-5mins, then, 2-6mins, then 3-7mins, then, 4-8mins and so on ..... until 10-14mins, then 11-15mins, then 12-16mins and then 13-17mins. If the speed is continuously in the 100-150km/h range the query would return those rows as output.
I would expect the following output:
Timestamp | Speed | Temperature |
---|---|---|
2022-01-01 00:03:00.0000000 | 134.8 | 25.6 |
2022-01-01 00:04:00.0000000 | 125.3 | 25.5 |
2022-01-01 00:05:00.0000000 | 118.4 | 25.4 |
2022-01-01 00:06:00.0000000 | 106.3 | 26.3 |
2022-01-01 00:07:00.0000000 | 119.6 | 26.5 |
2022-01-01 00:08:00.0000000 | 134.7 | 25.4 |
2022-01-01 00:10:00.0000000 | 137.5 | 25.5 |
2022-01-01 00:11:00.0000000 | 129.9 | 27.4 |
2022-01-01 00:12:00.0000000 | 118.1 | 26.3 |
2022-01-01 00:13:00.0000000 | 105.4 | 25.7 |
2022-01-01 00:14:00.0000000 | 101.7 | 24.4 |
2022-01-01 00:15:00.0000000 | 100.8 | 25.6 |
In the output dataset, timestamps from 0-2mins are filtered out because when we check from 0-4mins there is a value (158.2km/h) out of the range (100-150km/h). Similarly, we find this value when checking from 1-5mins and also when checking from 2-6mins.
From 3-7mins all the speed values are within the range constantly for 5 mins. And from 4-8mins as well that's why these rows are kept.
In the end, I would just like to plot the temperature for all the 5 min time windows where the speed was always within the range. The plotting part is clear so I just need help with filtering the rows.
Thanks in Advance!
Solution 1:[1]
Here is a solution without any assumptions on your data.
Additional information, such has the granularity of the timestamps, could help to simplify it.
- Break the rows to groups, treating records with out-of-range (OOR) speed as boundaries. Also add a synthetic record that could later be used as a minimal boundary.
- For each group find its boundaries, which are aligned to the sliding interval (1m). The main challenge here is to find the lower boundry for each group, since for the upper boundry we use
bin
which is a synonym to floor, but we don't have an equivalent for ceiling. - Remove the groups that are smaller than the defined window (5m).
- Join the groups boundaries with the groups' records and drop the OOR records.
let p_sliding_interval = 1m;
let p_window = 5m;
let t =
datatable (Timestamp:datetime ,Speed:real ,Temperature:real)
[
'2022-01-01 00:00:00.0000000' ,142.5 ,25.1
,'2022-01-01 00:01:00.0000000' ,147.4 ,25.5
,'2022-01-01 00:02:00.0000000' ,158.2 ,25.4
,'2022-01-01 00:03:00.0000000' ,134.8 ,25.6
,'2022-01-01 00:04:00.0000000' ,125.3 ,25.5
,'2022-01-01 00:05:00.0000000' ,118.4 ,25.4
,'2022-01-01 00:06:00.0000000' ,106.3 ,26.3
,'2022-01-01 00:07:00.0000000' ,119.6 ,26.5
,'2022-01-01 00:08:00.0000000' ,134.7 ,25.4
,'2022-01-01 00:09:00.0000000' ,153.2 ,26.6
,'2022-01-01 00:10:00.0000000' ,137.5 ,25.5
,'2022-01-01 00:11:00.0000000' ,129.9 ,27.4
,'2022-01-01 00:12:00.0000000' ,118.1 ,26.3
,'2022-01-01 00:13:00.0000000' ,105.4 ,25.7
,'2022-01-01 00:14:00.0000000' ,101.7 ,24.4
,'2022-01-01 00:15:00.0000000' ,100.8 ,25.6
,'2022-01-01 00:16:00.0000000' ,95.4 ,26.2
,'2022-01-01 00:17:00.0000000' ,105.6 ,26.7
];
let min_Timestamp = toscalar(t | summarize min(Timestamp));
let max_Timestamp = toscalar(t | summarize max(Timestamp));
let row_level =
t
| extend out_of_range_record_flag = iff(Speed !between (100 .. 150),1,0)
| union (print Timestamp = datetime(null) , out_of_range_record_flag = 1)
| order by Timestamp asc nulls first
| extend in_range_group_id = row_cumsum(out_of_range_record_flag);
let group_boundries =
row_level
| where out_of_range_record_flag == 1
| project in_range_group_id, from_timestamp = coalesce(Timestamp, min_Timestamp)
| order by in_range_group_id asc
| extend to_timestamp = coalesce(next(from_timestamp), max_Timestamp)
| extend bin_from_timestamp = bin(from_timestamp, p_sliding_interval)
| extend ceil_from_timestamp = bin_from_timestamp + iff(bin_from_timestamp == from_timestamp, 0ms, p_sliding_interval)
| extend in_range_window = bin(to_timestamp, p_sliding_interval) - ceil_from_timestamp
| where in_range_window >= p_window;
group_boundries
| join kind=inner row_level on in_range_group_id
| where out_of_range_record_flag == 0
| project Timestamp, Speed, Temperature
Timestamp | Speed | Temperature |
---|---|---|
2022-01-01T00:03:00Z | 134.8 | 25.6 |
2022-01-01T00:04:00Z | 125.3 | 25.5 |
2022-01-01T00:05:00Z | 118.4 | 25.4 |
2022-01-01T00:06:00Z | 106.3 | 26.3 |
2022-01-01T00:07:00Z | 119.6 | 26.5 |
2022-01-01T00:08:00Z | 134.7 | 25.4 |
2022-01-01T00:10:00Z | 137.5 | 25.5 |
2022-01-01T00:11:00Z | 129.9 | 27.4 |
2022-01-01T00:12:00Z | 118.1 | 26.3 |
2022-01-01T00:13:00Z | 105.4 | 25.7 |
2022-01-01T00:14:00Z | 101.7 | 24.4 |
2022-01-01T00:15:00Z | 100.8 | 25.6 |
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 |