'how to use mysql to handle get unique pairs and without reusing the elements in below table
suppose we have a table below is regarding the devices scanning different packages. How to use MySQL(version8) to get each complete scan_package event duration. In order to consider a valid pair scan_package event duration, the action "start" should happen before the "end" action, and both actions should use the same device, and we only record the "start-end" once. (refer below table)
CREATE TABLE record (
device VARCHAR(100),
eventTime datetime,
event VARCHAR(100),
actions VARCHAR(100)
);
INSERT INTO record VALUES
( 'device1' , '2020-02-01 10:05:25', 'scan_package' , 'end'),
( 'device1' , '2020-02-01 10:10:25', 'scan_package' , 'start'),
( 'device1' , '2020-02-01 10:25:25', 'scan_package' , 'end'),
( 'device1' , '2020-02-01 10:30:25', 'scan_package' , 'start'),
( 'device1' , '2020-02-01 10:35:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:40:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:42:25', 'scan_package' , 'start'),
( 'device2' , '2020-02-01 10:47:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:51:25', 'scan_package' , 'start'),
( 'device2' , '2020-02-01 10:53:25', 'scan_package' , 'end'),
( 'device3' , '2020-02-02 08:45:25', 'scan_package' , 'start');
the valid output should be something like below:
| device | eventTime1 | event | action1 | eventTime2 | action2 | scan_duration_in_mins |
|---|---|---|---|---|---|---|
| device1 | 2020-02-01 10:10:25 | scan_package | start | 2020-02-01 10:25:25 | end | 15 |
| device1 | 2020-02-01 10:30:25 | scan_package | start | 2020-02-01 10:35:25 | end | 5 |
| device2 | 2020-02-01 10:42:25 | scan_package | start | 2020-02-01 10:47:25 | end | 5 |
| device2 | 2020-02-01 10:51:25 | scan_package | start | 2020-02-01 10:53:25 | end | 2 |
Based on the above table, we will begin with row2, since the first-row timestamp of end action comes before the start action.
Below is my code, however, it reuses the end action to the start actions of the duration calculation
SELECT
*,
TIMESTAMPDIFF(MINUTE, t1.eventTime, t2.eventTime) duration
FROM record AS t1
JOIN record AS t2
ON t1.device = t2.device
WHERE t1.actions = 'start' AND t2.actions = 'end'
AND
t1.eventTime < t2.eventTime
Therefore, my code output had additional outputs due to resue the actions
Basically, the result should be a unique "start-end" pair, which we should not reuse the "start - end" corresponding timestamp.
Can someone please give me some suggestions or code?
Notes:
We can assume there is no below situation:
device4 start
device4 start
device4 start
device4 end
device4 end
device4 end
Meanwhile, I am curious if we have the above case then how to handle it? (this could be a separate problem)
Solution 1:[1]
There are two options for this problem:
add a unique key/index for each pair, so the filtering would be easy and clear. It would also be enough, if you add an extra column 'start_time', where you can insert the origin start time for each 'end' record.
add some extra knowledge. This means you have to improve your filter and get the "end" record with the lowest timestamp AND timestamp greater than the "start" record. Otherwise it depends on the DB, which record you will get (most likely the one, which was inserted first, but this is not guaranteed).
The second option requires a "correlated subquery", which is much slower.
I have updated your fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=12762f164b72a06849fd66d823d40ebe
This works good on small databases. But if you have much records, this can become really slow.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Robert |
