'how to use mysql to handle get unique pairs and without reusing the elements in below table

suppose we have a table below is regarding the devices scanning different packages. How to use MySQL(version8) to get each complete scan_package event duration. In order to consider a valid pair scan_package event duration, the action "start" should happen before the "end" action, and both actions should use the same device, and we only record the "start-end" once. (refer below table)

    CREATE TABLE record (
    device VARCHAR(100),
    eventTime datetime,
    event VARCHAR(100),
    actions VARCHAR(100)
);

INSERT INTO record VALUES 
( 'device1' , '2020-02-01 10:05:25', 'scan_package' , 'end'),
( 'device1' , '2020-02-01 10:10:25', 'scan_package' , 'start'),
( 'device1' , '2020-02-01 10:25:25', 'scan_package' , 'end'),
( 'device1' , '2020-02-01 10:30:25', 'scan_package' , 'start'),
( 'device1' , '2020-02-01 10:35:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:40:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:42:25', 'scan_package' , 'start'),
( 'device2' , '2020-02-01 10:47:25', 'scan_package' , 'end'),
( 'device2' , '2020-02-01 10:51:25', 'scan_package' , 'start'),
( 'device2' , '2020-02-01 10:53:25', 'scan_package' , 'end'),
( 'device3' , '2020-02-02 08:45:25', 'scan_package' , 'start');

the valid output should be something like below:

device eventTime1 event action1 eventTime2 action2 scan_duration_in_mins
device1 2020-02-01 10:10:25 scan_package start 2020-02-01 10:25:25 end 15
device1 2020-02-01 10:30:25 scan_package start 2020-02-01 10:35:25 end 5
device2 2020-02-01 10:42:25 scan_package start 2020-02-01 10:47:25 end 5
device2 2020-02-01 10:51:25 scan_package start 2020-02-01 10:53:25 end 2

Based on the above table, we will begin with row2, since the first-row timestamp of end action comes before the start action.

Below is my code, however, it reuses the end action to the start actions of the duration calculation

SELECT 
*,
TIMESTAMPDIFF(MINUTE, t1.eventTime, t2.eventTime) duration
FROM record AS t1
JOIN record AS t2
ON t1.device = t2.device
WHERE t1.actions = 'start' AND t2.actions = 'end'
AND 
t1.eventTime < t2.eventTime 

Therefore, my code output had additional outputs due to resue the actions

Basically, the result should be a unique "start-end" pair, which we should not reuse the "start - end" corresponding timestamp.

Can someone please give me some suggestions or code?

Notes:

We can assume there is no below situation:

  device4  start 
  device4  start
  device4  start
  device4  end
  device4  end
  device4  end

Meanwhile, I am curious if we have the above case then how to handle it? (this could be a separate problem)



Solution 1:[1]

There are two options for this problem:

  1. add a unique key/index for each pair, so the filtering would be easy and clear. It would also be enough, if you add an extra column 'start_time', where you can insert the origin start time for each 'end' record.

  2. add some extra knowledge. This means you have to improve your filter and get the "end" record with the lowest timestamp AND timestamp greater than the "start" record. Otherwise it depends on the DB, which record you will get (most likely the one, which was inserted first, but this is not guaranteed).

The second option requires a "correlated subquery", which is much slower.

I have updated your fiddle: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=12762f164b72a06849fd66d823d40ebe

This works good on small databases. But if you have much records, this can become really slow.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Robert