'How can I GROUP BY based on time between records, rather than time 'buckets'?

I've been learning MS Access 2016 the hard way (trial/error + search everything), and I've hit something that's got me stumped.

TL;DR: How do I re-bundle my dataset of operations into transactions? A transaction is defined as all records (operations) with the same [Clerk] and [Customer ID] within 10 minutes of each other including across hours.

Cross-posted on Stack Overflow: How can I GROUP BY based on time between records, rather than time 'buckets'?


I have 150k rows (each year) from an external system that I need to break down and organize into Reports. The external system logs "Packages to Customers" and I need to extract individual "Customer" transactions from this dataset for use in my reports. For example, one of these reports is Avg Customers Serviced per Month per Hour of the Day aka Avg Daily Workload.

The biggest stumbling block is that one "Customer" transaction is a set of records where [Clerk] and [Customer ID] are the same and [Date & Time] are all within 10 minutes of each other. This set of records counts as 1 "Customer" transaction, even when the [Date & Time] of each records has different hours.

I can get everything except the "Within 10 minutes" bit to work. Because "within 10 minutes" doesn't mean "Group by 10 minute buckets", but "Group all records between [Date & Time] and [Date & Time]+10 minutes". This would mean that if I have records at [12:59:30] and [13:01:20] (where [Clerk] and [Customer ID] match) both are grouped together as 1 "Customer", but not grouped with a record at [16:02:37] where [Clerk] and [Customer ID] also match.

I'm not sure if I need to create a "Calculated" column that does the legwork to create a Transaction ID, or if I can do this all through Group By and Select.

Additionally, in writing this post, I've struggled to explain this concept, which makes me believe that I either have a flaw in my concept or dealing with time is just a nightmare.

Note: Our data has hard boundaries on time frame. Data will never be before 04:00 or after 20:00 of any given day. This is because it reflects a physical operation and we definitely don't work 24 hours.


Here's a sample of my data set. First line of table data is the Data Type, if that helps.

DB ID Item ID Status Date & Time Clerk Class Customer ID
AutoNum Text Text Date/Time Text Text Text
1 9305589600000000000000 Delivered 31-Mar-2022 16:57:09 nguyenxe Priority Mail 3917
2 9405511100000000000000 Delivered 31-Mar-2022 16:56:48 nguyenxe Priority Mail 3917
3 9405511400000000000000 Delivered 31-Mar-2022 16:54:28 huntxa Priority Mail 1494
4 9305589600000000000001 Delivered 31-Mar-2022 16:53:59 huntxa Priority Mail 4769
5 9405511800000000000000 Delivered 31-Mar-2022 16:53:53 nguyenxe Priority Mail 4494
6 9305589600000000000002 Delivered 31-Mar-2022 16:53:44 nguyenxe Priority Mail 4494
7 9400109200000000000000 Delivered 31-Mar-2022 16:53:22 huntxa First-Class Mail 0499
8 92612999000000000000000000 Delivered 31-Mar-2022 16:52:00 huntxa Parcel Select 2529
9 9261290900000000000000 Delivered 31-Mar-2022 16:51:37 belldx Parcel Select Mail 0308
10 9205590200000000000000 Delivered 31-Mar-2022 16:50:52 belldx Priority Mail 0308


Solution 1:[1]

You could loop the records in VBA:

  1. Open a recordset sorted by Clerk, Customer ID, and Date & Time
  2. Write the first record to a log table and record the time of this
  3. Loop the records. If Clerk and Customer ID is different from the previouse record, or if Date & Time is off by more than 10 minutes from the last written record, write this record to the log table and record the time of this record as "Last saved time".
  4. Continue to no more records

This is plain and easy to maintain contrary to some convoluted query.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Gustav