'How to count transactions over a period of time defined by the individual row data time stamp per transaction
I am using MariaDB 10.2.4 in MySQL syntax and attempting to count the number of transactions that have occurred in the previous 24 hours based on the time stamp (not the calendar day) of the row for each customer for each transaction.
Column Information:
- row_number: I added this for ease of reading its not actually present
- order_no: is a unique value for each row
- customer_id: is unique by customer but repeated in the table
- order_date: is a timestamp that the count must be performed on
- dollar_value: is there for context and is a decimal
Here is a manually made table to show what my desired output will be. The input will be all the same columns, excluding the output (trans_count) and row_number column.
Examples:
Row 3. The time frame I'm interested in is 2022-02-11 22:53:50 (the value in the order_date column) to 2022-02-10 22:53:50 (order_date - interval 1 day) by customer ID 1111171. Row 4 and 6 match this, so the output for row 3 is 3 (including the row 3).
Row 4. The time frame is 2022-02-11 06:49:36 to 2022-02-10 06:49:36 for customer 1111171. Row 6,8,9,10 fall within the timeframe and are the same customer_id, so the trans_count value is 5.
Table
| row_number | order_no | customer_id | order_date | dollar_value | trans_count |
|---|---|---|---|---|---|
| 1 | 8888883 | 1111100 | 2022-02-14 01:10:04 | 2256.0 | 1 |
| 2 | 8888837 | 1111100 | 2022-02-12 05:46:32 | 1457.2 | 1 |
| 3 | 8888812 | 1111171 | 2022-02-11 22:53:50 | 1757.2 | 3 |
| 4 | 8888887 | 1111171 | 2022-02-11 06:49:36 | 1350.2 | 5 |
| 5 | 8888804 | 1111100 | 2022-02-11 03:10:07 | 1853.6 | 1 |
| 6 | 8888866 | 1111171 | 2022-02-11 01:20:26 | 1053.0 | 4 |
| 7 | 8888833 | 1111181 | 2022-02-10 21:09:05 | 253.2 | 1 |
| 8 | 8888874 | 1111171 | 2022-02-10 18:06:55 | 1958.6 | 3 |
| 9 | 8888829 | 1111171 | 2022-02-10 10:11:59 | 1456.2 | 2 |
| 10 | 8888802 | 1111171 | 2022-02-10 09:55:31 | 956.6 | 1 |
| 11 | 8888835 | 1111100 | 2022-02-09 19:40:24 | 756.4 | 2 |
| 12 | 8888810 | 1111123 | 2022-02-09 01:34:56 | 3179.5 | 1 |
| 13 | 8888850 | 1111100 | 2022-02-08 20:00:20 | 629.0 | 1 |
| 14 | 8888821 | 1111171 | 2022-02-08 17:59:05 | 1249.45 | 2 |
| 15 | 8888809 | 1111171 | 2022-02-08 06:25:15 | 1250.0 | 1 |
| 16 | 8888837 | 1111147 | 2022-02-08 06:18:15 | 184.6 | 1 |
| 17 | 8888836 | 1111171 | 2022-02-07 12:01:47 | 88.28 | 1 |
| 18 | 8888808 | 1111147 | 2022-02-05 12:02:49 | 3008.7 | 3 |
| 19 | 8888890 | 1111147 | 2022-02-05 11:48:16 | 1543.31 | 2 |
| 20 | 8888805 | 1111147 | 2022-02-05 11:37:55 | 2617.4 | 1 |
I intend to also apply some other filters, such as only counting dollar_value above 1000, but I expect that shouldn't affect the logic of the count.
I have been able to do this in PySpark, but I couldn't repeat the same logic. Here is a snippet from a function I have incorporated it into.
'''
# Function to calculate number of seconds from number of days
days = lambda i: i * 86400
# Create window by casting timestamp to long (number of seconds) then defining the number of days you wish to review
w = (Window.partitionBy('customer_id').orderBy(F.col(date_column).cast('long')).rangeBetween(-days(monitor_length_days), 0))
# Add total value of transactions undertaken within time frame by customer_id
filteredDF = filteredDF.withColumn(output_name, F.count('customer_id').over(w))
'''
Please let me know if you have any questions,
Solution 1:[1]
Unfortunately MariaDB doesn't allow you to make a window over a time period, so you need a different approach. One way to solve this is to JOIN the table to itself where the customer_id value is the same and the order_date is within the preceding 24 hours. You can then count the rows in the joined table to get the trans_count value:
SELECT o.customer_id, o.order_date,
COUNT(c.customer_id) AS trans_count
FROM orders o
JOIN (SELECT customer_id, order_date
FROM orders
) c ON c.customer_id = o.customer_id AND c.order_date BETWEEN o.order_date - INTERVAL 1 DAY AND o.order_date
GROUP BY o.customer_id, o.order_date
ORDER BY o.order_date DESC
Output (for your sample data):
customer_id order_date trans_count
1111100 2022-02-14 01:10:04 1
1111100 2022-02-12 05:46:32 1
1111171 2022-02-11 22:53:50 3
1111171 2022-02-11 06:49:36 5
1111100 2022-02-11 03:10:07 1
1111171 2022-02-11 01:20:26 4
1111181 2022-02-10 21:09:05 1
1111171 2022-02-10 18:06:55 3
1111171 2022-02-10 10:11:59 2
1111171 2022-02-10 09:55:31 1
1111100 2022-02-09 19:40:24 2
1111123 2022-02-09 01:34:56 1
1111100 2022-02-08 20:00:20 1
1111171 2022-02-08 17:59:05 2
1111171 2022-02-08 06:25:15 2
1111147 2022-02-08 06:18:15 1
1111171 2022-02-07 12:01:47 1
1111147 2022-02-05 12:02:49 3
1111147 2022-02-05 11:48:16 2
1111147 2022-02-05 11:37:55 1
Note that for row 15 the trans_count should be 2 as there is a preceding transaction within 24 hours on row 17.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Nick |
