'Django group nearby rows based on datetime
In a Django model that stores an event datetime, I would like to group rows nearby in datetime. To clarify, each object has a date_time field and I can find the gap between objects in date time rather easily with:
# We want a minimum date_time as the default for prev_date_time
# but because Session.date_time is a DateFimeField and TZ aware
# Django requires teh default to be TZ aware. And datetime.min
# cannot be made TZ awre (it crashes), adding one day prevents
# that crash and works and serves our purposes.
min_date_time = make_aware(datetime.min + timedelta(days=1), "UTC")
sessions = Session.objects.all().annotate(
prev_date_time=Window(
expression=Lead('date_time', default=min_date_time),
order_by=F('date_time').desc()
),
dt_difference=F('date_time') - F('prev_date_time')
)
This works brilliantly.
Now, I would like to group these sessions such that any session with a dt_difference of over 1 day marks a group boundary.
I imagine two new annotations could do the trick, but I am struggling to write them and would ideally like to do so without resorting to raw SQL.
- A new annotation that is equal to the date_time of the session when dt_difference is greater than one day, null otherwise.
- A new annotation that fills all these nulls with the first non-null value (in the direction away from the gap) with this date time.
Best illustrated by example. I extract some sample data from this table as follows and illustrate, in which I show the date_time and the prev_date_time and dt_difference as per the annotations above, and illustrate two new annotations, start_group and stat_dt.
| id | date_time | prev_date_tim | dt_difference | start_group | start_dt |
|---|---|---|---|---|---|
| 577 | "2021-07-03 23:59:00+10" | "2021-07-03 22:30:00+10" | "01:29:00" | null | "2021-07-03 19:45:00+10" |
| 574 | "2021-07-03 22:30:00+10" | "2021-07-03 22:30:00+10" | "00:00:00" | null | "2021-07-03 19:45:00+10" |
| 576 | "2021-07-03 22:30:00+10" | "2021-07-03 21:00:00+10" | "01:30:00" | null | "2021-07-03 19:45:00+10" |
| 575 | "2021-07-03 21:00:00+10" | "2021-07-03 19:45:40+10" | "01:14:20" | null | "2021-07-03 19:45:00+10" |
| 572 | "2021-07-03 19:45:40+10" | "2021-07-03 19:45:00+10" | "00:00:40" | null | "2021-07-03 19:45:00+10" |
| 573 | "2021-07-03 19:45:00+10" | "2021-06-19 22:15:00+10" | "13 days 21:30:00" | "2021-07-03 19:45:00+10" | "2021-07-03 19:45:00+10" |
| 580 | "2021-06-19 22:15:00+10" | "2021-06-19 20:45:00+10" | "01:30:00" | null | "2021-06-19 19:15:00+10" |
| 579 | "2021-06-19 20:45:00+10" | "2021-06-19 19:15:00+10" | "01:30:00" | null | "2021-06-19 19:15:00+10" |
| 578 | "2021-06-19 19:15:00+10" | "2021-06-05 23:59:00+10" | "13 days 19:16:00" | "2021-06-19 19:15:00+10" | "2021-06-19 19:15:00+10" |
| 571 | "2021-06-05 23:59:00+10" | "2021-06-05 22:00:00+10" | "01:59:00" | null | "2021-06-05 19:00:00+10" |
| 569 | "2021-06-05 22:00:00+10" | "2021-06-05 21:30:00+10" | "00:30:00" | null | "2021-06-05 19:00:00+10" |
| 570 | "2021-06-05 21:30:00+10" | "2021-06-05 21:10:00+10" | "00:20:00" | null | "2021-06-05 19:00:00+10" |
| 568 | "2021-06-05 21:10:00+10" | "2021-06-05 19:00:00+10" | "02:10:00" | null | "2021-06-05 19:00:00+10" |
| 567 | "2021-06-05 19:00:00+10" | "2021-05-22 18:15:00+10" | "14 days 00:45:00" | "2021-06-05 19:00:00+10" | "2021-06-05 19:00:00+10" |
Now start_group is not needed per se, I simply envisage it as an intermediary, conceivably easy to annotate. The goal is start_dt, which we can then GROUP_BY on to group them as desired, with nearby sessions with a day+ long gap either side.
I have a feeling this is best done and likely possible with annotations, so that a single query on the database can then return aggregate information on the grouped sessions.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
