'Django group nearby rows based on datetime

In a Django model that stores an event datetime, I would like to group rows nearby in datetime. To clarify, each object has a date_time field and I can find the gap between objects in date time rather easily with:

    # We want a minimum date_time as the default for prev_date_time
    # but because Session.date_time is a DateFimeField and TZ aware
    # Django requires teh default to be TZ aware. And datetime.min 
    # cannot be made TZ awre (it crashes), adding one day prevents 
    # that crash and works and serves our purposes.
    min_date_time = make_aware(datetime.min + timedelta(days=1), "UTC")
    sessions = Session.objects.all().annotate(
                prev_date_time=Window(
                    expression=Lead('date_time', default=min_date_time),
                    order_by=F('date_time').desc()
                ),
                dt_difference=F('date_time') - F('prev_date_time')
            )

This works brilliantly.

Now, I would like to group these sessions such that any session with a dt_difference of over 1 day marks a group boundary.

I imagine two new annotations could do the trick, but I am struggling to write them and would ideally like to do so without resorting to raw SQL.

  1. A new annotation that is equal to the date_time of the session when dt_difference is greater than one day, null otherwise.
  2. A new annotation that fills all these nulls with the first non-null value (in the direction away from the gap) with this date time.

Best illustrated by example. I extract some sample data from this table as follows and illustrate, in which I show the date_time and the prev_date_time and dt_difference as per the annotations above, and illustrate two new annotations, start_group and stat_dt.

id date_time prev_date_tim dt_difference start_group start_dt
577 "2021-07-03 23:59:00+10" "2021-07-03 22:30:00+10" "01:29:00" null "2021-07-03 19:45:00+10"
574 "2021-07-03 22:30:00+10" "2021-07-03 22:30:00+10" "00:00:00" null "2021-07-03 19:45:00+10"
576 "2021-07-03 22:30:00+10" "2021-07-03 21:00:00+10" "01:30:00" null "2021-07-03 19:45:00+10"
575 "2021-07-03 21:00:00+10" "2021-07-03 19:45:40+10" "01:14:20" null "2021-07-03 19:45:00+10"
572 "2021-07-03 19:45:40+10" "2021-07-03 19:45:00+10" "00:00:40" null "2021-07-03 19:45:00+10"
573 "2021-07-03 19:45:00+10" "2021-06-19 22:15:00+10" "13 days 21:30:00" "2021-07-03 19:45:00+10" "2021-07-03 19:45:00+10"
580 "2021-06-19 22:15:00+10" "2021-06-19 20:45:00+10" "01:30:00" null "2021-06-19 19:15:00+10"
579 "2021-06-19 20:45:00+10" "2021-06-19 19:15:00+10" "01:30:00" null "2021-06-19 19:15:00+10"
578 "2021-06-19 19:15:00+10" "2021-06-05 23:59:00+10" "13 days 19:16:00" "2021-06-19 19:15:00+10" "2021-06-19 19:15:00+10"
571 "2021-06-05 23:59:00+10" "2021-06-05 22:00:00+10" "01:59:00" null "2021-06-05 19:00:00+10"
569 "2021-06-05 22:00:00+10" "2021-06-05 21:30:00+10" "00:30:00" null "2021-06-05 19:00:00+10"
570 "2021-06-05 21:30:00+10" "2021-06-05 21:10:00+10" "00:20:00" null "2021-06-05 19:00:00+10"
568 "2021-06-05 21:10:00+10" "2021-06-05 19:00:00+10" "02:10:00" null "2021-06-05 19:00:00+10"
567 "2021-06-05 19:00:00+10" "2021-05-22 18:15:00+10" "14 days 00:45:00" "2021-06-05 19:00:00+10" "2021-06-05 19:00:00+10"

Now start_group is not needed per se, I simply envisage it as an intermediary, conceivably easy to annotate. The goal is start_dt, which we can then GROUP_BY on to group them as desired, with nearby sessions with a day+ long gap either side.

I have a feeling this is best done and likely possible with annotations, so that a single query on the database can then return aggregate information on the grouped sessions.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source