'Looking for a faster method for iterative calculations in DataFrame

My current code works but I would like help making it cleaner/faster.

Goal: With col 'numb' and col 'shopper' being consistent, keep rows with a timeframe longer than 10 minutes. A simple diff() or shift() function would not work in this case since I'm not just looking at the previous row but total elapsed time for rows with the same numb/shopper combination.

Original dataframe:

timestamp	rand	shopper	rand2	numb
2022-08-06 11:25:00	b	Mark	i	706040
2022-08-06 11:30:00	a	John	h	845843
2022-08-06 11:55:00	c	John	g	845843
2022-08-06 11:57:00	d	John	h	845843
2022-08-06 11:59:00	f	John	j	845843
2022-08-06 12:07:00	d	John	h	845843
2022-08-06 12:10:00	d	John	h	845843
2022-08-06 12:12:00	f	Peter	j	635640

Expected output:

timestamp	rand	shopper	rand2	numb
2022-08-06 11:25:00	b	Mark	i	706040
2022-08-06 11:30:00	a	John	h	845843
2022-08-06 11:55:00	c	John	g	845843
2022-08-06 12:07:00	d	John	h	845843
2022-08-06 12:12:00	f	Peter	j	635640

My code:

ref_idx = 0
next_idx = 1
numdays = timedelta(minutes = 10)
saved_rows = df.iloc[[0]]
while True:
    try:
        next_time = df['timestamp'].iloc[next_idx]
    except IndexError:
        break
    else:
        ref_time = df['timestamp'].iloc[ref_idx]
        ref_numb = df['numb'].iloc[ref_idx]
        next_numb = df['numb'].iloc[next_idx]
        if (((next_time - ref_time) >= numdays) and (ref_numb == next_numb) or ref_numb != next_numb):
            saved_rows.loc[len(saved_rows.index)] = df.iloc[next_idx]
            ref_idx = next_idx
        next_idx += 1
print(saved_rows)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Looking for a faster method for iterative calculations in DataFrame

Sources

Related Questions