'Looking for a faster method for iterative calculations in DataFrame

My current code works but I would like help making it cleaner/faster.

Goal: With col 'numb' and col 'shopper' being consistent, keep rows with a timeframe longer than 10 minutes. A simple diff() or shift() function would not work in this case since I'm not just looking at the previous row but total elapsed time for rows with the same numb/shopper combination.

Original dataframe:

timestamp rand shopper rand2 numb
2022-08-06 11:25:00 b Mark i 706040
2022-08-06 11:30:00 a John h 845843
2022-08-06 11:55:00 c John g 845843
2022-08-06 11:57:00 d John h 845843
2022-08-06 11:59:00 f John j 845843
2022-08-06 12:07:00 d John h 845843
2022-08-06 12:10:00 d John h 845843
2022-08-06 12:12:00 f Peter j 635640

Expected output:

timestamp rand shopper rand2 numb
2022-08-06 11:25:00 b Mark i 706040
2022-08-06 11:30:00 a John h 845843
2022-08-06 11:55:00 c John g 845843
2022-08-06 12:07:00 d John h 845843
2022-08-06 12:12:00 f Peter j 635640

My code:

ref_idx = 0
next_idx = 1
numdays = timedelta(minutes = 10)
saved_rows = df.iloc[[0]]
while True:
    try:
        next_time = df['timestamp'].iloc[next_idx]
    except IndexError:
        break
    else:
        ref_time = df['timestamp'].iloc[ref_idx]
        ref_numb = df['numb'].iloc[ref_idx]
        next_numb = df['numb'].iloc[next_idx]
        if (((next_time - ref_time) >= numdays) and (ref_numb == next_numb) or ref_numb != next_numb):
            saved_rows.loc[len(saved_rows.index)] = df.iloc[next_idx]
            ref_idx = next_idx
        next_idx += 1
print(saved_rows)


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source