'Find the number of elements in a DataFrame in the last 30 minutes

I have a dataframe that contains for a specific timestamp, the number of items on a specific event.

matrix = pd.DataFrame()
matrix['Timestamp'] = [Timestamp('2019-12-01 11:25:32.060000'),
 Timestamp('2019-12-01 15:24:38.527000'),
 Timestamp('2019-12-02 17:09:57.907000'),
 Timestamp('2019-12-02 18:06:38.883000'),
 Timestamp('2019-12-02 19:16:43.667000'),
 Timestamp('2019-12-03 20:59:10.727000')]
matrix['N° Items'] = [1,2,1,4,5,7]
matrix

enter image description here

I want to calcule for every row, the sum of the number of items in the last 30 minutes. I solved the problem with the following:

def in_Range(delta):
    seconds = delta.total_seconds()
    return (seconds>0) and (seconds<60*30)

def func(ts):
    return matrix[(ts-matrix['Timestamp']).apply(in_Range)]['N° Items'].sum()

matrix['N° Items Last 30 Minutes'] = matrix['Timestamp'].apply(func)

I would like to know if there are other (hopefully better) ways to solve this problem. This solution is very slow (even using multiprocessing) and my real dataframe is big.



Solution 1:[1]

DataFrame.rolling is what you are looking for. The function only works if your dataframe's index is a Timestamp series:

result = (
    matrix.set_index("Timestamp")
    .assign(**{
        "N° Items Last 30 Minutes": lambda x: x["N° Items"].rolling("30T").sum() - x["N° Items"]
    })
)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Code Different