'Find the number of elements in a DataFrame in the last 30 minutes
I have a dataframe that contains for a specific timestamp, the number of items on a specific event.
matrix = pd.DataFrame()
matrix['Timestamp'] = [Timestamp('2019-12-01 11:25:32.060000'),
Timestamp('2019-12-01 15:24:38.527000'),
Timestamp('2019-12-02 17:09:57.907000'),
Timestamp('2019-12-02 18:06:38.883000'),
Timestamp('2019-12-02 19:16:43.667000'),
Timestamp('2019-12-03 20:59:10.727000')]
matrix['N° Items'] = [1,2,1,4,5,7]
matrix
I want to calcule for every row, the sum of the number of items in the last 30 minutes. I solved the problem with the following:
def in_Range(delta):
seconds = delta.total_seconds()
return (seconds>0) and (seconds<60*30)
def func(ts):
return matrix[(ts-matrix['Timestamp']).apply(in_Range)]['N° Items'].sum()
matrix['N° Items Last 30 Minutes'] = matrix['Timestamp'].apply(func)
I would like to know if there are other (hopefully better) ways to solve this problem. This solution is very slow (even using multiprocessing) and my real dataframe is big.
Solution 1:[1]
DataFrame.rolling is what you are looking for. The function only works if your dataframe's index is a Timestamp series:
result = (
matrix.set_index("Timestamp")
.assign(**{
"N° Items Last 30 Minutes": lambda x: x["N° Items"].rolling("30T").sum() - x["N° Items"]
})
)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Code Different |

