'how to use pandas resample method?
I want to perform a sampling from a datetime series pandas using resample method. I don't understand the output I've got. I was expecting to get a sampling of '5s' but I'm getting 17460145 rows from 100 original dataframe. How should be the correct use of resample ?
import numpy as np
import pandas as pd
def random_dates(start, end, n=100):
start_u = start.value//10**9
end_u = end.value//10**9
return pd.to_datetime(np.random.randint(start_u, end_u, n), unit='s')
start = pd.to_datetime('2022-01-01')
end = pd.to_datetime('2023-01-01')
rd=random_dates(start, end)
clas = np.random.choice(['A','B','C'],size=100)
value = np.random.randint(0,100,size=100)
df =pd.DataFrame.from_dict({'ts':rd,'cl':clas,'vl':value}).set_index('ts').sort_index()
df
Out[48]:
cl vl
ts
2022-01-04 17:25:10 B 27
2022-01-06 19:17:35 C 34
2022-01-17 22:55:25 B 1
2022-01-23 00:33:25 A 20
2022-01-27 18:26:56 A 55
.. ..
2022-12-14 07:46:50 C 22
2022-12-18 02:33:52 C 52
2022-12-22 17:35:10 A 52
2022-12-28 04:55:20 A 57
2022-12-29 03:19:00 A 60
[100 rows x 2 columns]
df.groupby(by='cl').resample('5s').mean()
Out[49]:
vl
cl ts
A 2022-01-23 00:33:25 20.0
2022-01-23 00:33:30 NaN
2022-01-23 00:33:35 NaN
2022-01-23 00:33:40 NaN
2022-01-23 00:33:45 NaN
...
C 2022-12-18 02:33:30 NaN
2022-12-18 02:33:35 NaN
2022-12-18 02:33:40 NaN
2022-12-18 02:33:45 NaN
2022-12-18 02:33:50 52.0
[17460145 rows x 1 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
