'Select minimum values from 2 pandas dataframe columns with 1 column with values constrained by time_stamp
EDITED I have a time series in a Pandas DataFrame for which I want to add a new column with the minimum values. Specifically, imagine I have the following values in my time series:
time_stamp price da_price min hour_of_day day_of_year
0 2021-01-01 00:00:00 64.84 50.87 50.87 0 1
1 2021-01-01 00:15:00 13.96 50.87 13.96 0 1
2 2021-01-01 00:30:00 12.40 50.87 12.40 0 1
3 2021-01-01 00:45:00 7.70 50.87 7.70 0 1
4 2021-01-01 01:00:00 64.25 48.19 48.19 1 1
5 2021-01-01 01:15:00 14.07 48.19 14.07 1 1
6 2021-01-01 01:30:00 13.25 48.19 13.25 1 1
7 2021-01-01 01:45:00 10.47 48.19 10.47 1 1
I would like to find the minimum values so this is ok with pandas function. However the only constraint I have is that da_price is composed of the same value for 1 full hour. So if the average of the price over an hour is smaller than the da_price, then we report the values giving the lower average as MIN. So, here above (64+13+12+7)/4=24 < 50.87, so the values price should be reported.
So in substance:
- if price gives the MIN average, no problem, we report the values as they are as MIN.
- if da_price gives the minimum value, then we report the values as MIN.
Any ideas how I can do this efficiently with Pandas and/or Numpy? Thanks!
Solution 1:[1]
If I understand correctly, this should solve the issue (I added some data to the input data to verify your constraint):
Input data
time_stamp price da_price min hour_of_day day_of_year
0 2021-01-01 00:00:00 64.84 50.87 50.87 0 1
1 2021-01-01 00:15:00 13.96 50.87 13.96 0 1
2 2021-01-01 00:30:00 12.40 50.87 12.40 0 1
3 2021-01-01 00:45:00 7.70 50.87 7.70 0 1
4 2021-01-01 01:00:00 64.25 48.19 48.19 1 1
5 2021-01-01 01:15:00 14.07 48.19 14.07 1 1
6 2021-01-01 01:30:00 13.25 48.19 13.25 1 1
7 2021-01-01 01:45:00 10.47 48.19 10.47 1 1
8 2021-01-01 02:00:00 64.25 22.19 48.19 1 1
9 2021-01-01 02:15:00 14.07 22.19 14.07 1 1
10 2021-01-01 02:30:00 13.25 22.19 13.25 1 1
11 2021-01-01 02:45:00 10.47 22.19 10.47 1 1
df['time_stamp'] = pd.to_datetime(df['time_stamp'])
df = df.set_index('time_stamp', drop=True)
# Means - for the third hour the da_price mean is lower
# print(df.resample('1H').mean()[['price','da_price']])
time_stamp price da_price
2021-01-01 00:00:00 24.725 50.87
2021-01-01 01:00:00 25.510 48.19
2021-01-01 02:00:00 25.510 22.19
def check(x):
if x.da_price.mean() < x.price.mean():
x.loc[:,'min'] = x.da_price.mean()
else:
x.loc[:,'min'] = [min(i,j) for i,j in zip(x.price.values, x.da_price.values)]
return x
df = df.resample('1H').apply(check)
Output:
price da_price hour_of_day day_of_year min
time_stamp
2021-01-01 00:00:00 64.84 50.87 0 1 50.87
2021-01-01 00:15:00 13.96 50.87 0 1 13.96
2021-01-01 00:30:00 12.40 50.87 0 1 12.40
2021-01-01 00:45:00 7.70 50.87 0 1 7.70
2021-01-01 01:00:00 64.25 48.19 1 1 48.19
2021-01-01 01:15:00 14.07 48.19 1 1 14.07
2021-01-01 01:30:00 13.25 48.19 1 1 13.25
2021-01-01 01:45:00 10.47 48.19 1 1 10.47
2021-01-01 02:00:00 64.25 22.19 1 1 22.19
2021-01-01 02:15:00 14.07 22.19 1 1 22.19
2021-01-01 02:30:00 13.25 22.19 1 1 22.19
2021-01-01 02:45:00 10.47 22.19 1 1 22.19
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Yolao_21 |
