'Select minimum values from 2 pandas dataframe columns with 1 column with values constrained by time_stamp

EDITED I have a time series in a Pandas DataFrame for which I want to add a new column with the minimum values. Specifically, imagine I have the following values in my time series:

                time_stamp  price  da_price    min  hour_of_day  day_of_year
0  2021-01-01 00:00:00  64.84     50.87  50.87            0            1
1  2021-01-01 00:15:00  13.96     50.87  13.96            0            1
2  2021-01-01 00:30:00  12.40     50.87  12.40            0            1
3  2021-01-01 00:45:00   7.70     50.87   7.70            0            1
4  2021-01-01 01:00:00  64.25     48.19  48.19            1            1
5  2021-01-01 01:15:00  14.07     48.19  14.07            1            1
6  2021-01-01 01:30:00  13.25     48.19  13.25            1            1
7  2021-01-01 01:45:00  10.47     48.19  10.47            1            1

I would like to find the minimum values so this is ok with pandas function. However the only constraint I have is that da_price is composed of the same value for 1 full hour. So if the average of the price over an hour is smaller than the da_price, then we report the values giving the lower average as MIN. So, here above (64+13+12+7)/4=24 < 50.87, so the values price should be reported.

So in substance:

  • if price gives the MIN average, no problem, we report the values as they are as MIN.
  • if da_price gives the minimum value, then we report the values as MIN.

Any ideas how I can do this efficiently with Pandas and/or Numpy? Thanks!



Solution 1:[1]

If I understand correctly, this should solve the issue (I added some data to the input data to verify your constraint):

Input data

                time_stamp  price  da_price    min  hour_of_day  day_of_year
0  2021-01-01 00:00:00  64.84     50.87  50.87            0            1
1  2021-01-01 00:15:00  13.96     50.87  13.96            0            1
2  2021-01-01 00:30:00  12.40     50.87  12.40            0            1
3  2021-01-01 00:45:00   7.70     50.87   7.70            0            1
4  2021-01-01 01:00:00  64.25     48.19  48.19            1            1
5  2021-01-01 01:15:00  14.07     48.19  14.07            1            1
6  2021-01-01 01:30:00  13.25     48.19  13.25            1            1
7  2021-01-01 01:45:00  10.47     48.19  10.47            1            1
8  2021-01-01 02:00:00  64.25     22.19  48.19            1            1
9  2021-01-01 02:15:00  14.07     22.19  14.07            1            1
10  2021-01-01 02:30:00  13.25     22.19  13.25           1            1
11  2021-01-01 02:45:00  10.47     22.19  10.47           1            1
df['time_stamp'] = pd.to_datetime(df['time_stamp'])
df = df.set_index('time_stamp', drop=True)

# Means - for the third hour the da_price mean is lower 
# print(df.resample('1H').mean()[['price','da_price']])
time_stamp           price  da_price             
2021-01-01 00:00:00  24.725     50.87
2021-01-01 01:00:00  25.510     48.19
2021-01-01 02:00:00  25.510     22.19  


def check(x):
    if x.da_price.mean() < x.price.mean():
        x.loc[:,'min'] = x.da_price.mean()
    else:
        x.loc[:,'min'] = [min(i,j) for i,j in zip(x.price.values, x.da_price.values)]
    return x


df = df.resample('1H').apply(check)


Output:

                     price  da_price  hour_of_day  day_of_year    min
time_stamp                                                           
2021-01-01 00:00:00  64.84     50.87            0            1  50.87
2021-01-01 00:15:00  13.96     50.87            0            1  13.96
2021-01-01 00:30:00  12.40     50.87            0            1  12.40
2021-01-01 00:45:00   7.70     50.87            0            1   7.70
2021-01-01 01:00:00  64.25     48.19            1            1  48.19
2021-01-01 01:15:00  14.07     48.19            1            1  14.07
2021-01-01 01:30:00  13.25     48.19            1            1  13.25
2021-01-01 01:45:00  10.47     48.19            1            1  10.47
2021-01-01 02:00:00  64.25     22.19            1            1  22.19
2021-01-01 02:15:00  14.07     22.19            1            1  22.19
2021-01-01 02:30:00  13.25     22.19            1            1  22.19
2021-01-01 02:45:00  10.47     22.19            1            1  22.19

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Yolao_21