'Performing mathematical operation comparing two datetime dataframes

I have two different data frames. df1=

index Time
0   2009-09-13 01:17:00
1   20009-09-13 02:17:00
2   2009-09-13 03:17:00
3   2009-09-13 04:17:00
4   2009-09-13 05:17:00

............ This is a time-series data with hourly interval spanning in total 10 days. and df2=

    Selling Time          Buying Time                      sell     buy
0   2009-09-13 01:17:00   2009-09-13 04:35:00               1        -1
1   2009-09-13 04:20:00   2009-09-13 04:45:00               27       -3
2   2009-09-14 02:25:00   2009-09-15 01:17:00               1        -2
3   2009-09-15 05:33:00   2009-09-16 01:35:00               100      -5

..............

Here, the sell and buy column is in USD for a particular item. The item is bought and sold at different time.

I want to iterate the "df1" hourly time data through "df2" and calculate the what is the current value of money that I have at each particular hour (summing the previous hour value). For example, on 2009-09-13 01:17:00, I have sold 1 USD and nothing bought on that time. So,I would want to make a new dataset where 2009-09-13 01:00:00 would have the value 1. So, It will remain 1 for 02:00:00 and 03:00:00 for the date of 2009-09-13.

For 04 Hour I have sold 27USD and I have previous hour 1 USD. but bought (-1-3)= -4USD. So, for 2009-09-13 04:00:00 the new value would be (27+1)-4= 24USD and so on.

The output should look like this:

index Time                    Present Value
0   2009-09-13 01:00:00       1
1   20009-09-13 02:00:00      1
2   2009-09-13 03:00:00       1
3   2009-09-13 04:00:00       24  

...... and so on till the last date and time of df1.

Note that all of the "sell" and "buy" value should be added together within the same hour for the dataset "df1".

I can understand the for loop calculation but struggling to build a for loop with datetime.



Solution 1:[1]

I'd think about it a different way: your value over time is a step function. We can use a python package called staircase to help. It is built on pandas and numpy.

setup

df1 = pd.DataFrame(
    {
        "time": [
            pd.Timestamp("2009-09-13 01:00:00"),
            pd.Timestamp("2009-09-13 02:00:00"),
            pd.Timestamp("2009-09-13 03:00:00"),
            pd.Timestamp("2009-09-13 04:00:00"),
            pd.Timestamp("2009-09-13 05:00:00"),
        ]
    }
)

df2 = pd.DataFrame(
    {
        "sell_time":[
            pd.Timestamp("2009-09-13 01:17:00"),
            pd.Timestamp("2009-09-13 04:20:00"),
            pd.Timestamp("2009-09-14 02:25:00"),
            pd.Timestamp("2009-09-15 05:33:00"),
        ],
        "buy_time":[
            pd.Timestamp("2009-09-13 04:35:00"),
            pd.Timestamp("2009-09-13 04:45:00"),
            pd.Timestamp("2009-09-15 01:17:00"),
            pd.Timestamp("2009-09-16 01:35:00"),
        ],
        "sell":[1,27,1,100],
        "buy":[-1,-3,-2,-5]
    }
)

solution

import staircase as sc

# setup step function
sf = sc.Stairs().layer(df2["sell_time"], value=df2["sell"]).layer(df2["buy_time"], value=df2["buy"])

# you can plot it
sf.plot()

step function for value

# sample the step function at particular values
sf(df1["time"], include_index=True)

This gives you the following pandas Series which matches up times with the value of the step function

time
2009-09-13 01:00:00     0
2009-09-13 02:00:00     1
2009-09-13 03:00:00     1
2009-09-13 04:00:00     1
2009-09-13 05:00:00    24
dtype: int64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Riley