'how to perform this operation in less time?
hi i am working on this dataset
date open high low close \
timestamp
1597133400 2020-08-11 08:10:00 11759.0 11761.5 11750.0 11760.0
1597133700 2020-08-11 08:15:00 11760.0 11776.0 11759.5 11773.5
1597134000 2020-08-11 08:20:00 11773.5 11774.0 11751.5 11752.0
1597134300 2020-08-11 08:25:00 11752.0 11752.0 11735.0 11744.5
buy sell \
timestamp
1597133400 [[[11566, 11578]]] [[[11959, 11971]]]
1597133700 [[[11566, 11578]]] [[[11959, 11971]]]
1597134000 [[[11566, 11578]]] [[[12339.6, 12457.12]]]
1597134300 [[[11157.275, 11039.83]]] [[[12331.725, 12449.17]]]
buy2 sell2 price_pool_long \
timestamp
1597133400 [11566, 11578] [11566, 11578] 11578.00
1597133700 [11566, 11578] [11566, 11578] 11578.00
1597134000 [11566, 11578] [11566, 11578] 11578.00
1597134300 [11157.275, 11039.83] [11157.275, 11039.83] 11039.83
price_pool_short buy_perc \
timestamp
1597133400 12348.000 [0.01649659863945574, 0.015476190476190421]
1597133700 12362.175 [0.017624325816452235, 0.016605087696946574]
1597134000 12339.600 [0.015827093260721625, 0.01480599046970732]
1597134300 12331.725 [0.050000000000000044, 0.06000000000000005]
sell_perc liquidation_short \
timestamp
1597133400 [-0.016773301054815892, -0.01571946795646917] 0
1597133700 [-0.017940515303475735, -0.01688547244774563] 0
1597134000 [-0.016081618537091558, -0.01502850233200892] 0
1597134300 [-0.05263157894736836, -0.06382978723404253] 0
liquidation_long
timestamp
1597133400 0
1597133700 0
1597134000 0
1597134300 0
and i need to change the value on the colum liquidation_long to -1 if the value of the colum low for the a row is lower than the value of the column price_pool_long from the previous row, else make it 0
i do this whit the next code,
index = data4.index
for i in range(1,len(index)):
low =data4.low[data4.index==index[i]]
ppl = data4.price_pool_long[data4.index==index[i-1]].values[0]
if low.values[0]<=ppl:
data4.liquidation_long[data4.index==index[i]]=-1
else:
data4.liquidation_long[data4.index==index[i]]=0
but using my dataframe whit 200k rows is not a really fast aproach,
using this row, i should get -1 on the liquidation_short(first the "i-1" and then the "i" row
date open high low close \
timestamp
1597176900 2020-08-11 20:15:00 11335.0 11335.0 11209.5 11215.5
buy \
timestamp
1597176900 [[[11192, 11204], [11204, 11215], [11215, 1122...
sell \
timestamp
1597176900 [[[11566, 11578], [11578, 11590], [11590, 1160...
buy2 sell2 \
timestamp
1597176900 [11192, 11204, 11215, 11227] [11192, 11204, 11215, 11227]
price_pool_long price_pool_short \
timestamp
1597176900 11227.0 11776.275
buy_perc \
timestamp
1597176900 [0.002095314520083824, 0.001025366680041051, 4...
sell_perc \
timestamp
1597176900 [-0.002099714081486681, -0.0010264191360227493...
liquidation_short liquidation_long
timestamp
1597176900 0 0
and the next:
date open high low close \
timestamp
1597176600 2020-08-11 20:10:00 11360.0 11367.0 11334.5 11335.0
buy \
timestamp
1597176600 [[[11192, 11204], [11204, 11215], [11215, 1122...
sell \
timestamp
1597176600 [[[11566, 11578], [11578, 11590], [11590, 1160...
buy2 \
timestamp
1597176600 [11192, 11204, 11215, 11227, 11238]
sell2 price_pool_long \
timestamp
1597176600 [11192, 11204, 11215, 11227, 11238] 11238.0
price_pool_short \
timestamp
1597176600 11901.75
buy_perc \
timestamp
1597176600 [0.012615791795324238, 0.011557123952359971, 0...
sell_perc \
timestamp
1597176600 [-0.012776983559685595, -0.011692252766869027,...
liquidation_short liquidation_long
timestamp
1597176600 0 0
i hope you can helpme to make an apply function or some nice aproach to make the same in less time, thanks in advance!
Solution 1:[1]
I faced a similar issue and solved it with a numpy work around. This solution should do the calculation within a few seconds max. I have built my own minimal example. Maybe you should check (or compare) the results before, but it should lead to your desired result, if I understood you correctly. The first value is of course nonsense, because it uses the newest ppl in my solution.
df = pd.DataFrame({'low': [2, 3, 4, 5, 6, 7],
'price_pool_long': [3, 2, 5, 8, 6, 1],
'liquidation_long': [0, 0, 0, 0, 0, 0]})
low = df['low'].to_numpy()
price_pool_long = df['price_pool_long'].to_numpy()
df['liquidation_long'] = np.where(low <= np.roll(price_pool_long, 1),-1,0)
When you do such calculations you must always avoid iterating the entire dataframe in python at all costs. This is always very slow. Small tip: 'concat' should also only be used with caution.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | PushTheButton |


