'select only record where the cumulated sum of (VolumePred) is less than 450 [duplicate]
I have a dataframe :
ConversionPred VolumePred
OSBrowser PageId
(11, 16) 955764 88273.0 125110.0
955761 78408.0 104703.0
1184903 57702.0 118085.0
955767 49224.0 68942.0
1149586 36405.0 53582.0
... ... ... ...
(32, 16) 899748 0.0 4.0
(11, 15) 835198 0.0 4.0
(32, 16) 955761 0.0 151.0
For each group of OSBrowser, I have to select only record where the cumulated sum of (VolumePred) is less than 450
I tried with code :
subdata.loc[subdata['VolumePred'].cumsum() < 450, :]
But didn't work : I got this result :
ConversionPred VolumePred
OSBrowser PageId
(11, 11) 789615 15.0 20.0
923645 8.0 36.0
I don't know why only these 2 rows are selected ? why these rows :
(32, 16) 899748 0.0 4.0
(11, 15) 835198 0.0 4.0
(32, 16) 955761 0.0 151.0
are not selected?
strange
Solution 1:[1]
IIUC, try:
output = subdata[subdata.groupby(level=0).transform("cumsum").lt(450)]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | not_speshal |
