'How to create a list of tuples containing pairs of values from a column based on some conditions in Pandas?
Have following pandas dataframe which is sorted like mentioned below:
df = df.sort_values(['A','B','C','D'])
A B C D E F
19 1 400 503 8 628
19 1 400 527 9 628
19 1 400 557 8 628
19 1 400 611 9 628
19 1 400 615 8 628
19 2 500 705 9 800
19 2 500 720 8 800
19 2 500 745 9 800
19 3 600 850 9 1000
I want to create a list of tuples for all groups created by df.groupby(['A','B','C']) where :-
- Each tuple will be a pair of values from column D respective to each consecutive pair of 8 & 9 in column E.
- If there is no 9 after 8, then value from column F will be taken (all values of column F will be same for a group).
- If there is no 8 in column E for a group, the list should be empty.
For example:
Group1: A=19, B=1, C=400
lst = [(503,527), (557,611), (615,628)] #Rule2 applied
Group2: A=19, B=2, C=500
lst = [(720,745)] #Starting from the first 8 in the group, skipping the 9 above
Group3: A=19, B=3, C=600
lst = [] #Rule3 applied
How can this be achieved in the fastest way possible?
EDIT I have tried the following code:
for k, d in df.groupby(['A','B','C'], as_index=False):
lst, attempt = [], True
while attempt is True:
d = d.reset_index(drop=True)
if 8 in d.E.unique().tolist():
to_drop, tup = [], ()
a = d.loc[d['E'] == 8]['D'].min()
aindex = d.loc[d['E'] == 8]['D'].index[0]
if 9 in d.E.unique().tolist():
b = d.loc[d['E'] == 9]['D'].min()
bindex = d.loc[d['E'] == 9]['D'].index[0]
else:
b = d.F.unique().item()
bindex = aindex
tup = (a,b)
lst.append(tup)
to_drop.extend((aindex,bindex))
d = d.drop(d.index[to_drop])
else:
attempt=False
if d.empty:
attempt=False
print(k)
print(lst)
It gives following output:
(19, 1, 400)
[(503, 527), (557, 611), (615, 628)]
(19, 2, 500)
[(720, 705)]
(19, 3, 600)
[]
But the output for group2 is wrong. Also, I feel, if there is a better pythonic way to solve this problem?
Solution 1:[1]
Reset the indexes on each iteration from the 'd' frame. The calculation is performed either in the 'if stop < hist:' block, or when there is only one line in the else block.
for k, d in df.groupby(['A','B','C'], as_index=False):
hist = len(d)
d = d.reset_index()
lst = []
for i in range(0, hist):
stop = i + 1
if stop < hist:
if d.loc[i, 'E'] == 8 and d.loc[i+1, 'E'] == 9:
lst.append((d.loc[i, 'D'], d.loc[i+1, 'D']))
else:
if d.loc[i, 'E'] == 8:
lst.append((d.loc[i, 'D'], d.loc[i, 'F']))
print(k)
print(lst)
Output
(19, 1, 400)
[(503, 527), (557, 611), (615, 628)]
(19, 2, 500)
[(720, 745)]
(19, 3, 600)
[]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
