'How to create a list of tuples containing pairs of values from a column based on some conditions in Pandas?

Have following pandas dataframe which is sorted like mentioned below:

df = df.sort_values(['A','B','C','D'])
 A   B    C     D     E     F
19   1   400   503    8    628
19   1   400   527    9    628
19   1   400   557    8    628
19   1   400   611    9    628
19   1   400   615    8    628
19   2   500   705    9    800
19   2   500   720    8    800
19   2   500   745    9    800
19   3   600   850    9    1000

I want to create a list of tuples for all groups created by df.groupby(['A','B','C']) where :-

  1. Each tuple will be a pair of values from column D respective to each consecutive pair of 8 & 9 in column E.
  2. If there is no 9 after 8, then value from column F will be taken (all values of column F will be same for a group).
  3. If there is no 8 in column E for a group, the list should be empty.

For example:

Group1: A=19, B=1, C=400
lst = [(503,527), (557,611), (615,628)]    #Rule2 applied
Group2: A=19, B=2, C=500
lst = [(720,745)]                          #Starting from the first 8 in the group, skipping the 9 above
Group3: A=19, B=3, C=600
lst = []                                   #Rule3 applied

How can this be achieved in the fastest way possible?

EDIT I have tried the following code:

for k, d in df.groupby(['A','B','C'], as_index=False):
    lst, attempt = [], True
    while attempt is True:
        d = d.reset_index(drop=True)
        if 8 in d.E.unique().tolist():
            to_drop, tup = [], ()
            a = d.loc[d['E'] == 8]['D'].min()
            aindex = d.loc[d['E'] == 8]['D'].index[0]
            if 9 in d.E.unique().tolist():
                b = d.loc[d['E'] == 9]['D'].min()
                bindex = d.loc[d['E'] == 9]['D'].index[0]
            else:
                b = d.F.unique().item()
                bindex = aindex
            tup = (a,b)
            lst.append(tup)
            to_drop.extend((aindex,bindex))
            d = d.drop(d.index[to_drop])
        else:
            attempt=False
        if d.empty:
            attempt=False            
    print(k)
    print(lst)

It gives following output:

(19, 1, 400)
[(503, 527), (557, 611), (615, 628)]
(19, 2, 500)
[(720, 705)]
(19, 3, 600)
[]

But the output for group2 is wrong. Also, I feel, if there is a better pythonic way to solve this problem?



Solution 1:[1]

Reset the indexes on each iteration from the 'd' frame. The calculation is performed either in the 'if stop < hist:' block, or when there is only one line in the else block.

for k, d in df.groupby(['A','B','C'], as_index=False):
   hist = len(d)
   d = d.reset_index()
   lst = []
   for i in range(0, hist):
      stop = i + 1
      if stop < hist:
         if d.loc[i, 'E'] == 8 and d.loc[i+1, 'E'] == 9:
            lst.append((d.loc[i, 'D'], d.loc[i+1, 'D']))
      else:
         if d.loc[i, 'E'] == 8:
            lst.append((d.loc[i, 'D'], d.loc[i, 'F']))
   print(k)
   print(lst)

Output

(19, 1, 400)
[(503, 527), (557, 611), (615, 628)]
(19, 2, 500)
[(720, 745)]
(19, 3, 600)
[]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1