'Pandas DataFrame : How to groupby and sort "by blocks"?

I'm working with a DataFrame containing data as follows, and group the data two different ways.

>>> d = {
     "A": [100]*7 + [200]*7,
     "B": ["one"]*4 + ["two"]*3 + ["one"]*3 + ["two"]*4,
     "C": ["foo"]*3 + ["bar"] + ["foo"] + ["bar"]*2 + ["foo"]*2 + ["bar"] + ["foo"]*3 + ["bar"],
     "D": ["yay"] + ["nay"]*2 + ["yay"] + ["nay"]*3 + ["yay"] + ["nay"] + ["yay"]*3 + ["nay"] + ["yay"],
     "X": [2, 8, 3, 5, 1, 4, 3, 2, 6, 5, 1, 2, 4, 7]
    }

>>> df = pd.DataFrame(d)
>>> df

     A    B    C    D    X
0  100  one  foo  yay    2
1  100  one  foo  nay    8
2  100  one  foo  nay    3
3  100  one  bar  yay    5
4  100  two  foo  nay    1
5  100  two  bar  nay    4
6  100  two  bar  nay    3
7  200  one  foo  yay    2
8  200  one  foo  nay    6
9  200  one  bar  yay    5
10 200  two  foo  yay    1
11 200  two  foo  yay    2
12 200  two  foo  nay    4
13 200  two  bar  yay    7

>>> df_grp = df.groupby(['A', 'B'])
>>> df_grp_sorted = df_grp.sum().sort_values('X', ascending = False)
>>> df_grp_long = df.groupby(['A', 'B', 'C', 'D'])
>>> df_grp_sorted_long = df_grp_long.sum().sort_values('X', ascending = False)

This gives us :

>>> df_grp_sorted

            X
100  one   18
200  two   14
     one   13
100  two    8


>>> df_grp_sorted_long

                      X
100  one  foo  nay   11
     two  bar  nay    7
200  two  bar  yay    7
     one  foo  nay    6
100  one  bar  yay    5
200  one  bar  yay    5
     two  foo  nay    4
               yay    3
100  one  foo  yay    2
200  one  foo  yay    2    
100  two  foo  nay    1

Now, I would like to have the detail from df_grp_sorted_long, with the structure of df_grp_sorted. That would be :

>>> df_result

                      X
100  one  foo  nay   11
               yay    5
          foo  yay    2
200  two  bar  yay    7
          foo  nay    4
               yay    3
     one  foo  nay    6
          bar  yay    5
          foo  yay    2    
100  two  bar  nay    7
          foo  nay    1
          

I have done this with the following code (which goes against this post's advice) :

>>> col_names = ['A', 'B', 'C', 'D']
>>> df_result = pd.DataFrame(columns=col_names)
>>> for (i, (a, b)) in enumerate(df_grp_sorted.index):
        df_result = pd.concat(
            (
                df_result,
                (df[(df['A']==a) & (df['B']==b)]
                .groupby(col_names)
                .sum()
                .sort_values('X', ascending=False)
                )
            )
        )
>>> df_result = df_result["X"]

This gives the right answer, but is very slow for big data sets. I'm also wondering if there's a native way to do such a combination of grouping/sorting.

Also, maybe this approach is not the right one and there's a much simpler way to obtain this result of an equivalent one?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source