'Pandas, group by count and add count to original dataframe?

When trying to count rows with similar 'kind' in data frame:

import pandas as pd

items = [('aaa','aaa text 1'), ('aaa','aaa text 2'), ('aaa','aaa text 3'),
         ('bb', 'bb text 1'), ('bb', 'bb text 2'), ('bb', 'bb text 3'), 
         ('bb', 'bb text 4'),
         ('cccc','cccc text 1'), ('cccc','cccc text 2'),
         ('dd', 'dd text 1'),
         ('e', 'e text 1'),
         ('fff', 'fff text 1'),
        ]

df = pd.DataFrame(items, columns=['kind', 'msg'])
df

    kind    msg
0   aaa     aaa text 1
1   aaa     aaa text 2
2   aaa     aaa text 3
3   bb      bb text 1
4   bb      bb text 2
5   bb      bb text 3
6   bb      bb text 4
7   cccc    cccc text 1
8   cccc    cccc text 2
9   dd      dd text 1
10  e       e text 1
11  fff     fff text 1

This code works:

df = df[['kind']].groupby(['kind'])['kind'] \
                         .count() \
                         .reset_index(name='count') \
                         .sort_values(['count'], ascending=False) \
                         .head(5)

df

Resulting in:

    kind      count
    0   aaa   1
    1   bb    1
    2   cccc  1
    3   dd    1
    4   e     1

Yet, how can one get a data frame with all columns as in original one plus 'count' column? So the result should have columns 'kind', 'msg', 'count' in this order?

Also, how to sort this resulting data frame in descending order of count?



Solution 1:[1]

IIUC

In [247]: df['count'] = df.groupby('kind').transform('count')

In [248]: df
Out[248]:
    kind          msg  count
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

sorting:

In [249]: df.sort_values('count', ascending=False)
Out[249]:
    kind          msg  count
3     bb    bb text 1      4
4     bb    bb text 2      4
5     bb    bb text 3      4
6     bb    bb text 4      4
0    aaa   aaa text 1      3
1    aaa   aaa text 2      3
2    aaa   aaa text 3      3
7   cccc  cccc text 1      2
8   cccc  cccc text 2      2
9     dd    dd text 1      1
10     e     e text 1      1
11   fff   fff text 1      1

Solution 2:[2]

Here is the simple code to count the frequencies and add a column to the data frame when grouping by the kind column.

df['count'] = df.groupby('kind')['kind'].transform('count')

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 MaxU - stop genocide of UA
Solution 2 Community