'Pandas groupby with delimiter join
I tried to use groupby to group rows with multiple values.
col val
A Cat
A Tiger
B Ball
B Bat
import pandas as pd
df = pd.read_csv("Inputfile.txt", sep='\t')
group = df.groupby(['col'])['val'].sum()
I got
A CatTiger
B BallBat
I want to introduce a delimiter, so that my output looks like
A Cat-Tiger
B Ball-Bat
I tried,
group = df.groupby(['col'])['val'].sum().apply(lambda x: '-'.join(x))
this yielded,
A C-a-t-T-i-g-e-r
B B-a-l-l-B-a-t
What is the issue here ?
Thanks,
AP
Solution 1:[1]
Alternatively you can do it this way:
In [48]: df.groupby('col')['val'].agg('-'.join)
Out[48]:
col
A Cat-Tiger
B Ball-Bat
Name: val, dtype: object
UPDATE: answering question from the comment:
In [2]: df
Out[2]:
col val
0 A Cat
1 A Tiger
2 A Panda
3 B Ball
4 B Bat
5 B Mouse
6 B Egg
In [3]: df.groupby('col')['val'].agg('-'.join)
Out[3]:
col
A Cat-Tiger-Panda
B Ball-Bat-Mouse-Egg
Name: val, dtype: object
Last for convert index or MultiIndex to columns:
df1 = df.groupby('col')['val'].agg('-'.join).reset_index(name='new')
Solution 2:[2]
just try
group = df.groupby(['col'])['val'].apply(lambda x: '-'.join(x))
Solution 3:[3]
You can first aggregate to list
and then join with str.join
:
df = pd.DataFrame({'A': [1, 1, 1, 2, 2, 2], 'B': ['a', 'b', 'c', 'd', 'e', 'f']})
df.groupby('A')['B'].agg(list).str.join('-')
Output:
A
1 a-b-c
2 d-e-f
Name: B, dtype: object
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | jezrael |
Solution 2 | ?????? |
Solution 3 | Mykola Zotko |