'Pandas combine rows into strings separated by slash and aggregating by some other columns
I have the initial df and I want to aggregate the 'combo' column into a unique string, separated by slashes, but respecting the order indicated in the sort. In desired data you can find my final target dataset
raw_data = {'name': ['B','B','A','A','A','A','C'],
'date' : pd.to_datetime(pd.Series(['2017-04-03','2017-04-03','2017-03-31','2017-03-31','2017-03-31','2017-04-04','2017-04-04'])),
'order': [2,1,4,2,1,1,1],
'combo': ['x','y','x','y','z','x','x']}
df = pd.DataFrame(raw_data, columns = ['name','date','order','combo'])
df=df.sort_values(["name","date","order"])
df
desired_raw = {'name': ['A','A','B','C'],
'date' : pd.to_datetime(pd.Series(['2017-03-31','2017-04-04','2017-04-03','2017-04-04'])),
'combined_combo': ["z/y/x","x","y/x","x"]}
desired_data = pd.DataFrame(desired_raw, columns = ['name','date','combined_combo'])
desired_data
#what I did until now
df1 = df.groupby(['name','date'])['combo'].apply(list).reset_index(name='new')
df1
Solution 1:[1]
Here is one way:
combined_combo = df.groupby(['name', 'date'])['combo'].agg('/'.join).rename('combined_combo')
print(combined_combo)
Out:
name date
A 2017-03-31 z/y/x
2017-04-04 x
B 2017-04-03 y/x
C 2017-04-04 x
Name: combined_combo, dtype: object
If you don't want the groups as the index use:
desired_data = combined_combo.reset_index()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
