'Pandas: groupby to list
I have data like below:
id value time
1 5 2000
1 6 2000
1 7 2000
1 5 2001
2 3 2000
2 3 2001
2 4 2005
2 5 2005
3 3 2000
3 6 2005
My final goal is to have data in a list like below:
[[5,6,7],[5]] (this is for id 1 grouped by the id and year)
[[3],[3],[4,5]] (this is for id 2 grouped by the id and year)
[[3],[6]] (same logic as above)
I have grouped the data using df.groupby(['id', 'year']). But after that, I am not able to access the groups and get the data in the above format.
Solution 1:[1]
If you want to calculate the lists for multiple columns, you can do the following:
import pandas as pd
df = pd.DataFrame(
{'A': [1,1,2,2,2,2,3],
'B':['a','b','c','d','e','f','g'],
'C':['x','y','z','x','y','z','x']})
df.groupby('A').agg({'B': list,'C': list})
Which will calculate lists of B and C:
B C
A
1 [a, b] [x, y]
2 [c, d, e, f] [z, x, y, z]
3 [g] [x]
To get lists for all columns:
df.groupby('A').agg(list)
To have the lists be sorted:
df.groupby('A').agg(sorted)
Solution 2:[2]
You could do the following:
import pandas as pd
data = [[1, 5, 2000],
[1, 6, 2000],
[1, 7, 2000],
[1, 5, 2001],
[2, 3, 2000],
[2, 3, 2001],
[2, 4, 2005],
[2, 5, 2005],
[3, 3, 2000],
[3, 6, 2005]]
df = pd.DataFrame(data=data, columns=['id', 'value', 'year'])
result = []
for name, group in df.groupby(['id']):
result.append([g['value'].values.tolist() for _, g in group.groupby(['year'])])
for e in result:
print(e)
Output
[[5, 6, 7], [5]]
[[3], [3], [4, 5]]
[[3], [6]]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Asclepius |
| Solution 2 | Dani Mesejo |
