'Pandas: groupby to list

I have data like below:

id  value   time

1   5   2000
1   6   2000
1   7   2000
1   5   2001
2   3   2000
2   3   2001
2   4   2005
2   5   2005
3   3   2000
3   6   2005

My final goal is to have data in a list like below:

[[5,6,7],[5]] (this is for id 1 grouped by the id and year)
[[3],[3],[4,5]] (this is for id 2 grouped by the id and year)
[[3],[6]] (same logic as above)

I have grouped the data using df.groupby(['id', 'year']). But after that, I am not able to access the groups and get the data in the above format.



Solution 1:[1]

If you want to calculate the lists for multiple columns, you can do the following:

import pandas as pd

df = pd.DataFrame(
    {'A': [1,1,2,2,2,2,3],
     'B':['a','b','c','d','e','f','g'],
     'C':['x','y','z','x','y','z','x']})

df.groupby('A').agg({'B': list,'C': list})

Which will calculate lists of B and C:

              B             C
A                            
1        [a, b]        [x, y]
2  [c, d, e, f]  [z, x, y, z]
3           [g]           [x]

To get lists for all columns:

df.groupby('A').agg(list)

To have the lists be sorted:

df.groupby('A').agg(sorted)

Solution 2:[2]

You could do the following:

import pandas as pd

data = [[1, 5, 2000],
        [1, 6, 2000],
        [1, 7, 2000],
        [1, 5, 2001],
        [2, 3, 2000],
        [2, 3, 2001],
        [2, 4, 2005],
        [2, 5, 2005],
        [3, 3, 2000],
        [3, 6, 2005]]

df = pd.DataFrame(data=data, columns=['id', 'value', 'year'])

result = []
for name, group in df.groupby(['id']):
    result.append([g['value'].values.tolist() for _, g in group.groupby(['year'])])

for e in result:
    print(e)

Output

[[5, 6, 7], [5]]
[[3], [3], [4, 5]]
[[3], [6]]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Asclepius
Solution 2 Dani Mesejo