'Pandas: Three listed columns to wide format

How to expand a set of columns using the first column's values as headers for the other columns?

For example:

x = pd.DataFrame({'id':[11,998,3923], 'count':[7,7,7],
  'attributes':['VIS,TEMP,MIN','MIN,VIS,TEMP','MIN,VIS'],
  'attribute_values':['0,4,2','2,3,0','0,9'],
  'attribute_years':['2000,2001,2002','2001,2002,2003','2008,2009']})

(Edit: note that attributes could be out of order or missing.)

index	id	count	attributes	attribute_values	attribute_years
0	11	7	VIS,TEMP,MIN	0,4,2	2000,2001,2002
1	998	7	MIN,VIS,TEMP	2,3,0	2001,2002,2003
2	3923	7	MIN,VIS	0,9	2008,2009

In this case, the attributes column values should used to make new columns with attribute_values and attribute_years columns.

Ideal output:

index	id	count	attribute_values_VIS	attribute_values_TEMP	attribute_values_MIN	attribute_years_VIS	attribute_years_TEMP	attribute_years_MIN
0	11	7	0	4	2	2000	2001	2002
1	998	7	3	0	2	2002	2003	2001
2	3923	7	9	NaN	0	2009	NaN	2008

Solution 1:^[1]

I guess there is no native function in pandas to split strings and create new columns from them. However, you can easily write your own little function. Assuming your example

import pandas as pd

x = pd.DataFrame({'id': [1,2,3],
'attributes': ['VIS,TEMP,MIN','MIN,VIS,TEMP','MIN,VIS'],
'attribute_values': ['0,4,2','2,3,0','0,9'],
'attribute_years': ['2000,2001,2002','2001,2002,2003','2008,2009']})

You can just loop over the rows, split the strings in the columns x['attributes'], x['attribute_values'], and x['attribute_years'], split the strings (always assuming that we don't need to fear different length and that ',' indicates the character to split the strings) and create a new dictionary. Collecting all dictionaries with the new key-value pairs, you can just build a new pandas.DataFrame and assign it to the original if you like:

data = []
for i, row in x.iterrows():
    # extract data
    att = row['attributes'].split(',')
    val = list(map(int, row['attribute_values'].split(',')))
    yrs = list(map(int, row['attribute_years'].split(',')))
    # create new dictionaries
    row_new = {f'attribute_values_{a}': v for a, v in zip(att, val)}
    row_new.update({f'attribute_years_{a}': y for a, y in zip(att, yrs)})
    # concatenate dictionaries and append to list
    data.append(row_new)
# create table from list of rows
pd.DataFrame(data, index=x.index)

output:

attribute_values_VIS	attribute_values_TEMP	attribute_values_MIN	attribute_years_VIS	attribute_years_TEMP	attribute_years_MIN
0	0	4.0	2	2000	2001.0
1	3	0.0	2	2002	2003.0
2	9	NaN	0	2009	NaN

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1

'Pandas: Three listed columns to wide format

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]