'splitting a column into multiple columns with specific name in pandas dataframe
I have following dataframe:
pri sec
TOM AB,CD,EF
JACK XY,YZ
HARRY FG
NICK KY,NY,SD,EF,FR
I need following output with column names as following(based on how many , separated fields exists in column 'sec'):
pri sec sec0 sec1 sec2 sec3 sec4
TOM AB,CD,EF AB CD EF NaN NaN
JACK XY,YZ XY YZ NaN NaN NaN
HARRY FG FG NaN NaN NaN NaN
NICK KY,NY,SD,EF,FR KY NY SD EF ER
Can I get any suggestions?
Solution 1:[1]
Use join + split + add_prefix:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec'))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 JACK XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
And if need NaNs add fillna:
df = df.join(df['sec'].str.split(',', expand=True).add_prefix('sec').fillna(np.nan))
print (df)
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF NaN NaN
1 JACK XY,YZ XY YZ NaN NaN NaN
2 HARRY FG FG NaN NaN NaN NaN
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
Solution 2:[2]
Try following code (explanations as comments). It finds max length of items in "sec" column and creates names accordingly:
maxlen = max(list(map(lambda x: len(x.split(",")) ,df.sec))) # find max length in 'sec' column
cols = ["sec"+str(x) for x in range(maxlen)] # create new column names
datalist = list(map(lambda x: x.split(","), df.sec)) # create list from entries in "sec"
newdf = pd.DataFrame(data=datalist, columns=cols) # create dataframe of new columns
newdf = pd.concat([df, newdf], axis=1) # add it to original dataframe
print(newdf)
Output:
pri sec sec0 sec1 sec2 sec3 sec4
0 TOM AB,CD,EF AB CD EF None None
1 JACK XY,YZ XY YZ None None None
2 HARRY FG FG None None None None
3 NICK KY,NY,SD,EF,FR KY NY SD EF FR
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
