'Split column in a pandas dataframe
I have a file in which one of the column is a multi-value field, for example:
Col1|Col2
rec1|xyz#tew
rec2|
rec3|jkl#qwer
I need to split the Col2 based on delimiter, and following is the code which I am using:
x = ['Col1','Col2']
df[x] = (df[x].apply(lambda c: c.str.split('#',expand=True))
With this code I am getting following error : "AttributeError: 'Series' object has no attribute 'series' "
I tried using replace and fillna, but no luck, can someone please help in correcting the above code
Solution 1:[1]
First, we'll need to replace the NaN values in a clever manner:
>> df["Col2"] = df["Col2"].fillna("#")
Now, split the strings in the "Col2" column:
>> df["Col2"] = df["Col2"].str.split("#", n=1) # n=1 to make sure every list has 2 values
>> df
Col1 Col2
0 rec1 [xyz, tew]
1 rec2 [, ]
2 rec3 [jkl, qwer]
Now, merge your original dataframe with a new dataframe created from the lists of the previous step
>> df = df.join(pd.DataFrame(df["Col2"].values.tolist())) # .add_prefix('col_'))
You can add a prefix if you want to name your columns (add .add_prefix('Col_') at the end, for example). Drop your old "Col2":
>> df = df.drop("Col2", axis=1)
>> df
Col1 0 1
0 rec1 xyz tew
1 rec2
2 rec3 jkl qwer
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | aaossa |
