'Split column in a pandas dataframe

I have a file in which one of the column is a multi-value field, for example:

Col1|Col2
rec1|xyz#tew
rec2|
rec3|jkl#qwer

I need to split the Col2 based on delimiter, and following is the code which I am using:

x = ['Col1','Col2']
df[x] = (df[x].apply(lambda c: c.str.split('#',expand=True))

With this code I am getting following error : "AttributeError: 'Series' object has no attribute 'series' "

I tried using replace and fillna, but no luck, can someone please help in correcting the above code



Solution 1:[1]

First, we'll need to replace the NaN values in a clever manner:

>> df["Col2"] = df["Col2"].fillna("#")

Now, split the strings in the "Col2" column:

>> df["Col2"] = df["Col2"].str.split("#", n=1)  # n=1 to make sure every list has 2 values
>> df
   Col1         Col2
0  rec1   [xyz, tew]
1  rec2         [, ]
2  rec3  [jkl, qwer]

Now, merge your original dataframe with a new dataframe created from the lists of the previous step

>> df = df.join(pd.DataFrame(df["Col2"].values.tolist())) # .add_prefix('col_'))

You can add a prefix if you want to name your columns (add .add_prefix('Col_') at the end, for example). Drop your old "Col2":

>> df = df.drop("Col2", axis=1)
>> df

   Col1    0     1
0  rec1  xyz   tew
1  rec2           
2  rec3  jkl  qwer

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 aaossa