'String-join pandas dataframe colums and skip nan values
I'm trying to join column values into new column but I want to skip nan values:
df['col'] = 'df['col1'].map(str) + ',' + df['col2'].map(str) + ',' + df['col3'].map(str)'
For example if a col2 value is nan, corresponding col value becomes:
val1,,val3
^
... but I want to suppress the unwanted comma corresponding to the NaN column:
val1,val3
Sample df:
col1 col2 col3
---------------
val11 nan val13
nan val22 val23
nan nan val33
Desired output:
col1 col2 col3 col
---------------------
val11 nan val13 val11,val13
nan val22 val23 val22,val23
nan nan val33 val33
Solution 1:[1]
try this:
import numpy as np
import pandas as pd
data = {'col1': {0: 'val11', 1: np.nan, 2: np.nan},
'col2': {0: np.nan, 1: 'val22', 2: np.nan},
'col3': {0: 'val13', 1: 'val23', 2: 'val33'}}
df = pd.DataFrame(data)
print(df)
>>>
col1 col2 col3
0 val11 NaN val13
1 NaN val22 val23
2 NaN NaN val33
df['col'] = df.apply(lambda s: s.str.cat(sep=','), axis=1)
print(df)
>>>
col1 col2 col3 col
0 val11 NaN val13 val11,val13
1 NaN val22 val23 val22,val23
2 NaN NaN val33 val33
Solution 2:[2]
Oneliner:
df['col'] = df.agg(lambda x: ','.join(x[~x.isnull()].values), axis=1)
print(df)
Output:
col1 col2 col3 col
0 val11 NaN val13 val11,val13
1 NaN val22 val23 val22,val23
2 NaN NaN val33 val33
Solution 3:[3]
Improving on BeRT2me's one-liner, directly use .dropna() on aeach row's columns:
df.agg(lambda cols: ','.join(cols.dropna()), axis=1)
val11,val13
val22,val23
val33
Solution 4:[4]
When you read the dataframe from csv file then use:
df.read_csv(path , na_filter=False)
If you already have the dataframe then you can replace nan with empty string in this way:
df = df.fillna('')
Updated solution:
From what I understand in your question you want to include only column values that aren't nan.
You can add a condition before aggregating each column value to the desired result column col on each row of dataframe:
df['col'] = ""
for index, row in df.iterrows():
if not pd.isnull(row['col1']):
df.at[index,'col'] = f"{row['col1']} "
if not pd.isnull(row['col2']):
df.at[index, 'col'] += f"{row['col2']} "
if not pd.isnull(row['col3']):
df.at[index, 'col'] += f"{row['col3']}"
df.at[index, 'col'] = df.at[index, 'col'].rstrip().replace(" ",",")
Console output:
col1 col2 col3 col
0 val11 NaN val13 val11,val13
1 NaN val22 val23 val22,val23
2 NaN NaN val33 val33
Process finished with exit code 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | ziying35 |
| Solution 2 | BeRT2me |
| Solution 3 | smci |
| Solution 4 |
