'How to remove all strings from a given DataFrame column?
I need to preprocess a column for machine learning in python. The column contains a series of 1s and 0s (which is the desired output), but there are some strings in there that needs to be removed ['PX7','D1', etc..]
I thought about using df.replace to replace the strings with np.nan and then using df.dropna() to remove it. I was wondering what is the standard way of doing this given that this is probably a very common preprocessing task.
Solution 1:[1]
You can use:
df2 = df.where(df.isin([0,1]))
Or, convert to numeric to keep all numbers:
df2 = df.apply(pd.to_numeric, errors='coerce')
Then you can use dropna the way you want (if needed).
Solution 2:[2]
Use:
df[df['col'].str.isdigit().fillna(True)]
Input:
Output:
Second approch:
df[df['col'].isin([0,1])]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |


