'How to remove all strings from a given DataFrame column?

I need to preprocess a column for machine learning in python. The column contains a series of 1s and 0s (which is the desired output), but there are some strings in there that needs to be removed ['PX7','D1', etc..]

I thought about using df.replace to replace the strings with np.nan and then using df.dropna() to remove it. I was wondering what is the standard way of doing this given that this is probably a very common preprocessing task.

Solution 1:^[1]

You can use:

df2 = df.where(df.isin([0,1]))

Or, convert to numeric to keep all numbers:

df2 = df.apply(pd.to_numeric, errors='coerce')

Then you can use dropna the way you want (if needed).

Solution 2:^[2]

Use:

df[df['col'].str.isdigit().fillna(True)]

Input:

Output:

Second approch:

df[df['col'].isin([0,1])]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1
Solution 2

'How to remove all strings from a given DataFrame column?

Solution 1:[1]

Solution 2:[2]

Sources

Related Questions

Solution 1:^[1]

Solution 2:^[2]