'Is there a way to detect special chars such as '?' or any, in a column in huge dataframe with thousands of records?
INPUT
A B C
0 1 2 3
1 4 ? 6
2 7 8 ?
... ... ... ...
551 4 4 6
552 3 7 9
There might be '?' in between somewhere which is undetectable, I tried doing it with
pd.to_numeric, error='coerce'
but it only show first 5 and last 5 rows, and I cant check all rows/columns for special chars
So how to actually deal with this problem and make dataset clean
Once detected I know how to remove those and fill with their respective column mean values, so thats not an issue
Please I'm new to this stack overflow and switching from a non-IT field
Solution 1:[1]
The below is an easier way without using regex.
special = '[@_!#$%^&*()<>?/\|}{~:]'
df['B'].str.count(special)
Please refer to below link to do it using regex:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Deepak |
