'Pandas Read Excel _x00
I have an excel file that I read via :
df = pd.read_excel(path)
The problem is the encoding of the file when using pandas (while opening in Excel, everything is fine)
df
id group
_x0034_5109336 _x0020_N12
_x0035_4610785 _x0020_N32
_x0036_1987159 _x0020_N33
_x0034_6506844 _x0020_N41_x0020__x002F__x0020_N42
_x0033_8342845 _x0020_N23
I wanted to remove manually the xharacters:
df[col] = df[col].astype(str).str.replace('_x0020x', ' ')
But it might not be the best option..
BElow is the expected output
df
id group
45109336 N12
54610785 N32
61987159 N33
46506844 N41 / N42
38342845 N23
Solution 1:[1]
You'd have to use regex, something like _x.*[0-9]_
df[col] = df[col].astype(str).str.replace('_x.*[0-9]_', '', regex=True)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | elcarpo |
