'Pandas Read Excel _x00

I have an excel file that I read via :

df = pd.read_excel(path)

The problem is the encoding of the file when using pandas (while opening in Excel, everything is fine)

   df

id                  group
_x0034_5109336      _x0020_N12
_x0035_4610785      _x0020_N32
_x0036_1987159      _x0020_N33
_x0034_6506844      _x0020_N41_x0020__x002F__x0020_N42 
_x0033_8342845      _x0020_N23

I wanted to remove manually the xharacters:

df[col] = df[col].astype(str).str.replace('_x0020x', ' ')

But it might not be the best option..

BElow is the expected output

df
id          group
45109336    N12
54610785    N32
61987159    N33
46506844    N41 / N42
38342845    N23


Solution 1:[1]

You'd have to use regex, something like _x.*[0-9]_

df[col] = df[col].astype(str).str.replace('_x.*[0-9]_', '', regex=True)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 elcarpo