'CSV file Infinity value issue with AWS Glue job

I have a csv file which I am reading with Pandas and trying to convert NaN and Infinity to 0.0. I have the code which I run locally and get the conversion properly such as:

df = pd.read_csv('test.csv')
print(df['C1'])
df.replace([np.inf, -np.inf], np.nan, inplace=True)
df = df.fillna(0.00)
print(df['C1'])
0    NaN
1    inf
2    NaN
Name: C1, dtype: float64
0    0.0
1    0.0
2    0.0
Name: C1, dtype: float64

Here, the infinity and NaN value is converted properly into 0.0 as can be seen in the output. But when I do the same in AWS Glue Python Shell job, it does not convert the infinity value to 0.0. The code and output for Glue job is as below:

df = pd.read_csv('s3://bucket/test.csv')
print(df['C1'])
df = df.replace([np.Infinity, -np.Infinity], np.nan)
df = df.fillna(0.00)
print(df['C1'])
0         NaN
1    Infinity
2         NaN
Name: C1, dtype: object
0           0
1    Infinity
2           0
Name: C1, dtype: object

The same file is being used locally and on S3, but the issue is with infinity value. Also, locally, the data types are read as float64, but object type in Glue. Any help around this?



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source