'Can cast pandas Series to `int64` but not to `Int64`

I am stuck with a weird type conversion issue..

I have a pandas DataFrame pp with a column Value. All values are 'int', of type float. It is possible to convert them all to int64. But I get an error when attempting to conver to nullable Int64. Help!

Some details: pp['Value'] is of dtype float and may contain NaN. For my specific case, all values are integer values (see my debugging attempts below). I want them to become Int64. This has worked for months with thousands of series, but now suddenly 1 series seems to fail - I cannot find any non-int value that would explain this.

What I am trying: Convert the below DataSeries to Int64:

print(f"pp['Value']:\n{pp['Value']}")
pp['Value']:
0      3500000.0
1       600000.0
2       400000.0
3      8300000.0
4      5700000.0
5      4400000.0
6      3000000.0
7      2700000.0
8      2000000.0
9       800000.0
10      300000.0
11      300000.0
12     5300000.0
13     2500000.0
14    11000000.0
15     1000000.0
16    18000000.0
17     6250000.0
18     5000000.0
19     4400000.0
20     4200000.0
21     2000000.0
22     1750000.0
23      900000.0
24     4000000.0
25      800000.0
26     9250000.0
27     5200000.0
28      600000.0
29     5700000.0
30    13500000.0
31    10000000.0
32     3300000.0
33     3200000.0
34     2000000.0
35      750000.0
Name: Value, dtype: float64

For some reason, pp['Value'].astype('Int64') raises the error TypeError: cannot safely cast non-equivalent float64 to int64

I debugged 2 alternative approaches, with both work..

A: Convert the series to 'int64' - works like a charm (the numbers really all are integers):

pp['Value'] = pp['Value'].astype('int64')
print(f"pp['Value']:\n{pp['Value']}")
pp['Value']:
0      3500000
1       600000
2       400000
3      8300000
4      5700000
5      4400000
6      3000000
7      2700000
8      2000000
9       800000
10      300000
11      300000
12     5300000
13     2500000
14    11000000
15     1000000
16    18000000
17     6250000
18     5000000
19     4400000
20     4200000
21     2000000
22     1750000
23      900000
24     4000000
25      800000
26     9250000
27     5200000
28      600000
29     5700000
30    13500000
31    10000000
32     3300000
33     3200000
34     2000000
35      750000
Name: Value, dtype: int64

B: Converted each element individually, and checked whether any single value does have some weird floating-point arithmetic issue.. not the case either. All values can be converted..

for idx, row in pp.iterrows():
   print(f"{idx}: value = {row['Value']}, residual vs. int: {row['Value']%row['Value']}, int value: {int(row['Value'])}")
0: value = 3500000.0, residual vs. int: 0.0, int value: 3500000
1: value = 600000.0, residual vs. int: 0.0, int value: 600000
2: value = 400000.0, residual vs. int: 0.0, int value: 400000
3: value = 8300000.000000001, residual vs. int: 0.0, int value: 8300000
4: value = 5700000.0, residual vs. int: 0.0, int value: 5700000
5: value = 4400000.0, residual vs. int: 0.0, int value: 4400000
6: value = 3000000.0, residual vs. int: 0.0, int value: 3000000
7: value = 2700000.0, residual vs. int: 0.0, int value: 2700000
8: value = 2000000.0, residual vs. int: 0.0, int value: 2000000
9: value = 800000.0, residual vs. int: 0.0, int value: 800000
10: value = 300000.0, residual vs. int: 0.0, int value: 300000
11: value = 300000.0, residual vs. int: 0.0, int value: 300000
12: value = 5300000.0, residual vs. int: 0.0, int value: 5300000
13: value = 2500000.0, residual vs. int: 0.0, int value: 2500000
14: value = 11000000.0, residual vs. int: 0.0, int value: 11000000
15: value = 1000000.0, residual vs. int: 0.0, int value: 1000000
16: value = 18000000.0, residual vs. int: 0.0, int value: 18000000
17: value = 6250000.0, residual vs. int: 0.0, int value: 6250000
18: value = 5000000.0, residual vs. int: 0.0, int value: 5000000
19: value = 4400000.0, residual vs. int: 0.0, int value: 4400000
20: value = 4200000.0, residual vs. int: 0.0, int value: 4200000
21: value = 2000000.0, residual vs. int: 0.0, int value: 2000000
22: value = 1750000.0, residual vs. int: 0.0, int value: 1750000
23: value = 900000.0, residual vs. int: 0.0, int value: 900000
24: value = 4000000.0, residual vs. int: 0.0, int value: 4000000
25: value = 800000.0, residual vs. int: 0.0, int value: 800000
26: value = 9250000.0, residual vs. int: 0.0, int value: 9250000
27: value = 5200000.0, residual vs. int: 0.0, int value: 5200000
28: value = 600000.0, residual vs. int: 0.0, int value: 600000
29: value = 5700000.0, residual vs. int: 0.0, int value: 5700000
30: value = 13500000.0, residual vs. int: 0.0, int value: 13500000
31: value = 10000000.0, residual vs. int: 0.0, int value: 10000000
32: value = 3300000.0, residual vs. int: 0.0, int value: 3300000
33: value = 3200000.0, residual vs. int: 0.0, int value: 3200000
34: value = 2000000.0, residual vs. int: 0.0, int value: 2000000
35: value = 750000.0, residual vs. int: 0.0, int value: 750000

I am lost... All the values are int. I can convert all values to int. I can convert the whole Series to int64. But when converting to Int64, I get an error. Why? What is wrong here?


Edit note:

pp['Value'] = pp['Value'].round().astype('Int64')

solves the problem.. But I would love to understand why. As you can see above, the set is guaranteed to only contain integers; each value is an 'int' down to machine accuracy.. Why on earth would the 'non-safe conversion' error be raised?



Solution 1:[1]

As Jason suggested in his comment, your edit solves the problem because rounding changes 8300000.000000001 to 8300000.0.

This is important as it means that after the type conversion the two values are still equal, and so they meet the "safe" casting rule for numpy conversions. When converting to 'Int64' pandas use the numpy.ndarray.astype function which applies this rule. The details on "safe" casting can be found here.

As far as I am aware, there is no way to request that pandas uses the numpy function with a different type of casting, so rounding the values first is the solution to your problem.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 FluxZA