'Pandas DataFrame subtraction is getting an unexpected result. Concatenating instead?
I have two dataframes of the same size (510x6)
preds
0 1 2 3 4 5
0 2.610270 -4.083780 3.381037 4.174977 2.743785 -0.766932
1 0.049673 0.731330 1.656028 -0.427514 -0.803391 -0.656469
2 -3.579314 3.347611 2.891815 -1.772502 1.505312 -1.852362
3 -0.558046 -1.290783 2.351023 4.669028 3.096437 0.383327
4 -3.215028 0.616974 5.917364 5.275736 7.201042 -0.735897
... ... ... ... ... ... ...
505 -2.178958 3.918007 8.247562 -0.523363 2.936684 -3.153375
506 0.736896 -1.571704 0.831026 2.673974 2.259796 -0.815212
507 -2.687474 -1.268576 -0.603680 5.571290 -3.516223 0.752697
508 0.182165 0.904990 4.690155 6.320494 -2.326415 2.241589
509 -1.675801 -1.602143 7.066843 2.881135 -5.278826 1.831972
510 rows × 6 columns
outputStats
0 1 2 3 4 5
0 2.610270 -4.083780 3.381037 4.174977 2.743785 -0.766932
1 0.049673 0.731330 1.656028 -0.427514 -0.803391 -0.656469
2 -3.579314 3.347611 2.891815 -1.772502 1.505312 -1.852362
3 -0.558046 -1.290783 2.351023 4.669028 3.096437 0.383327
4 -3.215028 0.616974 5.917364 5.275736 7.201042 -0.735897
... ... ... ... ... ... ...
505 -2.178958 3.918007 8.247562 -0.523363 2.936684 -3.153375
506 0.736896 -1.571704 0.831026 2.673974 2.259796 -0.815212
507 -2.687474 -1.268576 -0.603680 5.571290 -3.516223 0.752697
508 0.182165 0.904990 4.690155 6.320494 -2.326415 2.241589
509 -1.675801 -1.602143 7.066843 2.881135 -5.278826 1.831972
510 rows × 6 columns
when I execute:
preds - outputStats
I expect a 510 x 6 dataframe with elementwise subtraction. Instead I get this:
0 1 2 3 4 5 0 1 2 3 4 5
0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ...
505 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
506 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
507 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
508 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
509 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
I've tried dropping columns and the like, and that hasn't helped. I also get the same result with preds.subtract(outputStats). Any Ideas?
Solution 1:[1]
There are many ways that two different values can appear the same when displayed. One of the main ways is if they are different types, but corresponding values for those types. For instance, depending on how you're displaying them, the int 1 and the str '1' may not be easily distinguished. You can also have whitespace characters, such as '1' versus ' 1'.
If the problem is that one set is int while the other is str, you can solve the problem by converting them all to int or all to str. To do the former, do df.columns = [int(col) for col in df.columns]. To do the latter, df.columns = [str(col) for col in df.columns]. Converting to str is somewhat safer, as trying to convert to int can raise an error if the string isn't amenable to conversion (e.g. int('y') will raise an error), but int can be more usual as they have the numerical structure.
You asked in a comment about dropping columns. You can do this with drop and including axis=1 as a parameter to tell it to drop columns rather than rows, or you can use the del keyword. But changing the column names should remove the need to drop columns.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Acccumulation |
