'ValueError after saving and loading pandas DataFrame to csv

I am trying to find whether a row exists in a DataFrame based on the values of all columns. I believe I found a solution, but I'm having problems after saving and loading the DataFrame into/from a .csv file.

In the following example, I iterate over each row of the DataFrame, and find the index corresponding to each row -- i.e. the row where all columns are identical to the row being queried).

NB: In my real code, I iterate over a smaller DataFrame and search for rows in a larger DataFrame. But the issue happens in both cases.

import pandas  as pd

df = pd.DataFrame([[1, 2], [3, 4]])         # Create data frame
df.to_csv(my_filename, index=False)         # Save to csv
df1 = pd.read_csv(my_filename)              # Load from csv

# Find original data in loaded data
for row_idx, this_row in df.iterrows():
    print(np.where((df  == this_row).all(axis=1)))    # This returns the correct index

for row_idx, this_row in df.iterrows():
    print(np.where((df1 == this_row).all(axis=1)))    # This returns an empty index, and a FutureWarning

The output is:

(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version.  Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`

After some debugging, I found that the DataFrame loaded from csv is not identical to the original DataFrame:

# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1

The output is:

   0  1
0  1  2
1  3  4

   0  1
0  1  2
1  3  4

Traceback (most recent call last):

  File "tmp.py", line 30, in <module>
    df == df1

  File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
    return method(self, other)

  File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
    return self._cmp_method(other, operator.eq)

  File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
    self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)

  File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
    raise ValueError(

ValueError: Can only compare identically-labeled DataFrame objects
  • Note: This appears to be related to a similar question, but the proposed solution, namely specifying the index labels, did not solve my problem.

Thanks in advance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source