'ValueError after saving and loading pandas DataFrame to csv
I am trying to find whether a row exists in a DataFrame based on the values of all columns. I believe I found a solution, but I'm having problems after saving and loading the DataFrame into/from a .csv file.
In the following example, I iterate over each row of the DataFrame, and find the index corresponding to each row -- i.e. the row where all columns are identical to the row being queried).
NB: In my real code, I iterate over a smaller DataFrame and search for rows in a larger DataFrame. But the issue happens in both cases.
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]]) # Create data frame
df.to_csv(my_filename, index=False) # Save to csv
df1 = pd.read_csv(my_filename) # Load from csv
# Find original data in loaded data
for row_idx, this_row in df.iterrows():
print(np.where((df == this_row).all(axis=1))) # This returns the correct index
for row_idx, this_row in df.iterrows():
print(np.where((df1 == this_row).all(axis=1))) # This returns an empty index, and a FutureWarning
The output is:
(array([0]),)
(array([1]),)
(array([], dtype=int64),)
(array([], dtype=int64),)
tmp.py:25: FutureWarning: Automatic reindexing on DataFrame vs Series comparisons is deprecated and will raise ValueError in a future version. Do `left, right = left.align(right, axis=1, copy=False)` before e.g. `left == right`
After some debugging, I found that the DataFrame loaded from csv is not identical to the original DataFrame:
# The DataFrames look identical, but comparing gives me a ValueError:
df
df1
df == df1
The output is:
0 1
0 1 2
1 3 4
0 1
0 1 2
1 3 4
Traceback (most recent call last):
File "tmp.py", line 30, in <module>
df == df1
File "python3.9/site-packages/pandas/core/ops/common.py", line 69, in new_method
return method(self, other)
File "python3.9/site-packages/pandas/core/arraylike.py", line 32, in __eq__
return self._cmp_method(other, operator.eq)
File "python3.9/site-packages/pandas/core/frame.py", line 6851, in _cmp_method
self, other = ops.align_method_FRAME(self, other, axis, flex=False, level=None)
File "python3.9/site-packages/pandas/core/ops/__init__.py", line 288, in align_method_FRAME
raise ValueError(
ValueError: Can only compare identically-labeled DataFrame objects
- Note: This appears to be related to a similar question, but the proposed solution, namely specifying the index labels, did not solve my problem.
Thanks in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
