'Upgrading to pandas version 1.4.0 or 1.4.1 causes a call to the method .at[idx, "XXX"] to generate an InvalidIndexError

my program fails after upgrading to python pandas version 1.4.0 or 1.4.1 with the following traceback :

File "XXX.py", line XXX, in XXX
data.at[idx, "False_positives"] = "-"
File "lib/python3.9/site-packages/pandas/core/indexing.py", line 2274, in setitem
return super().setitem(key, value)
File "/python3.9/site-packages/pandas/core/indexing.py", line 2229, in setitem
self.obj._set_value(*key, value=value, takeable=self._takeable)
File "/python3.9/site-packages/pandas/core/frame.py", line 3869, in _set_value
loc = self.index.get_loc(index)
File "/python3.9/site-packages/pandas/core/indexes/range.py", line 388, in get_loc
self._check_indexing_error(key)
File "/python3.9/site-packages/pandas/core/indexes/base.py", line 5637, in _check_indexing_error
raise InvalidIndexError(key)
pandas.errors.InvalidIndexError: Int64Index([0], dtype='int64')

This error does not occur with pandas version 1.3.5 on the same dataframe and the code does not generate any warning. This bug happens with my real life data. However I am unable to reproduce this bug with mock data that model my real life data, probably because my understanding of Pandas is not expert. Therefore I am unable to get help from the Pandas dev team on this issue. I am hoping to find someone with knowledge of the changes between those 2 versions of Pandas and that can point me to the right direction by looking at the traceback. Is someone able to help ?

Since I am not able to create a mock data, here is the minimal example that reproduces the error:

conda create -y --name icescreen_env icescreen -c conda-forge -c bioconda
conda activate icescreen_env
mkdir -p ~/tmp/test_icescreen
cd ~/tmp/test_icescreen
mkdir genbank_files
i=NZ_CP026548
curl -s  "https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=${i}&rettype=gbwithparts&retmode=txt" > genbank_files/$i.gbk
icescreen -g ~/tmp/test_icescreen/genbank_files -o ~/tmp/test_icescreen/
head ~/tmp/test_icescreen/ICEscreen_results/results/NZ_CP026548/icescreen_detection_ME/NZ_CP026548_detected_ME.summary
rm -rf ~/tmp/test_icescreen/ICEscreen_results
conda install -c conda-forge pandas=1.4.1
icescreen -g ~/tmp/test_icescreen/genbank_files -o ~/tmp/test_icescreen/

The last line fails with the InvalidIndexError.



Solution 1:[1]

Using the example from the docs, at is intended to get or set a single index value in a DataFrame or Series. Using at on the index list [4,5] or even [4] will fail in pandas 1.4 (but not earlier):

import pandas as pd
df = pd.DataFrame([[0, 2, 3], [0, 4, 1], [10, 20, 30]],
                  index=[4, 5, 6], columns=['A', 'B', 'C'])
df.at[[4,5], 'B']

>InvalidIndexError: [4, 5]

Instead use loc for getting and setting an index collection:

df.loc[[4,5], 'B']
4    2
5    4

Solution 2:[2]

Same error that could only be resolved by downgrading to pandas version 1.3.5, if you're using pip:

pip install pandas==1.3.5

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Piero