'0 is not in range in pandas

So I am trying to load my data from jupyter into mysql workbench using the one "multiple row" insert statement. I will achieve that with a for loop and i am receiving some error messages.

First, a little background:

So I had my csv file which contains data set for preprocessing and I split into 2 here:

Before_handwashing=copy_monthly_df.iloc[:76] 
After_handwashig=copy_monthly_df.iloc[76:]

I have successfully structured and loaded the first data set Before_handwashing into mysql work bench using this for loop below.

for x in range(Before_handwashing.shape[0]):
    insert_query+='('
    for y in range(Before_handwashing.shape[1]):
        insert_query+= str(Before_handwashing[Before_handwashing.columns.values[y]][x])+', '
    insert_query=insert_query[:-2]+'), '

Now I want to structure and load my second part of the dataset which is After_handwashig into mysql workbench using a similar code structure here.

for x in range(After_handwashig.shape[0]):
    insert_query+='('
    for y in range(After_handwashig.shape[1]):
        insert_query+=str(After_handwashig[After_handwashig.columns.values[y]][x])+', '
    insert_query=insert_query[:-2]+'), '

And I am recieving the following error messages

error message: ValueError
Traceback (most recent call last)
~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    384                 try:
--> 385                     return self._range.index(new_key)
    386                 except ValueError as err:

ValueError: 0 is not in range

The above exception was the direct cause of the following exception:

KeyError
Traceback (most recent call last) ~\AppData\Local\Temp/ipykernel_4316/2677185951.py in <module>
      2     insert_query+='('
      3     for y in range(After_handwashig.shape[1]):
----> 4         insert_query+=str(After_handwashig[After_handwashig.columns.values[y]][x])+', '
      5     insert_query=insert_query[:-2]+'), '

~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
    940 
    941         elif key_is_scalar:
--> 942             return self._get_value(key)
    943 
    944         if is_hashable(key):

~\anaconda3\lib\site-packages\pandas\core\series.py in
_get_value(self, label, takeable)
   1049
   1050         # Similar to Index.get_value, but we do not fall back to positional
-> 1051         loc = self.index.get_loc(label)
   1052         return self.index._get_values_for_loc(self, loc, label)    
   1053 

~\anaconda3\lib\site-packages\pandas\core\indexes\range.py in
get_loc(self, key, method, tolerance)
    385                     return self._range.index(new_key)
    386                 except ValueError as err:
--> 387                     raise KeyError(key) from err
    388             raise KeyError(key)
    389         return super().get_loc(key, method=method, tolerance=tolerance)

KeyError: 0

Can someone help me out in answering this problem?



Solution 1:[1]

OK, it took me a moment to find this. Consider these statements:

Before_handwashing=copy_monthly_df.iloc[:76] 
After_handwashig=copy_monthly_df.iloc[76:]

When these are done, Before contains lines with indexes 0 to 75. After contains lines starting with index 76. There is no line with index 0, so your attempt to access it causes a key error.

There are two solutions. One is to use iloc to reference lines by ordinal instead of by index:

        insert_query+=str(After_handwashig[After_handwashig.columns.values[y]]./iloc(x))+', '

The other is to reset the indexes to start from 0:

After_handwashing.reset_index(drop=True, inplace=True)

There's really no point in splitting the dataframe like that. Just have your first loop do range(76): and the second do range(76,copy_monthly_dy.shape[1]):.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Tim Roberts