'The code that works individually breaks in the loop on 3rd-4th iteration, no matter what the input is

I wrote a script (can't publish all of it here, it is big), that downloads the CSV file, checks the rages and creates a new CSV file that has all "out of range" info.

The script was checked on all existing CSV files and works without errors.

Now I am trying to loop through all of them to generate the "out of range" data but it errors after the 3rd or 4th iteration no matter what the input file is.

I tried to swap the queue of files, and the ones that errored before are processed just fine, but the error still appears on 3rd-4th iteration.

What may be the issue with this?

The error I get is the ValueError: cannot reindex on an axis with duplicate labels

when I run the line assigning the out of range values to the column

dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']] = dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str)  + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str) 


Solution 1:[1]

The ValueError you mentioned occurs when you join/assign to a column that has duplicate index values. From what I can infer from the single line of code you posted, I'll break it down and maybe it could be clear whether your assignment makes sense:

dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']]

I equate the rows of the column Flagged_measure in dataframe that matches with flags[i][0] with some RHS value, preferably a single value per iteration.

dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str)  + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str) 

This way of assignment makes no sense whatsoever. You perform a grouped operation but at the same time, use a single-value assignment for changing values in dataframe.

Might I suggest you try this?

dataframe['Flagged_measure'] = dataframe['Flagged_measure'].apply(lambda row:  (" , ".join([str(row),str(csv_report_df.iloc[flags[i][1], flags[i][0]]]))) if row == flags[i][0])

If it still doesn't work, maybe you need to look into csv_report_df as well. As far as I know, loc is good for label-based indices, but not for numeric-based indexing, as I think you're looking to achieve here.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 pp352