'The code that works individually breaks in the loop on 3rd-4th iteration, no matter what the input is
I wrote a script (can't publish all of it here, it is big), that downloads the CSV file, checks the rages and creates a new CSV file that has all "out of range" info.
The script was checked on all existing CSV files and works without errors.
Now I am trying to loop through all of them to generate the "out of range" data but it errors after the 3rd or 4th iteration no matter what the input file is.
I tried to swap the queue of files, and the ones that errored before are processed just fine, but the error still appears on 3rd-4th iteration.
What may be the issue with this?
The error I get is the ValueError: cannot reindex on an axis with duplicate labels
when I run the line assigning the out of range values to the column
dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']] = dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str) + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str)
Solution 1:[1]
The ValueError you mentioned occurs when you join/assign to a column that has duplicate index values. From what I can infer from the single line of code you posted, I'll break it down and maybe it could be clear whether your assignment makes sense:
dataframe.loc[dataframe['Flagged_measure'] == flags[i][0], ['Flagged_measure']]
I equate the rows of the column Flagged_measure in dataframe that matches with flags[i][0] with some RHS value, preferably a single value per iteration.
dataframe[dataframe['Flagged_measure'] == flags[i][0]]['Flagged_measure'].astype(str) + ' , ' + csv_report_df.loc[flags[i][1], flags[i][0]].astype(str)
This way of assignment makes no sense whatsoever. You perform a grouped operation but at the same time, use a single-value assignment for changing values in dataframe.
Might I suggest you try this?
dataframe['Flagged_measure'] = dataframe['Flagged_measure'].apply(lambda row: (" , ".join([str(row),str(csv_report_df.iloc[flags[i][1], flags[i][0]]]))) if row == flags[i][0])
If it still doesn't work, maybe you need to look into csv_report_df as well. As far as I know, loc is good for label-based indices, but not for numeric-based indexing, as I think you're looking to achieve here.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | pp352 |
