'custom method for voting bewtween multiple csv files
I have 3 (or more) Dataframes with this structure:
| ID | Percentage | 
|---|---|
| 00001 | 3 | 
| 00002 | 15 | 
| 00003 | 73 | 
| 00004 | 90 | 
| ... | ... | 
each csv have unique predicted percentage values
among these csv, one csv have very good MAE so i want to give it bigger weight, also if 2 or more predicted the same value i want it to be considered (even if the values close to each other i want to take the avg or mean of the value)
here is my code :
df1 = pd.read_csv("BlahBlahBlah01.csv",index_col=0)
df2 = pd.read_csv("BlahBlahBlah02.csv",index_col=0)
df3 = pd.read_csv("BlahBlahBlah03.csv",index_col=0)
dfGold = pd.read_csv("BlahBlahBlahGold.csv",index_col=0)
# all dataframes have the same shape
lenOfDF = 1000
newCSV = pd.DataFrame(columns = ['ID','Percentage'])
newCSV['ID'] = df1['ID']
for i in range(lenOfDF):
    pred01 = df1['Percentage'][i]
    pred02 = df2['Percentage'][i]
    pred03 = df3['Percentage'][i]
    predGold = dfGold['Percentage'][i]
    # all lines below are not real code (((Just pseudocode)))
    if pred01 == Any(pred02,pred03,predGold):
        newCSV['Percentage'][i] = pred01
    elif pred02 == Any(pred01,pred03,predGold):
        newCSV['Percentage'][i] = pred02
    elif pred03 == Any(pred01,pred02,predGold):
        newCSV['Percentage'][i] = pred03
    else:
        newCSV['Percentage'][i] = predGold
I know it's very basic and doesn't provide good prediction, so i need help to fix it.
like i said above i want to give weight also i want to consider values that are close to each other with +- 5
i know there is ensembling techniques for that, but i have csv files not the model.
thank you...
Solution 1:[1]
    csv_list = ['BlahBlahBlah01','BlahBlahBlah02','BlahBlahBlah03','BlahBlahBlahGold']
    preds = []
    for i, pred in enumerate(csv_list):
        pred = pd.read_csv(f"./{pred}.csv", index_col=0)
        pred.rename(columns={"Percentage": i}, inplace=True)
        preds.append(pred)
    preds = pd.concat(preds, axis=1)
    preds["Percentage"] = preds.mode(axis=1)[0]
    df= pd.read_csv("BlahBlahBlah01.csv", index_col=0)
    preds["Id"]=df.index
    preds.to_csv("output.csv" ,columns=['Id', 'Percentage'], index=False)
    					Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source | 
|---|---|
| Solution 1 | Thomas | 
