'Percentage difference every 2 columns of pandas dataframe and generate a new column
I have a large pandas dataframe similar to the one below:
Number A_white A_black B_white B_black ... AUV_white AUV_black
140 0.010 0.567 0.001 0.0075 1.0 1.5
150 100 200 500 1000 ... 500 1000
I am trying to generate a new dataframe with the following data:
Number A B ... AUV
140 193.068 152.941 40
150 66.667 66.667 66.667
In the new dataframe:
new_df['A'] = (old_df.A_white - df.A_black).abs() / ((old_df.A_white + df.A_black)/2) * 100
There is a similar solution: Percentage difference between any two columns of pandas dataframe
But I have a large dataframe and can't really do it with column names. Is there a way I can use, loc or iloc to attain the results?
For col names of the new dataframe I have:
col_names = old_df.columns
col_names = col_names.tolist()
col_names = [x.replace("_white", "") for x in col_names]
del col_names[1::2]
Solution 1:[1]
You can transform your index to MultiIndex and use advanced indexing with xs:
# set non white/black columns aside
df = df.set_index('Number')
# split to MultiIndex
df.columns = pd.MultiIndex.from_tuples(df.columns.str.split('_').to_list())
# select cross-sections
w = df.xs('white', level=1, axis=1)
b = df.xs('black', level=1, axis=1)
# perform the computation
df2 = (abs(w-b)/(w+b)*200).reset_index()
Output:
Number A B AUV
0 140 193.067591 152.941176 40.000000
1 150 66.666667 66.666667 66.666667
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
