'Percentage difference every 2 columns of pandas dataframe and generate a new column

I have a large pandas dataframe similar to the one below:

Number    A_white   A_black   B_white   B_black    ...   AUV_white   AUV_black
140       0.010    0.567     0.001     0.0075            1.0         1.5
150       100      200       500       1000        ...   500         1000

I am trying to generate a new dataframe with the following data:

Number    A          B         ...       AUV
140       193.068    152.941             40     
150       66.667     66.667              66.667

In the new dataframe:

new_df['A'] = (old_df.A_white - df.A_black).abs() / ((old_df.A_white + df.A_black)/2) * 100

There is a similar solution: Percentage difference between any two columns of pandas dataframe

But I have a large dataframe and can't really do it with column names. Is there a way I can use, loc or iloc to attain the results?

For col names of the new dataframe I have:

col_names = old_df.columns
col_names = col_names.tolist()
col_names = [x.replace("_white", "") for x in col_names]
del col_names[1::2]

   


Solution 1:[1]

You can transform your index to MultiIndex and use advanced indexing with xs:

# set non white/black columns aside
df = df.set_index('Number')
# split to MultiIndex
df.columns = pd.MultiIndex.from_tuples(df.columns.str.split('_').to_list())

# select cross-sections
w = df.xs('white', level=1, axis=1)
b = df.xs('black', level=1, axis=1)

# perform the computation
df2 = (abs(w-b)/(w+b)*200).reset_index()

Output:

   Number           A           B        AUV
0     140  193.067591  152.941176  40.000000
1     150   66.666667   66.666667  66.666667

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1