'automatic mean of multiple columns in python

I have a dataset with multiple variables. I am trying to group these variables according to the end of the name of variable and calculate the mean of each group. Here is an example of my dataset:

data = {'AST_0-01': [1, 2, 3], 
        'AST_0-02': [4, 5, 6],
        'AST_1-01': [7, 8, 9],
        'AST_1-02': [1, 2, 3],
        'AST_2-01': [4, 5, 6],
        'AST_2-02': [7, 8, 9]}

df = pd.DataFrame(data)

What I am trying to do is to group the variables that ends with the same number, e.g.: [AST_0-01, AST_1-01, AST_2-01], [AST_0-02, AST_1-02, AST_2-02], and then get the means.



Solution 1:[1]

First, "transpose" your dataframe so that you can group by the string names

In [3]: df = df.T.reset_index()

In [4]: df
Out[4]:
      index  0  1  2
0  AST_0-01  1  2  3
1  AST_0-02  4  5  6
2  AST_1-01  7  8  9
3  AST_1-02  1  2  3
4  AST_2-01  4  5  6
5  AST_2-02  7  8  9

In [5]: df.groupby(df["index"].str[-2:]).mean()
Out[5]:
         0    1    2
index
01     4.0  5.0  6.0
02     4.0  5.0  6.0

This mean is broken out into the three separate rows in the original dataframe, but if you want the "total" mean, then

In [6]: df.groupby(df["index"].str[-2:]).mean().sum(axis=1)
Out[6]:
index
01    15.0
02    15.0
dtype: float64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ddejohn