'How can i calculate pct changes between groups of colums efficiently?

I have a set of columsn like so:

q1_cash_total, q2_cash_total, q3_cash_total, 
q1_shop_us, q2_shop_us, q3_shop_us,

etc, i have about 40 similarly named column names like this. I wish to calculate the pct changes between each of these groups of 3. e.g. i know individually i can do:

df[['q1_cash_total', 'q2_cash_total', 'q3_cash_total']].pct_change().add_suffix('_PCT_CHG')

to do this for every 3 i do:

q1 =  [col for col in df.columns if 'q1' in col ]
q2 =  [col for col in df.columns if 'q2' in col ]
q3 =  [col for col in df.columns if 'q3' in col ]
q_cols = q1+q2+q3
dflist = []
for col in df[q_cols].columns:
    #col[3:] to just get col name without the q1_/q2_ etc 
    print(col[3:])
    cols = [c for c in df.columns if col[3:] in c]
    pct = df[cols].pct_change().add_suffix('_PCT_CHG')
    dflist.append(pct) 

pcts_df = pd.concat(dflist)

I cannot think of a cleaner way to do this. Does anybody have any ideas? How can i also do it such that i do the pct change between q1 and q3 too instead of successively.

Solution 1:^[1]

You could create a dataframe containing only the desires columns, for that, filter column names starting with q immediately follow by one or more digits and an underscore (^q\d+?_). Remove the prefix and keep only unique column names using pd.unique. For each unique column name, filter columns with that specific name and apply the percentage change along the columns axis (.pct_change(axis='columns')) to obtain the changes between q1, q2 and q3.

To get the percentage change between q1 and q3 you can select those columns by name over the previous created dataframe (df_q) and apply the same pct_change executed earlier.

df used as input

   q1_cash_total  q1_shop_us  q2_cash_total  q2_shop_us  q3_cash_total  q3_shop_us  another_col   numCols  dataCols
0             52          93             15          72             61          21           83        87        75
1             75          88             24           3             22          53            2        88        30
2             38           2             64          60             21          33           76        58        22
3             89          49             91          59             42          92           60        80        15
4             62          62             47          62             51          55           64         3        51

df_q = df.filter(regex='^q\d+?_')
unique_cols = pd.unique([c[3:] for c in df_q.columns])

dflist = []
for col in unique_cols:
    q_name = df_q.filter(like=col)
    df_s = q_name.pct_change(axis='columns').add_suffix('_PCT_CHG')
    dflist.append(df_s)
    df_s = df_q[[f'q1_{col}', f'q3_{col}']].pct_change(axis='columns').add_suffix('_Q1-Q3')
    dflist.append(df_s)

pcts_df = pd.concat(dflist, axis=1)

Output from pcts_df

   q1_cash_total_PCT_CHG  q2_cash_total_PCT_CHG  q3_cash_total_PCT_CHG  ...  q3_shop_us_PCT_CHG  q1_shop_us_Q1-Q3  q3_shop_us_Q1-Q3
0                    NaN              -0.711538               3.066667  ...           -0.708333               NaN         -0.774194
1                    NaN              -0.680000              -0.083333  ...           16.666667               NaN         -0.397727
2                    NaN               0.684211              -0.671875  ...           -0.450000               NaN         15.500000
3                    NaN               0.022472              -0.538462  ...            0.559322               NaN          0.877551
4                    NaN              -0.241935               0.085106  ...           -0.112903               NaN         -0.112903

[5 rows x 10 columns]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	n1colas.m

'How can i calculate pct changes between groups of colums efficiently?

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]