'How can i calculate pct changes between groups of colums efficiently?
I have a set of columsn like so:
q1_cash_total, q2_cash_total, q3_cash_total,
q1_shop_us, q2_shop_us, q3_shop_us,
etc, i have about 40 similarly named column names like this. I wish to calculate the pct changes between each of these groups of 3. e.g. i know individually i can do:
df[['q1_cash_total', 'q2_cash_total', 'q3_cash_total']].pct_change().add_suffix('_PCT_CHG')
to do this for every 3 i do:
q1 = [col for col in df.columns if 'q1' in col ]
q2 = [col for col in df.columns if 'q2' in col ]
q3 = [col for col in df.columns if 'q3' in col ]
q_cols = q1+q2+q3
dflist = []
for col in df[q_cols].columns:
#col[3:] to just get col name without the q1_/q2_ etc
print(col[3:])
cols = [c for c in df.columns if col[3:] in c]
pct = df[cols].pct_change().add_suffix('_PCT_CHG')
dflist.append(pct)
pcts_df = pd.concat(dflist)
I cannot think of a cleaner way to do this. Does anybody have any ideas? How can i also do it such that i do the pct change between q1 and q3 too instead of successively.
Solution 1:[1]
You could create a dataframe containing only the desires columns, for that, filter column names starting with q immediately follow by one or more digits and an underscore (^q\d+?_). Remove the prefix and keep only unique column names using pd.unique. For each unique column name, filter columns with that specific name and apply the percentage change along the columns axis (.pct_change(axis='columns')) to obtain the changes between q1, q2 and q3.
To get the percentage change between q1 and q3 you can select those columns by name over the previous created dataframe (df_q) and apply the same pct_change executed earlier.
df used as input
q1_cash_total q1_shop_us q2_cash_total q2_shop_us q3_cash_total q3_shop_us another_col numCols dataCols
0 52 93 15 72 61 21 83 87 75
1 75 88 24 3 22 53 2 88 30
2 38 2 64 60 21 33 76 58 22
3 89 49 91 59 42 92 60 80 15
4 62 62 47 62 51 55 64 3 51
df_q = df.filter(regex='^q\d+?_')
unique_cols = pd.unique([c[3:] for c in df_q.columns])
dflist = []
for col in unique_cols:
q_name = df_q.filter(like=col)
df_s = q_name.pct_change(axis='columns').add_suffix('_PCT_CHG')
dflist.append(df_s)
df_s = df_q[[f'q1_{col}', f'q3_{col}']].pct_change(axis='columns').add_suffix('_Q1-Q3')
dflist.append(df_s)
pcts_df = pd.concat(dflist, axis=1)
Output from pcts_df
q1_cash_total_PCT_CHG q2_cash_total_PCT_CHG q3_cash_total_PCT_CHG ... q3_shop_us_PCT_CHG q1_shop_us_Q1-Q3 q3_shop_us_Q1-Q3
0 NaN -0.711538 3.066667 ... -0.708333 NaN -0.774194
1 NaN -0.680000 -0.083333 ... 16.666667 NaN -0.397727
2 NaN 0.684211 -0.671875 ... -0.450000 NaN 15.500000
3 NaN 0.022472 -0.538462 ... 0.559322 NaN 0.877551
4 NaN -0.241935 0.085106 ... -0.112903 NaN -0.112903
[5 rows x 10 columns]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | n1colas.m |
