'Pandas: The most efficient way to subtract some columns from multiple other columns in a dataframe

I have a data frame:

id      P1T1    P1T2    P1T3    P2T1    P2T2    P2T3
1       10      20      20      16      50      10
2       20      10      25      10      52      20
3       20      5       50      2       40      20
4       23      5       6       78      5       65
5       4       8       9       9       7       5

For each person(P1, P2), I want to subtract their T1 value from all other values, so the output is like this:

   id  P1T1  P1T2  P1T3  P2T1  P2T2  P2T3
0   1     0    10    10     0    34    -6
1   2     0   -10     5     0    42    10
2   3     0   -15    30     0    38    18
3   4     0   -18   -17     0   -73   -13
4   5     0     4     5     0    -2    -4

I wrote this code:

import sys
import pandas as pd

df = pd.read_csv('test.csv', sep='\t',header=0)
df1 = df.filter(regex='^P1')
df2 = df.filter(regex='^P2')

df1 = df1.sub(df1['P1T1'], axis=0)
df2 = df2.sub(df2['P2T1'], axis=0)

df1.insert(loc=0,column='id',value=df['id'])
df2.insert(loc=0,column='id',value=df['id'])

merge_df = pd.merge(df1,df2,on='id')
print(merge_df)

It runs (producing the output above), but it's not efficient because it real life, I can have 50 people, so I want to subtract 50 people's T1 values from the rest of their T values (which may be uneven, e.g. person one might have T1 -> T10, person2 might have T1 -> T8).

Could someone recommend how to make this more generalisable so I don't have to make 50 sub-data frames, one for each person? I was wondering if the best way to loop through like:

total_df = pd.DataFrame()
for i in range(1,50):
   reg_pattern = '^P' + str(i)
   df1 = df.filter(regex=reg_pattern)
   col_to_subtract = reg_pattern + 'T1'
   df1 = df1.sub(df1[col_to_subtract], axis=0)
   df1.insert(loc=0,column='id',value=df1['id'])
   total_df.append(df1)
print(total_df)

But I got the error:

Traceback (most recent call last):
  File "prep_samples.py", line 9, in <module>
    df1 = df1.sub(df1[col_to_subtract], axis=0)
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/opt/conda/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '^P1T1'


Solution 1:[1]

Try with apply:

>>> df.apply(lambda x: x.sub(df[x.name[:2]+"T1"]))

   P1T1  P1T2  P1T3  P2T1  P2T2  P2T3
0     0    10    10     0    34    -6
1     0   -10     5     0    42    10
2     0   -15    30     0    38    18
3     0   -18   -17     0   -73   -13
4     0     4     5     0    -2    -4

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 not_speshal