'Pandas: The most efficient way to subtract some columns from multiple other columns in a dataframe
I have a data frame:
id P1T1 P1T2 P1T3 P2T1 P2T2 P2T3
1 10 20 20 16 50 10
2 20 10 25 10 52 20
3 20 5 50 2 40 20
4 23 5 6 78 5 65
5 4 8 9 9 7 5
For each person(P1, P2), I want to subtract their T1 value from all other values, so the output is like this:
id P1T1 P1T2 P1T3 P2T1 P2T2 P2T3
0 1 0 10 10 0 34 -6
1 2 0 -10 5 0 42 10
2 3 0 -15 30 0 38 18
3 4 0 -18 -17 0 -73 -13
4 5 0 4 5 0 -2 -4
I wrote this code:
import sys
import pandas as pd
df = pd.read_csv('test.csv', sep='\t',header=0)
df1 = df.filter(regex='^P1')
df2 = df.filter(regex='^P2')
df1 = df1.sub(df1['P1T1'], axis=0)
df2 = df2.sub(df2['P2T1'], axis=0)
df1.insert(loc=0,column='id',value=df['id'])
df2.insert(loc=0,column='id',value=df['id'])
merge_df = pd.merge(df1,df2,on='id')
print(merge_df)
It runs (producing the output above), but it's not efficient because it real life, I can have 50 people, so I want to subtract 50 people's T1 values from the rest of their T values (which may be uneven, e.g. person one might have T1 -> T10, person2 might have T1 -> T8).
Could someone recommend how to make this more generalisable so I don't have to make 50 sub-data frames, one for each person? I was wondering if the best way to loop through like:
total_df = pd.DataFrame()
for i in range(1,50):
reg_pattern = '^P' + str(i)
df1 = df.filter(regex=reg_pattern)
col_to_subtract = reg_pattern + 'T1'
df1 = df1.sub(df1[col_to_subtract], axis=0)
df1.insert(loc=0,column='id',value=df1['id'])
total_df.append(df1)
print(total_df)
But I got the error:
Traceback (most recent call last):
File "prep_samples.py", line 9, in <module>
df1 = df1.sub(df1[col_to_subtract], axis=0)
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2927, in __getitem__
indexer = self.columns.get_loc(key)
File "/opt/conda/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2659, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 132, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1601, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1608, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '^P1T1'
Solution 1:[1]
Try with apply:
>>> df.apply(lambda x: x.sub(df[x.name[:2]+"T1"]))
P1T1 P1T2 P1T3 P2T1 P2T2 P2T3
0 0 10 10 0 34 -6
1 0 -10 5 0 42 10
2 0 -15 30 0 38 18
3 0 -18 -17 0 -73 -13
4 0 4 5 0 -2 -4
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | not_speshal |
