'How to apply a function to selected rows of a dataframe
I want to apply a regex function to selected rows in a dataframe. My solution works but the code is terribly long and I wonder if there is not a better, faster and more elegant way to solve this problem.
In words I want my regex function to be applied to elements of the source_value column, but only to rows where the column source_type == rhombus AND (rhombus_refer_to_odk_type == integer OR a decimal).
The code:
df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'] = df_arrows.loc[(df_arrows['source_type']=='rhombus') & ((df_arrows['rhombus_refer_to_odk_type']=='integer') | (df_arrows['rhombus_refer_to_odk_type']=='decimal')),'source_value'].apply(lambda x: re.sub(r'^[^<=>]+','', str(x)))
Solution 1:[1]
Use Series.isin with condition in variable m and for replace use Series.str.replace:
m = (df_arrows['source_type']=='rhombus') &
df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal'])
df_arrows.loc[m,'source_value'] = df_arrows.loc[m,'source_value'].astype(str).str.replace(r'^[^<=>]+','')
EDIT: If mask is 2 dimensional possible problem should be duplicated columns names, you can test it:
print ((df_arrows['source_type']=='rhombus'))
print (df_arrows['rhombus_refer_to_odk_type'].isin(['integer','decimal']))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
