'Create lambda function to apply to select df columns
I have the following df:
id header1 header2 diabetes obesity hypertension/high blood pressure. . .
1 metabolism diabetes no no no
2 heart issue heart disease None None None
3 obesity diabetes yes no no
4 metabolism had hypertension no no yes
5 heart issue heart disease no no yes
6 obesity diabetes yes yes no
7 obesity diabetes no no yes
I want to create a lambda function that iterates through header1 and header2, checks if either cell is a substring of the column names. Depending on whether the column has yes, no, or null, return a column with a flag value.
For every cell in header1 or header2, if it contains a substring match in the column name and there is a yes within that column, flag the new column as 2. If any of the category columns contains a yes, but not a keyword match with header1 and header2, put a 1. Else, leave blank!
Example)
attempt: cols = [x for x in df.columns if x not in ['header1', 'header2']]
df['flag'] = df.apply(lambda x: 2 if df['header1'] or df['header2'] in cols and cols == yes, 1 elif df['header1'] not in df['header2'] in cols and cols == yes, None else
desired result:
id header1 header2 diabetes obesity hypertension/high blood pressure | flag
1 metabolism diabetes no no no None
2 heart issue heart disease None None None None
3 obesity diabetes yes no no 2
4 metabolism had hypertension no no yes 2
5 heart issue heart disease no no yes 1
6 obesity diabetes yes yes no 2
7 obesity diabetes no no yes 1
Constructor
Please note that my actual df has a dynamic amount of yes/no columns, but only two header columns.
data = np.array([('metabolism','diabetes','no','no', 'no'),
('heart issue', 'heart disease', None,None,None),
('obesity','diabetes','yes','no','no'),
('metabolism',' had hypertension','no','no','yes'),
('heart issue', 'heart disease','no','no','yes'),
('obesity', 'diabetes','yes','yes', 'no'),
('obesity', 'diabetes', 'no','no', 'yes')])
df = pd.DataFrame(data, columns=['header1', 'header2','diabetes','obesity','hypertension/high blood pressure'])
cols = [x for x in df.columns if x not in ['header1', 'header2']]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
