'Create lambda function to apply to select df columns

I have the following df:

id   header1     header2      diabetes obesity hypertension/high blood pressure. . .      
 1  metabolism   diabetes          no      no          no
 2  heart issue  heart disease    None     None        None       
 3    obesity    diabetes          yes     no          no
 4   metabolism  had hypertension  no      no          yes
 5   heart issue heart disease     no      no          yes
 6    obesity    diabetes          yes     yes         no
 7    obesity    diabetes          no      no          yes

I want to create a lambda function that iterates through header1 and header2, checks if either cell is a substring of the column names. Depending on whether the column has yes, no, or null, return a column with a flag value.

For every cell in header1 or header2, if it contains a substring match in the column name and there is a yes within that column, flag the new column as 2. If any of the category columns contains a yes, but not a keyword match with header1 and header2, put a 1. Else, leave blank!

Example)

attempt: cols = [x for x in df.columns if x not in ['header1', 'header2']]

df['flag'] = df.apply(lambda x: 2 if df['header1'] or df['header2'] in cols and cols == yes, 1 elif df['header1'] not in df['header2'] in cols and cols == yes, None else

desired result:

id   header1     header2    diabetes  obesity hypertension/high blood pressure | flag      
 1  metabolism   diabetes         no      no            no                       None                  
 2  heart issue  heart disease  None      None         None                      None
 3    obesity    diabetes         yes     no            no                        2
 4   metabolism had hypertension  no      no            yes                       2
 5   heart issue heart disease    no      no            yes                       1
 6    obesity    diabetes         yes     yes           no                        2
 7    obesity    diabetes          no      no          yes                        1

Constructor

Please note that my actual df has a dynamic amount of yes/no columns, but only two header columns.

data = np.array([('metabolism','diabetes','no','no', 'no'), 
                 ('heart issue', 'heart disease', None,None,None),
                 ('obesity','diabetes','yes','no','no'),
                 ('metabolism',' had hypertension','no','no','yes'),
                 ('heart issue', 'heart disease','no','no','yes'),
                 ('obesity', 'diabetes','yes','yes', 'no'),
                 ('obesity', 'diabetes', 'no','no', 'yes')])


df = pd.DataFrame(data, columns=['header1', 'header2','diabetes','obesity','hypertension/high blood pressure'])

cols = [x for x in df.columns if x not in ['header1', 'header2']]
      


Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source