'Any other optimize way to write this code

for col in df.columns:
if not df[col].astype(str).str.contains('-->').any() and not col in ['ETL_ID','CLAIM_ID']:
    df.drop(col, inplace=True, axis=1)

I just want to keep the columns (of all rows) in dataframe, where ever even single substing '-->' exists. If substring doesn't exist (in any row) I need to drop those columns except few columns like ETL_ID... Any other optimize way to write above code Or I just want those records in my dataframe which contains substring '-->'



Solution 1:[1]

How about using any?

mask = df.eq('-->').any()
df = df[mask[mask == False].index.tolist() + ['ETL_ID','CLAIM_ID']]

I don't like mask[mask == False].index.tolist() but cannot figure out to do the any() within the filtering.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Emma