'Any other optimize way to write this code
for col in df.columns:
if not df[col].astype(str).str.contains('-->').any() and not col in ['ETL_ID','CLAIM_ID']:
df.drop(col, inplace=True, axis=1)
I just want to keep the columns (of all rows) in dataframe, where ever even single substing '-->' exists. If substring doesn't exist (in any row) I need to drop those columns except few columns like ETL_ID... Any other optimize way to write above code Or I just want those records in my dataframe which contains substring '-->'
Solution 1:[1]
How about using any?
mask = df.eq('-->').any()
df = df[mask[mask == False].index.tolist() + ['ETL_ID','CLAIM_ID']]
I don't like mask[mask == False].index.tolist() but cannot figure out to do the any() within the filtering.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Emma |
