'negation / inversion of python pandas DataFrame.filter
How do I filter columns of a data frame not containing a given string in their label?
DataFrame.filter allows, for example, to select all columns of a data frame whose label contain a provided string.
df = pd.DataFrame(
np.array(([1, 2, 3], [4, 5, 6])),
columns=['beat', 'meat', 'street']
)
df.filter(like="eat", axis=1) ### yields the columns "beat" and "meat".
Is there a way to revert this logic, so that I may only keep those columns not containing "eat"? Alternatively: Is there a way to drop columns containing "eat"?
Solution 1:[1]
Use regex parameter:
print (df.filter(regex=r'^(?!.*eat).*$'))
Solution 2:[2]
Based on @jezrael's answer, one could parameterize the solution like this:
import re
def neg_filter(df, not_like, axis):
"""Only keep labels from axis, which satisfy `not_like in label == False`."""
pattern = r"^(?!.*" + re.escape(not_like) + r").*$"
return df.filter(regex=pattern, axis=axis)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
| Solution 2 | user3389669 |
