'Drop pandas column with constant alphanumeric values

I have a dataframe df that contains around 2 million records. Some of the columns contain only alphanumeric values (e.g. "wer345", "gfer34", "123fdst").

Is there a pythonic way to drop those columns (e.g. using isalnum())?



Solution 1:[1]

Apply Series.str.isalnum column-wise to mask all the alphanumeric values of the DataFrame. Then use DataFrame.all to find the columns that only contain alphanumeric values. Invert the resulting boolean Series to select only the columns that contain at least one non-alphanumeric value.

is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]

Example

import pandas as pd

df = pd.DataFrame({
    'a': ['aas', 'sd12', '1232'],
    'b': ['sdds', 'nnm!!', 'ab-2'],
    'c': ['sdsd', 'asaas12', '12.34'],
})

is_alnum_col = df.apply(lambda col: col.str.isalnum()).all()
res = df.loc[:, ~is_alnum_col]

Output:

>>> df

      a      b        c
0   aas   sdds     sdsd
1  sd12  nnm!!  asaas12
2  1232   ab-2    12.34

>>> df.apply(lambda col: col.str.isalnum())

      a      b      c
0  True   True   True
1  True  False   True
2  True  False  False

>>> is_alnum_col

a     True
b    False
c    False
dtype: bool

>>> res

       b        c
0   sdds     sdsd
1  nnm!!  asaas12
2   ab-2    12.34

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Rodalm