'How to filter data frame column with list elements

I have a data-frame column which has list elements in each field. I want to filter the data frame with respect to elements in list.

data = pd.DataFrame({'column':[['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],['i', 'j', 'k'],['m', 'n'],['q', 'r'],['s']]})

I have a list of elements:

element_list = ['e','f','g','h']

Now I want to filter the data-frame as per element_list. So finally my result should return data-frame with one row which is satisfying the condition.

Can anyone help me how to filter this?



Solution 1:[1]

If order doesn't matter, you can use set operations:

You have several options depending on whether you want exact match, all items present or at least one item:

S = set(element_list)

data['equal'] = data['column'].apply(lambda x: S==set(x))

data['subset'] = data['column'].apply(S.issubset)

data['superset'] = data['column'].apply(S.issuperset)

Output:

            column  equal  subset  superset
0     [a, b, c, d]  False   False     False
1     [e, f, g, h]   True    True      True
2        [i, j, k]  False   False     False
3           [m, n]  False   False     False
4           [q, r]  False   False     False
5              [s]  False   False     False
6              [e]  False   False      True
7     [e, g, h, f]   True    True      True
8  [e, g, h, f, i]  False    True     False

You can use the boolean series to subset the dataframe:

data[data['column'].apply(lambda x: S==set(x))]

Output:

         column
1  [e, f, g, h]
7  [e, g, h, f]
Performance

If performance is important, you can use list comprehensions instead of apply:

data['equal'] = [S==set(x) for x in data['column']]

data['subset'] = [S.issubset(x) for x in data['column']]

data['superset'] = [S.issuperset(x) for x in data['column']]

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1