'How to filter data frame column with list elements
I have a data-frame column which has list elements in each field. I want to filter the data frame with respect to elements in list.
data = pd.DataFrame({'column':[['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],['i', 'j', 'k'],['m', 'n'],['q', 'r'],['s']]})
I have a list of elements:
element_list = ['e','f','g','h']
Now I want to filter the data-frame as per element_list.
So finally my result should return data-frame with one row which is satisfying the condition.
Can anyone help me how to filter this?
Solution 1:[1]
If order doesn't matter, you can use set operations:
You have several options depending on whether you want exact match, all items present or at least one item:
S = set(element_list)
data['equal'] = data['column'].apply(lambda x: S==set(x))
data['subset'] = data['column'].apply(S.issubset)
data['superset'] = data['column'].apply(S.issuperset)
Output:
column equal subset superset
0 [a, b, c, d] False False False
1 [e, f, g, h] True True True
2 [i, j, k] False False False
3 [m, n] False False False
4 [q, r] False False False
5 [s] False False False
6 [e] False False True
7 [e, g, h, f] True True True
8 [e, g, h, f, i] False True False
You can use the boolean series to subset the dataframe:
data[data['column'].apply(lambda x: S==set(x))]
Output:
column
1 [e, f, g, h]
7 [e, g, h, f]
Performance
If performance is important, you can use list comprehensions instead of apply:
data['equal'] = [S==set(x) for x in data['column']]
data['subset'] = [S.issubset(x) for x in data['column']]
data['superset'] = [S.issuperset(x) for x in data['column']]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
