'comparing two list of lists with a dataframe column python
I want to compare two list of lists with a dataframe column.
list1=[[r2,r4,r6],[r6,r7]]list2=[[p4,p5,p8],[p86,p21,p0,p94]]
Dataset:
| rid | pid | value |
|---|---|---|
| r2 | p0 | banana |
| r2 | p4 | chocolate |
| r4 | p89 | apple |
| r6 | p5 | milk |
| r7 | p0 | bread |
Output:
[[chocolate,milk],[bread]]
As r2 and p4 occur in the list1[0], list2[0] and in the same row in dataset, so chocolate must be stored. Similarly r6 and p5 occur in both lists at same position and in the same row in dataset,milk must be stored.
Solution 1:[1]
Answer
result = []
for l1, l2 in zip(list1, list2):
res = df.loc[df["rid"].isin(l1) & df["pid"].isin(l2)]["value"].tolist()
result.append(res)
[['chocolate', 'milk'], ['bread']]
Explain
zipwill combine the two lists, equivalent to
for i in range(len(list1)):
l1 = list1[i]
l2 = list2[i]
df["rid"].isin(l1) & df["pid"].isin(l2)will combine the condition withand operator&
Attation
- The length of list1 and list2 must be equal, otherwise,
zipwill ignore the rest element of the longer list.
Solution 2:[2]
You can do it as follows:
from itertools import product
df = pd.DataFrame({'rid': {0: 'r2', 1: 'r2', 2: 'r4', 3: 'r6', 4: 'r7'},
'pid': {0: 'p0', 1: 'p4', 2: 'p89', 3: 'p5', 4: 'p0'},
'value': {0: 'banana', 1: 'chocolate', 2: 'apple', 3: 'milk', 4: 'bread'}})
list1 = [['r2','r4','r6'],['r6','r7']]
list2 = [['p4','p5','p8'],['p86','p21','p0','p94']]
# Generate all possible associations.
associations = (product(l1, l2) for l1, l2 in zip(list1, list2))
# Index for speed and convenience of the lookup.
df = df.set_index(['rid', 'pid']).sort_index()
output = [[df.loc[assoc, 'value'] for assoc in assoc_list if assoc in df.index]
for assoc_list in associations]
print(output)
[['chocolate', 'milk'], ['bread']]
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | FavorMylikes |
| Solution 2 |
