'Subtract elements from one list from another list, using list comprehension. Returns incomplete list?
I have an array l1 of size (81x2), and another l2 of size (8x2). All elements of l2 are also contained in l1. I'm trying to generate an array l3 of size (73x2) containing all elements of l1 minus the ones in l2 ( ==> l3 = l1 - l2 ), but using list comprehension.
I found many similar questions on here, and almost all agree on a solution like this to generate l3:
n = 9
index = np.arange(n)
l1 = np.array([(i,j) for i in index for j in index])
l2 = np.array([(0, 3),(0, 5),(2, 4),(4, 4),(4, 2),(4, 6),(8, 3),(8, 5)])
l3 = [(i,j) for (i,j) in l1 if (i,j) not in l2]
print(l3)
However, the code above generates an array l3 that only contains 20 of the expected (81-8=) 73 elements. I don't understand how list comprehension operates here or why only those particular 20 elements are kept. Can anyone help?
NOTE: many people advise using set() instead of list comprehension for this problem, but I haven't tried that yet and I'd really like to understand why list comprehension is failing in the code above.
Solution 1:[1]
Let's test the first row of l1:
In [46]: i,j = l1[0]
In [47]: i,j
Out[47]: (0, 0)
In [48]: (i,j) in l2
Out[48]: True
It's True because 0 occurs in l2. It isn't testing by rows.
There isn't a 7 in l2, so this is False
In [49]: (7,7) in l2
Out[49]: False
Make sure your list comprehension test works.
One way to test for matches is:
In [72]: x = (l1==l2[:,None,:]).all(axis=2).any(axis=0)
In [73]: x
Out[73]:
array([False, False, False, True, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, True, False, True, False, True, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, True, False, True, False, False, False])
This has 8 True values, the ones that exactly match l2:
In [74]: x.sum()
Out[74]: 8
In [75]: l1[x]
Out[75]:
array([[0, 3],
[0, 5],
[2, 4],
[4, 2],
[4, 4],
[4, 6],
[8, 3],
[8, 5]])
So the rest would be accessed with:
In [76]: l1[~x]
TO work with sets, we need to convert the arrays to lists of tuples
In [85]: s1 = set([tuple(x) for x in l1])
In [86]: s2 = set([tuple(x) for x in l2])
In [87]: len(s1.difference(s2))
Out[87]: 73
Another approach is to convert the arrays to structured arrays:
In [88]: import np.lib.recfunctions as rf
In [102]: r1 = rf.unstructured_to_structured(l1,dtype=np.dtype('i,i'))
In [103]: r2 = rf.unstructured_to_structured(l2,dtype=np.dtype('i,i'))
In [104]: r2
Out[104]:
array([(0, 3), (0, 5), (2, 4), (4, 4), (4, 2), (4, 6), (8, 3), (8, 5)],
dtype=[('f0', '<i4'), ('f1', '<i4')])
Now isin works - the arrays are both 1d, as required by isin:
In [105]: np.isin(r1,r2)
Out[105]:
array([False, False, False, True, False, True, False, False, False,
False, False, False, False, False, False, False, False, False,
False, False, False, False, True, False, False, False, False,
...])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
