'sets of numpy 2D array rows having unique values in each row position
Consider array a, holding some of the permutations of 1,2,3,4. (my actual arrays may be larger)
import numpy as np
n = 3
a = np.array([[1, 2, 3, 4],
[1, 2, 4, 3],
[1, 3, 4, 2],
[1, 4, 3, 2],
[2, 3, 4, 1],
[2, 4, 3, 1],
[3, 1, 4, 2],
[4, 1, 2, 3]]))
I want to identify sets of n rows (in this example, n=3) where each row position holds unique values.
In this example, the output would be:
out = [[0, 4, 7],
[2, 5, 7],
[3, 4, 7]]
The 1st row of out indicates that a[0], a[4], and a[7] have unique values in each row position
When n = 2, there are 11 row pairs that match the criteria: [[0,4], [0,6], [0,7], [1,5] ...etc
When n = 4, there are 0 rows that match the criteria.
I'm new enough to python that I can't find a good way to approach this situation.
Solution 1:[1]
You can reduce this problem to finding all cliques of size n in an undirected graph. Nodes in the graph are given by row indices of a. There is an edge between i and j if (a[i] != a[j]).all().
Here is one implementation based on networkx. A function enumerate_all_cliques(g) iterates over cliques in g in order of increasing size. We discard all cliques of size less than n, keep those of size n, and stop once the first clique of size greater than n is found or cliques run out.
from itertools import combinations
import networkx as nx
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
for c in nx.algorithms.clique.enumerate_all_cliques(g):
lc = len(c)
if lc == n:
yield c
elif lc > n:
break
print(list(f(a, 3)))
# [[0, 4, 7], [2, 5, 7], [3, 4, 7]]
Here is another approach: find all maximal cliques and yield all subsets of size n from each clique. This can lead to double-counting, hence set is used before the return statement.
def f(arr, n):
nodes = np.arange(arr.shape[0])
g = nx.Graph()
g.add_nodes_from(nodes)
for i, j in combinations(nodes, 2):
if (arr[i] != arr[j]).all():
g.add_edge(i, j)
cliques = set()
# iterate over maximal cliques
for c in nx.algorithms.clique.find_cliques(g):
# update the clique set subsets of c
cliques.update(map(frozenset, combinations(c, n)))
# return all cliques of size n without doublecounting
return [list(c) for c in cliques]
print(f(a, 3))
# [[2, 5, 7], [3, 4, 7], [0, 4, 7]]
The performance of either approach will vary depending on input values.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
