'transitive closure python tuples

Does anyone know if there's a python builtin for computing transitive closure of tuples?

I have tuples of the form (1,2),(2,3),(3,4) and I'm trying to get (1,2),(2,3),(3,4),(1,3)(2,4)

Thanks.



Solution 1:[1]

There's no builtin for transitive closures.

They're quite simple to implement though.

Here's my take on it:

def transitive_closure(a):
    closure = set(a)
    while True:
        new_relations = set((x,w) for x,y in closure for q,w in closure if q == y)

        closure_until_now = closure | new_relations

        if closure_until_now == closure:
            break

        closure = closure_until_now

    return closure

call: transitive_closure([(1,2),(2,3),(3,4)])

result: set([(1, 2), (1, 3), (1, 4), (2, 3), (3, 4), (2, 4)])

call: transitive_closure([(1,2),(2,1)])

result: set([(1, 2), (1, 1), (2, 1), (2, 2)])

Solution 2:[2]

Just a quick attempt:

def transitive_closure(elements):
    elements = set([(x,y) if x < y else (y,x) for x,y in elements])

    relations = {}
    for x,y in elements:
        if x not in relations:
            relations[x] = []
        relations[x].append(y)

    closure = set()
    def build_closure(n):
        def f(k):
            for y in relations.get(k, []):
                closure.add((n, y))
                f(y)
        f(n)

    for k in relations.keys():
        build_closure(k)

    return closure

Executing it, we'll get

In [3]: transitive_closure([(1,2),(2,3),(3,4)])
Out[3]: set([(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)])

Solution 3:[3]

We can perform the "closure" operation from a given "start node" by repeatedly taking a union of "graph edges" from the current "endpoints" until no new endpoints are found. We need to do this at most (number of nodes - 1) times, since this is the maximum length of a path. (Doing things this way avoids getting stuck in infinite recursion if there is a cycle; it will waste iterations in the general case, but avoids the work of checking whether we are done i.e. that no changes were made in a given iteration.)

from collections import defaultdict

def transitive_closure(elements):
    edges = defaultdict(set)
    # map from first element of input tuples to "reachable" second elements
    for x, y in elements: edges[x].add(y)

    for _ in range(len(elements) - 1):
        edges = defaultdict(set, (
            (k, v.union(*(edges[i] for i in v)))
            for (k, v) in edges.items()
        ))

    return set((k, i) for (k, v) in edges.items() for i in v)

(I actually tested it for once ;) )

Solution 4:[4]

Suboptimal, but conceptually simple solution:

def transitive_closure(a):
    closure = set()
    for x, _ in a:
        closure |= set((x, y) for y in dfs(x, a))
    return closure

def dfs(x, a):
    """Yields single elements from a in depth-first order, starting from x"""
    for y in [y for w, y in a if w == x]:
        yield y
        for z in dfs(y, a):
            yield z

This won't work when there's a cycle in the relation, i.e. a reflexive point.

Solution 5:[5]

Here's one essentially the same as the one from @soulcheck that works on adjacency lists rather than edge lists:

def inplace_transitive_closure(g):
    """g is an adjacency list graph implemented as a dict of sets"""
    done = False
    while not done:
        done = True
        for v0, v1s in g.items():
            old_len = len(v1s)
            for v2s in [g[v1] for v1 in v1s]:
                v1s |= v2s
            done = done and len(v1s) == old_len

Solution 6:[6]

If you have a lot of tupels (more than 5000), you might want to consider using the scipy code for matrix powers (see also http://www.ics.uci.edu/~irani/w15-6B/BoardNotes/MatrixMultiplication.pdf)

from scipy.sparse import csr_matrix as csr
def get_closure(tups):
    index2id = list(set([tup[0] for tup in tups]) | set([tup[1] for tup in tups]));
    id2index = {index2id[i]:i for i in xrange(len(index2id))};
    tups_re  = tups + [(index2id[i],index2id[i],) for i in xrange(len(index2id))]; # Unfortunately you have to make the relation reflexive first - you could also add the diagonal to M
    M        = csr( ([True for tup in tups_re],([id2index[tup[0]] for tup in tups_re],[id2index[tup[1]] for tup in tups_re])),shape=(len(index2id),len(index2id)),dtype=bool);
    M_       = M**n; # n is maximum path length of your relation
    temp     = M_.nonzero();
    #TODO: You might want to remove the added reflexivity tupels again
    return [(index2id[temp[0][i]],index2id[temp[1][i]],) for i in xrange(len(temp[0]))];

In the best case, you can choose n wisely if you know a bit about your relation/graph -- that is how long the longest path can be. Otherwise you have to choose M.shape[0], which might blow up in your face.

This detour also has its limits, in particular you should be sure than the closure does not get too large (the connectivity is not too strong), but you would have the same problem in the python implementation.

Solution 7:[7]

You can create a graph from those tuples then use connnected components algorithm from the created graph. Networkx is library that supports connnected components algorithm.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2
Solution 3 Karl Knechtel
Solution 4
Solution 5 Dave Abrahams
Solution 6
Solution 7