'How can i identiy fast indirect relations in a dependency matrix

 |A  |B  |C  |
A|Nan|x  |x  |
B|x  |Nan|Nan|
C|x  |Nan|Nan|

I have this example from a csv file and with Pandas i managed to remove X and Nan values and replace them with 0/1

 |A|B|C|
A|0|1|1|
B|1|0|0|
C|1|0|0|

My aim is to find and add the indirect relations. For example if A has a depedency to B and C, then add the value 1 to B and C elements. My table is more than 400 elements, so i can choose every element by column name, therefor i will use for loops to map the coordinates of the values 1 and then find the indirect relation. For example: 1,2 and 1,3 have a Value of 1, then 2,3 and 3,2 will have also the value 1. My result should be like this table:

 |A|B|C|
A|0|1|1|
B|1|0|1|
C|1|1|0|

Does anyone have another idea for an easier way or has seen something similar. The difficult part for me is the creation of the 1 values in the table, where i am not sure how it can be done.



Solution 1:[1]

What you have here is a graph problem.

Starting from this input (I added 2 more nodes D/E):

df = pd.DataFrame([[0,1,1,0,0],[1,0,0,0,0],[1,0,0,0,0],[0,0,0,0,1],[0,0,0,1,0]],
                  columns=list('ABCDE'), index=list('ABCDE'))

   A  B  C  D  E
A  0  1  1  0  0
B  1  0  0  0  0
C  1  0  0  0  0
D  0  0  0  0  1
E  0  0  0  1  0

You have the following graph:

graph

and want to find all edges:

graph2

For this you can start by constructing a list of edges:

df2 = df.where(df.eq(1)).stack().rename_axis(['source', 'target']).reset_index()

  source target    0
0      A      B  1.0
1      A      C  1.0
2      B      A  1.0
3      C      A  1.0
4      D      E  1.0
5      E      D  1.0

Then compute a graph with networkx and get the connected components (i.e. the disconnected subgroups):

import networkx as nx

G = nx.from_pandas_edgelist(df2)

groups = nx.connected_components(G)

# NB. the above is a generator which gives
# [{'A', 'B', 'C'}, {'D', 'E'}]

Finally, generate the list all edges pairs with itertools.permutations and create the desired output:

from itertools import permutations, chain

idx = pd.MultiIndex.from_tuples(chain.from_iterable(permutations(l, 2)
                                for l in nx.connected_components(G)))

out = pd.Series(index=idx).fillna(1, downcast='infer').unstack(fill_value=0)

   A  B  C  D  E
A  0  1  1  0  0
B  1  0  1  0  0
C  1  1  0  0  0
D  0  0  0  0  1
E  0  0  0  1  0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 mozway