'How to do a DFS in a a pyspark dataframe

I have a pyspark dataframe that relates different ID's betwen than, but some ID's are related in a higher degree of relation, like an ID A that relates to B and B relates to C, but A doesnt relates to C at all.

ID | Relations
   |
X9 | [B9,C9]
   |
B9 | [X9,D9]
   |
C9 | [X9]
   |
D9 | [B9]

I want to agreggate all relations of all degrees like this

ID | Relations
   |
X9 | [B9,C9,D9]
   |
B9 | [X9,D9]
   |
C9 | [X9,B9,D9]
   |
D9 | [B9,X9]

where the ID relations is the aggregation of all relations of his relateds. I thought about using a DFS algorithm but dont know how to look for specific index without loading the dataframe into memory, like using lookup. Also thought about doing multiples joins to aggregate the id's relation but dont know how to know when to stop

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'How to do a DFS in a a pyspark dataframe

Sources

Related Questions