'How to detect if any of 2 columns in a Pandas data frame (with N columns) have a parent-child relationship?

I need to efficiently detect if any pair of 2 columns in a data frame has a parent/child relationship. The purpose is to automatically detect all such relations in a data frame. The data frame is not known beforehand. Efficiently, because I already known that N*(N-1)*2 column-to-column investigations need to be executed. Below there is a sample dataset for this tricky problem. Ideally there would be a function that returns all column pairs that constitute a parent child relation, e.g.

resolve_parent_child_dependencies(df)

For the sample below (ignoring all the other columns) this should return

[(col1, col2), (col1, col3), (col1, col4),(col1, colN) (col2, col3), ..., (colN, col4)]

Here's the sample dataset.

col1 col2 col3 col4 ... colN
   A   A1  A11  foo        X        
   A   A2  A21  bar        Y
   A   A2  A22  foo        X
   B   B1  B11  baz        Z
   B   B2  B21  qux        Z
   B   B3  B22  baz        Z

Purpose is to turn any pandas data frame without much configuration into a multi-dimensional database (the one im currently developing, https://tinyolap.com). The function is needed to auto-detect all the relationships to derive the dimensional model from the data frame.

Any ideas to get me restarted? 100x thanks...



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source