'How to detect if any of 2 columns in a Pandas data frame (with N columns) have a parent-child relationship?
I need to efficiently detect if any pair of 2 columns in a data frame has a parent/child relationship. The purpose is to automatically detect all such relations in a data frame. The data frame is not known beforehand. Efficiently, because I already known that N*(N-1)*2 column-to-column investigations need to be executed. Below there is a sample dataset for this tricky problem. Ideally there would be a function that returns all column pairs that constitute a parent child relation, e.g.
resolve_parent_child_dependencies(df)
For the sample below (ignoring all the other columns) this should return
[(col1, col2), (col1, col3), (col1, col4),(col1, colN) (col2, col3), ..., (colN, col4)]
Here's the sample dataset.
col1 col2 col3 col4 ... colN
A A1 A11 foo X
A A2 A21 bar Y
A A2 A22 foo X
B B1 B11 baz Z
B B2 B21 qux Z
B B3 B22 baz Z
Purpose is to turn any pandas data frame without much configuration into a multi-dimensional database (the one im currently developing, https://tinyolap.com). The function is needed to auto-detect all the relationships to derive the dimensional model from the data frame.
Any ideas to get me restarted? 100x thanks...
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|