I am new to Python and stuck with building a hierarchy out of a relational dataset.
It would be of immense help if someone has an idea on how to proceed with this.
I have a relational data-set with data like
_currentnode, childnode_ root, child1 child1, leaf2 child1, child3 child1, leaf4 child3, leaf5 child3, leaf6
so-on. I am looking for some python or pyspark code to
build a hierarchy dataframe like below
_level1, level2, level3, level4_ root, child1, leaf2, null root, child1, child3, leaf5 root, child1, child3, leaf6 root, child1, leaf4, null
The data is alpha-numerics and is a huge dataset[~50mil records].
Also, the root of the hierarchy is known and can be hardwired in the code.
So in the example, above, the root of the hierarchy is 'root'.