Quantcast
Channel: Build a hierarchy from a relational data-set using Pyspark - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Build a hierarchy from a relational data-set using Pyspark

$
0
0

I am new to Python and stuck with building a hierarchy out of a relational dataset.
It would be of immense help if someone has an idea on how to proceed with this.

I have a relational data-set with data like

_currentnode,  childnode_   root,         child1   child1,       leaf2   child1,       child3   child1,       leaf4   child3,       leaf5   child3,       leaf6  

so-on. I am looking for some python or pyspark code to
build a hierarchy dataframe like below

_level1, level2,  level3,  level4_  root,    child1,  leaf2,   null  root,    child1,  child3,  leaf5  root,    child1,  child3,  leaf6  root,    child1,  leaf4,   null  

The data is alpha-numerics and is a huge dataset[~50mil records].
Also, the root of the hierarchy is known and can be hardwired in the code.
So in the example, above, the root of the hierarchy is 'root'.


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles





Latest Images