'How can I configure Databricks display function to visualize Pyspark Decision tree with original feature names?

My objective is to visualize a Pyspark regression decision tree in Databricks.

There is a display function display(decision_tree) in Databricks which helps in visualization of decision tree (https://docs.databricks.com/notebooks/visualizations/index.html#id15).

Sample decision tree printed by display(decision_tree) in Databricks

Pyspark decision tree requires that all features are in a list under a single feature name called "features". Hence, we see the feature names as feature 1, feature 2 etc.

Let's say the original feature names are in a list called cols. Is there a way to modify the feature names in the sample decision tree to the original feature names while visualization using display(decision_tree) function?



Solution 1:[1]

Not able to find how to modify the feature names in the sample decision tree.

But you can follow the workaround below.

Install graphviz 0.19.1 library.

pip install graphviz 

Create a graph object:

>>> import graphviz  # doctest: +NO_EXE
>>> dot = graphviz.Digraph(comment='The Round Table')
>>> dot  #doctest: +ELLIPSIS
<graphviz.graphs.Digraph object at 0x...>

Add nodes and edges:

>>> dot.node('A', 'King Arthur')  # doctest: +NO_EXE
>>> dot.node('B', 'Sir Bedevere the Wise')
>>> dot.node('L', 'Sir Lancelot the Brave')

>>> dot.edges(['AB', 'AL'])
>>> dot.edge('B', 'L', constraint='false')

Check the generated source code:

>>> print(dot.source)  # doctest: +NORMALIZE_WHITESPACE +NO_EXE
// The Round Table
digraph {
    A [label="King Arthur"]
    B [label="Sir Bedevere the Wise"]
    L [label="Sir Lancelot the Brave"]
    A -> B
    A -> L
    B -> L [constraint=false]
}

Save and render the source code:

>>> doctest_mark_exe()

>>> dot.render('doctest-output/round-table.gv', view=True)  # doctest: +SKIP
'doctest-output/round-table.gv.pdf'

enter image description here

Refer - https://pypi.org/project/graphviz/

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 AbhishekKhandave-MT