Best First Tree Builder

In the _tree.pyx file, there is a very interesting implementation of a type of TreeBuilder called the BestFirstTreeBuilder. The __cinit__ is very boilerplate but all the intelligence actually sits in the build method.

In the BestFirstTreeBuilder, they used the data structure PriorityHeap to implement the priority binary tree. There are two key variables, the max_split_nodes and max_leaf_nodes which are being used to declare the initial capacity.

Screen Shot 2019-10-27 at 9.52.43 AM

In a complete binary tree, if root is at the depth 0, then at depth n, you will have 2^(n) total leaves at the bottom of the tree. And all the nodes above will be split nodes. And in total, you will have 1+2+4 … = 2^0 + 2^1 + 2^2 + … + 2^(n-1) = 2^n – 1. That is why max_split_nodes = max_leaf_nodes – 1 and initial capacity is the sum of those two. Screen Shot 2019-10-27 at 9.58.24 AM

Then the rest of the build method will recursively build the tree following the “priority”.

Screen Shot 2019-10-27 at 10.10.30 AM

There are two methods which we will cover here. self._add_split_node will compute the split node and _add_to_frontier will add the split to the frontier.

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s