In the _tree.pyx file, there is a very interesting implementation of a type of TreeBuilder called the BestFirstTreeBuilder. The __cinit__ is very boilerplate but all the intelligence actually sits in the build method.
In the BestFirstTreeBuilder, they used the data structure PriorityHeap to implement the priority binary tree. There are two key variables, the max_split_nodes and max_leaf_nodes which are being used to declare the initial capacity.
In a complete binary tree, if root is at the depth 0, then at depth n, you will have 2^(n) total leaves at the bottom of the tree. And all the nodes above will be split nodes. And in total, you will have 1+2+4 … = 2^0 + 2^1 + 2^2 + … + 2^(n-1) = 2^n – 1. That is why max_split_nodes = max_leaf_nodes – 1 and initial capacity is the sum of those two.
Then the rest of the build method will recursively build the tree following the “priority”.
There are two methods which we will cover here. self._add_split_node will compute the split node and _add_to_frontier will add the split to the frontier.