[ML] Avoid computing best splits for nodes which we know can't split

We only split nodes for which the `gain > small positive constant`. If there were a cheap upper bound available for `gain` we could avoid the expensive operation of computing the various split derivatives and finding the best split. Handily there is...

The penalty from regularisation is known in advance: `"size penalty" + "depth penalty"`. We also know that an optimal split must come from amongst the `n - 1` possible 2-splits one obtains from sorting the node's values by increasing target. This can be used to cheaply upper bound the maximum reduction in loss. The necessary statistics can be computed once and passed down from the parent node when it is split. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ML] Avoid computing best splits for nodes which we know can't split #1313

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[ML] Avoid computing best splits for nodes which we know can't split #1313

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions