Skip to content

[ML] Avoid computing best splits for nodes which we know can't split #1313

@tveasey

Description

@tveasey

We only split nodes for which the gain > small positive constant. If there were a cheap upper bound available for gain we could avoid the expensive operation of computing the various split derivatives and finding the best split. Handily there is...

The penalty from regularisation is known in advance: "size penalty" + "depth penalty". We also know that an optimal split must come from amongst the n - 1 possible 2-splits one obtains from sorting the node's values by increasing target. This can be used to cheaply upper bound the maximum reduction in loss. The necessary statistics can be computed once and passed down from the parent node when it is split.

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions