We only split nodes for which the gain > small positive constant. If there were a cheap upper bound available for gain we could avoid the expensive operation of computing the various split derivatives and finding the best split. Handily there is...
The penalty from regularisation is known in advance: "size penalty" + "depth penalty". We also know that an optimal split must come from amongst the n - 1 possible 2-splits one obtains from sorting the node's values by increasing target. This can be used to cheaply upper bound the maximum reduction in loss. The necessary statistics can be computed once and passed down from the parent node when it is split.