Skip to content

Fix selection direction, scorer handling, and fit kwargs; resolve sktime doctest#182

Merged
SimonBlanke merged 21 commits into
mainfrom
final-fixes-v5
Sep 14, 2025
Merged

Fix selection direction, scorer handling, and fit kwargs; resolve sktime doctest#182
SimonBlanke merged 21 commits into
mainfrom
final-fixes-v5

Conversation

@SimonBlanke

@SimonBlanke SimonBlanke commented Sep 11, 2025

Copy link
Copy Markdown
Collaborator

Problems

  • Grid/Random search picked the wrong best params (argmin on signed scores) and didn’t consistently set best_* attributes.
  • Sklearn fit kwargs were silently dropped by the verify_fit decorator.
  • Doctest failed in sktime classification by accessing private scorer._score_func (not present for _PassthroughScorer).
  • “mixed” experiment objectives had undefined score sign behavior.
  • _score_params returned experiment(**params), which calls score(). score() applies a sign flip based on the experiment tag, so Grid/Random search were selecting on “signed” scores and then also making assumptions about
    direction (previously hardcoded argmin).

Solutions

  • Grid/Random search: select min/max based on experiment tag using raw evaluate() values; set best_params_, best_index_, and compute signed best_score_ via experiment.score(...).
  • Centralized scorer handling: _coerce_to_scorer now attaches a safe ._metric_func fallback (e.g., accuracy/r2) and robust sign inference.
  • Sklearn decorator: verify_fit now preserves *args, **kwargs and marks fit success.
  • BaseExperiment: score() raises on "mixed" to avoid undefined behavior (users should define a concrete direction or override).
  • _score_params now returns the raw evaluate() value (float), not the signed score(). Selecting the best config should use raw objective values and then choose min or max based on the tag (higher/lower). This removes ambiguity, avoids double sign logic, and makes selection correct and explicit. We still compute the public best_score_ via experiment.score(best_params) so external consumers see the standardized “higher-is-better” score_.

Comment thread src/hyperactive/opt/gridsearch/_sk.py Outdated

best_index = np.argmin(scores)
# choose selection direction based on experiment tag
hib = experiment.get_tag("property:higher_or_lower_is_better", "higher")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

something must be wrong, I do not think a case distinction should happen here. Per design, score always returns "higher is better"

Comment thread src/hyperactive/opt/random_search.py Outdated
)

best_index = int(np.argmin(scores)) # lower-is-better convention
hib = experiment.get_tag("property:higher_or_lower_is_better", "higher")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, no case distinction should happen here

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this is correct - are we making an error of sign somewhere? At least, in the optimizers, nothing should change in my opinion, as per the design, score always returns a "higher is better" score.

What I would also suggest: wherever you noticed that the wrong parameters were selected, let's add this as a test case. General principle, if a bug gets fixed, and there was no test failure prior, a test should be added that fails before and runs after. I presume, it would be using a "naive experiment" where we know what the best parameters are? Or one of the toy test functions?

# store public attributes
self.best_index_ = best_index
self.best_score_ = scores[best_index]
self.best_score_ = float(scores[best_index])

@fkiraly fkiraly Sep 11, 2025

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this necessary if the score function internally also does float coercion? I think if we have contracts, we should rely on contracts (instead of anticipating non-conformance)

@SimonBlanke SimonBlanke requested a review from fkiraly September 11, 2025 18:14
metric_func = getattr(scorer, "_score_func", None)
if metric_func is None:
metric_func = _default_metric_for(estimator)
try:

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels risky, can we avoid this?

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

One issue that I have with _coerce_to_scorer is that its guarantees are no longer seem to be met, i.e., that it returns an sklearn scorer, is this true? The try/except block strikes me as particularly hacky, what are we trying to "fix"?

One option to avoid this could be to rework the metric and wrap things in a stable scorer interface that is always guaranteed to work, that way the coupling (that you are probably trying to address) that has optimizers reach into _scoring etc is no longer needed.

How about that?

@fkiraly

fkiraly commented Sep 11, 2025

Copy link
Copy Markdown
Collaborator

tried to refactor it - feel free to revert if you do not like it

@SimonBlanke SimonBlanke marked this pull request as ready for review September 13, 2025 07:33
@fkiraly fkiraly self-requested a review September 13, 2025 08:55
@fkiraly

fkiraly commented Sep 13, 2025

Copy link
Copy Markdown
Collaborator

(from my side, this is all fine now)

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest:

  • check what is happening with the jupyter notebook - why is it reformatted?
  • I would recommend we add a test for the failing sign. I think doing grid search on one of the toy datasets and checking explicitly for optimal parameters should ensure the sign is correct.

@SimonBlanke SimonBlanke requested a review from fkiraly September 13, 2025 14:22
f"Optimizer should select argmax of standardized score. "
f"Expected {good}, got {best_params}."
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, you can avoid lots of repetition by using set_params, i.e., inst = object_instance.clone().set_params({"experiment": exp, "param_space": see_below}).

Besides this, can we avoid hard-coding a lot of these parameters per estimator? This is not too extensible. It is fine if we use it primarily for checking the sign, but I wonder whether we can avoid all the hard coding.

@SimonBlanke SimonBlanke requested a review from fkiraly September 14, 2025 07:41

@fkiraly fkiraly left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, great!

There is still a little bit of branching, so I will check (in a separate PR) to make things more extensible.

Also, there are code formatting issues, see below in code-quality, did you see them?

@SimonBlanke SimonBlanke merged commit 63e99cb into main Sep 14, 2025
41 checks passed
@SimonBlanke SimonBlanke deleted the final-fixes-v5 branch December 3, 2025 06:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants