Fix selection direction, scorer handling, and fit kwargs; resolve sktime doctest by SimonBlanke · Pull Request #182 · hyperactive-project/Hyperactive

SimonBlanke · 2025-09-11T16:05:38Z

Problems

Grid/Random search picked the wrong best params (argmin on signed scores) and didn’t consistently set best_* attributes.
Sklearn fit kwargs were silently dropped by the verify_fit decorator.
Doctest failed in sktime classification by accessing private scorer._score_func (not present for _PassthroughScorer).
“mixed” experiment objectives had undefined score sign behavior.
_score_params returned experiment(**params), which calls score(). score() applies a sign flip based on the experiment tag, so Grid/Random search were selecting on “signed” scores and then also making assumptions about
direction (previously hardcoded argmin).

Solutions

Grid/Random search: select min/max based on experiment tag using raw evaluate() values; set best_params_, best_index_, and compute signed best_score_ via experiment.score(...).
Centralized scorer handling: _coerce_to_scorer now attaches a safe ._metric_func fallback (e.g., accuracy/r2) and robust sign inference.
Sklearn decorator: verify_fit now preserves *args, **kwargs and marks fit success.
BaseExperiment: score() raises on "mixed" to avoid undefined behavior (users should define a concrete direction or override).
_score_params now returns the raw evaluate() value (float), not the signed score(). Selecting the best config should use raw objective values and then choose min or max based on the tag (higher/lower). This removes ambiguity, avoids double sign logic, and makes selection correct and explicit. We still compute the public best_score_ via experiment.score(best_params) so external consumers see the standardized “higher-is-better” score_.

fkiraly · 2025-09-11T16:12:57Z


-        best_index = np.argmin(scores)
+        # choose selection direction based on experiment tag
+        hib = experiment.get_tag("property:higher_or_lower_is_better", "higher")


something must be wrong, I do not think a case distinction should happen here. Per design, score always returns "higher is better"

fkiraly · 2025-09-11T16:13:15Z

        )

-        best_index = int(np.argmin(scores))  # lower-is-better convention
+        hib = experiment.get_tag("property:higher_or_lower_is_better", "higher")


same here, no case distinction should happen here

fkiraly

I am not sure if this is correct - are we making an error of sign somewhere? At least, in the optimizers, nothing should change in my opinion, as per the design, score always returns a "higher is better" score.

What I would also suggest: wherever you noticed that the wrong parameters were selected, let's add this as a test case. General principle, if a bug gets fixed, and there was no test failure prior, a test should be added that fails before and runs after. I presume, it would be using a "naive experiment" where we know what the best parameters are? Or one of the toy test functions?

fkiraly · 2025-09-11T18:02:40Z

+        # store public attributes
        self.best_index_ = best_index
-        self.best_score_ = scores[best_index]
+        self.best_score_ = float(scores[best_index])


is this necessary if the score function internally also does float coercion? I think if we have contracts, we should rely on contracts (instead of anticipating non-conformance)

fkiraly · 2025-09-11T18:17:40Z

+    metric_func = getattr(scorer, "_score_func", None)
+    if metric_func is None:
+        metric_func = _default_metric_for(estimator)
+    try:


this feels risky, can we avoid this?

fkiraly

Looks good!

One issue that I have with _coerce_to_scorer is that its guarantees are no longer seem to be met, i.e., that it returns an sklearn scorer, is this true? The try/except block strikes me as particularly hacky, what are we trying to "fix"?

One option to avoid this could be to rework the metric and wrap things in a stable scorer interface that is always guaranteed to work, that way the coupling (that you are probably trying to address) that has optimizers reach into _scoring etc is no longer needed.

How about that?

fkiraly · 2025-09-11T21:12:09Z

tried to refactor it - feel free to revert if you do not like it

fkiraly · 2025-09-13T08:55:37Z

(from my side, this is all fine now)

fkiraly

I would suggest:

check what is happening with the jupyter notebook - why is it reformatted?
I would recommend we add a test for the failing sign. I think doing grid search on one of the toy datasets and checking explicitly for optimal parameters should ensure the sign is correct.

fkiraly · 2025-09-13T14:56:10Z

+                f"Optimizer should select argmax of standardized score. "
+                f"Expected {good}, got {best_params}."
+            )
+


I think, you can avoid lots of repetition by using set_params, i.e., inst = object_instance.clone().set_params({"experiment": exp, "param_space": see_below}).

Besides this, can we avoid hard-coding a lot of these parameters per estimator? This is not too extensible. It is fine if we use it primarily for checking the sign, but I wonder whether we can avoid all the hard coding.

fkiraly

Ok, great!

There is still a little bit of branching, so I will check (in a separate PR) to make things more extensible.

Also, there are code formatting issues, see below in code-quality, did you see them?

SimonBlanke added 7 commits September 10, 2025 19:30

v5rc1

f81bf2c

v5.0.0

a70a1aa

fix string

35e03d3

handle lower and higher is better

e1b2b04

handle mixed and other values

4514a95

integration fixes

b27b32b

fix for when "_score_func" attribute does not exist

b748440

fkiraly reviewed Sep 11, 2025

View reviewed changes

fkiraly requested changes Sep 11, 2025

View reviewed changes

SimonBlanke added 2 commits September 11, 2025 18:51

revert changes for sign handling

fb84674

add test

ee73bcd

fkiraly reviewed Sep 11, 2025

View reviewed changes

pre-commit

c9d63ec

SimonBlanke requested a review from fkiraly September 11, 2025 18:14

fkiraly reviewed Sep 11, 2025

View reviewed changes

fkiraly requested changes Sep 11, 2025

View reviewed changes

fkiraly added 3 commits September 11, 2025 22:01

Update test_examples.py

9f357a2

sign logic

e3bcfae

lint

c0d90eb

SimonBlanke marked this pull request as ready for review September 13, 2025 07:33

fkiraly self-requested a review September 13, 2025 08:55

fkiraly requested changes Sep 13, 2025

View reviewed changes

SimonBlanke added 5 commits September 13, 2025 16:02

add test

cb71317

add test

129846c

fix test

549cbfc

remove tests

047a609

add sklearn to tests

e886dd5

SimonBlanke requested a review from fkiraly September 13, 2025 14:22

fkiraly reviewed Sep 13, 2025

View reviewed changes

SimonBlanke added 2 commits September 14, 2025 09:23

clone().set_params chaining

c88d848

common config keys to reduce hard-coding

0b302cd

SimonBlanke requested a review from fkiraly September 14, 2025 07:41

fkiraly approved these changes Sep 14, 2025

View reviewed changes

pre-commit

8cad42c

SimonBlanke merged commit 63e99cb into main Sep 14, 2025
41 checks passed

SimonBlanke deleted the final-fixes-v5 branch December 3, 2025 06:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix selection direction, scorer handling, and fit kwargs; resolve sktime doctest#182

Fix selection direction, scorer handling, and fit kwargs; resolve sktime doctest#182
SimonBlanke merged 21 commits into
mainfrom
final-fixes-v5

SimonBlanke commented Sep 11, 2025 •

edited

Loading

Uh oh!

fkiraly Sep 11, 2025

Uh oh!

fkiraly Sep 11, 2025

Uh oh!

fkiraly left a comment

Uh oh!

fkiraly Sep 11, 2025 •

edited

Loading

Uh oh!

fkiraly Sep 11, 2025

Uh oh!

fkiraly left a comment

Uh oh!

fkiraly commented Sep 11, 2025

Uh oh!

fkiraly commented Sep 13, 2025

Uh oh!

fkiraly left a comment

Uh oh!

fkiraly Sep 13, 2025

Uh oh!

fkiraly left a comment •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SimonBlanke commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 11, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Sep 11, 2025

Uh oh!

fkiraly commented Sep 13, 2025

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly Sep 13, 2025

Choose a reason for hiding this comment

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

SimonBlanke commented Sep 11, 2025 •

edited

Loading

fkiraly Sep 11, 2025 •

edited

Loading

fkiraly left a comment •

edited

Loading