chore(weave): Remote scorer class by mscavezze-cw · Pull Request #6688 · wandb/weave

mscavezze-cw · 2026-04-22T23:57:40Z

Description

WB-33547

Add Remote Scorer class

Testing

unit tests

codecov · 2026-04-22T23:58:00Z

Codecov Report

❌ Patch coverage is 63.41463% with 15 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
weave/scorers/remote_scorer.py	55.88%	15 Missing ⚠️

📢 Thoughts on this report? Let us know!

wandbot-3000 · 2026-04-23T00:08:25Z

Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=4ca0f5e809cb9c71ce9373c905b8d9ae6ce4a957

neutralino1 · 2026-04-23T19:57:01Z

+    The Python SDK stores configuration for publication with monitors. Remote
+    scoring is **invoked from the Weave scoring worker**, which performs the
+    outbound ``POST`` and feedback writes; it does not run by calling
+    :meth:`score` in user code.


I wonder if this class should also be the one users use to encapsulate the actual scoring logic that will run inside their server. It could bring two advantages:

If we eventually do offer a .serve() API, it could be defined on this class

The scorer object would be persisted as an object in our backend and users could select it from a drop down instead of configuring it by hand in the UI form

Not a blocker here, just food for thought.

That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.

I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.

I think it's compatible with my current plan. A couple things might change, but not in a major way.

I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.

Package dependencies could be an issue, but as we discussed before, those can probably be solved.

neutralino1 · 2026-04-23T19:59:12Z

+        The token string, or None if unset. Whitespace-only values are treated as
+        None.
+    """
+    raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN")


These WF_ prefixes really bug me. They don't mean anything anymore. Should we avoid them going forward?

They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.

My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.

mscavezze-cw

Thanks for the comments!

mscavezze-cw · 2026-04-24T15:51:44Z

+    The Python SDK stores configuration for publication with monitors. Remote
+    scoring is **invoked from the Weave scoring worker**, which performs the
+    outbound ``POST`` and feedback writes; it does not run by calling
+    :meth:`score` in user code.


That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.

I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.

I think it's compatible with my current plan. A couple things might change, but not in a major way.

I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.

Package dependencies could be an issue, but as we discussed before, those can probably be solved.

mscavezze-cw · 2026-04-24T15:56:55Z

+        The token string, or None if unset. Whitespace-only values are treated as
+        None.
+    """
+    raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN")


They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.

My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.

mscavezze-cw added 3 commits April 22, 2026 16:01

Add remote scorer class

42f292f

lint

57b53dd

lint

bf96a1d

mscavezze-cw requested a review from a team as a code owner April 22, 2026 23:57

mscavezze-cw added 2 commits April 22, 2026 17:10

improve test coverage

6b91b84

Merge branch 'master' into mike/remote_scorer_110_remote_scorer_class

59993d3

mscavezze-cw changed the title ~~chore(weave): remote scorer class~~ chore(weave): Remote scorer class Apr 23, 2026

neutralino1 reviewed Apr 23, 2026

View reviewed changes

mscavezze-cw commented Apr 24, 2026

View reviewed changes

Merge branch 'master' into mike/remote_scorer_110_remote_scorer_class

1d1d89c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore(weave): Remote scorer class#6688

chore(weave): Remote scorer class#6688
mscavezze-cw wants to merge 6 commits intomasterfrom
mike/remote_scorer_110_remote_scorer_class

mscavezze-cw commented Apr 22, 2026 •

edited by atlassian Bot

Loading

Uh oh!

codecov Bot commented Apr 22, 2026 •

edited

Loading

Uh oh!

wandbot-3000 Bot commented Apr 23, 2026 •

edited

Loading

Uh oh!

neutralino1 Apr 23, 2026

Uh oh!

mscavezze-cw Apr 24, 2026

Uh oh!

neutralino1 Apr 23, 2026

Uh oh!

mscavezze-cw Apr 24, 2026

Uh oh!

mscavezze-cw left a comment

Uh oh!

mscavezze-cw Apr 24, 2026

Uh oh!

mscavezze-cw Apr 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mscavezze-cw commented Apr 22, 2026 • edited by atlassian Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

codecov Bot commented Apr 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

wandbot-3000 Bot commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

neutralino1 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mscavezze-cw Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

neutralino1 Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mscavezze-cw Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

mscavezze-cw left a comment

Choose a reason for hiding this comment

Uh oh!

mscavezze-cw Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

mscavezze-cw Apr 24, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mscavezze-cw commented Apr 22, 2026 •

edited by atlassian Bot

Loading

codecov Bot commented Apr 22, 2026 •

edited

Loading

wandbot-3000 Bot commented Apr 23, 2026 •

edited

Loading