Skip to content

chore(weave): Remote scorer class#6688

Open
mscavezze-cw wants to merge 6 commits intomasterfrom
mike/remote_scorer_110_remote_scorer_class
Open

chore(weave): Remote scorer class#6688
mscavezze-cw wants to merge 6 commits intomasterfrom
mike/remote_scorer_110_remote_scorer_class

Conversation

@mscavezze-cw
Copy link
Copy Markdown
Contributor

@mscavezze-cw mscavezze-cw commented Apr 22, 2026

Description

WB-33547

Add Remote Scorer class

Testing

unit tests

@mscavezze-cw mscavezze-cw requested a review from a team as a code owner April 22, 2026 23:57
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 22, 2026

Codecov Report

❌ Patch coverage is 63.41463% with 15 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
weave/scorers/remote_scorer.py 55.88% 15 Missing ⚠️

📢 Thoughts on this report? Let us know!

@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 Bot commented Apr 23, 2026

@mscavezze-cw mscavezze-cw changed the title chore(weave): remote scorer class chore(weave): Remote scorer class Apr 23, 2026
Comment on lines +33 to +36
The Python SDK stores configuration for publication with monitors. Remote
scoring is **invoked from the Weave scoring worker**, which performs the
outbound ``POST`` and feedback writes; it does not run by calling
:meth:`score` in user code.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this class should also be the one users use to encapsulate the actual scoring logic that will run inside their server. It could bring two advantages:

  • If we eventually do offer a .serve() API, it could be defined on this class
  • The scorer object would be persisted as an object in our backend and users could select it from a drop down instead of configuring it by hand in the UI form

Not a blocker here, just food for thought.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.

I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.

I think it's compatible with my current plan. A couple things might change, but not in a major way.

I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.

Package dependencies could be an issue, but as we discussed before, those can probably be solved.

The token string, or None if unset. Whitespace-only values are treated as
None.
"""
raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These WF_ prefixes really bug me. They don't mean anything anymore. Should we avoid them going forward?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.

My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.

Copy link
Copy Markdown
Contributor Author

@mscavezze-cw mscavezze-cw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments!

Comment on lines +33 to +36
The Python SDK stores configuration for publication with monitors. Remote
scoring is **invoked from the Weave scoring worker**, which performs the
outbound ``POST`` and feedback writes; it does not run by calling
:meth:`score` in user code.
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.

I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.

I think it's compatible with my current plan. A couple things might change, but not in a major way.

I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.

Package dependencies could be an issue, but as we discussed before, those can probably be solved.

The token string, or None if unset. Whitespace-only values are treated as
None.
"""
raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN")
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.

My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants