Conversation
Codecov Report❌ Patch coverage is
📢 Thoughts on this report? Let us know! |
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=4ca0f5e809cb9c71ce9373c905b8d9ae6ce4a957 |
| The Python SDK stores configuration for publication with monitors. Remote | ||
| scoring is **invoked from the Weave scoring worker**, which performs the | ||
| outbound ``POST`` and feedback writes; it does not run by calling | ||
| :meth:`score` in user code. |
There was a problem hiding this comment.
I wonder if this class should also be the one users use to encapsulate the actual scoring logic that will run inside their server. It could bring two advantages:
- If we eventually do offer a
.serve()API, it could be defined on this class - The scorer object would be persisted as an object in our backend and users could select it from a drop down instead of configuring it by hand in the UI form
Not a blocker here, just food for thought.
There was a problem hiding this comment.
That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.
I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.
I think it's compatible with my current plan. A couple things might change, but not in a major way.
I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.
Package dependencies could be an issue, but as we discussed before, those can probably be solved.
| The token string, or None if unset. Whitespace-only values are treated as | ||
| None. | ||
| """ | ||
| raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN") |
There was a problem hiding this comment.
These WF_ prefixes really bug me. They don't mean anything anymore. Should we avoid them going forward?
There was a problem hiding this comment.
They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.
My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.
mscavezze-cw
left a comment
There was a problem hiding this comment.
Thanks for the comments!
| The Python SDK stores configuration for publication with monitors. Remote | ||
| scoring is **invoked from the Weave scoring worker**, which performs the | ||
| outbound ``POST`` and feedback writes; it does not run by calling | ||
| :meth:`score` in user code. |
There was a problem hiding this comment.
That seems like a reasonable idea worth exploring. It could be elegant if it works out. I imagine that .serve could live in this class, and then customers could make subclasses with their own custom logic.
I probably wouldn't put the implementation that makes the http call here (some reasons in this comment, and also this spike pr gives a little more context). Some of that reasoning might apply to the .serve() API, but to a lesser extent. I'd like to think about that a little more.
I think it's compatible with my current plan. A couple things might change, but not in a major way.
I do have a drop-down scorer selector in the works, just for switching between LLM and Remote. Adding additional scorers to it could be a natural evolution.
Package dependencies could be an issue, but as we discussed before, those can probably be solved.
| The token string, or None if unset. Whitespace-only values are treated as | ||
| None. | ||
| """ | ||
| raw = os.environ.get("WF_SCORING_WORKER_REMOTE_SCORER_BEARER_TOKEN") |
There was a problem hiding this comment.
They don't carry their original meaning any more. However, they may serve to identify these variables as a group, since they all share the WF_ prefix. There can be value in consistency over multiple evolving forms of correctness.
My opinions here are not too strong. I'm curious if there's a particular reason why these bug you so much.
Description
WB-33547
Add Remote Scorer class
Testing
unit tests