Conversation
|
Preview this PR with FeatureBee: https://beta.wandb.ai/?betaVersion=15204526d0b81244876595b312cecf2a926a729a |
Codecov Report❌ Patch coverage is 📢 Thoughts on this report? Let us know! |
HiveMind Sessions2 sessions · 17h 38m · $225
View all sessions in HiveMind → Run |
chance-wnb
left a comment
There was a problem hiding this comment.
Let's zoom into the ALTER TABLE ... UPDATE issue.
| # In practice the mutation only fires when a caller mutates | ||
| # `pred.output` AFTER logging a score, or overrides via | ||
| # `pred.finish(output=...)`. For high-frequency eval workloads that | ||
| # rely on that pattern, the right long-term fix is to change |
There was a problem hiding this comment.
the right long-term fix is to
I am afraid that's the correct fix and should happen before this ships to production traffic. But I think directly attacking the issue by refactoring the AggregatingMergeTree is better. or otherwise sooner or later we will pay.
Your mentioning of the right long-term fix is the bottom line :)
| # `call_end` row with the create-time output, and `calls_merged` uses | ||
| # `SimpleAggregateFunction(any, ...)` for `output_dump` — so a second | ||
| # `call_end` row alone cannot win against the first one. We issue an | ||
| # `ALTER TABLE ... UPDATE` mutation on both `call_parts` (source) and |
There was a problem hiding this comment.
IMHO, ALTER TABLE ... UPDATE is very costly. You know our calls_merged table contains data not related to eval business as well, so it might rewrite a multi-GB part per row, on every replica, while blocking merges. (An ALTER TABLE ... UPDATE mutation rewrites parts, not granules.)
| # output for this call. `prediction_create` already emitted a | ||
| # `call_end` row with the create-time output, and `calls_merged` uses | ||
| # `SimpleAggregateFunction(any, ...)` for `output_dump` — so a second | ||
| # `call_end` row alone cannot win against the first one. We issue an |
There was a problem hiding this comment.
so a second
call_endrow alone cannot win
I think this could work with the current AggregatingMergeTree
-
Add
updated_at DateTime64(6)tocall_parts, stampednow64(6)per insert. -
calls_merged:MODIFY COLUMN output_dump AggregateFunction(argMax, String, DateTime64(6)) -
calls_merged_view: change fromanySimpleState(output_dump)toargMaxState(output_dump, updated_at). -
Query time:
SELECT output_dump→argMaxMerge(output_dump).
This will need to rebuild a column of output_dump. I am not sure how realistic it is. Let's ask expert's opinion: @gtarpenning
| assert ev._accumulated_predictions[1]._captured_scores == {"s": 0.5} | ||
|
|
||
|
|
||
| def test_log_example_after_finalization_raises(client, logger_cls): |
There was a problem hiding this comment.
can we collapse these tests into like 4 testing real user flows with multiple assertions instead of like a zillion ones doing very simple things?
| id_param_parts = pb_parts.add_param(call_id) | ||
| output_dump_param_parts = pb_parts.add_param(output_dump) | ||
| output_refs_param_parts = pb_parts.add_param(output_refs) | ||
| call_parts_query = f""" |
There was a problem hiding this comment.
oh man we are going to do two updates? yikes
| ) | ||
|
|
||
| @ddtrace.tracer.wrap(name="clickhouse_trace_server_batched._update_call_output") | ||
| def _update_call_output( |
There was a problem hiding this comment.
most of this belongs in a /query_builder/ file
gtarpenning
left a comment
There was a problem hiding this comment.
This seems very dangerous
neutralino1
left a comment
There was a problem hiding this comment.
Concerns from @gtarpenning and @chance-wnb make me ask: should we migrate the data model first? As in, create new tables dedicated to evals and decouple from the calls table, before we do this switch over?
This PR adds a variant of
EvaluationLoggerthat uses the server-side APIs instead of our client-side hack.