Skip to content

feat(weave): emit OTEL metrics for ClickHouse client operations#6692

Draft
gtarpenning wants to merge 3 commits intomasterfrom
gtarpenning/clickhouse-otel-metrics
Draft

feat(weave): emit OTEL metrics for ClickHouse client operations#6692
gtarpenning wants to merge 3 commits intomasterfrom
gtarpenning/clickhouse-otel-metrics

Conversation

@gtarpenning
Copy link
Copy Markdown
Member

@gtarpenning gtarpenning commented Apr 23, 2026

Summary

Wraps the hot methods on each thread-local ClickHouse client (query, insert, command, query_rows_stream) with OTEL counters and a duration histogram. No-op until the host service (weave-trace) installs a real MeterProvider, so this has zero effect on SDK consumers of the weave package.

Part of an MVP to stand up a single OTEL metrics path that can fan out to both Datadog (cloud, via the dd-agent OTLP receiver) and Prometheus / Grafana (on-prem, via an otel-collector). Avoids dual-emit from application code.

Weave side: #6692

What this adds

  • weave/trace_server/clickhouse/metrics.py (~100 lines):
    • install_metrics(client) rebinds the four hot methods on a clickhouse_connect client to record three OTEL instruments:
      • weave.clickhouse.query.count (counter, labeled by operation)
      • weave.clickhouse.query.duration (histogram, seconds, labeled by operation)
      • weave.clickhouse.query.errors (counter, labeled by operation)
    • Idempotent per-client via a sentinel attribute.
  • clickhouse_trace_server_batched._mint_client: wraps the client before returning it. Single call site, no churn at the 7300+ line downstream query sites.
  • opentelemetry-api>=1.24.0 added to the trace_server optional-deps (runtime) and opentelemetry-sdk to trace_server_tests (test-only, for InMemoryMetricReader).

Companion PR

Consumed by a weave-trace PR (gtarpenning/weave-trace-otel-mvp) in wandb/core that sets up the MeterProvider and OTLP exporter. Intentionally not bumping the weave-public submodule pointer in that PR until this lands.

Test plan

  • uv run --group test --group trace_server --group trace_server_tests python -m pytest tests/trace_server/test_clickhouse_metrics.py -v -> 2 passed
  • uvx ruff check clean
  • Merge weave-trace companion PR on top once this lands and bump the submodule.

🤖 Generated with Claude Code

Wraps the hot methods on each thread-local ClickHouse client
(`query`, `insert`, `command`, `query_rows_stream`) with OTEL counters
and a duration histogram. No-op until the host service (weave-trace)
installs a real MeterProvider, so this has no effect on SDK consumers
of the `weave` package.

Part of the MVP for making weave-trace, workers, and ClickHouse emit
OTEL metrics so both Datadog (cloud) and Prometheus/Grafana (on-prem
operator v2) can consume them from a single code path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@wandbot-3000
Copy link
Copy Markdown

wandbot-3000 Bot commented Apr 23, 2026

@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 23, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

gtarpenning and others added 2 commits April 23, 2026 10:41
CI runs uv sync --frozen against the committed lockfile, so the SDK
group was not resolvable at test time. Rewrite the ClickHouse metrics
tests to exercise the wrapper against the opentelemetry-api default
no-op meter - the wrapping contract is what matters, and that path is
what most SDK consumers of weave will actually hit. Removes the
spurious trace_server_tests -> opentelemetry-sdk edge.

Also picks up the ruff-format pass that CI flagged on the test module.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The explicit 'opentelemetry-api>=1.24.0' in the trace_server extra and
dependency group forced uv.lock drift in CI. prek's 'files modified by
hook' guard then failed because 'uv run ty check' resynced the lock.

opentelemetry-api is already pulled in transitively by ddtrace and
opentelemetry-proto, so the import in trace_server.clickhouse.metrics
keeps working without the explicit declaration. Lockfile stays
untouched.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant