Track code mode adoption and usage

## User Story

**As a** ToolHive developer, **I want** to know how many people are opting in to code mode and how many tool calls come through it **so that** I can gauge adoption and prioritize investment.

## Background

STORY-001 introduces the code mode feature with an opt-in config flag, and STORY-002 adds operational observability (execution duration, error rate, per-script tool call counts). This story addresses a different question: **adoption tracking**. The team needs to understand how widely the feature is being used across the fleet, not just how individual executions perform.

Specifically, we need two categories of adoption signals:

1. **Configuration adoption**: How many VirtualMCPServer instances have code mode enabled? This tells us what fraction of the fleet has opted in.
2. **Traffic split**: What proportion of tool calls flow through `execute_tool_script` vs regular `tool/call`? This tells us whether agents are actually using code mode once it is enabled.

These metrics complement STORY-002's per-execution observability by providing a fleet-wide adoption view that helps the team decide whether to invest further in code mode.

## Scope

### In Scope

- OTel gauge tracking VirtualMCPServers with code mode enabled vs disabled
- OTel counters distinguishing `execute_tool_script` calls from regular `tool/call` calls
- Metrics available through the existing vMCP telemetry pipeline (Prometheus-compatible export)
- Example PromQL queries or dashboard panel definitions for comparing code mode vs regular call volume

### Out of Scope

- Per-execution observability (covered in STORY-002)
- Grafana dashboard deployment or alerting rules (operators bring their own stack)
- Usage analytics beyond OTel metrics (no external analytics services)
- Tracking which specific agents or users are using code mode (no per-identity attribution)

## Acceptance Criteria

- [ ] unit: OTel gauge `toolhive_vmcp_code_mode_enabled` reports `1` when code mode is enabled and `0` when disabled, updated on startup and configuration change
- [ ] unit: OTel counter `toolhive_vmcp_tool_calls` increments with `method` attribute `execute_tool_script` for script calls and `tool_call` for regular calls
- [ ] unit: Metrics are registered using the existing `instrumentationName` constant and `MeterProvider` from the vMCP server
- [ ] unit: Metrics are exported through the existing Prometheus-compatible `/metrics` endpoint without additional configuration

## Technical Notes

### Metric Design

**Configuration gauge** (`toolhive_vmcp_code_mode_enabled`):
- Type: `Int64UpDownCounter` or `Int64ObservableGauge` (gauge semantics -- value goes up and down)
- Emitted by each vMCP instance on startup and when configuration is reloaded
- Value: `1` when code mode is enabled, `0` when disabled
- Attributes: `vmcp_server_name` (the VirtualMCPServer name, for per-instance breakdown)
- An `Int64ObservableGauge` with a callback is the cleanest approach -- the callback reads the current config state, avoiding stale values

**Traffic split counter** (`toolhive_vmcp_tool_calls`):
- Type: `Int64Counter`
- Incremented on every tool invocation passing through the vMCP middleware chain
- Attributes:
  - `method`: `execute_tool_script` or `tool_call`
  - `status`: `success` or `error` (reuse the pattern from STORY-002)
- This counter should be placed in the middleware layer where both regular and script tool calls are visible -- likely in the script middleware (for script calls) and the existing backend client (for regular calls)

### Existing Patterns to Follow

The `telemetryBackendClient` in `pkg/vmcp/server/telemetry.go` and the optimizer metrics in `pkg/vmcp/server/sessionmanager/factory.go` demonstrate the project conventions:
- Metric names prefixed with `toolhive_vmcp_`
- Counters created via `meter.Int64Counter()` with descriptive `metric.WithDescription()`
- Histograms use `telemetry.MCPHistogramBuckets`
- Attributes follow both backward-compat and OTEL spec conventions
- Tests use `sdkmetric.NewManualReader` with `findMetric()` helper pattern (see `pkg/vmcp/server/sessionmanager/telemetry_test.go`)

### Example PromQL Queries

Include these in code comments or documentation:

```promql
# Ratio of code mode calls to total calls (last 5 minutes)
sum(rate(toolhive_vmcp_tool_calls_total{method="execute_tool_script"}[5m]))
/
sum(rate(toolhive_vmcp_tool_calls_total[5m]))

# Count of vMCP instances with code mode enabled
sum(toolhive_vmcp_code_mode_enabled)

# Code mode call rate vs regular call rate
sum(rate(toolhive_vmcp_tool_calls_total{method="execute_tool_script"}[5m]))
sum(rate(toolhive_vmcp_tool_calls_total{method="tool_call"}[5m]))
```

### Key Files

- `pkg/vmcp/server/telemetry.go` -- Add new metrics here or in a new `telemetry_codemode.go` file
- `pkg/script/middleware.go` -- Increment `execute_tool_script` counter when handling script calls
- `pkg/vmcp/server/server.go` -- Gauge callback reads config state from here
- `pkg/vmcp/server/sessionmanager/factory.go` -- Existing metric registration patterns
- `docs/operator/virtualmcpserver-observability.md` -- Document new metrics here

### Dependencies

- Depends on STORY-001 (the code mode config flag and `execute_tool_script` path must exist)
- Depends on STORY-002 (shares the telemetry infrastructure and avoids duplicate metric registration)
- Uses `go.opentelemetry.io/otel/metric` (already in go.mod)

## References

- Prototype PR: https://github.com/stacklok/toolhive/pull/4714 (branch: `jerm-dro/script-middleware-prototype`)
- Existing telemetry patterns: `pkg/vmcp/server/telemetry.go`, `pkg/vmcp/server/sessionmanager/factory.go`
- Existing observability docs: `docs/operator/virtualmcpserver-observability.md`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track code mode adoption and usage #4744

User Story

Background

Scope

In Scope

Out of Scope

Acceptance Criteria

Technical Notes

Metric Design

Existing Patterns to Follow

Example PromQL Queries

Key Files

Dependencies

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Track code mode adoption and usage #4744

Description

User Story

Background

Scope

In Scope

Out of Scope

Acceptance Criteria

Technical Notes

Metric Design

Existing Patterns to Follow

Example PromQL Queries

Key Files

Dependencies

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions