feat(bigquery-analytics): otel correlation, custom_metadata allowlist, column projection (#312/#320/#321) by caohy1988 · Pull Request #10 · caohy1988/adk-python

caohy1988 · 2026-06-29T20:26:54Z

Summary

Implements three BQAA plugin observability controls in one change, all additive and off by default. Validated against google/adk-python main and the refined specs in GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK google#312 / google#320 / google#321.

Branch is based on current main (synced from google/adk-python). Only bigquery_agent_analytics_plugin.py + its test file change.

google#312 — span-level Cloud Trace correlation

Captures the ambient OTel span context at row-emission time (trace.get_current_span().get_span_context()), only when is_valid, into attributes.otel.{span_id,trace_id}.
span_id / parent_span_id are unchanged — they remain the BQAA-internal execution tree. The stale "OpenTelemetry span ID" schema descriptions are corrected to say so, pointing consumers at attributes.otel.span_id for span-level joins.
Documented as a best-effort join key (an unsampled valid span is absent from the Cloud Trace export), not a foreign key. otel_parent_span_id is deferred — the OTel SpanContext does not expose a parent id.
No plugin-owned OTel span is created/exported (preserves the spellcheck: response spelling changed google/adk-python#94 no-duplicate-span guarantee).

google#320 — custom_metadata allowlist

New custom_metadata_allowlist config: exact keys and explicit a2a:*-style prefixes (a plain key is never treated as a prefix).
Allowlisted keys from event.custom_metadata are captured into attributes.custom_metadata.* on every row emitted from the source Event — including AGENT_RESPONSE, which did not read custom_metadata before (the UDR citation case).
Runs through the existing safety pipeline: truncation (max_content_length) + sensitive-key redaction + circular-ref handling; truncation flips is_truncated, redaction does not.
The built-in a2a:* path (A2A_INTERACTION, typed views) is untouched; generic capture lives under a separate namespace.
Query via JSON_QUERY(attributes, '$.custom_metadata."<key>"') (quoted segment handles :/. in keys).

google#321 — physical column projection

New payload_column_denylist (denylist-first), scoped to the projectable payload columns content / content_parts / attributes / latency_ms. Listing an identity/correlation column raises a clear ValueError at construction.
Applied schema-first: the BQ table schema, Arrow schema, row dict, and auto-schema-upgrade all derive from the projected schema, so they never disagree. Auto-upgrade stays additive (never drops existing columns).
Projection-aware views: derived view columns whose SQL references a denied payload column (content, attributes, or latency_ms) are dropped, so view creation never references a missing column.

Schema doc

Broadened the is_truncated description to "content or metadata payload was truncated" (and noted redaction does not set it).

Tests

25 new tests covering: allowlist parse + exact/prefix matching, capture (namespace, redaction-no-flag, truncation-flag, non-allowlisted absent, no-source-event, default-noop), denylist validation (ValueError on protected/unknown), construction rejection, schema projection + Arrow consistency, view degradation for attributes/content/latency_ms, and otel capture present/absent by is_valid.
Full plugin suite: 287 passed, 6 skipped. isort + pyink clean.

Default behavior is unchanged when none of the three configs are set.

…, column projection Implements three BQAA plugin observability controls (GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#312/google#320/google#321): - google#312 span-level Cloud Trace correlation: capture the ambient OTel span context at row-emission time (only when is_valid) into attributes.otel.*; span_id/parent_span_id stay the BQAA-internal execution tree. Corrects the stale "OpenTelemetry span ID" schema descriptions. Best-effort join key (an unsampled valid span is absent from Cloud Trace), not a foreign key; otel_parent_span_id deferred (not derivable from SpanContext alone). - google#320 custom_metadata allowlist: custom_metadata_allowlist config (exact keys + explicit "a2a:*"-style prefixes) captures event.custom_metadata into attributes.custom_metadata.* on every row emitted from the source Event (including AGENT_RESPONSE, which did not read custom_metadata before), through the existing safety pipeline (truncation, sensitive-key redaction, circular-ref handling, is_truncated). The built-in a2a:* path is unchanged. - google#321 physical column projection: payload_column_denylist (denylist-first, scoped to content/content_parts/attributes/latency_ms; identity/correlation columns are protected and raise ValueError). Applied schema-first so the BQ schema, Arrow schema, row dict, and views stay consistent; projection-aware views drop derived columns that reference a denied payload column. Also broadens the is_truncated column description to cover content or metadata payload truncation. Adds 25 unit tests; full plugin suite green (287 passed, 6 skipped).

- File-content compliance: assemble the cloud-platform OAuth scope from parts so this changed file no longer embeds a bare Google APIs host literal (the compliance scan rejects such literals on changed files). - Schema upgrade vs projection change: _maybe_upgrade_schema now computes the missing-field diff BEFORE the version-label early return. self._schema is projection-dependent (google#321), so relaxing payload_column_denylist on a table whose label still matches must still add the now-desired columns instead of skipping the diff. - attributes denial interaction: reject custom_metadata_allowlist together with payload_column_denylist=["attributes"] at construction (the captured payload would be silently dropped), skip the attributes.otel write when attributes is denied, and document that denying attributes disables otel/custom_metadata. Adds 5 tests (denylist-relaxed upgrade, current-and-complete no-op, fail-fast rejection, attributes-denied otel skip). Full plugin suite: 292 passed, 6 skipped. isort + pyink clean.

Content parsing/offload ran before row projection, so denying content_parts (which holds the offload object reference) could still upload the payload to GCS with no retained reference -- a payload leak + cost. And denying both content and content_parts still did the full parse/offload for a row that keeps neither payload column. - When content_parts is denied, do not construct the GCS offloader (large / binary content is kept inline + truncated instead of uploaded); log a warning so the disabled offload is visible. - When both content and content_parts are denied, skip content parsing entirely (no inline summary, no parts, no offload). Adds 2 tests asserting the storage upload mock is not called for payload_column_denylist=["content_parts"] and ["content","content_parts"] with gcs_bucket_name set. Full plugin suite: 294 passed, 6 skipped.

The skip-parse branch assigned content_parts from a bare [] inside tuple unpacking, which mypy could not infer (var-annotated error on 3.10-3.13). Annotate content_json/content_parts/parser_truncated before the branch.

caohy1988 added 4 commits June 29, 2026 13:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bigquery-analytics): otel correlation, custom_metadata allowlist, column projection (#312/#320/#321)#10

feat(bigquery-analytics): otel correlation, custom_metadata allowlist, column projection (#312/#320/#321)#10
caohy1988 wants to merge 4 commits into
mainfrom
feat/bqaa-otel-metadata-projection

caohy1988 commented Jun 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

caohy1988 commented Jun 29, 2026

Summary

google#312 — span-level Cloud Trace correlation

google#320 — custom_metadata allowlist

google#321 — physical column projection

Schema doc

Tests

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant