feat(bigquery-analytics): otel correlation, custom_metadata allowlist, column projection (#312/#320/#321)#10
Open
caohy1988 wants to merge 4 commits into
Open
feat(bigquery-analytics): otel correlation, custom_metadata allowlist, column projection (#312/#320/#321)#10caohy1988 wants to merge 4 commits into
caohy1988 wants to merge 4 commits into
Conversation
…, column projection Implements three BQAA plugin observability controls (GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK#312/google#320/google#321): - google#312 span-level Cloud Trace correlation: capture the ambient OTel span context at row-emission time (only when is_valid) into attributes.otel.*; span_id/parent_span_id stay the BQAA-internal execution tree. Corrects the stale "OpenTelemetry span ID" schema descriptions. Best-effort join key (an unsampled valid span is absent from Cloud Trace), not a foreign key; otel_parent_span_id deferred (not derivable from SpanContext alone). - google#320 custom_metadata allowlist: custom_metadata_allowlist config (exact keys + explicit "a2a:*"-style prefixes) captures event.custom_metadata into attributes.custom_metadata.* on every row emitted from the source Event (including AGENT_RESPONSE, which did not read custom_metadata before), through the existing safety pipeline (truncation, sensitive-key redaction, circular-ref handling, is_truncated). The built-in a2a:* path is unchanged. - google#321 physical column projection: payload_column_denylist (denylist-first, scoped to content/content_parts/attributes/latency_ms; identity/correlation columns are protected and raise ValueError). Applied schema-first so the BQ schema, Arrow schema, row dict, and views stay consistent; projection-aware views drop derived columns that reference a denied payload column. Also broadens the is_truncated column description to cover content or metadata payload truncation. Adds 25 unit tests; full plugin suite green (287 passed, 6 skipped).
- File-content compliance: assemble the cloud-platform OAuth scope from parts so this changed file no longer embeds a bare Google APIs host literal (the compliance scan rejects such literals on changed files). - Schema upgrade vs projection change: _maybe_upgrade_schema now computes the missing-field diff BEFORE the version-label early return. self._schema is projection-dependent (google#321), so relaxing payload_column_denylist on a table whose label still matches must still add the now-desired columns instead of skipping the diff. - attributes denial interaction: reject custom_metadata_allowlist together with payload_column_denylist=["attributes"] at construction (the captured payload would be silently dropped), skip the attributes.otel write when attributes is denied, and document that denying attributes disables otel/custom_metadata. Adds 5 tests (denylist-relaxed upgrade, current-and-complete no-op, fail-fast rejection, attributes-denied otel skip). Full plugin suite: 292 passed, 6 skipped. isort + pyink clean.
Content parsing/offload ran before row projection, so denying content_parts (which holds the offload object reference) could still upload the payload to GCS with no retained reference -- a payload leak + cost. And denying both content and content_parts still did the full parse/offload for a row that keeps neither payload column. - When content_parts is denied, do not construct the GCS offloader (large / binary content is kept inline + truncated instead of uploaded); log a warning so the disabled offload is visible. - When both content and content_parts are denied, skip content parsing entirely (no inline summary, no parts, no offload). Adds 2 tests asserting the storage upload mock is not called for payload_column_denylist=["content_parts"] and ["content","content_parts"] with gcs_bucket_name set. Full plugin suite: 294 passed, 6 skipped.
The skip-parse branch assigned content_parts from a bare [] inside tuple unpacking, which mypy could not infer (var-annotated error on 3.10-3.13). Annotate content_json/content_parts/parser_truncated before the branch.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements three BQAA plugin observability controls in one change, all additive and off by default. Validated against
google/adk-pythonmain and the refined specs in GoogleCloudPlatform/BigQuery-Agent-Analytics-SDK google#312 / google#320 / google#321.Branch is based on current
main(synced fromgoogle/adk-python). Onlybigquery_agent_analytics_plugin.py+ its test file change.google#312 — span-level Cloud Trace correlation
trace.get_current_span().get_span_context()), only whenis_valid, intoattributes.otel.{span_id,trace_id}.span_id/parent_span_idare unchanged — they remain the BQAA-internal execution tree. The stale"OpenTelemetry span ID"schema descriptions are corrected to say so, pointing consumers atattributes.otel.span_idfor span-level joins.otel_parent_span_idis deferred — the OTelSpanContextdoes not expose a parent id.google#320 — custom_metadata allowlist
custom_metadata_allowlistconfig: exact keys and explicita2a:*-style prefixes (a plain key is never treated as a prefix).event.custom_metadataare captured intoattributes.custom_metadata.*on every row emitted from the source Event — includingAGENT_RESPONSE, which did not readcustom_metadatabefore (the UDR citation case).max_content_length) + sensitive-key redaction + circular-ref handling; truncation flipsis_truncated, redaction does not.a2a:*path (A2A_INTERACTION, typed views) is untouched; generic capture lives under a separate namespace.JSON_QUERY(attributes, '$.custom_metadata."<key>"')(quoted segment handles:/.in keys).google#321 — physical column projection
payload_column_denylist(denylist-first), scoped to the projectable payload columnscontent/content_parts/attributes/latency_ms. Listing an identity/correlation column raises a clearValueErrorat construction.content,attributes, orlatency_ms) are dropped, so view creation never references a missing column.Schema doc
is_truncateddescription to "content or metadata payload was truncated" (and noted redaction does not set it).Tests
attributes/content/latency_ms, and otel capture present/absent byis_valid.isort+pyinkclean.Default behavior is unchanged when none of the three configs are set.