Skip to content

Metrics: add histogram bucket views per metric-name pattern#66805

Open
1fanwang wants to merge 1 commit into
apache:mainfrom
1fanwang:metrics-otel-bucket-views
Open

Metrics: add histogram bucket views per metric-name pattern#66805
1fanwang wants to merge 1 commit into
apache:mainfrom
1fanwang:metrics-otel-bucket-views

Conversation

@1fanwang

@1fanwang 1fanwang commented May 12, 2026

Copy link
Copy Markdown
Contributor

OTel histogram metrics in Airflow share one bucket layout regardless of what they measure, so a metric whose useful range differs from the default gets uninformative tails — task duration buckets fine, but scheduler-loop duration needs finer low-end resolution, and second-to-hour delays need coarser high-end. #64207 set an exponential default per instrument type, but non-timer histogram families (*_count, *_duration, *_delay) still resolve their boundaries at each call site, so the same family can end up shaped differently depending on which module created the instrument.

This adds a declarative metric-name pattern → bucket aggregation map in shared/observability and layers the resulting OTel Views on top of the existing instrument-type baseline. Latency families get exponential buckets, counts get a small linear range, delays get a wide range; the per-instrument-type default still applies and the pattern views only refine it for matching names. Deployments needing a different layout pass an override dict.

Closes #66801

Tests

New test_histogram_buckets.py covers the default pattern map, per-pattern aggregation resolution, the custom-mapping override, and that one View is built per entry. test_otel_logger.py is updated to assert the layered shape (baseline view followed by the pattern views).

before/after on the discriminating test

Reverting otel_logger.py to upstream/main (baseline view only, no pattern views):

FAILED ...test_get_otel_logger_uses_exponential_histogram_view
    assert pattern_names == {"*_count", "*_duration", "*_delay"}
E   AssertionError: set() != {'*_count', '*_delay', '*_duration'}

With the change restored: 10 passed (full test_histogram_buckets.py plus the updated otel_logger assertion).

A standalone MeterProvider(views=build_views_for_patterns()) driving an InMemoryMetricReader confirms the SDK resolves per name end to end: task_duration collects as ExponentialHistogramDataPoint, while schedule_delay / retry_count collect as fixed-boundary HistogramDataPoint. Without the patch all three fall through to the exponential default regardless of name.

@choo121600 choo121600 added the ready for maintainer review Set after triaging when all criteria pass. label May 15, 2026
@1fanwang 1fanwang force-pushed the metrics-otel-bucket-views branch from df02c2d to a6931d8 Compare June 14, 2026 05:43
Follow-up to apache#64207, which set ExponentialBucketHistogramAggregation as
the instrument-type default for OTel histograms. Non-timer histogram
families (*_count, *_duration, *_delay) span very different value ranges
yet still inherit one bucket layout chosen at each call site, so their
distributions are poorly resolved. A single value range cannot serve a
millisecond latency and an hours-long delay equally well.

Closes apache#66801

Signed-off-by: 1fanwang <1fannnw@gmail.com>
@1fanwang 1fanwang force-pushed the metrics-otel-bucket-views branch from a6931d8 to bc5ec90 Compare June 19, 2026 18:45
@1fanwang 1fanwang changed the title feat(metrics/otel): add histogram bucket views per metric-name pattern Metrics: add histogram bucket views per metric-name pattern Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready for maintainer review Set after triaging when all criteria pass.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Declarative OTel histogram bucket views per metric-name pattern

2 participants