Skip to content

Span Metrics Connector support for extrapolated metrics from tracestate ot.th #45539

@yuanyuanzhao3

Description

@yuanyuanzhao3

Component(s)

connector/spanmetricsconnector

Is your feature request related to a problem? Please describe.

OTEP 235 describes the new tracestate encoding for sampling probability. A number of built-in samplers will support this, such as TraceIdRatioBased.

An adjusted_count can be derived from the ot.th key in tracestate. Specifically, the value can be a floating point number (a fractional number instead of an integer). In current spanmetricsconnector, each span counts as exactly one. Their value contribute to the latency histogram as one datapoint and each event carried count as one as well. (Note that span event might be deprecated so we may not need to support it. However, that is a different issue and is beyond the scope here.

Current data structures used also just support integral counts. In order to support fractional count, we propose to use stochastic rounding of the adjusted_count. In addition, there will be attributes to specify whether the metrics are extrapolated. Note that stochastic rounding may introduce discrepancies when the sampling size is small. However, it is statistically accurate when the data set is large enough (TODO: defined large enough). In this regard, the statistical nature is not different from the metrics and histograms themselves.

Guidance will also be given for users to use sampling rate that is the reciprocal of integral adjusted_count. In such a case, stochastic rounding reduces to no-op. However, sampling rate that is the reciprocal of fractional numbers, such as 3/4 will be supported, and will be statistically correct.

Estimating standard error is possible, but the priority is secondary and is at the expense of extra floating point operations that stochastic rounding is aiming to save. We expect users' sampling choice result in statistically significant samples. This is also a basic requirement for using sampling and derived metrics from samples from the first place.

Describe the solution you'd like

We want to use stochastic rounding in order to preserve efficient integer operations.

Describe alternatives you've considered

We've considered to use floating point count. However, this introduces extra overhead and also contention to FPU.

We've also considered to keep aside the fractional adjusted_count along with the integer metrics counter (sum, count) and histograms. However, extra complexity is needed in order to support non-uniform or dynamically changing sampling rates.

Additional context

No response

Tip

React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions