-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Component(s)
connector/spanmetricsconnector
Is your feature request related to a problem? Please describe.
OTEP 235 describes the new tracestate encoding for sampling probability. A number of built-in samplers will support this, such as TraceIdRatioBased.
An adjusted_count can be derived from the ot.th key in tracestate. Specifically, the value can be a floating point number (a fractional number instead of an integer). In current spanmetricsconnector, each span counts as exactly one. Their value contribute to the latency histogram as one datapoint and each event carried count as one as well. (Note that span event might be deprecated so we may not need to support it. However, that is a different issue and is beyond the scope here.
Current data structures used also just support integral counts. In order to support fractional count, we propose to use stochastic rounding of the adjusted_count. In addition, there will be attributes to specify whether the metrics are extrapolated. Note that stochastic rounding may introduce discrepancies when the sampling size is small. However, it is statistically accurate when the data set is large enough (TODO: defined large enough). In this regard, the statistical nature is not different from the metrics and histograms themselves.
Guidance will also be given for users to use sampling rate that is the reciprocal of integral adjusted_count. In such a case, stochastic rounding reduces to no-op. However, sampling rate that is the reciprocal of fractional numbers, such as 3/4 will be supported, and will be statistically correct.
Estimating standard error is possible, but the priority is secondary and is at the expense of extra floating point operations that stochastic rounding is aiming to save. We expect users' sampling choice result in statistically significant samples. This is also a basic requirement for using sampling and derived metrics from samples from the first place.
Describe the solution you'd like
We want to use stochastic rounding in order to preserve efficient integer operations.
Describe alternatives you've considered
We've considered to use floating point count. However, this introduces extra overhead and also contention to FPU.
We've also considered to keep aside the fractional adjusted_count along with the integer metrics counter (sum, count) and histograms. However, extra complexity is needed in order to support non-uniform or dynamically changing sampling rates.
Additional context
No response
Tip
React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.