You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: experimental/metrics/config-service.md
+17-3Lines changed: 17 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,8 +20,8 @@
20
20
21
21
<small><i><ahref='http://ecotrust-canada.github.io/markdown-toc/'>Table of contents generated with markdown-toc</a></i></small>
22
22
23
-
24
23
## Overview
24
+
25
25
The OpenTelemetry Metric Configuration Service adds the ability to dynamically
26
26
and remotely configure metric collection schedules. A user may specify
27
27
collection periods at runtime, and propagate these changes to instrumented
@@ -35,8 +35,8 @@ third-party metric provider has an existing metric configuration service (or
35
35
would like to implement one in the future), and if it communicates using this
36
36
protocol, it may speak directly with our instrumented applications.
37
37
38
-
39
38
## Service Protocol
39
+
40
40
Configuration data is communicated between an SDK and a backend (either directly
41
41
or indirectly through a Collector) using the following protocol specification.
42
42
The SDK is assumed to be the client, and makes the metric config requests. The
@@ -45,6 +45,7 @@ responses. For more details on this arrangement, see
45
45
[below](#push-vs-pull-metric-model).
46
46
47
47
### Metric Config Request
48
+
48
49
A request consists of two fields: `resource` and an optional
49
50
`last_known_fingerprint`.
50
51
@@ -63,10 +64,12 @@ If unspecified, the configuration backend will send the full schedules with each
63
64
request.
64
65
65
66
### Metric Config Response
67
+
66
68
A response consists of three fields `schedules`, `fingerprint`, and
67
69
`suggested_wait_time_sec`.
68
70
69
71
#### Schedules
72
+
70
73
`schedules` is a list of metric schedules. Each schedule consists of three
71
74
components: `exclusion_patterns`, `inclusion_patterns`, and `period_sec`.
72
75
@@ -85,6 +88,7 @@ periods that are divisible by the smallest period (see
85
88
collected
86
89
87
90
#### Fingerprint
91
+
88
92
`fingerprint` is a sequence of bytes that corresponds to the set of schedules
89
93
being sent. There are two requirements on computing fingerprints:
90
94
@@ -97,12 +101,14 @@ is the same as the response’s `last_known_fingerprint`, then all other fields
97
101
the response are optional.
98
102
99
103
#### Wait Time
104
+
100
105
`suggested_wait_time_sec` is a duration (in seconds) that the SDK should wait
101
106
before sending the next metric config request. A response MAY have a
102
107
`suggested_wait_time_sec`, but its use is optional, and the SDK need not obey
103
108
it. As the name implies, it is simply a suggestion.
104
109
105
110
### Push vs Pull Metric Model
111
+
106
112
Note that the configuration service assumes a “push” model of metric export --
107
113
that is, metrics are pushed from the SDK to a receiving backend. The backend
108
114
serves incoming requests that contain metric data. This is in contrast to the
@@ -114,8 +120,8 @@ metrics, and the need for our configuration service is less relevant. We
114
120
therefore assume that all systems using the configuration service deliver
115
121
metrics on a push-based model.
116
122
117
-
118
123
## Implementation Details
124
+
119
125
Because this specification is experimental, and may imply substantial changes to
120
126
the existing system, we provide additional details on the example prototype
121
127
implementations available on the
@@ -125,6 +131,7 @@ actual implementation in an SDK will likely differ. We offer these details not
125
131
as formal specification, but as an example of how this system might look.
126
132
127
133
### Collection Periods
134
+
128
135
Though the protocol does not enforce specific collection periods, the SDK MAY
129
136
assume that all larger collection periods will be divisible by the smallest
130
137
period in a set of schedules, for the sake of optimization. Indeed, it is
@@ -149,6 +156,7 @@ However, the SDK MUST still be able to handle periods of any nonzero integer
149
156
duration, even if they violate the divisibility suggestion.
150
157
151
158
### Go SDK
159
+
152
160
A prototype implementation of metric configuration is available for the Go SDK,
153
161
currently hosted on the [contrib repo](https://github.com/vmingchen/opentelemetry-go-contrib). It provides an
154
162
alternative push controller component with the ability to configure collection
@@ -166,6 +174,7 @@ controller, in place of OpenTelemetry’s version, to be able to have access to
166
174
this feature.
167
175
168
176
### Collector Extension
177
+
169
178
An example configuration backend is implemented as an extension for the
170
179
Collector, currently hosted on the [contrib repo](https://github.com/vmingchen/opentelemetry-collector-contrib). When this extension is enabled, the Collector
171
180
functions as a potential endpoint for Agent/SDKs to retrieve configuration data.
@@ -184,6 +193,7 @@ The configuration data itself may be specified using one of two sources: a local
184
193
file or a connection to a remote backend.
185
194
186
195
#### Local File
196
+
187
197
Configuration data can be specified in the form of a local file that the
188
198
collector monitors. Changes to the file are immediately reflected in the
189
199
Collector’s in-memory representation of the data, so there is no need to restart
@@ -213,6 +223,7 @@ ConfigBlocks:
213
223
```
214
224
215
225
The following rules govern the file-based configurations:
226
+
216
227
* There MUST be 1 ConfigBlock or more in a ConfigBlocks list
217
228
* Each ConfigBlock MAY have a field Resource
218
229
* Resource MAY have one or more strings, each a string-representation of a key-value label in a resource. If no strings are specified, then this ConfigBlock matches with any request
@@ -222,6 +233,7 @@ The following rules govern the file-based configurations:
222
233
* Each Schedule MUST have a field Period, corresponding to the collection period of the metrics matched by this Schedule
223
234
224
235
##### Matching Behavior
236
+
225
237
An incoming request specifies a resource for which configuration data should be
226
238
returned. A ConfigBlock matches a resource if all strings listed under
227
239
ConfigBlock::Resource exactly equal a key-value label in the resource. In the
@@ -236,11 +248,13 @@ across telemetry sources, unless superseded by a more specific ConfigBlock that
236
248
asks for a shorter period.
237
249
238
250
##### Fingerprint Hashing
251
+
239
252
Fingerprints are generated using an FNVa 64 bit hashing scheme. The hash is
240
253
uniquely determined by the contents of a ConfigBlock. The order of patterns and
241
254
the order of schedules do not impact the resulting hash.
242
255
243
256
#### Remote Backend
257
+
244
258
Alternatively, instead of using a local file, the Collector may use another
245
259
configuration service backend. This remote backend could be another Collector,
246
260
or it could be a third party that implements the configuration service. In the
to output and handle all logs, including errors. Custom handlers and filters can be registered both in code and using the
104
-
Java logging configuration file.
102
+
to output and handle all logs, including errors. Custom handlers and filters can be registered both in code and using the Java logging configuration file.
Copy file name to clipboardExpand all lines: specification/glossary.md
+1-6Lines changed: 1 addition & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -30,6 +30,7 @@ Some other fundamental terms are documented in the [overview document](overview.
30
30
31
31
<aname="in-band"></a>
32
32
<aname="out-of-band"></a>
33
+
33
34
### In-band and Out-of-band Data
34
35
35
36
> In telecommunications, **in-band signaling** is the sending of control information within the same band or channel used for data such as voice or video. This is in contrast to **out-of-band signaling** which is sent over a different channel, or even over a separate network ([Wikipedia](https://en.wikipedia.org/wiki/In-band_signaling)).
@@ -46,15 +47,13 @@ usually asynchronously by background routines
46
47
rather than from the critical path of the business logic.
47
48
Metrics, logs, and traces exported to telemetry backends are examples of out-of-band data.
48
49
49
-
<aname="telemetry_sdk"></a>
50
50
### Telemetry SDK
51
51
52
52
Denotes the library that implements the *OpenTelemetry API*.
53
53
54
54
See [Library Guidelines](library-guidelines.md#sdk-implementation) and
[Transient errors](#transient-errors) MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered.
55
+
[Transient errors](#transient-errors) MUST be handled with a retry strategy. This retry strategy MUST implement an exponential back-off with jitter to avoid overwhelming the destination until the network is restored or the destination has recovered.
56
56
57
57
## Transient Errors
58
+
58
59
Transient errors are errors which expect the backend to recover. The following status codes are defined as transient errors:
0 commit comments