Implement Azure Metrics Exporter#693
Conversation
contrib/opencensus-ext-azure/opencensus/ext/azure/common/protocol.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/common/protocol.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/common/protocol.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/metrics_exporter/__init__.py
Show resolved
Hide resolved
| # which contains the aggregated value | ||
| for point in time_series.points: | ||
| if point.value is not None: | ||
| data_point = DataPoint(ns=metric.descriptor.name, |
There was a problem hiding this comment.
Would the name allow something like a namespace? E.g. https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/HTTP.md#measures
If yes, we might consider to split the name string to namespace and the actual name?
There was a problem hiding this comment.
Interesting solution. Given the max length of name is 1024 characters, it is possible to encode the namespace and actual name inside the "name" field. Having a namespace field is useful as it categorizes what kind of metric its for, and the name serves as the specific type of metric. However there must be some documentation on how to use the encoding and for corner cases as well. (e.g. if we do name=:<actual_name>, what happens when either of them is blank or the ":" is missing?)
There was a problem hiding this comment.
I'd say we leave it as it is for now, I'd rather change the DataPoint API than having to encode something in name.
reyang
left a comment
There was a problem hiding this comment.
Looks good in general. Need to have the test cases and pass CI.
| if metric.descriptor.type == MetricDescriptorType.CUMULATIVE_DISTRIBUTION: | ||
| continue | ||
| envelopes.append(self.metric_to_envelope(metric)) | ||
| self._transmit(envelopes) |
There was a problem hiding this comment.
What if transmission failed?
There was a problem hiding this comment.
How to control the max batch size (number of items) that we export?
There was a problem hiding this comment.
Since we do not have a defined retry logic for Metrics yet, the exporter simply ignores the transmissions that have failed and logs an error message with the given status code. I do not want to throw and exception and stop the application if a single transmission fails.
There was a problem hiding this comment.
We have not yet defined what "number of items" means in terms of metrics (# of metrics, time series or points?). The injestion service also only accepts a single metric at a time, the list passed in is simply for convention (so match with stats design). I say we can keep it as it is for now (sends all time series passed in) until we can define what this means.
There was a problem hiding this comment.
_transmit has dependency on the storage, we need to decouple it (or make storage optional if we don't use it in metrics exporter).
There was a problem hiding this comment.
"sending all the time series passed in" won't work since the ingestion would start to reject data if we hit a certain limit.
There was a problem hiding this comment.
Yes you are correct. Leaving the logic to run into the exception is not a good practice. I fixed it so that it will only pass to storage if storage is being used (not being used for Metrics exporter). This means that for now, any partial success response we get back from the ingestion service will be discarded.
There was a problem hiding this comment.
The ingestion service only accepts a single data point at a time, so there is no need for batching.
There was a problem hiding this comment.
The ingestion service only accepts a single data point at a time, so there is no need for batching.
It is hard to believe that the ingestion service would be able to serve real customer in production without batching.
contrib/opencensus-ext-azure/opencensus/ext/azure/metrics_exporter/__init__.py
Outdated
Show resolved
Hide resolved
| return blob.put(data, lease_period=lease_period, silent=silent) | ||
|
|
||
|
|
||
| class LocalNoopStorage(object): |
There was a problem hiding this comment.
This is like a mock class for testing, do we want to hack this way?
There was a problem hiding this comment.
Refactored to exporter to include _transmit functionality without retry policy.
| "number of requests", | ||
| "requests") | ||
| NUM_REQUESTS_VIEW = view_module.View("Number of Requests", | ||
| "View for number of requests made", |
| view_manager.register_view(NUM_REQUESTS_VIEW) | ||
| mmap = stats_recorder.new_measurement_map() | ||
| tmap = tag_map_module.TagMap() | ||
| tmap.insert("url", "website.com") |
| # We assume the ordering is already correct | ||
| for i in range(len(metric_descriptor.label_keys)): | ||
| if time_series.label_values[i].value is None: | ||
| value = "None" |
There was a problem hiding this comment.
Think about how to make it generic.
| envelope.data = Data(baseData=data, baseType="MetricData") | ||
| return envelope | ||
|
|
||
| def _transmit_without_retry(self, envelopes): |
There was a problem hiding this comment.
The TODO should cover:
- consolidate with transport logic (instead of duplicating code)
- handle failures properly
contrib/opencensus-ext-azure/opencensus/ext/azure/metrics_exporter/__init__.py
Outdated
Show resolved
Hide resolved
contrib/opencensus-ext-azure/opencensus/ext/azure/metrics_exporter/__init__.py
Outdated
Show resolved
Hide resolved
| mmap.measure_int_put(CARROTS_MEASURE, 1000) | ||
| mmap.record(tmap) | ||
| time.sleep(10) | ||
| time.sleep(60) |
There was a problem hiding this comment.
If the application exits without sleeping, what's the expected result?
What's our recommendation to the users?
There was a problem hiding this comment.
It is possible that an application will exit before the exporter thread has a chance to exporter the metrics based on exporter_interval.
It is recommended for the user to keep their applications running, or at least long enough for the exporter to meet the interval.
For applications that terminate before the default interval (15s), it is fine because we only have up to one minute of granularity for the ingestion service (soon to be 30s), so the data seen prior to one minute does not have much value.
There was a problem hiding this comment.
It is recommended for the user to keep their applications running, or at least long enough for the exporter to meet the interval.
This is vague, what does "long enough" mean? We probably want to document it.
There was a problem hiding this comment.
The minimum requirement, we don't want to tell the user to "pick some lucky number, wait and pray".
There was a problem hiding this comment.
Right. Pit of success.
contrib/opencensus-ext-azure/opencensus/ext/azure/metrics_exporter/__init__.py
Outdated
Show resolved
Hide resolved
| self._transmit_without_retry(envelopes) | ||
| del envelopes[:] | ||
| # If leftover data points in envelopes, send them all | ||
| if envelopes: |
There was a problem hiding this comment.
Need to optimize the logic flow here.
There was a problem hiding this comment.
The logic here is to send a request as soon as "max_batch_size" amount of metrics have been processed instead of waiting for all of the metrics to be processed.
I think it is minimal optimization, it might be cleaner to just iterate through all the metrics, convert them to envelopes, and then send the envelopes batch by batch. What do you think?
There was a problem hiding this comment.
I was thinking about latter, del envelopes[:] and having a separate logic handling the residues outside several nested scope could be hard to maintain.
There was a problem hiding this comment.
Up to you. I don't have strong opinion here.
Add skeleton metrics exporter to azure add bracket (census-instrumentation#690) Implement Azure Metrics Exporter (census-instrumentation#693) fix rst doc for Azure exporter bump version
Add skeleton metrics exporter to azure add bracket (census-instrumentation#690) Implement Azure Metrics Exporter (census-instrumentation#693) fix rst doc for Azure exporter bump version fix comment
This PR implements the Metrics Exporter for Azure Monitor. The underlying logic uses the Stats API.