Skip to content
This repository was archived by the owner on Sep 17, 2025. It is now read-only.

Implement Azure Metrics Exporter#693

Merged
reyang merged 45 commits intocensus-instrumentation:masterfrom
lzchen:metrics2
Jun 28, 2019
Merged

Implement Azure Metrics Exporter#693
reyang merged 45 commits intocensus-instrumentation:masterfrom
lzchen:metrics2

Conversation

@lzchen
Copy link
Copy Markdown
Contributor

@lzchen lzchen commented Jun 21, 2019

This PR implements the Metrics Exporter for Azure Monitor. The underlying logic uses the Stats API.

  • Uses asynchronous polling of metrics on a different thread periodically to be exported to Azure Monitor.
  • Recording of measures is done synchronously
  • Histograms (ValueDistributions) are not supported since to view metrics in Azure Monitor as histograms, you use queries to construct the summary view that you want
  • Label Key/Value are inserted into Azure Monitor as "properties"

@lzchen lzchen requested review from a team, c24t, reyang and songy23 as code owners June 21, 2019 23:21
# which contains the aggregated value
for point in time_series.points:
if point.value is not None:
data_point = DataPoint(ns=metric.descriptor.name,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would the name allow something like a namespace? E.g. https://github.com/census-instrumentation/opencensus-specs/blob/master/stats/HTTP.md#measures
If yes, we might consider to split the name string to namespace and the actual name?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting solution. Given the max length of name is 1024 characters, it is possible to encode the namespace and actual name inside the "name" field. Having a namespace field is useful as it categorizes what kind of metric its for, and the name serves as the specific type of metric. However there must be some documentation on how to use the encoding and for corner cases as well. (e.g. if we do name=:<actual_name>, what happens when either of them is blank or the ":" is missing?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say we leave it as it is for now, I'd rather change the DataPoint API than having to encode something in name.

Copy link
Copy Markdown
Contributor

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good in general. Need to have the test cases and pass CI.

if metric.descriptor.type == MetricDescriptorType.CUMULATIVE_DISTRIBUTION:
continue
envelopes.append(self.metric_to_envelope(metric))
self._transmit(envelopes)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if transmission failed?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How to control the max batch size (number of items) that we export?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we do not have a defined retry logic for Metrics yet, the exporter simply ignores the transmissions that have failed and logs an error message with the given status code. I do not want to throw and exception and stop the application if a single transmission fails.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not yet defined what "number of items" means in terms of metrics (# of metrics, time series or points?). The injestion service also only accepts a single metric at a time, the list passed in is simply for convention (so match with stats design). I say we can keep it as it is for now (sends all time series passed in) until we can define what this means.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_transmit has dependency on the storage, we need to decouple it (or make storage optional if we don't use it in metrics exporter).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"sending all the time series passed in" won't work since the ingestion would start to reject data if we hit a certain limit.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes you are correct. Leaving the logic to run into the exception is not a good practice. I fixed it so that it will only pass to storage if storage is being used (not being used for Metrics exporter). This means that for now, any partial success response we get back from the ingestion service will be discarded.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingestion service only accepts a single data point at a time, so there is no need for batching.

https://github.com/microsoft/ApplicationInsights-Home/blob/master/EndpointSpecs/Schemas/Bond/MetricData.bond

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ingestion service only accepts a single data point at a time, so there is no need for batching.

It is hard to believe that the ingestion service would be able to serve real customer in production without batching.

return blob.put(data, lease_period=lease_period, silent=silent)


class LocalNoopStorage(object):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is like a mock class for testing, do we want to hack this way?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to exporter to include _transmit functionality without retry policy.

"number of requests",
"requests")
NUM_REQUESTS_VIEW = view_module.View("Number of Requests",
"View for number of requests made",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove "View of".

view_manager.register_view(NUM_REQUESTS_VIEW)
mmap = stats_recorder.new_measurement_map()
tmap = tag_map_module.TagMap()
tmap.insert("url", "website.com")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to be concise here.

# We assume the ordering is already correct
for i in range(len(metric_descriptor.label_keys)):
if time_series.label_values[i].value is None:
value = "None"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think about how to make it generic.

envelope.data = Data(baseData=data, baseType="MetricData")
return envelope

def _transmit_without_retry(self, envelopes):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leave a todo comment here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The TODO should cover:

  1. consolidate with transport logic (instead of duplicating code)
  2. handle failures properly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mmap.measure_int_put(CARROTS_MEASURE, 1000)
mmap.record(tmap)
time.sleep(10)
time.sleep(60)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the application exits without sleeping, what's the expected result?
What's our recommendation to the users?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is possible that an application will exit before the exporter thread has a chance to exporter the metrics based on exporter_interval.

It is recommended for the user to keep their applications running, or at least long enough for the exporter to meet the interval.

For applications that terminate before the default interval (15s), it is fine because we only have up to one minute of granularity for the ingestion service (soon to be 30s), so the data seen prior to one minute does not have much value.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is recommended for the user to keep their applications running, or at least long enough for the exporter to meet the interval.

This is vague, what does "long enough" mean? We probably want to document it.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The minimum requirement, we don't want to tell the user to "pick some lucky number, wait and pray".

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. Pit of success.

self._transmit_without_retry(envelopes)
del envelopes[:]
# If leftover data points in envelopes, send them all
if envelopes:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to optimize the logic flow here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic here is to send a request as soon as "max_batch_size" amount of metrics have been processed instead of waiting for all of the metrics to be processed.

I think it is minimal optimization, it might be cleaner to just iterate through all the metrics, convert them to envelopes, and then send the envelopes batch by batch. What do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about latter, del envelopes[:] and having a separate logic handling the residues outside several nested scope could be hard to maintain.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Up to you. I don't have strong opinion here.

@reyang reyang merged commit ed28627 into census-instrumentation:master Jun 28, 2019
lzchen added a commit to lzchen/opencensus-python that referenced this pull request Jul 15, 2019
lzchen added a commit to lzchen/opencensus-python that referenced this pull request Jul 18, 2019
lzchen added a commit to lzchen/opencensus-python that referenced this pull request Jul 19, 2019
lzchen added a commit to lzchen/opencensus-python that referenced this pull request Jul 22, 2019
Add skeleton metrics exporter to azure

add bracket (census-instrumentation#690)

Implement Azure Metrics Exporter (census-instrumentation#693)

fix rst doc for Azure exporter

bump version
lzchen added a commit to lzchen/opencensus-python that referenced this pull request Jul 22, 2019
Add skeleton metrics exporter to azure

add bracket (census-instrumentation#690)

Implement Azure Metrics Exporter (census-instrumentation#693)

fix rst doc for Azure exporter

bump version

fix comment
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants