-
Notifications
You must be signed in to change notification settings - Fork 537
Description
What's wrong?
We've been migrating from YACE to using Alloy with the embedded cloudwatch exporter. We are seeing an issue with the exported metrics: the values for the latest datapoint are not the same as the metrics in Cloudwatch (bug in YACE). In our YACE config, we had a delay for each metric for a few minutes, which would allow the metric datapoint value to stabilize in Cloudwatch, and we would then store the stable value to Mimir.
As such, I do not think setting the delay to 0 in the exporter config is the correct thing to do - the latest datapoint (and often the second-to-last one) are missing values, which makes the averages and sums be too low, regardless of the length and the delay parameters used. Having a delay is the only way we have so far found that allow us to fetch accurate metrics, so I believe this is a bug in Alloy.
Steps to reproduce
- Fetch some volatile metric from Cloudwatch using Alloy's Cloudwatch exporter. I've used Lambda invocations for a function that is invoked ~3k times per second, with period of 1 minute, length of 5 minutes and scrape interval of 1 minute.
- Observe both the actual Cloudwatch metric and the one imported using the exporter, e.g. in a same dashboard. Notice that the imported metric is never quite the same than the one from Cloudwatch. The imported metric in the screenshot is offset by -2 minutes to more closely align with the shape of the Cloudwatch metric.
-
The latest value from Cloudwatch is low, but it grows for more than a minute, until it reaches a stable value that no longer changes.
-
This could be solved with changing the delay parameter of the metric to 2 minutes and using Cloudwatch timestamps. For some other metrics, 5 minute delay could be more suitable.
System information
Linux amd64 and arm64
Software version
Grafana Alloy 1.7.1
Configuration
prometheus.exporter.cloudwatch "default" {
sts_region = "eu-central-1"
discovery {
type = "AWS/Lambda"
regions = ["eu-central-1"]
metric {
name = "Invocations"
statistics = ["Sum"]
period = "1m"
length = "5m"
}
}
}
prometheus.scrape "cloudwatch_exporter" {
clustering {
enabled = true
}
targets = prometheus.exporter.cloudwatch.default.targets
scrape_interval = "1m"
forward_to = [prometheus.relabel.cloudwatch_exporter_relabel.receiver]
}
Logs
