Skip to content

Cloudwatch metrics are too low #3929

@juissi-t

Description

@juissi-t

What's wrong?

We've been migrating from YACE to using Alloy with the embedded cloudwatch exporter. We are seeing an issue with the exported metrics: the values for the latest datapoint are not the same as the metrics in Cloudwatch (bug in YACE). In our YACE config, we had a delay for each metric for a few minutes, which would allow the metric datapoint value to stabilize in Cloudwatch, and we would then store the stable value to Mimir.

As such, I do not think setting the delay to 0 in the exporter config is the correct thing to do - the latest datapoint (and often the second-to-last one) are missing values, which makes the averages and sums be too low, regardless of the length and the delay parameters used. Having a delay is the only way we have so far found that allow us to fetch accurate metrics, so I believe this is a bug in Alloy.

Steps to reproduce

  1. Fetch some volatile metric from Cloudwatch using Alloy's Cloudwatch exporter. I've used Lambda invocations for a function that is invoked ~3k times per second, with period of 1 minute, length of 5 minutes and scrape interval of 1 minute.
  2. Observe both the actual Cloudwatch metric and the one imported using the exporter, e.g. in a same dashboard. Notice that the imported metric is never quite the same than the one from Cloudwatch. The imported metric in the screenshot is offset by -2 minutes to more closely align with the shape of the Cloudwatch metric.

Image

  1. The latest value from Cloudwatch is low, but it grows for more than a minute, until it reaches a stable value that no longer changes.

  2. This could be solved with changing the delay parameter of the metric to 2 minutes and using Cloudwatch timestamps. For some other metrics, 5 minute delay could be more suitable.

System information

Linux amd64 and arm64

Software version

Grafana Alloy 1.7.1

Configuration

prometheus.exporter.cloudwatch "default" {
	sts_region = "eu-central-1"

	discovery {
		type    = "AWS/Lambda"
		regions = ["eu-central-1"]

		metric {
			name       = "Invocations"
			statistics = ["Sum"]
			period     = "1m"
			length     = "5m"
		}
	}
}

prometheus.scrape "cloudwatch_exporter" {
	clustering {
		enabled = true
	}

	targets         = prometheus.exporter.cloudwatch.default.targets
	scrape_interval = "1m"
	forward_to      = [prometheus.relabel.cloudwatch_exporter_relabel.receiver]
}

Logs


Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions