feat: Add datadog metrics backend by markstory · Pull Request #703 · getsentry/taskbroker

markstory · 2026-06-12T20:09:59Z

We currently require applications to implement their own metrics backend. This has resulted in metrics from the taskworker runtime having inconsistent tags and metric names, which makes building dashboards and alerting for taskworkers tedious.

Having a more opinionated metrics backend will allow us to make metric names and tags that are important to observability structurally required. I've also included a prefixed metric shim which will allow us to switch metrics providers without gaps in observability.

Refs STREAM-816

We currently require applications to implement their own metrics backend. This has resulted in metrics from the taskworker runtime having inconsistent tags and metric names, which makes building dashboards and alerting for taskworkers tedious. Having a more opinionated metrics backend will allow us to make metric names and tags that are important to observability structurally required. I've also included a prefixed metric shim which will allow us to switch metrics providers without gaps in observability. Refs STREAM-816

linear-code · 2026-06-12T20:10:03Z

STREAM-816

cursor · 2026-06-12T20:12:00Z

+        try:
+            yield None
+        finally:
+            self._emit("timing", key, time.monotonic() - start, tags, sample_rate)


Timer sends seconds not milliseconds

High Severity

DatadogMetrics.timer emits elapsed wall time from time.monotonic() directly to DogStatsD timing, but that API expects values in milliseconds. Reported durations for RPC and task timers will be roughly three orders of magnitude too small, breaking latency dashboards and alerts.

^{Reviewed by Cursor Bugbot for commit fe820a1. Configure here.}

cursor · 2026-06-12T20:12:00Z

+
+
+def _rss_bytes() -> int:
+    return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss


Memory metric uses peak RSS

Medium Severity

track_memory_usage derives its distribution from the difference in ru_maxrss, which is a process lifetime high-water mark, not current RSS. Memory freed inside the block does not lower the metric, and on Linux ru_maxrss is in kilobytes despite _rss_bytes implying bytes.

^{Reviewed by Cursor Bugbot for commit fe820a1. Configure here.}

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit 2078b84. Configure here.}

It should be aligned with the other methods.

markstory · 2026-06-12T20:46:26Z

        instance: str | None = None,
        tags: Tags | None = None,
-        sample_rate: float = 1,
+        sample_rate: float | None = None,


This default value has been silly for a while. Now is a good time to change this.

sentry · 2026-06-12T20:48:34Z

+    client.distribution.assert_called_once()
+    args, _ = client.distribution.call_args
+    assert args[0] == "taskworker.mem"
+    assert isinstance(args[1], int)


Bug: The track_memory_usage function incorrectly uses ru_maxrss, a high-water mark for memory, to calculate memory deltas. This can cause test failures and makes the production metric unreliable.
_{Severity: MEDIUM}

Suggested Fix

To accurately measure memory usage within the context, replace resource.getrusage with a method that measures current memory usage, not a historical peak. For example, use a library like psutil to get the current resident set size (RSS) at the beginning and end of the block. The difference between these two values will provide a more accurate measurement of the memory consumed within that specific context.

Prompt for AI Agent

Review the code at the location below. A potential bug has been identified by an AI agent. Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not valid. Location: clients/python/tests/test_metrics.py#L140 Potential issue: The function `track_memory_usage` calculates memory usage deltas using `resource.getrusage(resource.RUSAGE_SELF).ru_maxrss`. This value represents the peak memory usage (a high-water mark) of the process, not the current usage. In the test `test_track_memory_usage`, an assertion `args[1] > 0` expects a positive memory delta after an allocation. However, if previous tests have already set a high memory peak, the new allocation may not increase it, resulting in a delta of zero and a test failure. This also affects production, where the metric will likely report zero for most tasks, making it ineffective for its intended purpose of tracking memory usage per task.

markstory requested a review from a team as a code owner June 12, 2026 20:10

cursor Bot reviewed Jun 12, 2026

View reviewed changes

sentry Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread clients/python/src/taskbroker_client/metrics.py

Comment thread clients/python/src/taskbroker_client/metrics.py

Port more configuration from sentry in

2078b84

cursor Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread clients/python/src/taskbroker_client/metrics.py

sentry Bot reviewed Jun 12, 2026

View reviewed changes

Comment thread clients/python/src/taskbroker_client/metrics.py

Comment thread clients/python/src/taskbroker_client/metrics.py

Fix up default sample_rate for gauge()

0c255ff

It should be aligned with the other methods.

markstory commented Jun 12, 2026

View reviewed changes

sentry Bot reviewed Jun 12, 2026

View reviewed changes

evanh approved these changes Jun 15, 2026

View reviewed changes

Align another setting with sentry

c3a4dce

markstory merged commit 63fadc8 into main Jun 17, 2026
27 checks passed

markstory deleted the metrics-adapter branch June 17, 2026 17:31

sentry-release-bot Bot mentioned this pull request Jun 18, 2026

publish: getsentry/taskbroker/clients@0.19.2 getsentry/publish#8619

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add datadog metrics backend#703

feat: Add datadog metrics backend#703
markstory merged 4 commits into
mainfrom
metrics-adapter

markstory commented Jun 12, 2026

Uh oh!

linear-code Bot commented Jun 12, 2026

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

cursor Bot Jun 12, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstory Jun 12, 2026

Uh oh!

Uh oh!

sentry Bot Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants



		def _rss_bytes() -> int:
		return resource.getrusage(resource.RUSAGE_SELF).ru_maxrss

Uh oh!

Conversation

markstory commented Jun 12, 2026

Uh oh!

linear-code Bot commented Jun 12, 2026

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Timer sends seconds not milliseconds

Uh oh!

cursor Bot Jun 12, 2026

Choose a reason for hiding this comment

Memory metric uses peak RSS

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

markstory Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sentry Bot Jun 12, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants