Skip to content

ref(config): Introduce multi-topic config, deprecate old fields#663

Merged
untitaker merged 17 commits into
mainfrom
multi-topic-config
Jun 2, 2026
Merged

ref(config): Introduce multi-topic config, deprecate old fields#663
untitaker merged 17 commits into
mainfrom
multi-topic-config

Conversation

@untitaker

@untitaker untitaker commented Jun 2, 2026

Copy link
Copy Markdown
Member

ref STREAM-1042

We want to eventually support consuming multiple topics from a single taskbroker replica; https://app.notion.com/p/sentry/Multi-topic-management-for-taskbroker-3728b10e4b5d80adba06e127c5fac612 -- this is mainly for raw mode, so that we can consume multiple raw topics.

This is the first part that migrates the config to the new format. It is fully backwards compatible and old fields will print deprecation warnings. Multi-topic support is not actually implemented and defining more than one topic will result in a validation error.

Fields are migrated in Config::validate(), and it's renamed to normalize_and_validate. consumable_topic() has been introduced to make migration of the application less miserable, but when the actual support for multi-topic is implemented, that method has to go away.

Config::default() no longer produces a valid config. Instead, one has to run Config::default() (what figment uses), then always call normalize_and_validate afterwards to get a config that the rest of the app code can actually deal with.

What has been migrated is:

  • retry topic config
  • DLQ settings (kafka_deadletter_cluster is now just... a cluster in kafka_clusters)
  • main consumer topic

What has not been migrated:

  • kafka_consume_retry_topic has been removed entirely because the migration is too complicated. It is not currently used and I think until we have proper multi-topic support, we can work without it.
  • Kafka session timeout and commit interval remain global options since there is currently no need to configure them per-topic (or... per-cluster?)
  • the global settings auto offset reset, session timeout, and commit interval can now also be configured at a per-topic level, however the old fields are not deprecated. they serve as defaults instead.

Other changes:

While grinding through bugbot comments, I had to add a ton of extra validation just to deal with the fact that we're sharing a producer across many topics. The old config was more restrictive in what kind of values it allowed to set (for example, retry topic didn't have a dedicated retry cluster setting), and the new format structurally allows for too many possibilities that are not actually handled in code.

I think we should follow up and actually create separate producers for DLQ vs retry topic vs demoted namespaces, then this unnecessary restriction also goes away. It's also a footgun anyway since the k8s deployment templates don't validate against this either.

@linear-code

linear-code Bot commented Jun 2, 2026

Copy link
Copy Markdown

STREAM-1042

STREAM-1085

@untitaker untitaker changed the title multi topic config ref(config): Introduce multi-topic config, deprecate old fields Jun 2, 2026
untitaker and others added 3 commits June 2, 2026 13:49
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The upkeep tests built Config literals (overriding kafka_deadletter_topic
after normalization, or skipping normalization entirely), so
kafka_producer_config() panicked because the deadletter topic was never
registered in kafka_topics.

Introduce create_integration_config_from_base(base) as the single
normalized-config builder and rebuild the other integration helpers on
top of it. Tests now pass their overrides via the base config so
normalization runs after all fields are set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@untitaker untitaker marked this pull request as ready for review June 2, 2026 12:13
@untitaker untitaker requested a review from a team as a code owner June 2, 2026 12:13
Comment thread src/config.rs Outdated
untitaker and others added 3 commits June 2, 2026 14:27
The legacy normalization path keys topics and clusters by name. A
deadletter topic configured with the same name as the consumed topic
previously collapsed silently into the main topic's entry (via
entry().or_insert_with()), so the deadletter producer would resolve to
the main cluster instead of the deadletter cluster — misrouting
deadletter messages.

Use the insert return value to reject that collision with a clear error,
and assert the remaining internal inserts (clusters, main topic) never
overwrite an existing entry. The intentional retry-topic alias of the
main topic is left as-is.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/config.rs
Comment thread src/upkeep.rs
Comment thread src/test_utils.rs Outdated
untitaker and others added 2 commits June 2, 2026 14:42
The retry topic may alias the main topic (retries get re-enqueued there),
but aliasing the deadletter topic silently gave the retry topic the
deadletter cluster/role via entry().or_insert_with(). Reject that
collision during normalization.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/upkeep.rs
The forwarding producer reuses the deadletter cluster's credentials and
only overrides bootstrap.servers, so forwarding to a different cluster
with its own auth can fail to publish. Behavior is unchanged for
compatibility, but we now warn when the forward cluster differs from the
deadletter cluster and the latter has auth configured.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Comment thread src/config.rs

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fb61fe7. Configure here.

Comment thread src/config.rs
untitaker and others added 3 commits June 2, 2026 14:58
The upkeep producer connects to the deadletter topic's cluster but is
also reused to publish retries to the retry topic (or the consumed topic
when none is configured). A single producer reaches only one cluster, so
a deadletter topic on a different cluster than the retry target silently
misroutes retries to the wrong brokers.

Reject that during normalization. Replaces a bogus test that asserted the
deadletter topic could live on its own cluster, which the shared-producer
implementation cannot actually support.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
In the new format kafka_retry_topic was never registered as a topic, so a
retry topic on a different cluster would silently fall through to the
deadletter producer's cluster with no validation. Require the retry
target to be a declared topic so its cluster is known and checked against
the deadletter cluster.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
kafka_consumer_group is documented as deprecated and participates in the
legacy/new-format mutual-exclusivity check, but unlike the other
deprecated fields it emitted no warning when set on its own. Add the
matching deprecation warning.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@evanh evanh left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not have normalize_and_validate be called in the default() function directly? Why make it separate?

@untitaker

untitaker commented Jun 2, 2026

Copy link
Copy Markdown
Member Author

Why not have normalize_and_validate be called in the default() function directly? Why make it separate?

because the flow is like this:

  1. default is created
  2. figment is overwriting based on envvars
  3. we validate (this kind of happens implicitly when we use the config, we have various assertions around the codebase)

that's how it works in main. with this PR this doesn't really change:

  1. default
  2. figment
  3. normalize and validate

if we normalize before figment, or validate before it, we'll just not validate or normalize any user input.

same in tests: we get a default config, overwrite whatever is necessary, then validate/normalize.

@untitaker untitaker merged commit 0bfb1b6 into main Jun 2, 2026
25 checks passed
@untitaker untitaker deleted the multi-topic-config branch June 2, 2026 14:25
untitaker added a commit to getsentry/sentry that referenced this pull request Jun 3, 2026
…116758)

See getsentry/taskbroker#667 for details.

Same kafka config migration as the self-hosted change, but for the
`ingest-profiles` taskbroker in devservices. Deprecation warnings land
as per getsentry/taskbroker#663

`ingest-profiles` runs in raw mode, and raw mode now requires an
explicit retry topic: raw payloads aren't activations, so retries can't
loop back into the `profiles` topic. Retries now go to the main
`taskworker` topic so the existing taskbroker picks them up instead of
running another broker.

Depends on getsentry/taskbroker#668, which wires
up per-topic raw mode and enforces the retry-topic requirement. Once
that ships, the current devenv config breaks without this change.

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants