Skip to content

fix(config): legacy retry topic uses deadletter cluster#666

Merged
untitaker merged 2 commits into
mainfrom
fix/legacy-retry-topic-deadletter-cluster
Jun 2, 2026
Merged

fix(config): legacy retry topic uses deadletter cluster#666
untitaker merged 2 commits into
mainfrom
fix/legacy-retry-topic-deadletter-cluster

Conversation

@untitaker

@untitaker untitaker commented Jun 2, 2026

Copy link
Copy Markdown
Member

ref STREAM-1042

Follow-up to #663, which added the normalize_and_validate check that the retry target and deadletter topic must resolve to the same cluster (they share the upkeep producer). This fixes a false positive in that check for legacy configs.

In legacy normalization a distinct kafka_retry_topic was registered on DEFAULT_CLUSTER (the main consumer cluster). But retries are produced by the upkeep producer, which is the same producer used for the DLQ and connects to the deadletter topic's cluster (kafka_producer_configkafka_producer_cluster). The same-cluster check then compared the retry topic's mis-assigned cluster against the deadletter cluster and rejected configs where the main consumer cluster differs from the deadletter cluster — even though retries would have gone to the deadletter cluster at runtime as intended.

Fix: register the legacy retry topic on DEADLETTER_CLUSTER so the config model matches the cluster the producer actually uses. This only diverges when kafka_deadletter_cluster is explicitly set; when unset it falls back to the main address, so the change is a no-op for those pools.

See test_legacy_retry_topic_uses_deadletter_cluster for the repro — it fails with the verbatim production error without the fix.

🤖 Generated with Claude Code

In legacy-config normalization a distinct kafka_retry_topic was
registered on DEFAULT_CLUSTER (the main consumer cluster). But retries
are produced by the upkeep producer, which is the same producer used for
the DLQ and connects to the deadletter topic's cluster
(kafka_producer_config -> kafka_producer_cluster). The same-cluster
validation then compared the retry topic's mis-assigned DEFAULT_CLUSTER
against the deadletter cluster and falsely rejected configs where the
main consumer cluster differs from the deadletter cluster.

This only diverges when kafka_deadletter_cluster is explicitly set; when
it is unset the deadletter cluster falls back to the main address, so the
change is a no-op for those pools. For the ingest-profiles-raw pool the
main consumer is on kafka-profiles while retry+DLQ are co-located on
kafka-small, which surfaced the bug:

  retry target topic 'taskworker-ingest' is on cluster 'kafka-profiles...',
  but deadletter topic 'taskworker-ingest-dlq' is on 'kafka-small...';
  they share a single producer and must be on the same cluster

Register the legacy retry topic on DEADLETTER_CLUSTER so the config model
matches the cluster the producer actually connects to.

ref STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@untitaker untitaker requested a review from a team as a code owner June 2, 2026 15:14
@linear-code

linear-code Bot commented Jun 2, 2026

Copy link
Copy Markdown

STREAM-1042

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@untitaker untitaker merged commit ab4bcd2 into main Jun 2, 2026
25 checks passed
@untitaker untitaker deleted the fix/legacy-retry-topic-deadletter-cluster branch June 2, 2026 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants