Skip to content

Add dynamic num.stream.threads sizing for Kafka Streams apps#119

Open
suresh-prakash wants to merge 2 commits into
mainfrom
dynamic-stream-threads
Open

Add dynamic num.stream.threads sizing for Kafka Streams apps#119
suresh-prakash wants to merge 2 commits into
mainfrom
dynamic-stream-threads

Conversation

@suresh-prakash
Copy link
Copy Markdown

Summary

Introduces a per-instance auto-sizing path for num.stream.threads. When an app sets num.stream.threads = "DYNAMIC", the framework derives the right thread count from the topology's partition layout and the deployment's replica count — keeping every stream task active without idle threads or noisy under-allocation.

The new threading package contains:

  • DynamicStreamThreadsCountCalculator — sums the maximum partition count across each sub-topology's source topics (one task per partition per sub-topology) and divides by the replica count to get per-instance threads.
  • StreamThreadsCountResolver — bridges topology + AdminClient to the calculator. Falls back to a safe default of 8 threads on any failure (broker outage, missing replica count, AdminClient errors) so the app can still start.

KafkaStreamsApp exposes a new overrideable hook getStreamThreadsCountResolver() returning Optional.empty() by default. Apps that want auto-sizing override it; apps that don't are completely unaffected.

Why

Currently every Kafka Streams app hardcodes num.stream.threads per-cluster in its Helm values. As topics grow partitions or as replica counts change, the hardcoded value drifts from the optimal — leading to idle threads or under-provisioned consumption. This change lets ops set DYNAMIC once and have the right number computed at startup based on actual broker state.

Safety

This is a library change consumed by every Hypertrace Kafka Streams app, so the new flow is engineered to be invisible to apps that don't opt in:

  • StreamThreadsCountResolver.isDynamic(...) short-circuits on the very first check unless the configured value is literally the string "DYNAMIC". Numeric configs (the existing default) take the same code path as before.
  • The entire dynamic-resolution call site in doInit() is wrapped in try { ... } catch (Throwable t) { log; }. Any unexpected failure — resolver bug, AdminClient timeout, classloader issue, NPE — is swallowed and the streams config is left untouched, so KafkaStreams starts with whatever value was originally configured.
  • The resolver itself catches RuntimeException from the calculator and falls back to FALLBACK_NUM_STREAM_THREADS = 8.

Other changes

  • Fixes a typo'd import Optional; (missing java.util.) in KafkaStreamsApp.java.
  • Adds mockito-junit-jupiter:5.2.0 as a test dep so @ExtendWith(MockitoExtension.class) resolves; matches the existing mockito-core:5.2.0 version.

Test plan

  • Unit test: single sub-topology partition→threads math
  • Unit test: multi sub-topology summation
  • Unit test: absent topic counts as zero partitions
  • Unit test: zero/negative replicas throws (calculator) and falls back (resolver)
  • Unit test: total tasks zero returns 1
  • Unit test: resolver falls back to 8 on calculator throw
  • Unit test: resolver falls back on missing/zero/negative replica count
  • Unit test: resolver delegates to calculator with configured replicas
  • Unit test: isDynamic true on sentinel, false on numeric/absent
  • All 12 tests pass on JDK 11 (CI's target)
  • Reviewer to validate the safety guarantee for non-DYNAMIC apps (no behavior change expected)

Introduce StreamThreadsCountResolver and DynamicStreamThreadsCountCalculator
under a new `threading` package, and wire them into KafkaStreamsApp via
an overrideable getStreamThreadsCountResolver() hook.

When num.stream.threads is set to the sentinel "DYNAMIC", the framework
sums the maximum partition count across each sub-topology's source topics
and divides by the configured replica count, producing a per-instance
thread count that keeps every task active without idle threads.

Apps that don't opt in are unaffected: isDynamic() short-circuits unless
the value is literally "DYNAMIC", and the resolution path is wrapped in
try/catch so any unexpected failure logs and leaves the streams config
untouched. Bad broker calls or absent topics fall back to a safe default
of 8 threads instead of preventing startup.

Also fixes a typo'd `import Optional;` left from an earlier refactor.

Tests cover the partition math, multi-subtopology summation, absent-topic
handling, replica-count edge cases, sentinel detection, and the resolver's
fallback behavior.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 1, 2026

Test Results

 7 files   -  8   7 suites   - 8   14s ⏱️ -15s
28 tests  - 40  27 ✅  - 41  0 💤 ±0  1 ❌ +1 
28 runs   - 58  27 ✅  - 59  0 💤 ±0  1 ❌ +1 

For more details on these failures, see this check.

Results for commit 9187236. ± Comparison against base commit 0aa8c44.

This pull request removes 41 and adds 1 tests. Note that renamed tests count towards both.
org.hypertrace.core.kafkastreams.framework.ConsolidatedServiceTest ‑ checkDataFlowsThroughBothSubTopologies()
org.hypertrace.core.kafkastreams.framework.SampleAppTest ‑ baseStreamsConfigTest()
org.hypertrace.core.kafkastreams.framework.SampleAppTest ‑ shouldIncludeValueWithLengthGreaterThanFive()
org.hypertrace.core.kafkastreams.framework.SampleAsyncAppTest ‑ asyncTransformationTest()
org.hypertrace.core.kafkastreams.framework.SampleAsyncAppTest ‑ baseStreamsConfigTest()
org.hypertrace.core.kafkastreams.framework.SampleAsyncAppTest ‑ testUnderlyingReturnsNull()
org.hypertrace.core.kafkastreams.framework.exceptionhandlers.IgnoreProductionExceptionHandlerTest ‑ continueWithConfiguredException()
org.hypertrace.core.kafkastreams.framework.exceptionhandlers.IgnoreProductionExceptionHandlerTest ‑ continueWithConfiguredMultipleExceptions()
org.hypertrace.core.kafkastreams.framework.exceptionhandlers.IgnoreProductionExceptionHandlerTest ‑ failWithConfiguredException()
org.hypertrace.core.kafkastreams.framework.exceptionhandlers.IgnoreProductionExceptionHandlerTest ‑ failWithConfiguredMultipleExceptions()
…
Gradle Test Executor 1 ‑ failed to execute tests

♻️ This comment has been updated with latest results.

@suresh-prakash suresh-prakash marked this pull request as ready for review June 1, 2026 06:29
@suresh-prakash suresh-prakash requested a review from a team as a code owner June 1, 2026 06:29
Adds a concrete code snippet showing how a consumer app wires its own
replica-count source through StreamThreadsCountResolver, so adopters
don't have to read the calculator/resolver internals to integrate.
Also clarifies that apps not overriding the hook are unaffected.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant