Skip to content

[GSoC 2026] Kafka Streams runner — testing infra: test runner + Create + first @ValidatesRunner #39192

Description

@junaiddshaukat

Tracking issue: #18479
Follows: #39141 (GroupByKey, merged)

Summary

Build out the testing infrastructure so the runner becomes self-validating instead of relying on hand-rolled TopologyTestDriver tests + manual review. The GroupByKey output-timestamp bug (#39141) is a good example of what an automated @ValidatesRunner suite would have caught. Three parts, agreed with @je-ik; each can land as its own PR.

Scope

a) KafkaStreamsTestRunner

Extract the boilerplate every translation test repeats — PipelineTranslation.toProto
-> KafkaStreamsPipelineTranslator.translate -> TopologyTestDriver — into a single
reusable test helper/runner that takes a Pipeline, builds the topology, and drives
it (including the repartition-topic round-trip GroupByKeyTest does by hand). Cuts
the per-test churn and is the foundation for the @ValidatesRunner harness.

b) Support Create

Enable Create.of(...) so tests can build inputs. Create.Values can expand either
via Impulse + ParDo (which we already support) or via Read(CreateSource) (which
needs SplittableParDo, not yet supported). First step: confirm which expansion
the portable pipeline uses and, if it's the Read/SDF path, force the
Impulse-based expansion. Likely little new translation code — mostly wiring +
a test.

c) First @ValidatesRunner test

Wire the runner into Beam's ValidatesRunner gradle config (a validatesRunner
configuration + createValidatesRunnerTask + includeCategories
'org.apache.beam.sdk.testing.ValidatesRunner', modelled on Flink's
flink_runner.gradle), and get the first test green — Create -> ParDo / GroupByKey
-> PAssert, on a bounded/batch pipeline. Use a sickbay list to exclude categories
we do not support yet (windowing, timers, state, streaming), so the suite is
green from day one and grows as transforms land.

Why this matters

@ValidatesRunner turns "does it work?" from a manual judgment into a passing
suite. Once it's green, every transform added afterwards is automatically checked
against the Beam model.

Out of scope

  • A full @ValidatesRunner pass — most categories need windowing / timers / state
    and are sickbayed for now.
  • Streaming @ValidatesRunner — needs the real (multi-partition, event-time)
    watermark propagation, tracked separately.

Testing

The first @ValidatesRunner test green + ./gradlew :runners:kafka-streams:check.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions