KAFKA-8410: KTableProcessor migration groundwork by vvcephei · Pull Request #10744 · apache/kafka

vvcephei · 2021-05-21T21:04:51Z

Lay the groundwork for migrating KTable Processors to the new PAPI.
Migrate the KTableFilter processor to prove that the groundwork works.

This is an effort to help break up #10507
into multiple PRs.

Committer Checklist (excluded from commit message)

Verify design and implementation
Verify test coverage and CI build status
Verify documentation (including upgrade notes)

vvcephei · 2021-05-21T21:05:43Z

Hey, @jeqo , can you give this a good, hard look, since you've built up so much context on #10507?

vvcephei · 2021-05-21T21:07:06Z

-class KTableFilter<K, V> implements KTableProcessorSupplier<K, V, V> {
-    private final KTableImpl<K, ?, V> parent;
-    private final Predicate<? super K, ? super V> predicate;
+class KTableFilter<KIn, VIn> implements KTableNewProcessorSupplier<KIn, VIn, KIn, VIn> {


This is the only processor we migrate here. The point is to use this processor to make sure that the groundwork in the rest of these changes is sufficient.

vvcephei · 2021-05-21T21:08:58Z

+        public void init(final org.apache.kafka.streams.processor.ProcessorContext context) {
+            // This is the old processor context for compatibility with the other KTable processors.
+            // Once we migrte them all, we can swap this out.


This particular interface was too much trouble to migrate now, and it's not terribly significant, since the value getter never forwards.

vvcephei · 2021-05-21T21:09:37Z

+    // Temporarily setting the processorSupplier to type Object so that we can transition from the
+    // old ProcessorSupplier to the new api.ProcessorSupplier. This works because all accesses to
+    // this field are guarded by typechecks anyway.
+    private final Object processorSupplier;


Calling this out as well. Hopefully the comment itself is clear.

vvcephei · 2021-05-21T21:10:17Z

+        } else if (processorSupplier instanceof KTableNewProcessorSupplier) {
+            return ((KTableNewProcessorSupplier<?, ?, K, V>) processorSupplier).view();


We have to add a new typecheck for the new supplier type.

vvcephei · 2021-05-21T21:10:22Z

+            } else if (processorSupplier instanceof KTableNewProcessorSupplier) {
+                final KTableNewProcessorSupplier<?, ?, ?, ?> tableProcessorSupplier =
+                    (KTableNewProcessorSupplier<?, ?, ?, ?>) processorSupplier;
+                if (!tableProcessorSupplier.enableSendingOldValues(forceMaterialization)) {
+                    return false;
+                }


We have to add a new typecheck for the new supplier type.

vvcephei · 2021-05-21T21:25:08Z

 import static org.apache.kafka.streams.kstream.internals.WrappingNullableUtils.prepareValueSerializer;

-public class SinkNode<KIn, VIn, KOut, VOut> extends ProcessorNode<KIn, VIn, KOut, VOut> {
+public class SinkNode<KIn, VIn> extends ProcessorNode<KIn, VIn, Void, Void> {


Here's where we declare the sink node cannot forward and hence only needs input parameters.

That brought up a good question, as whether we need to override getChild and getChildren in SinkNode to throw as it should never be called?

We already throw an exception if you try and add a child. I think it would complicate any of our processor graph traversal algorithms if we made it illegal to even call getChildren, as they would have to type-check the nodes before traversing.

vvcephei · 2021-05-21T21:25:33Z

 import static org.apache.kafka.streams.kstream.internals.WrappingNullableUtils.prepareValueDeserializer;

-public class SourceNode<KIn, VIn, KOut, VOut> extends ProcessorNode<KIn, VIn, KOut, VOut> {
+public class SourceNode<KIn, VIn> extends ProcessorNode<KIn, VIn, KIn, VIn> {


Here, we declare that the source node can only forward the same type it receives.

vvcephei · 2021-05-21T21:26:19Z

-                    rawOldValue != null ? serdes.valueFrom(rawOldValue) : null,
-                    timestamp
-                ),
+                new CacheFlushListener<byte[], byte[]>() {


Since the interface has two methods, it can't be a lambda anymore.

vvcephei · 2021-05-21T21:27:42Z

        replay(context);

-        new TimestampedCacheFlushListener<>(context).apply(
+        new TimestampedCacheFlushListener<>((ProcessorContext<String, Change<String>>) context).apply(


Lines like this are because we have to cast to differentiate between the two constructors. Since the context is an IPC, it actually implements both interfaces, and it doesn't matter which one we cast to.

vvcephei · 2021-05-21T21:28:43Z

-        final InternalMockProcessorContext context = new InternalMockProcessorContext(streamsMetrics);
-        final ProcessorNode<Object, Object, ?, ?> node = new ProcessorNode<>("name", new NoOpProcessor(), Collections.<String>emptySet());
+        final InternalMockProcessorContext<Object, Object> context = new InternalMockProcessorContext<>(streamsMetrics);
+        final ProcessorNode<Object, Object, Object, Object> node = new ProcessorNode<>("name", new NoOpProcessor(), Collections.<String>emptySet());


For complicated java-type-system reasons, I had to switch from wildcards to Object in some places.

vvcephei · 2021-05-24T15:23:50Z

Filed ticket for Connect test: https://issues.apache.org/jira/browse/KAFKA-12842
Commented on ticket for Raft test: https://issues.apache.org/jira/browse/KAFKA-12629

abbccdda · 2021-05-27T16:06:31Z

    private static final String SINK_NAME = "KTABLE-SINK-";

-    private final ProcessorSupplier<?, ?> processorSupplier;
+    // Temporarily setting the processorSupplier to type Object so that we can transition from the


s/transition/transit

abbccdda · 2021-05-27T16:14:56Z

+
+import org.apache.kafka.streams.processor.api.ProcessorSupplier;
+
+public interface KTableNewProcessorSupplier<KIn, VIn, KOut, VOut> extends ProcessorSupplier<KIn, Change<VIn>, KOut, Change<VOut>> {


Could we add TODO to the old interface to easily remind the potential removal work? Or we already have tickets to do it?

abbccdda · 2021-05-27T16:16:38Z

+                      final VOut oldValue,
                      final long timestamp) {
-        final ProcessorNode prev = context.currentNode();
+        @SuppressWarnings("rawtypes") final ProcessorNode prev = context.currentNode();


Why do we put suppression inline, instead of putting it on the top of function?

abbccdda · 2021-05-27T16:22:47Z


+    @Override
+    public String toString() {
+        return "To{" +


nit: could we do a string format for this to read easier?

I just didn't bother because there's no place it would actually be printed unless a test is failing. We can give more thought to the string format later on as needed.

abbccdda · 2021-05-27T16:26:10Z

 import static org.apache.kafka.streams.kstream.internals.WrappingNullableUtils.prepareValueSerializer;

-public class SinkNode<KIn, VIn, KOut, VOut> extends ProcessorNode<KIn, VIn, KOut, VOut> {
+public class SinkNode<KIn, VIn> extends ProcessorNode<KIn, VIn, Void, Void> {


That brought up a good question, as whether we need to override getChild and getChildren in SinkNode to throw as it should never be called?

abbccdda · 2021-05-27T16:28:05Z

+    /**
+     * Called when records are flushed from the {@link ThreadCache}
+     */
+    void apply(final Record<K, Change<V>> record);


Do we want to to deprecate the old apply method and use the new one? If so, could we rename one of them to applyOld or applyNew to differentiate?

Additionally, we want a parameter signature for record

I don't want to deprecate it right now, but as with the rest of the "compatibility mode" changes, the old member should become unused by the time @jeqo is done and we can remove it at that time.

…ls/KTableNewProcessorSupplier.java

John Roesler added 5 commits May 21, 2021 13:19

wip

8707d5c

builds

ddd0f5d

fix processorSupplier

06640c7

fixing tests

8264273

fix tests

b4f09ab

vvcephei added the streams label May 21, 2021

vvcephei commented May 21, 2021

View reviewed changes

vvcephei self-assigned this May 25, 2021

abbccdda approved these changes May 27, 2021

View reviewed changes

Update streams/src/main/java/org/apache/kafka/streams/kstream/interna…

599f185

…ls/KTableNewProcessorSupplier.java

vvcephei merged commit f207bac into trunk May 28, 2021

vvcephei deleted the poc-478-ktable-1 branch May 28, 2021 19:59

vvcephei mentioned this pull request May 28, 2021

KAFKA-8410: Migrating stateful operators to new Processor API #10507

Closed

3 tasks

		} else if (processorSupplier instanceof KTableNewProcessorSupplier) {
		return ((KTableNewProcessorSupplier<?, ?, K, V>) processorSupplier).view();


		import org.apache.kafka.streams.processor.api.ProcessorSupplier;

		public interface KTableNewProcessorSupplier<KIn, VIn, KOut, VOut> extends ProcessorSupplier<KIn, Change<VIn>, KOut, Change<VOut>> {

Conversation

vvcephei commented May 21, 2021

Committer Checklist (excluded from commit message)

Uh oh!

vvcephei commented May 21, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vvcephei commented May 24, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants