Conversation
77cf530 to
76f267f
Compare
rodesai
approved these changes
Sep 11, 2024
Contributor
rodesai
left a comment
There was a problem hiding this comment.
Mostly LGTM, one structural comment inline.
| public Integer metadataTablePartition(final int kafkaPartition) { | ||
| return kafkaPartition; | ||
| } | ||
| boolean belongs(final Bytes key, final int kafkaPartition); |
Contributor
There was a problem hiding this comment.
seems cleaner to just have a method that returns the partition for a key, or the number of partitions so then we don't have a bunch of impls that throw unsupportedoperationexception than to have those impls understand the context of why it's being called and throw accordingly.
Contributor
Author
There was a problem hiding this comment.
discussed offline, the exceptions are less about the API and more a shortcut in order to avoid needing to pass in the numPartitions to all the other implementations. eventually it would make sense to implement this method in those as well.
rodesai
added a commit
that referenced
this pull request
Sep 15, 2024
This reverts commit 8e9b0d8 which removed the partition from the mongo value schema and used client-side filtering that computed the partition from the mongo key. It turns out that this approach won't actually work because the mongo key may not be the original record key that was used to compute the changelog partition. We therefore cannot use the mongo key to compute the partition.: - some dsl operators include a timestamp in the key - a user writing their own PAPI processor is free to construct their own key, which we cannot predict.
rodesai
added a commit
that referenced
this pull request
Sep 20, 2024
This reverts commit 8e9b0d8 which removed the partition from the mongo value schema and used client-side filtering that computed the partition from the mongo key. It turns out that this approach won't actually work because the mongo key may not be the original record key that was used to compute the changelog partition. We therefore cannot use the mongo key to compute the partition.: - some dsl operators include a timestamp in the key - a user writing their own PAPI processor is free to construct their own key, which we cannot predict.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Turns out this was a bit more of a refactoring nightmare than I wanted it to be! Strategy was to pass in alongside the
TablePartitionera method that allows us to determine whether or not a key belongs to a Kafka partition. We can consider refactoring that to return a specific partition instead of whether or not a key "belongs" in a given one, but this makes it simpler to bypass the method in cases where it doesn't need to be implemented.90% of this PR is just adding generics all over the place to somewhere that were necessary to get the
TablePartitionerpiped into the tables instead of being created in the flush manager