fix(airbyte-cdk): Fix RequestOptionsProvider for PerPartitionWithGlobalCursor#254
Conversation
📝 WalkthroughWalkthroughThe pull request modifies the Changes
Possibly related PRs
Suggested reviewers
Hey there! 👋 I noticed the changes look like they're streamlining the stream slicer type checking. Curious about the motivation behind removing the 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (4)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2377-2377: Consider adding a comment explaining the condition change.The condition was simplified to only check for
DatetimeBasedCursor. Would it be helpful to add a comment explaining whyPerPartitionWithGlobalCursorcheck was removed and how it affects therequest_options_providerassignment? This would help future maintainers understand the reasoning behind this change, wdyt?+ # We only need to check for DatetimeBasedCursor as it has its own request_options_provider. + # All other stream slicers, including PerPartitionWithGlobalCursor, can act as request_options_provider. if ( not isinstance(stream_slicer, DatetimeBasedCursor) or type(stream_slicer) is not DatetimeBasedCursor ):
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
2377-2379: 🛠️ Refactor suggestionConsider simplifying the type check condition.
The current condition uses both
isinstanceandtypechecks with anoroperator. This might be redundant and potentially confusing. Would it be clearer to use justisinstance? The currentorcondition means the block executes if either check passes, which might not be the intended behavior, wdyt?- if ( - not isinstance(stream_slicer, DatetimeBasedCursor) - or type(stream_slicer) is not DatetimeBasedCursor - ): + if not isinstance(stream_slicer, DatetimeBasedCursor):Likely invalid or redundant comment.
Line range hint
2380-2385: Consider documenting the breaking change.The comment indicates that custom components implementing
DatetimeBasedCursorwill need to implement their ownRequestOptionsProvider. This is a breaking change that might affect users. Would it be helpful to:
- Add a migration guide or documentation?
- Expand the comment to provide an example of how to implement a custom
RequestOptionsProvider?- Add a deprecation warning for the old behavior?
Let's check for any custom components that might be affected:
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (2)
356-362: Consider adding error handling for partition key validation?The new
create_cursor_for_partitionmethod nicely centralizes the cursor creation logic! However, should we add validation to ensure the partition key is not empty or None? This could prevent potential issues down the line, wdyt? 🤔def create_cursor_for_partition(self, partition_key: str) -> None: + if not partition_key: + raise ValueError("Partition key cannot be empty") partition_state = ( self._state_to_migrate_from if self._state_to_migrate_from else self._NO_CURSOR_STATE ) cursor = self._create_cursor(partition_state) self._cursor_per_partition[partition_key] = cursor
225-226: Extract common partition check into a helper method?I notice we have the same partition check in all request-related methods. What do you think about extracting this into a helper method to make the code more DRY? Something like this perhaps? 🎯
+ def _ensure_partition_cursor_exists(self, partition: Mapping[str, Any]) -> None: + partition_key = self._to_partition_key(partition) + if partition_key not in self._cursor_per_partition: + self.create_cursor_for_partition(partition_key) def get_request_params(...): if stream_slice: - if self._to_partition_key(stream_slice.partition) not in self._cursor_per_partition: - self.create_cursor_for_partition(self._to_partition_key(stream_slice.partition)) + self._ensure_partition_cursor_exists(stream_slice.partition)Also applies to: 249-250, 273-274, 297-298
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py(5 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Analyze (python)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)
352-362: Great refactoring! 🎉The centralization of cursor creation logic and consistent handling across all methods makes the code more maintainable and robust. The changes look good to me!
Maxime Carbonneau-Leclerc (maxi297)
left a comment
There was a problem hiding this comment.
Pre-emptively approve
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (4)
unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py (2)
2329-2602: Consider refactoring manifest definitions to reduce duplicationThe
SUBSTREAM_REQUEST_OPTIONS_MANIFESTshares many similarities with the existingSUBSTREAM_MANIFEST. Would it be helpful to refactor common components into shared definitions or functions to avoid duplication and improve maintainability? Wdyt?
2605-2875: Enhance test coverage by verifying request parametersIn the
test_incremental_substream_request_options_providerfunction, we're verifying records and state transitions. Should we also include assertions to confirm that thepost_idandcomment_idparameters are correctly passed in the requests? This could help ensure that theRequestOptionsProvideris functioning as expected. Wdyt?airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (2)
225-226: Refactor repeated cursor initialization into a helper methodIn the methods
get_request_params,get_request_headers,get_request_body_data, andget_request_body_json, there's repeated logic for checking and creating cursors for partitions. Would it be beneficial to extract this logic into a helper method to reduce duplication and enhance code clarity? Wdyt?Also applies to: 249-250, 273-274, 297-298
356-380: Consider adding unit tests for_create_cursor_for_partitionSince
_create_cursor_for_partitionplays a crucial role in initializing cursors, would adding unit tests for this method help ensure its correctness and prevent future regressions? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py(5 hunks)unit_tests/sources/declarative/incremental/test_concurrent_perpartitioncursor.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: Analyze (python)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
🔇 Additional comments (1)
airbyte_cdk/sources/declarative/incremental/per_partition_cursor.py (1)
356-380: Add a comment explaining the temporary nature of_create_cursor_for_partitionThe
_create_cursor_for_partitionmethod is introduced to dynamically create cursors for partitions. Could we add a comment to explain why this method is necessary and mention that it's a temporary workaround until the cursor is decoupled from the concurrent cursor implementation? This would help future maintainers understand its purpose. Wdyt?
Summary by CodeRabbit
New Features
Refactor
Tests