fix: Async Retriever change url path for download retriever#192
Conversation
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Async Retriever change url path for download retriever
Signed-off-by: Artem Inzhyyants <artem.inzhyyants@gmail.com>
Async Retriever change url path for download retrieverAsync Retriever change url path for download retriever
📝 WalkthroughWalkthroughThis pull request introduces modifications across multiple files in the Airbyte CDK, focusing on enhancing record processing and stream slice handling. The changes primarily involve updating the Changes
Sequence DiagramsequenceDiagram
participant Factory as ModelToComponentFactory
participant Retriever as AsyncRetriever
participant Selector as RecordSelector
Factory->>Retriever: create_async_retriever()
Retriever->>Selector: Initialize with transformations
Selector-->>Retriever: Configured selector
Possibly related PRs
Suggested labels
Suggested reviewers
Hey there! 👋 I noticed these changes look quite interesting. Would you like me to elaborate on any specific aspect of the modifications? Wdyt about the sequence of changes? 🤔 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
airbyte_cdk/sources/types.py (1)
156-157: Should we consider extending the docstring for clarity?
Currently, the method is straightforward, but it might help future readers if we explained that aStreamSliceis considered truthy whenever its main or extra fields are non-empty. wdyt?unit_tests/sources/declarative/requesters/test_http_job_repository.py (1)
87-87: Could we safeguard against missing 'url' inextra_fields?
When referencing{{stream_slice.extra_fields['url']}}, a KeyError could occur if'url'is absent. Would it make sense to provide a default or fail gracefully? wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)airbyte_cdk/sources/declarative/requesters/http_job_repository.py(1 hunks)airbyte_cdk/sources/types.py(1 hunks)unit_tests/sources/declarative/requesters/test_http_job_repository.py(1 hunks)
🔇 Additional comments (2)
airbyte_cdk/sources/declarative/requesters/http_job_repository.py (1)
192-197: Any concerns about overwriting an existing 'url' inextra_fields?
When mergingextra_fieldswith{"url": url}, the new key unconditionally overrides. If theextra_fieldsdictionary already contained aurlentry, it would be lost. Is this desired, or should we handle it differently? wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2257-2257: Are transformations intended for all download operations?
We are passingtransformations=transformationsinto theRecordSelectorfor the download retriever. Should we allow users to configure a distinct transformations list exclusively for downloads? wdyt?
📝 WalkthroughWalkthroughThis pull request introduces modifications across multiple files in the Airbyte CDK, focusing on enhancing record processing and stream slice handling. The changes primarily affect the Changes
Sequence DiagramsequenceDiagram
participant Factory as ModelToComponentFactory
participant Selector as RecordSelector
participant Job as AsyncHttpJobRepository
participant Slice as StreamSlice
Factory->>Selector: Create with transformations
Job->>Slice: Construct with job parameters
Slice-->>Job: Provide context for record fetching
Possibly related PRs
Suggested labels
Suggested reviewers
Hey there! 👋 I noticed these changes look quite interesting. Would you like me to elaborate on any specific aspect of the modifications? The transformation handling and stream slice updates seem particularly intriguing. Wdyt? 🤔 Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/types.py (1)
155-157: Would you consider expanding the docstring to explain the new boolean evaluation?Currently,
__bool__returns true if either the main slice orextra_fieldsis non-empty. It might be helpful to clarify this in the docstring or comment, so future maintainers understand why it’s deemed “truthy” if either portion is present. wdyt?airbyte_cdk/sources/declarative/requesters/http_job_repository.py (1)
192-197: Would you consider verifying whether the “url” key already exists injob_slice.extra_fields?Merging
"url"intoextra_fieldsmight accidentally overwrite a name collision. Checking this in advance or documenting the assumption could prevent unexpected behavior. wdyt?airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
2257-2257: Any interest in logging or clarifying transformations usage here?We’re now passing
transformationsto theSimpleRetriever’sRecordSelector. It might be good to describe in code comments how these transformations are applied when reading job results. wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(1 hunks)airbyte_cdk/sources/declarative/requesters/http_job_repository.py(1 hunks)airbyte_cdk/sources/types.py(1 hunks)unit_tests/sources/declarative/requesters/test_http_job_repository.py(1 hunks)
🔇 Additional comments (1)
unit_tests/sources/declarative/requesters/test_http_job_repository.py (1)
87-87: Could we handle missing “url” more gracefully?In
path="{{stream_slice.extra_fields['url']}}", a KeyError could arise if'url'is absent fromextra_fields. Perhaps we could add a default or an assertion? wdyt?
What
urlasextra_fieldto ignore it in state managertransformationsto download retrieverCaution
changing url path in
stream_slicefor download retriever is technically a breaking change, but I don't want to bump major version sinceAsyncRetrieveris anExperimentalClassReason
see #192 (comment)
Summary by CodeRabbit
Release Notes
New Features
StreamSliceclass with boolean evaluation supportImprovements
The changes introduce more dynamic and flexible data processing capabilities within the Airbyte CDK, allowing for more nuanced record transformations and stream handling.