feat(property-chunking): Allow the definition of query_properties on the HttpRequester component#638
Conversation
…nt for more flexible property chunking
📝 WalkthroughWalkthroughThis change deprecates the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant ModelFactory
participant HttpRequester
User->>ModelFactory: Provide manifest with query_properties or fetch_properties_from_endpoint
ModelFactory->>HttpRequester: Initialize with query_properties (preferred) or fetch_properties_from_endpoint (deprecated)
HttpRequester-->>ModelFactory: Returns configured HttpRequester instance
Possibly related PRs
Suggested reviewers
Would you like to add a migration note or documentation update to help users transition from Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Pylint (3.3.7)airbyte_cdk/sources/declarative/models/declarative_component_schema.pyunit_tests/sources/declarative/parsers/test_model_to_component_factory.pyairbyte_cdk/sources/declarative/parsers/model_to_component_factory.py✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)
3223-3223: Consider enhancing the deprecation comment with timeline details?The TODO comment is helpful for tracking the deprecation path, but could you add more context about when this deprecation might happen? Something like "once source-hubspot migrates" or a target timeline might help future maintainers prioritize this cleanup. wdyt?
- # todo: Deprecate this condition once dependent connectors migrate to query_properties + # TODO: Deprecate this condition once dependent connectors (e.g., source-hubspot) migrate to query_propertiesairbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
2046-2047: Add deprecation-context metadata?Nice to see the field formally deprecated. Would it be useful to also capture when it was deprecated (e.g.
x-airbyte-deprecation-versionor similar), so that downstream tooling can decide when to remove it entirely, wdyt?
2051-2054: Clarify & exemplify the newquery_propertiespropertyLove the addition 🎉. Two small thoughts:
- The description mentions
stream_partition.extra_fields, but doesn’t highlight how the properties are injected (header, query-param, etc.). Would an explicit pointer torequest_optionusage (or a short inline example) help future users?- Since
request_parametersalready accepts an inlineQueryPropertiesobject, having both a dedicated top-level field and the inline option may confuse manifest authors. Perhaps the docs could hint at the recommended approach (top-level vs inline) to avoid ambiguity, wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
airbyte_cdk/sources/declarative/declarative_component_schema.yaml(1 hunks)airbyte_cdk/sources/declarative/models/declarative_component_schema.py(1 hunks)airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py(2 hunks)unit_tests/sources/declarative/parsers/test_model_to_component_factory.py(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)
undefined
<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>
<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
</retrieved_learning>
<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)
undefined
<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>
<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
undefined
<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>
<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)
undefined
<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>
<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>
<retrieved_learning>
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
</retrieved_learning>
🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
QueryProperties(14-48)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)
3235-3239: Great implementation of the new query_properties path!The new code handling
model.requester.query_propertiesfollows the same pattern as the existing logic and maintains consistency by calling the sameself.create_query_properties()method. This looks clean and should work well with the new API requirements you mentioned in the PR description.
3219-3239: The elif chain logic handles the migration gracefullyThe overall flow here is well-structured:
- Check for query properties in request_parameters (with validation)
- Fall back to deprecated
fetch_properties_from_endpoint(with clear deprecation marking)- Handle the new
query_propertiesattributeThis approach ensures backward compatibility while providing a clear migration path. The error handling in the first condition also prevents conflicting configurations, which is good defensive programming.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)
2544-2550: LGTM! Proper deprecation implementation.The deprecation of
fetch_properties_from_endpointfollows best practices with clear migration guidance. The field remains available for backward compatibility while directing users to the newquery_propertiesfield. The deprecation message is concise and actionable - nice work!Just curious though - since this file is auto-generated from
declarative_component_schema.yaml, have you verified that the source YAML file also has the same quality deprecation metadata, wdyt?
2551-2555: Well-implemented new field addition!The
query_propertiesfield is properly typed and has excellent documentation that clearly explains its purpose for APIs requiring explicit property specification. The description effectively communicates the chunking capabilities and integration withstream_partition.extra_fields.From the relevant code snippets, the
QueryPropertiesclass looks robust with good chunking and merging functionality. The field name is also intuitive and descriptive - this should provide a clean replacement for the deprecated field, wdyt?unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)
4407-4423: LGTM! The schema update properly reflects the new nested structure.The change from the flat
fetch_properties_from_endpointto the nestedquery_properties.property_liststructure looks good and aligns well with the PR's goal of enabling property chunking. The new structure provides better organization and extensibility.One quick question though - does this change maintain backward compatibility for existing manifests that might still use the old
fetch_properties_from_endpointfield, or are we expecting users to migrate their manifests as part of this deprecation? Just want to make sure the migration path is clear, wdyt?
4453-4472: Great test coverage for the new schema structure!The test assertions properly validate the new nested structure:
- Correctly checks that
additional_query_propertiesis aQueryPropertiesinstance- Validates that
property_listcontains the expectedPropertiesFromEndpoint- Ensures the nested retriever and requester are properly constructed
This gives good confidence that the schema transformation is working as intended. The test nicely demonstrates how the new structure preserves all the original functionality while enabling the enhanced property chunking capabilities.
Aldo Gonzalez (aldogonzalez8)
left a comment
There was a problem hiding this comment.
APPROVED
What
While migrating the last
source-instagramstreamUserInsightsto low-code, I found a gap in the CDK functionality in how we chunk properties to make multiple requests and merge the records back together.The current implementation only allows us to put all query properties under a single field in the
request_properties. However, for Instagram, we need to load certain query properties into one parameter field and other properties to another. For example:And we still want to be able to merge the record back together for all the chunks based on a target merge key.
How
In reality how we do this quite easy, but relied mostly on an oversight in how I had previously designed the
fetch_properties_from_endpointimprovement from a few months ago. I had originally envisioned that we would not really need to do property chunking here and if it was needed, it would be handled from the definition inrequest_properties. However, we're now seeing a case where we want to load parameters into different fields.By allowing for the
QueryPropertieswhich supports property chunking to be defined the HttpRequester, we can perform chunking and choose where they go.I've marked
fetch_properties_from_endpointas deprecated and would ideally want to do an under the table deprecation of the old field. Onlysource-hubspotuses this and we can swap this out. Once we do, we can silently deprecate.Summary by CodeRabbit
New Features
Deprecation
Bug Fixes