Skip to content

feat(property-chunking): Allow the definition of query_properties on the HttpRequester component#638

Merged
Brian Lai (brianjlai) merged 1 commit into
mainfrom
brian/query_properties_in_http_requester
Jul 7, 2025
Merged

feat(property-chunking): Allow the definition of query_properties on the HttpRequester component#638
Brian Lai (brianjlai) merged 1 commit into
mainfrom
brian/query_properties_in_http_requester

Conversation

@brianjlai
Copy link
Copy Markdown
Contributor

@brianjlai Brian Lai (brianjlai) commented Jul 3, 2025

What

While migrating the last source-instagram stream UserInsights to low-code, I found a gap in the CDK functionality in how we chunk properties to make multiple requests and merge the records back together.

The current implementation only allows us to put all query properties under a single field in the request_properties. However, for Instagram, we need to load certain query properties into one parameter field and other properties to another. For example:

METRICS_BY_PERIOD = {
        "day": [
            "follower_count",
            "reach",
        ],
        "week": ["reach"],
        "days_28": ["reach"],
        "lifetime": ["online_followers"],
    }

We want to create 4 chunks where query params are:
?period=day&metric=follower_count,reach
?period=week&metric=reach
?period=days_28&metric=reach
?period=lifetime&metric=online_followers

And we still want to be able to merge the record back together for all the chunks based on a target merge key.

How

In reality how we do this quite easy, but relied mostly on an oversight in how I had previously designed the fetch_properties_from_endpoint improvement from a few months ago. I had originally envisioned that we would not really need to do property chunking here and if it was needed, it would be handled from the definition in request_properties. However, we're now seeing a case where we want to load parameters into different fields.

By allowing for the QueryProperties which supports property chunking to be defined the HttpRequester, we can perform chunking and choose where they go.

I've marked fetch_properties_from_endpoint as deprecated and would ideally want to do an under the table deprecation of the old field. Only source-hubspot uses this and we can swap this out. Once we do, we can silently deprecate.

Summary by CodeRabbit

  • New Features

    • Introduced a new property for HTTP requesters to explicitly specify query properties for APIs that require them.
  • Deprecation

    • Marked the existing property for fetching properties from endpoints as deprecated, with guidance to use the new query properties feature.
  • Bug Fixes

    • Updated tests to use the new query properties structure, ensuring compatibility and correctness.

@github-actions github-actions Bot added the enhancement New feature or request label Jul 3, 2025
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 3, 2025

PyTest Results (Fast)

3 685 tests  ±0   3 674 ✅ ±0   6m 19s ⏱️ +4s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 77c39d2. ± Comparison against base commit a562875.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jul 3, 2025

PyTest Results (Full)

3 688 tests  ±0   3 677 ✅ ±0   17m 53s ⏱️ -29s
    1 suites ±0      11 💤 ±0 
    1 files   ±0       0 ❌ ±0 

Results for commit 77c39d2. ± Comparison against base commit a562875.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Jul 3, 2025

📝 Walkthrough

Walkthrough

This change deprecates the fetch_properties_from_endpoint property in the HttpRequester schema and model, introducing a new query_properties property instead. The parsing logic and related tests are updated to support query_properties while maintaining backward compatibility with the deprecated property.

Changes

File(s) Change Summary
airbyte_cdk/sources/declarative/declarative_component_schema.yaml Marked fetch_properties_from_endpoint as deprecated; added new query_properties property to HttpRequester.
airbyte_cdk/sources/declarative/models/declarative_component_schema.py Deprecated fetch_properties_from_endpoint field in HttpRequester; added new query_properties field.
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py Updated parsing logic to support query_properties in addition to deprecated fetch_properties_from_endpoint.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py Updated test to use query_properties instead of fetch_properties_from_endpoint in test manifest.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant ModelFactory
    participant HttpRequester

    User->>ModelFactory: Provide manifest with query_properties or fetch_properties_from_endpoint
    ModelFactory->>HttpRequester: Initialize with query_properties (preferred) or fetch_properties_from_endpoint (deprecated)
    HttpRequester-->>ModelFactory: Returns configured HttpRequester instance
Loading

Possibly related PRs

Suggested reviewers

  • darynaishchenko
  • maxi297

Would you like to add a migration note or documentation update to help users transition from fetch_properties_from_endpoint to query_properties, wdyt?

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Pylint (3.3.7)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (1)

3223-3223: Consider enhancing the deprecation comment with timeline details?

The TODO comment is helpful for tracking the deprecation path, but could you add more context about when this deprecation might happen? Something like "once source-hubspot migrates" or a target timeline might help future maintainers prioritize this cleanup. wdyt?

-            # todo: Deprecate this condition once dependent connectors migrate to query_properties
+            # TODO: Deprecate this condition once dependent connectors (e.g., source-hubspot) migrate to query_properties
airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

2046-2047: Add deprecation-context metadata?

Nice to see the field formally deprecated. Would it be useful to also capture when it was deprecated (e.g. x-airbyte-deprecation-version or similar), so that downstream tooling can decide when to remove it entirely, wdyt?


2051-2054: Clarify & exemplify the new query_properties property

Love the addition 🎉. Two small thoughts:

  1. The description mentions stream_partition.extra_fields, but doesn’t highlight how the properties are injected (header, query-param, etc.). Would an explicit pointer to request_option usage (or a short inline example) help future users?
  2. Since request_parameters already accepts an inline QueryProperties object, having both a dedicated top-level field and the inline option may confuse manifest authors. Perhaps the docs could hint at the recommended approach (top-level vs inline) to avoid ambiguity, wdyt?
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a562875 and 77c39d2.

📒 Files selected for processing (4)
  • airbyte_cdk/sources/declarative/declarative_component_schema.yaml (1 hunks)
  • airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1 hunks)
  • airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2 hunks)
  • unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: ChristoGrab
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the `YamlDeclarativeSource` class in `airbyte_cdk/sources/declarative/yaml_declarative_source.py`, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (3)

undefined

<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>

<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py:21-29
Timestamp: 2025-01-13T23:39:15.457Z
Learning: The CustomPageIncrement class in unit_tests/source_declarative_manifest/resources/source_the_guardian_api/components.py is imported from another connector definition and should not be modified in this context.
</retrieved_learning>

<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>

airbyte_cdk/sources/declarative/declarative_component_schema.yaml (2)

undefined

<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>

<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>

airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

undefined

<retrieved_learning>
Learnt from: aaronsteers
PR: #174
File: airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py:1093-1102
Timestamp: 2025-01-14T00:20:32.310Z
Learning: In the airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py file, the strict module name checks in _get_class_from_fully_qualified_class_name (requiring module_name to be "components" and module_name_full to be "source_declarative_manifest.components") are intentionally designed to provide early, clear feedback when class declarations won't be found later in execution. These restrictions may be loosened in the future if the requirements for class definition locations change.
</retrieved_learning>

<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (3)

undefined

<retrieved_learning>
Learnt from: ChristoGrab
PR: #58
File: airbyte_cdk/sources/declarative/yaml_declarative_source.py:0-0
Timestamp: 2024-11-18T23:40:06.391Z
Learning: When modifying the YamlDeclarativeSource class in airbyte_cdk/sources/declarative/yaml_declarative_source.py, avoid introducing breaking changes like altering method signatures within the scope of unrelated PRs. Such changes should be addressed separately to minimize impact on existing implementations.
</retrieved_learning>

<retrieved_learning>
Learnt from: pnilan
PR: airbytehq/airbyte-python-cdk#0
File: :0-0
Timestamp: 2024-12-11T16:34:46.319Z
Learning: In the airbytehq/airbyte-python-cdk repository, the declarative_component_schema.py file is auto-generated from declarative_component_schema.yaml and should be ignored in the recommended reviewing order.
</retrieved_learning>

<retrieved_learning>
Learnt from: aaronsteers
PR: #58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in airbyte_cdk/cli/source_declarative_manifest/, including _run.py, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
</retrieved_learning>

🧬 Code Graph Analysis (1)
airbyte_cdk/sources/declarative/models/declarative_component_schema.py (1)
airbyte_cdk/sources/declarative/requesters/query_properties/query_properties.py (1)
  • QueryProperties (14-48)
🔇 Additional comments (6)
airbyte_cdk/sources/declarative/parsers/model_to_component_factory.py (2)

3235-3239: Great implementation of the new query_properties path!

The new code handling model.requester.query_properties follows the same pattern as the existing logic and maintains consistency by calling the same self.create_query_properties() method. This looks clean and should work well with the new API requirements you mentioned in the PR description.


3219-3239: The elif chain logic handles the migration gracefully

The overall flow here is well-structured:

  1. Check for query properties in request_parameters (with validation)
  2. Fall back to deprecated fetch_properties_from_endpoint (with clear deprecation marking)
  3. Handle the new query_properties attribute

This approach ensures backward compatibility while providing a clear migration path. The error handling in the first condition also prevents conflicting configurations, which is good defensive programming.

airbyte_cdk/sources/declarative/models/declarative_component_schema.py (2)

2544-2550: LGTM! Proper deprecation implementation.

The deprecation of fetch_properties_from_endpoint follows best practices with clear migration guidance. The field remains available for backward compatibility while directing users to the new query_properties field. The deprecation message is concise and actionable - nice work!

Just curious though - since this file is auto-generated from declarative_component_schema.yaml, have you verified that the source YAML file also has the same quality deprecation metadata, wdyt?


2551-2555: Well-implemented new field addition!

The query_properties field is properly typed and has excellent documentation that clearly explains its purpose for APIs requiring explicit property specification. The description effectively communicates the chunking capabilities and integration with stream_partition.extra_fields.

From the relevant code snippets, the QueryProperties class looks robust with good chunking and merging functionality. The field name is also intuitive and descriptive - this should provide a clean replacement for the deprecated field, wdyt?

unit_tests/sources/declarative/parsers/test_model_to_component_factory.py (2)

4407-4423: LGTM! The schema update properly reflects the new nested structure.

The change from the flat fetch_properties_from_endpoint to the nested query_properties.property_list structure looks good and aligns well with the PR's goal of enabling property chunking. The new structure provides better organization and extensibility.

One quick question though - does this change maintain backward compatibility for existing manifests that might still use the old fetch_properties_from_endpoint field, or are we expecting users to migrate their manifests as part of this deprecation? Just want to make sure the migration path is clear, wdyt?


4453-4472: Great test coverage for the new schema structure!

The test assertions properly validate the new nested structure:

  • Correctly checks that additional_query_properties is a QueryProperties instance
  • Validates that property_list contains the expected PropertiesFromEndpoint
  • Ensures the nested retriever and requester are properly constructed

This gives good confidence that the schema transformation is working as intended. The test nicely demonstrates how the new structure preserves all the original functionality while enabling the enhanced property chunking capabilities.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

APPROVED

@brianjlai Brian Lai (brianjlai) merged commit 88f8256 into main Jul 7, 2025
30 of 31 checks passed
@brianjlai Brian Lai (brianjlai) deleted the brian/query_properties_in_http_requester branch July 7, 2025 21:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants