Skip to content

feat: Remove long standing connections on sql state and catalog backends#676

Merged
Aaron ("AJ") Steers (aaronsteers) merged 2 commits into
airbytehq:mainfrom
benjaminwilen:remove_long_standing_connections_on_sql_state_and_catalog_backends
May 26, 2025
Merged

feat: Remove long standing connections on sql state and catalog backends#676
Aaron ("AJ") Steers (aaronsteers) merged 2 commits into
airbytehq:mainfrom
benjaminwilen:remove_long_standing_connections_on_sql_state_and_catalog_backends

Conversation

@benjaminwilen

@benjaminwilen Ben Wilen (benjaminwilen) commented May 21, 2025

Copy link
Copy Markdown
Contributor

What does this PR do?

This PR modifies how SqlCatalogBackend and SqlStateBackend connect to an sql engine to do so lazily and per query execution instead of eagerly and in a single connection. This allows long running syncs to complete without receiving the following error:

(snowflake.connector.errors.ProgrammingError) 390114 (08001): None: Authentication token has expired.  The user must authenticate again.

Testing

  • I used a failing PyAirbyte stream (takes >5 hours to run), used my feature branch as the patched airbyte package, and the stream completed despite taking 7 hours to run

Additional Notes

Pros of this PR

  1. Doesn't attempt to allow connection to stay alive for 4+ hours with client_session_keep_alive=True, instead opts to open a connection for each individual query execution.
  2. Reuses the structure of SnowflakeSqlProcessor by accepting sql_config as a param.

Cons of this PR

  1. SqlCatalogBackend and SqlStateBackend are not backwards compatible if a user uses this classes outside of the one internal reference for each within PyAirbyte code.

Summary by CodeRabbit

  • Refactor
    • Updated internal configuration to use a new abstraction for SQL connections, improving maintainability and flexibility. No changes to user-facing features or workflows.

@coderabbitai

coderabbitai Bot commented May 21, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

The changes refactor backend classes to accept a SqlConfig abstraction instead of a direct SQLAlchemy Engine instance. Constructors and internal references are updated to retrieve the engine dynamically via the SqlConfig object. Instantiations in the CacheBase class are also updated to pass the configuration object rather than the engine.

Changes

File(s) Change Summary
airbyte/caches/_catalog_backend.py Updated SqlCatalogBackend to accept SqlConfig in the constructor and retrieve the engine via get_sql_engine(). Adjusted internal references and imports accordingly.
airbyte/caches/_state_backend.py Modified SqlStateBackend to use SqlConfig instead of a direct engine. Updated engine access in all relevant methods and imports.
airbyte/caches/base.py Changed instantiation of backend classes in CacheBase to pass self as sql_config instead of passing an engine.

Sequence Diagram(s)

sequenceDiagram
    participant CacheBase
    participant SqlCatalogBackend
    participant SqlStateBackend
    participant SqlConfig
    participant Engine

    CacheBase->>SqlCatalogBackend: __init__(sql_config=self, table_prefix)
    CacheBase->>SqlStateBackend: __init__(sql_config=self, table_prefix)
    SqlCatalogBackend->>SqlConfig: get_sql_engine()
    SqlConfig->>Engine: (returns Engine)
    SqlStateBackend->>SqlConfig: get_sql_engine()
    SqlConfig->>Engine: (returns Engine)
Loading

Would you like to see a more detailed breakdown of how the SqlConfig abstraction is now used throughout the backends, or is this high-level overview sufficient for your needs? Wdyt?

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 30th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 0296295 and 96a76eb.

📒 Files selected for processing (3)
  • airbyte/caches/_catalog_backend.py (5 hunks)
  • airbyte/caches/_state_backend.py (4 hunks)
  • airbyte/caches/base.py (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/caches/_state_backend.py (4)
airbyte/shared/sql_processor.py (3)
  • SqlConfig (86-151)
  • get_sql_engine (131-140)
  • get_sql_engine (384-386)
airbyte/cloud/sync_results.py (1)
  • get_sql_engine (279-281)
airbyte/_processors/sql/duckdb.py (1)
  • get_sql_engine (78-87)
airbyte/results.py (1)
  • get_sql_engine (80-82)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Pytest (Fast)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (12)
airbyte/caches/base.py (2)

92-95: LGTM! Good refactoring to use SqlConfig.

The change to pass self as sql_config to SqlCatalogBackend is spot on. This modification supports the PR objective of removing long-standing connections by allowing the backend to retrieve a fresh engine for each operation, preventing expired token issues. Nice work!


96-99: LGTM! Consistent SqlConfig pattern applied.

Similarly updating SqlStateBackend to use the configuration object maintains consistency in how connections are managed. This unified approach will make the codebase more maintainable and addresses the authentication token expiration problem effectively.

airbyte/caches/_state_backend.py (5)

28-28: LGTM! Import update matches implementation changes.

Correctly updated the import to use SqlConfig instead of Engine to align with the constructor parameter changes. This reflects the underlying architectural shift to dynamic connection management.


187-194: LGTM! Good constructor refactoring.

The constructor now properly accepts a SqlConfig object instead of a direct Engine instance, which aligns perfectly with the PR objective to create connections lazily. This change will help prevent the Snowflake authentication token expiration errors during long-running operations.


197-200: LGTM! Dynamic engine retrieval implemented.

Great implementation of dynamic engine retrieval via self._sql_config.get_sql_engine(). This ensures a fresh connection for each database operation, addressing the core issue with expired authentication tokens.


224-226: LGTM! Consistent pattern applied throughout.

Consistently applied the same pattern of retrieving the engine dynamically in the get_state_provider method. This ensures that all database operations benefit from the new connection management approach.


137-138: LGTM! State writer updated for consistency.

Updated the _write_state method to use the dynamic engine retrieval pattern as well. This completes the refactoring by ensuring all database interactions use fresh connections.

airbyte/caches/_catalog_backend.py (5)

30-30: LGTM! Import update aligns with implementation.

Updated the import statement to use SqlConfig instead of Engine, matching the constructor parameter changes. Good attention to detail!


111-117: LGTM! Constructor updated to use SqlConfig.

The constructor now properly accepts and stores a SqlConfig object instead of directly storing an engine. This change is fundamental to addressing the expired token issue by enabling lazy connection creation.


127-129: LGTM! Dynamic engine retrieval implemented.

Implemented dynamic engine retrieval in the _ensure_internal_tables method, which ensures fresh connections are used when interacting with the database schema.


183-185: LGTM! Consistent pattern applied.

Applied the same pattern of retrieving the engine dynamically in the _save_catalog_info method, maintaining consistency throughout the codebase.


219-221: LGTM! All database interactions updated.

Completed the refactoring by updating the _fetch_streams_info method to also retrieve the engine dynamically. This ensures all database interactions in this class follow the new pattern.

✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@benjaminwilen

Copy link
Copy Markdown
Contributor Author

CodeRabbit (@coderabbitai) review

@benjaminwilen Ben Wilen (benjaminwilen) force-pushed the remove_long_standing_connections_on_sql_state_and_catalog_backends branch from da64cb2 to 96a76eb Compare May 21, 2025 18:44
@benjaminwilen Ben Wilen (benjaminwilen) marked this pull request as ready for review May 22, 2025 02:44
@aaronsteers

Aaron ("AJ") Steers (aaronsteers) commented May 22, 2025

Copy link
Copy Markdown
Member

/test-pr

PR test job started... Check job output.

❌ Tests failed.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving - after careful review, and running tests manually with our creds.

Thanks again Ben Wilen (@benjaminwilen) for this contribution. Will merge today and release shortly. 🙏

@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit 6cb8e07 into airbytehq:main May 26, 2025
16 checks passed
@benjaminwilen Ben Wilen (benjaminwilen) deleted the remove_long_standing_connections_on_sql_state_and_catalog_backends branch May 27, 2025 04:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants