Skip to content

Feat: Introducing AIRBYTE_OFFLINE_MODE for air-gapped environments#432

Merged
Aaron ("AJ") Steers (aaronsteers) merged 1 commit into
airbytehq:mainfrom
niyasrad:offline-pyairbyte
Oct 29, 2024
Merged

Feat: Introducing AIRBYTE_OFFLINE_MODE for air-gapped environments#432
Aaron ("AJ") Steers (aaronsteers) merged 1 commit into
airbytehq:mainfrom
niyasrad:offline-pyairbyte

Conversation

@niyasrad

@niyasrad Niyas Hameed (niyasrad) commented Oct 27, 2024

Copy link
Copy Markdown
Contributor

Description

  • Introduced a new constant, AIRBYTE_OFFLINE_MODE, for PyAirbyte to enable functionality in offline or air-gapped environments where external connectivity is unavailable.

  • This issue was initially documented in Issue #428, where a user experienced difficulties using PyAirbyte due to unsuccessful requests to the Airbyte Connector Registry.

  • The AIRBYTE_OFFLINE_MODE constant gracefully handles exceptions when attempting to connect to the registry. If this mode is disabled, the error handling includes a detailed explanation and guidance for users.

  • Additionally, when AIRBYTE_OFFLINE_MODE is enabled, it acts likewise to the DO_NOT_TRACK environment variable, ensuring that no telemetry data is sent from the environment.

Summary by CodeRabbit

  • New Features

    • Introduced AIRBYTE_OFFLINE_MODE to enhance offline operation, allowing users to work without internet access while managing connector metadata.
    • Updated telemetry settings to disable tracking in offline mode.
  • Bug Fixes

    • Improved error handling for connectivity issues, providing clearer guidance when the connector registry is unreachable.
    • Enhanced management of scenarios where the connector registry is disabled or offline, allowing for graceful handling without raising errors.

@coderabbitai

coderabbitai Bot commented Oct 27, 2024

Copy link
Copy Markdown
Contributor
📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces enhancements to error handling and control flow in the get_connector_executor function within the airbyte/_executors/util.py file. It adds a new constant, AIRBYTE_OFFLINE_MODE, which influences telemetry tracking in the _setup_analytics and send_telemetry functions in airbyte/_util/telemetry.py. Additionally, the constant is defined in airbyte/constants.py, providing a mechanism for offline operation that affects both error handling and telemetry behavior.

Changes

File Change Summary
airbyte/_executors/util.py Enhanced error handling in get_connector_executor to include a new exception block for requests.exceptions.ConnectionError, utilizing AIRBYTE_OFFLINE_MODE. Preserved existing error handling for AirbyteConnectorNotRegisteredError.
airbyte/_util/telemetry.py Introduced AIRBYTE_OFFLINE_MODE to modify telemetry opt-out conditions in _setup_analytics and send_telemetry functions.
airbyte/constants.py Added new constant AIRBYTE_OFFLINE_MODE, initialized from the environment variable, with a detailed docstring explaining its purpose and functionality.
airbyte/sources/registry.py Enhanced error handling in _get_registry_cache and get_connector_metadata, allowing for graceful handling of offline mode and empty registries. Updated return type of get_connector_metadata to allow None.

Possibly related PRs

Suggested reviewers

  • aaronsteers

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (6)
airbyte/constants.py (2)

102-107: Minor formatting improvements needed.

There are a few formatting issues in the docstring. Would you like me to fix these? Here's what I'm thinking:

-Airbyte registry but will not raise an error if the registry is unavailable. This can be useful in 
-environments without internet access, and it allows PyAirbyte to function without external dependencies.
+Airbyte registry but will not raise an error if the registry is unavailable. This can be useful in
+environments without internet access, and it allows PyAirbyte to function without external
+dependencies.

-Offline mode also disables telemetry, similar to a `DO_NOT_TRACK` setting, ensuring no usage data 
-is sent from your environment. You may also specify a custom registry URL (via the `_REGISTRY_ENV_VAR` 
+Offline mode also disables telemetry, similar to a `DO_NOT_TRACK` setting, ensuring no usage data
+is sent from your environment. You may also specify a custom registry URL (via the
+`_REGISTRY_ENV_VAR`
🧰 Tools
🪛 Ruff

102-102: Trailing whitespace

Remove trailing whitespace

(W291)


103-103: Line too long (104 > 100)

(E501)


105-105: Trailing whitespace

Remove trailing whitespace

(W291)


106-106: Line too long (103 > 100)

(E501)


106-106: Trailing whitespace

Remove trailing whitespace

(W291)


99-111: Consider enhancing the docstring with environment variable details?

The docstring is comprehensive, but what do you think about explicitly mentioning the environment variable name? Something like:

"This mode can be enabled by setting the AIRBYTE_OFFLINE_MODE environment variable to 'true'."

This would make it even more user-friendly, wdyt? 🤔

🧰 Tools
🪛 Ruff

102-102: Trailing whitespace

Remove trailing whitespace

(W291)


103-103: Line too long (104 > 100)

(E501)


105-105: Trailing whitespace

Remove trailing whitespace

(W291)


106-106: Line too long (103 > 100)

(E501)


106-106: Trailing whitespace

Remove trailing whitespace

(W291)

airbyte/_executors/util.py (2)

8-8: Remove unused import

Hey! I noticed we have an unused os import. Would you mind if we remove it to keep the imports clean? 🧹

-import os

Also applies to: 20-20

🧰 Tools
🪛 Ruff

8-8: os imported but unused

Remove unused import: os

(F401)


165-179: Enhance error handling robustness

The error handling looks great! 🎯 A few suggestions to make it even better:

  1. Would you consider preserving the original exception using from? This helps with debugging by maintaining the error chain:
-            raise exc.AirbyteConnectorRegistryError(
+            raise exc.AirbyteConnectorRegistryError(
                 message="Failed to connect to the connector registry.",
                 context={"connector_name": name},
                 guidance=(
                     "\nThere was a problem connecting to the Airbyte connector registry. "
                     "Please check your internet connection and try again."
                     "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to "
                     "`1`. This will prevent errors related to registry connectivity and disable telemetry. "
                     "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment variable to "
                     "the URL of your custom registry."
                 ),
-            )
+            ) from exc
  1. The guidance message lines are a bit long. What do you think about breaking them at natural points? wdyt?
                 guidance=(
-                    "\nThere was a problem connecting to the Airbyte connector registry. "
-                    "Please check your internet connection and try again."
-                    "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to "
-                    "`1`. This will prevent errors related to registry connectivity and disable telemetry. "
-                    "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment variable to "
-                    "the URL of your custom registry."
+                    "\nThere was a problem connecting to the Airbyte connector registry. "
+                    "Please check your internet connection and try again.\n"
+                    "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment "
+                    "variable to `1`. This will prevent errors related to registry "
+                    "connectivity and disable telemetry.\n"
+                    "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment "
+                    "variable to the URL of your custom registry."
                 ),
🧰 Tools
🪛 Ruff

168-179: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


175-175: Line too long (108 > 100)

(E501)


176-176: Line too long (103 > 100)

(E501)

airbyte/_util/telemetry.py (2)

93-93: LGTM! Consider adding a debug message for offline mode?

The offline mode check is implemented correctly. Since we already have debug messages for other scenarios (like the issues list at line 144), what do you think about adding a debug message when offline mode is detected? This could help with troubleshooting, wdyt? 🤔

     if os.environ.get(DO_NOT_TRACK) or AIRBYTE_OFFLINE_MODE:
+        if DEBUG:
+            print("Telemetry disabled: Offline mode or DO_NOT_TRACK is set")
         # User has opted out of tracking.
         return False

Line range hint 1-300: Architecture suggestion: Consider consolidating the offline checks.

Since both _setup_analytics and send_telemetry perform the same check, what do you think about consolidating this logic into a single helper function? This could make future modifications easier and reduce the chance of inconsistencies, wdyt? 🤔

Something like this perhaps?

def _is_telemetry_disabled() -> bool:
    """Check if telemetry is disabled via environment variables or offline mode."""
    return bool(os.environ.get(DO_NOT_TRACK) or AIRBYTE_OFFLINE_MODE)

Then you could use it like:

def _setup_analytics() -> str | bool:
    if _is_telemetry_disabled():
        return False
    # ... rest of the function

def send_telemetry(...) -> None:
    if _is_telemetry_disabled():
        return
    # ... rest of the function
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 02f5ede and 005d837.

📒 Files selected for processing (3)
  • airbyte/_executors/util.py (3 hunks)
  • airbyte/_util/telemetry.py (3 hunks)
  • airbyte/constants.py (1 hunks)
🧰 Additional context used
🪛 Ruff
airbyte/_executors/util.py

8-8: os imported but unused

Remove unused import: os

(F401)


168-179: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


175-175: Line too long (108 > 100)

(E501)


176-176: Line too long (103 > 100)

(E501)

airbyte/constants.py

102-102: Trailing whitespace

Remove trailing whitespace

(W291)


103-103: Line too long (104 > 100)

(E501)


105-105: Trailing whitespace

Remove trailing whitespace

(W291)


106-106: Line too long (103 > 100)

(E501)


106-106: Trailing whitespace

Remove trailing whitespace

(W291)

🔇 Additional comments (4)
airbyte/constants.py (1)

93-98: LGTM! Clean implementation of the offline mode constant.

The implementation follows the established patterns in the codebase and uses the existing _str_to_bool helper. The default value of "false" makes sense as a safe default.

airbyte/_executors/util.py (1)

165-179: Implementation aligns well with PR objectives! 🎯

The error handling implementation effectively addresses the needs of air-gapped environments by:

  • Gracefully handling connection failures
  • Providing clear guidance on enabling offline mode
  • Maintaining backward compatibility

Great job on implementing this feature! 🚀

🧰 Tools
🪛 Ruff

168-179: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)


175-175: Line too long (108 > 100)

(E501)


176-176: Line too long (103 > 100)

(E501)

airbyte/_util/telemetry.py (2)

53-53: LGTM! Clean import of the new constant.

The import follows the existing style and is placed in the appropriate section.


211-211: LGTM! Consistent implementation.

The offline mode check is applied consistently with the same pattern used in _setup_analytics. Nice work on maintaining consistency! 👍

@aaronsteers Aaron ("AJ") Steers (aaronsteers) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Niyas Hameed (@niyasrad) - This looks really good! Thanks for creating this. I wanted to also suggest adding upstream handling to skip attempting to load the registry if we're in this mode. GitHub will only let me suggest changes on modified lines so I created a commit with my suggestions here:

3451dec

Wdyt?

@niyasrad

Niyas Hameed (niyasrad) commented Oct 29, 2024

Copy link
Copy Markdown
Contributor Author

Niyas Hameed (@niyasrad) - This looks really good! Thanks for creating this. I wanted to also suggest adding upstream handling to skip attempting to load the registry if we're in this mode. GitHub will only let me suggest changes on modified lines so I created a commit with my suggestions here:

3451dec

Wdyt?

Aaron ("AJ") Steers (@aaronsteers) That's a wonderful suggestion, totally missed this!

Merged and made some new changes to ensure that the {} returns do not affect/ bring about any new errors. Tested it offline as well! We can squash and merge.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between d230135 and 2abd55d.

📒 Files selected for processing (1)
  • airbyte/sources/registry.py (3 hunks)
🔇 Additional comments (3)
airbyte/sources/registry.py (3)

19-20: LGTM! Clean import additions.

The new imports are well-organized and follow the existing import style.


268-272: Nice warning implementation!

The warning message is clear and the comment explains why an empty registry isn't fatal. Good call on using with_stack=False here! 👍


238-245: Small typo in comment & verify empty cache handling.

There's a small typo in the comment: "dissalowed" should be "disallowed". Otherwise, the logic looks good!

Quick question: should we add a debug log when returning an empty cache in these cases? It might help with troubleshooting, wdyt? 🤔

Let's verify how empty cache is handled throughout the codebase:

✅ Verification successful

Let me try another approach to find cache usage patterns:

And let's also check where this function is used:


Fix typo in comment and add debug logging for empty cache cases

The empty cache handling looks good! I can see that empty caches are properly handled in the code:

  • Line 292 checks for empty cache and raises a clear error
  • The function is used consistently throughout the codebase for connector listing and metadata access
  • Test fixtures properly handle cache resets

Two suggestions to improve the code:

  1. Fix the typo: "dissalowed" → "disallowed"
  2. Maybe add debug logging when returning empty cache? Something like:
if registry_url.upper() in {"0", "F", "FALSE"}:
    logging.debug("Registry disabled by user configuration")
    return {}
if registry_url.startswith("http") and AIRBYTE_OFFLINE_MODE:
    logging.debug("Registry access blocked due to offline mode")
    return {}

What do you think about adding these debug logs? They could help with troubleshooting without being too noisy! 🤔

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for direct cache access or empty cache handling
rg -A 3 "_get_registry_cache|cache\s*=\s*{}" --type py

Length of output: 206


Script:

#!/bin/bash
# Search for direct cache access with fixed regex
rg -A 3 "_get_registry_cache" --type py

# Search for empty cache handling with fixed regex
rg -A 3 'cache\s*=\s*\{\}' --type py

Length of output: 2947


Script:

#!/bin/bash
# Search for registry cache related code
rg -A 5 "registry.*cache" --type py

Length of output: 4587


Script:

#!/bin/bash
# Find usages of _get_registry_cache
ast-grep --pattern '_get_registry_cache'

Length of output: 940

Comment thread airbyte/sources/registry.py
@niyasrad

Copy link
Copy Markdown
Contributor Author

CodeRabbit (@coderabbitai) review

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (5)
airbyte/constants.py (2)

99-111: The docstring is comprehensive, but what do you think about adding usage examples? 🤔

The documentation clearly explains the purpose and implications of offline mode. Would you consider adding a quick example showing how to enable it? Something like:

# Enable offline mode via environment variable
export AIRBYTE_OFFLINE_MODE=true

# Or programmatically in Python
import os
os.environ["AIRBYTE_OFFLINE_MODE"] = "true"

This could help users quickly understand how to use this feature in their air-gapped environments. wdyt?


105-107: Consider clarifying the registry URL environment variable name.

The docstring mentions "_REGISTRY_ENV_VAR" but doesn't specify the actual environment variable name. Should we make this more explicit for users?

airbyte/sources/registry.py (3)

185-187: Consider adding a docstring for better documentation.

The function looks great! Would you consider adding a docstring to explain its purpose and parameters? Something like this perhaps? 🤔

 def _is_registry_disabled(url: str) -> bool:
+    """Check if the registry is disabled based on URL value or offline mode.
+
+    Args:
+        url: The registry URL to check
+
+    Returns:
+        bool: True if registry is disabled, False otherwise
+    """
     return url.upper() in {"0", "F", "FALSE"} or AIRBYTE_OFFLINE_MODE

Wdyt? 🎯


269-273: Consider enhancing the warning message for clarity.

The warning message is good, but what do you think about making it more specific when in offline mode? Maybe something like:

-            message=f"Connector registry is empty: {registry_url}",
+            message=f"Connector registry is empty: {registry_url}" + 
+                    (" (AIRBYTE_OFFLINE_MODE is enabled)" if AIRBYTE_OFFLINE_MODE else ""),

This would help users understand why they're seeing an empty registry. Wdyt? 🤔


280-289: Consider updating the docstring to reflect offline mode behavior.

The function signature change looks great! Would you consider updating the docstring to document the None return case? Something like:

 def get_connector_metadata(name: str) -> None | ConnectorMetadata:
     """Check the cache for the connector.
 
     If the cache is empty, populate by calling update_cache.
+
+    Returns:
+        None if the registry is disabled or offline mode is enabled
+        ConnectorMetadata if the connector is found in the registry
+
+    Raises:
+        PyAirbyteInternalError: If the registry could not be loaded
+        AirbyteConnectorNotRegisteredError: If the connector is not found in the registry
     """

This would help users understand when to expect a None return. What do you think? 🎯

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 2abd55d and 9cac8e8.

📒 Files selected for processing (4)
  • airbyte/_executors/util.py (3 hunks)
  • airbyte/_util/telemetry.py (3 hunks)
  • airbyte/constants.py (1 hunks)
  • airbyte/sources/registry.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • airbyte/_executors/util.py
  • airbyte/_util/telemetry.py
🔇 Additional comments (2)
airbyte/constants.py (1)

93-98: LGTM! Clean implementation of the offline mode constant.

The implementation follows the established pattern in the codebase, using the existing _str_to_bool helper and environment variable pattern. The default value of "false" is a safe choice that maintains backward compatibility.

airbyte/sources/registry.py (1)

19-20: LGTM! Clean import additions.

The new imports are well-organized and follow the project's import structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. Thanks, Niyas Hameed (@niyasrad), for contributing! 🚀

Comment on lines +170 to +177
guidance=(
"\nThere was a problem connecting to the Airbyte connector registry. "
"Please check your internet connection and try again.\nTo operate "
"offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to `1`."
"This will prevent errors related to registry connectivity and disable "
"telemetry. \nIf you have a custom registry, set `_REGISTRY_ENV_VAR` "
"environment variable to the URL of your custom registry."
),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nicely explained. 👍

@aaronsteers Aaron ("AJ") Steers (aaronsteers) merged commit 7c703ac into airbytehq:main Oct 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants