Feat: Introducing AIRBYTE_OFFLINE_MODE for air-gapped environments#432
Conversation
📝 Walkthrough📝 WalkthroughWalkthroughThis pull request introduces enhancements to error handling and control flow in the Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (6)
airbyte/constants.py (2)
102-107: Minor formatting improvements needed.There are a few formatting issues in the docstring. Would you like me to fix these? Here's what I'm thinking:
-Airbyte registry but will not raise an error if the registry is unavailable. This can be useful in -environments without internet access, and it allows PyAirbyte to function without external dependencies. +Airbyte registry but will not raise an error if the registry is unavailable. This can be useful in +environments without internet access, and it allows PyAirbyte to function without external +dependencies. -Offline mode also disables telemetry, similar to a `DO_NOT_TRACK` setting, ensuring no usage data -is sent from your environment. You may also specify a custom registry URL (via the `_REGISTRY_ENV_VAR` +Offline mode also disables telemetry, similar to a `DO_NOT_TRACK` setting, ensuring no usage data +is sent from your environment. You may also specify a custom registry URL (via the +`_REGISTRY_ENV_VAR`🧰 Tools
🪛 Ruff
102-102: Trailing whitespace
Remove trailing whitespace
(W291)
103-103: Line too long (104 > 100)
(E501)
105-105: Trailing whitespace
Remove trailing whitespace
(W291)
106-106: Line too long (103 > 100)
(E501)
106-106: Trailing whitespace
Remove trailing whitespace
(W291)
99-111: Consider enhancing the docstring with environment variable details?The docstring is comprehensive, but what do you think about explicitly mentioning the environment variable name? Something like:
"This mode can be enabled by setting the
AIRBYTE_OFFLINE_MODEenvironment variable to 'true'."This would make it even more user-friendly, wdyt? 🤔
🧰 Tools
🪛 Ruff
102-102: Trailing whitespace
Remove trailing whitespace
(W291)
103-103: Line too long (104 > 100)
(E501)
105-105: Trailing whitespace
Remove trailing whitespace
(W291)
106-106: Line too long (103 > 100)
(E501)
106-106: Trailing whitespace
Remove trailing whitespace
(W291)
airbyte/_executors/util.py (2)
8-8: Remove unused importHey! I noticed we have an unused
osimport. Would you mind if we remove it to keep the imports clean? 🧹-import osAlso applies to: 20-20
🧰 Tools
🪛 Ruff
8-8:
osimported but unusedRemove unused import:
os(F401)
165-179: Enhance error handling robustnessThe error handling looks great! 🎯 A few suggestions to make it even better:
- Would you consider preserving the original exception using
from? This helps with debugging by maintaining the error chain:- raise exc.AirbyteConnectorRegistryError( + raise exc.AirbyteConnectorRegistryError( message="Failed to connect to the connector registry.", context={"connector_name": name}, guidance=( "\nThere was a problem connecting to the Airbyte connector registry. " "Please check your internet connection and try again." "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to " "`1`. This will prevent errors related to registry connectivity and disable telemetry. " "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment variable to " "the URL of your custom registry." ), - ) + ) from exc
- The guidance message lines are a bit long. What do you think about breaking them at natural points? wdyt?
guidance=( - "\nThere was a problem connecting to the Airbyte connector registry. " - "Please check your internet connection and try again." - "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to " - "`1`. This will prevent errors related to registry connectivity and disable telemetry. " - "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment variable to " - "the URL of your custom registry." + "\nThere was a problem connecting to the Airbyte connector registry. " + "Please check your internet connection and try again.\n" + "\nTo operate offline, set the `AIRBYTE_OFFLINE_MODE` environment " + "variable to `1`. This will prevent errors related to registry " + "connectivity and disable telemetry.\n" + "\nIf you have a custom registry, set `_REGISTRY_ENV_VAR` environment " + "variable to the URL of your custom registry." ),🧰 Tools
🪛 Ruff
168-179: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
175-175: Line too long (108 > 100)
(E501)
176-176: Line too long (103 > 100)
(E501)
airbyte/_util/telemetry.py (2)
93-93: LGTM! Consider adding a debug message for offline mode?The offline mode check is implemented correctly. Since we already have debug messages for other scenarios (like the issues list at line 144), what do you think about adding a debug message when offline mode is detected? This could help with troubleshooting, wdyt? 🤔
if os.environ.get(DO_NOT_TRACK) or AIRBYTE_OFFLINE_MODE: + if DEBUG: + print("Telemetry disabled: Offline mode or DO_NOT_TRACK is set") # User has opted out of tracking. return False
Line range hint
1-300: Architecture suggestion: Consider consolidating the offline checks.Since both
_setup_analyticsandsend_telemetryperform the same check, what do you think about consolidating this logic into a single helper function? This could make future modifications easier and reduce the chance of inconsistencies, wdyt? 🤔Something like this perhaps?
def _is_telemetry_disabled() -> bool: """Check if telemetry is disabled via environment variables or offline mode.""" return bool(os.environ.get(DO_NOT_TRACK) or AIRBYTE_OFFLINE_MODE)Then you could use it like:
def _setup_analytics() -> str | bool: if _is_telemetry_disabled(): return False # ... rest of the function def send_telemetry(...) -> None: if _is_telemetry_disabled(): return # ... rest of the function
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (3)
- airbyte/_executors/util.py (3 hunks)
- airbyte/_util/telemetry.py (3 hunks)
- airbyte/constants.py (1 hunks)
🧰 Additional context used
🪛 Ruff
airbyte/_executors/util.py
8-8:
osimported but unusedRemove unused import:
os(F401)
168-179: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
175-175: Line too long (108 > 100)
(E501)
176-176: Line too long (103 > 100)
(E501)
airbyte/constants.py
102-102: Trailing whitespace
Remove trailing whitespace
(W291)
103-103: Line too long (104 > 100)
(E501)
105-105: Trailing whitespace
Remove trailing whitespace
(W291)
106-106: Line too long (103 > 100)
(E501)
106-106: Trailing whitespace
Remove trailing whitespace
(W291)
🔇 Additional comments (4)
airbyte/constants.py (1)
93-98: LGTM! Clean implementation of the offline mode constant.The implementation follows the established patterns in the codebase and uses the existing
_str_to_boolhelper. The default value of "false" makes sense as a safe default.airbyte/_executors/util.py (1)
165-179: Implementation aligns well with PR objectives! 🎯The error handling implementation effectively addresses the needs of air-gapped environments by:
- Gracefully handling connection failures
- Providing clear guidance on enabling offline mode
- Maintaining backward compatibility
Great job on implementing this feature! 🚀
🧰 Tools
🪛 Ruff
168-179: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
175-175: Line too long (108 > 100)
(E501)
176-176: Line too long (103 > 100)
(E501)
airbyte/_util/telemetry.py (2)
53-53: LGTM! Clean import of the new constant.The import follows the existing style and is placed in the appropriate section.
211-211: LGTM! Consistent implementation.The offline mode check is applied consistently with the same pattern used in
_setup_analytics. Nice work on maintaining consistency! 👍
005d837 to
d230135
Compare
There was a problem hiding this comment.
Niyas Hameed (@niyasrad) - This looks really good! Thanks for creating this. I wanted to also suggest adding upstream handling to skip attempting to load the registry if we're in this mode. GitHub will only let me suggest changes on modified lines so I created a commit with my suggestions here:
Wdyt?
Aaron ("AJ") Steers (@aaronsteers) That's a wonderful suggestion, totally missed this! Merged and made some new changes to ensure that the |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
- airbyte/sources/registry.py (3 hunks)
🔇 Additional comments (3)
airbyte/sources/registry.py (3)
19-20: LGTM! Clean import additions.
The new imports are well-organized and follow the existing import style.
268-272: Nice warning implementation!
The warning message is clear and the comment explains why an empty registry isn't fatal. Good call on using with_stack=False here! 👍
238-245: Small typo in comment & verify empty cache handling.
There's a small typo in the comment: "dissalowed" should be "disallowed". Otherwise, the logic looks good!
Quick question: should we add a debug log when returning an empty cache in these cases? It might help with troubleshooting, wdyt? 🤔
Let's verify how empty cache is handled throughout the codebase:
✅ Verification successful
Let me try another approach to find cache usage patterns:
And let's also check where this function is used:
Fix typo in comment and add debug logging for empty cache cases
The empty cache handling looks good! I can see that empty caches are properly handled in the code:
- Line 292 checks for empty cache and raises a clear error
- The function is used consistently throughout the codebase for connector listing and metadata access
- Test fixtures properly handle cache resets
Two suggestions to improve the code:
- Fix the typo: "dissalowed" → "disallowed"
- Maybe add debug logging when returning empty cache? Something like:
if registry_url.upper() in {"0", "F", "FALSE"}:
logging.debug("Registry disabled by user configuration")
return {}
if registry_url.startswith("http") and AIRBYTE_OFFLINE_MODE:
logging.debug("Registry access blocked due to offline mode")
return {}What do you think about adding these debug logs? They could help with troubleshooting without being too noisy! 🤔
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Search for direct cache access or empty cache handling
rg -A 3 "_get_registry_cache|cache\s*=\s*{}" --type py
Length of output: 206
Script:
#!/bin/bash
# Search for direct cache access with fixed regex
rg -A 3 "_get_registry_cache" --type py
# Search for empty cache handling with fixed regex
rg -A 3 'cache\s*=\s*\{\}' --type py
Length of output: 2947
Script:
#!/bin/bash
# Search for registry cache related code
rg -A 5 "registry.*cache" --type py
Length of output: 4587
Script:
#!/bin/bash
# Find usages of _get_registry_cache
ast-grep --pattern '_get_registry_cache'
Length of output: 940
2abd55d to
f68991b
Compare
f68991b to
9cac8e8
Compare
|
CodeRabbit (@coderabbitai) review |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (5)
airbyte/constants.py (2)
99-111: The docstring is comprehensive, but what do you think about adding usage examples? 🤔
The documentation clearly explains the purpose and implications of offline mode. Would you consider adding a quick example showing how to enable it? Something like:
# Enable offline mode via environment variable
export AIRBYTE_OFFLINE_MODE=true
# Or programmatically in Python
import os
os.environ["AIRBYTE_OFFLINE_MODE"] = "true"This could help users quickly understand how to use this feature in their air-gapped environments. wdyt?
105-107: Consider clarifying the registry URL environment variable name.
The docstring mentions "_REGISTRY_ENV_VAR" but doesn't specify the actual environment variable name. Should we make this more explicit for users?
airbyte/sources/registry.py (3)
185-187: Consider adding a docstring for better documentation.
The function looks great! Would you consider adding a docstring to explain its purpose and parameters? Something like this perhaps? 🤔
def _is_registry_disabled(url: str) -> bool:
+ """Check if the registry is disabled based on URL value or offline mode.
+
+ Args:
+ url: The registry URL to check
+
+ Returns:
+ bool: True if registry is disabled, False otherwise
+ """
return url.upper() in {"0", "F", "FALSE"} or AIRBYTE_OFFLINE_MODEWdyt? 🎯
269-273: Consider enhancing the warning message for clarity.
The warning message is good, but what do you think about making it more specific when in offline mode? Maybe something like:
- message=f"Connector registry is empty: {registry_url}",
+ message=f"Connector registry is empty: {registry_url}" +
+ (" (AIRBYTE_OFFLINE_MODE is enabled)" if AIRBYTE_OFFLINE_MODE else ""),This would help users understand why they're seeing an empty registry. Wdyt? 🤔
280-289: Consider updating the docstring to reflect offline mode behavior.
The function signature change looks great! Would you consider updating the docstring to document the None return case? Something like:
def get_connector_metadata(name: str) -> None | ConnectorMetadata:
"""Check the cache for the connector.
If the cache is empty, populate by calling update_cache.
+
+ Returns:
+ None if the registry is disabled or offline mode is enabled
+ ConnectorMetadata if the connector is found in the registry
+
+ Raises:
+ PyAirbyteInternalError: If the registry could not be loaded
+ AirbyteConnectorNotRegisteredError: If the connector is not found in the registry
"""This would help users understand when to expect a None return. What do you think? 🎯
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (4)
- airbyte/_executors/util.py (3 hunks)
- airbyte/_util/telemetry.py (3 hunks)
- airbyte/constants.py (1 hunks)
- airbyte/sources/registry.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- airbyte/_executors/util.py
- airbyte/_util/telemetry.py
🔇 Additional comments (2)
airbyte/constants.py (1)
93-98: LGTM! Clean implementation of the offline mode constant.
The implementation follows the established pattern in the codebase, using the existing _str_to_bool helper and environment variable pattern. The default value of "false" is a safe choice that maintains backward compatibility.
airbyte/sources/registry.py (1)
19-20: LGTM! Clean import additions.
The new imports are well-organized and follow the project's import structure.
Aaron ("AJ") Steers (aaronsteers)
left a comment
There was a problem hiding this comment.
This looks great. Thanks, Niyas Hameed (@niyasrad), for contributing! 🚀
| guidance=( | ||
| "\nThere was a problem connecting to the Airbyte connector registry. " | ||
| "Please check your internet connection and try again.\nTo operate " | ||
| "offline, set the `AIRBYTE_OFFLINE_MODE` environment variable to `1`." | ||
| "This will prevent errors related to registry connectivity and disable " | ||
| "telemetry. \nIf you have a custom registry, set `_REGISTRY_ENV_VAR` " | ||
| "environment variable to the URL of your custom registry." | ||
| ), |
There was a problem hiding this comment.
Very nicely explained. 👍
Description
Introduced a new constant,
AIRBYTE_OFFLINE_MODE, for PyAirbyte to enable functionality in offline or air-gapped environments where external connectivity is unavailable.This issue was initially documented in Issue #428, where a user experienced difficulties using PyAirbyte due to unsuccessful requests to the Airbyte Connector Registry.
The
AIRBYTE_OFFLINE_MODEconstant gracefully handles exceptions when attempting to connect to the registry. If this mode is disabled, the error handling includes a detailed explanation and guidance for users.Additionally, when
AIRBYTE_OFFLINE_MODEis enabled, it acts likewise to theDO_NOT_TRACKenvironment variable, ensuring that no telemetry data is sent from the environment.Summary by CodeRabbit
New Features
AIRBYTE_OFFLINE_MODEto enhance offline operation, allowing users to work without internet access while managing connector metadata.Bug Fixes