feat(cli): use new central Dockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands#530
Conversation
… airbyte-repo-root
There was a problem hiding this comment.
Pull Request Overview
This PR centralizes Dockerfile templates and enhances CLI functionality by introducing a repository-root resolution mechanism and adopting a more concise positional connector argument across commands. Key changes include:
- Removing legacy Dockerfile templates in favor of centralized definitions and downloading logic.
- Adding a new module for resolving Airbyte repository and connector paths.
- Updating CLI commands and tests to use positional "[CONNECTOR]" arguments instead of separate flags.
Reviewed Changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py | Updated import for MANIFEST_YAML to use the new connector paths module. |
| unit_tests/cli/airbyte_cdk/test_util.py | Added tests for the new resolve_airbyte_repo_root functionality. |
| airbyte_cdk/utils/docker_image_templates.py | Removed legacy Dockerfile templates to centralize definitions. |
| airbyte_cdk/utils/docker.py | Modified Docker image build functions to use updated template and repo resolution logic. |
| airbyte_cdk/utils/connector_paths.py | New module for resolving Airbyte repo roots and connector paths. |
| airbyte_cdk/test/standard_tests/util.py, declarative_sources.py, connector_base.py | Updated imports from test_resources to connector_paths. |
| airbyte_cdk/cli/airbyte_cdk/_secrets.py, _image.py, _connector.py | Updated CLI argument handling and removed legacy utility files. |
Comments suppressed due to low confidence (1)
airbyte_cdk/utils/connector_paths.py:140
- The variable 'connector' is not defined in this context; it appears that 'connector_ref' was intended. Updating the error message to reference the correct variable would resolve this issue.
raise ValueError(f"Could not infer connector_name or connector_directory from input ref: {connector}")
|
Warning Rate limit exceededAaron ("AJ") Steers (@aaronsteers) has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 0 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis change refactors the Airbyte CDK CLI and related utilities by consolidating the logic for resolving connector names and directories into a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI_Command
participant ConnectorPathsUtil
participant DockerUtil
User->>CLI_Command: Invoke CLI with [connector] argument
CLI_Command->>ConnectorPathsUtil: resolve_connector_name_and_directory(connector)
ConnectorPathsUtil-->>CLI_Command: (connector_name, connector_directory)
CLI_Command->>DockerUtil: build_connector_image(connector_directory, dockerfile_override)
DockerUtil->>ConnectorPathsUtil: resolve_airbyte_repo_root(connector_directory)
DockerUtil->>DockerUtil: get_dockerfile_templates() or download from GitHub
DockerUtil-->>CLI_Command: Build image
CLI_Command-->>User: Output result
Possibly related PRs
Suggested reviewers
Would you like to add more test cases for the new path resolution logic, or is the current coverage sufficient for now—wdyt? ✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (11)
airbyte_cdk/utils/docker.py (2)
267-309: Consider adding timeout and retry logic to GitHub requests, wdyt?The HTTP requests to GitHub don't include timeout or retry logic, which could potentially make the build process less reliable in case of network issues.
- dockerfile_resp = requests.get(dockerfile_url) - dockerfile_resp.raise_for_status() + # Add timeout and retry for better reliability + session = requests.Session() + adapter = requests.adapters.HTTPAdapter(max_retries=3) + session.mount('https://', adapter) + dockerfile_resp = session.get(dockerfile_url, timeout=10) + dockerfile_resp.raise_for_status()Also consider adding a user agent header to follow good API etiquette:
- dockerfile_resp = requests.get(dockerfile_url) + headers = {"User-Agent": "Airbyte-CDK/1.0"} + dockerfile_resp = requests.get(dockerfile_url, headers=headers)
296-296: Consider parameterizing the GitHub branch referenceThe hardcoded reference to the
masterbranch might cause issues if the repository structure changes or if specific versions need to be targeted.- base_url = f"https://github.com/airbytehq/airbyte/raw/master/docker-images/" + # Allow specifying a different branch or tag + github_branch = os.environ.get("AIRBYTE_GITHUB_BRANCH", "master") + base_url = f"https://github.com/airbytehq/airbyte/raw/{github_branch}/docker-images/"airbyte_cdk/cli/airbyte_cdk/_connector.py (1)
49-53: Good consolidation of imports.The import statements have been nicely reorganized to use the new centralized location for connector path utilities. I noticed you commented out
pytest_hooks- was that intentional? If it's no longer needed, maybe we should remove it completely? wdyt?unit_tests/cli/airbyte_cdk/test_util.py (2)
1-3: Unused import in test file.The
checkimport fromtabnannyon line 2 doesn't appear to be used anywhere in this file. Should we remove it?from pathlib import Path -from tabnanny import check import pytest
61-84: Consider expanding exception handling.The test only specifically handles
FileNotFoundError, butresolve_airbyte_repo_rootmight raise other exceptions. Would it make sense to add a more general exception handler to provide clearer test failures for unexpected errors? Maybe something like:try: repo_root = resolve_airbyte_repo_root(start_dir_rel) # existing validation code except FileNotFoundError: if expect_success: raise AssertionError(f"Airbyte repo root not found from '{start_dir_rel!s}'.") + except Exception as e: + if expect_success: + raise AssertionError(f"Unexpected error resolving repo root from '{start_dir_rel!s}': {e}") + # If we expected failure, any exception is fineWhat do you think?
airbyte_cdk/cli/airbyte_cdk/_image.py (1)
81-81: Minor redundancy in parameter passing.There's a small redundancy in the parameter passing:
dockerfile_override=dockerfile or None,Since
dockerfileis already defined asPath | None, theor Noneis unnecessary. Consider simplifying to:- dockerfile_override=dockerfile or None, + dockerfile_override=dockerfile,What do you think?
airbyte_cdk/cli/airbyte_cdk/_secrets.py (4)
76-81: Use a richer Click type for paths?The positional
connectorargument is declared astype=str, which works for both a plain connector name and a path expressed as a string.
Would it be useful to accept a real path object as well, by switching totype=click.Path(path_type=Path, exists=False)and then passing it through unchanged?
That would give users tab-completion for directories and automatically normalisePathhandling insideresolve_connector_name_and_directory, while still allowing simple names. Wdyt?
115-117: Minor security wording tweak?The warning now says the option is ignored without
CI, but the implementation below prints an explanatory message after the flag is supplied.
Perhaps “will be ignored and a notice printed” would match the runtime behaviour more precisely. Wdyt?
121-123: Surface resolution errors more prominently
resolve_connector_name_and_directory()can raiseValueError/FileNotFoundError.
Would wrapping the call in a tinytry/exceptthat converts these intoclick.ClickExceptionmake the CLI error output friendlier (no stack trace by default)? For example:- connector_name, connector_directory = resolve_connector_name_and_directory(connector) + try: + connector_name, connector_directory = resolve_connector_name_and_directory(connector) + except (ValueError, FileNotFoundError) as exc: + raise click.ClickException(str(exc)) from excThis keeps internal exceptions out of user-facing terminals. Wdyt?
287-289: Rename shadow variable to underscoreUsing
_ = connector_nameavoids the “unused” linter complaint, but it still reads as a placeholder.
Would dropping the parameter entirely (or renaming the function to_get_secrets_dir_for_connector()with noconnector_namearg) simplify the signature?airbyte_cdk/utils/connector_paths.py (1)
168-170: Consider accepting “speculative” prefixesThe name check currently hard-fails when the directory name does not start with
source-ordestination-.
Would accepting custom prefixes (e.g.,processor-, future categories) or at least providing a clearer hint (“expected 'source-' or 'destination-' prefix”) improve UX? Wdyt?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (13)
airbyte_cdk/cli/airbyte_cdk/_connector.py(4 hunks)airbyte_cdk/cli/airbyte_cdk/_image.py(3 hunks)airbyte_cdk/cli/airbyte_cdk/_secrets.py(5 hunks)airbyte_cdk/cli/airbyte_cdk/_util.py(0 hunks)airbyte_cdk/test/standard_tests/connector_base.py(1 hunks)airbyte_cdk/test/standard_tests/declarative_sources.py(1 hunks)airbyte_cdk/test/standard_tests/test_resources.py(0 hunks)airbyte_cdk/test/standard_tests/util.py(1 hunks)airbyte_cdk/utils/connector_paths.py(1 hunks)airbyte_cdk/utils/docker.py(5 hunks)airbyte_cdk/utils/docker_image_templates.py(0 hunks)unit_tests/cli/airbyte_cdk/test_util.py(1 hunks)unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py(1 hunks)
💤 Files with no reviewable changes (3)
- airbyte_cdk/cli/airbyte_cdk/_util.py
- airbyte_cdk/utils/docker_image_templates.py
- airbyte_cdk/test/standard_tests/test_resources.py
🧰 Additional context used
🧠 Learnings (1)
airbyte_cdk/test/standard_tests/declarative_sources.py (1)
Learnt from: aaronsteers
PR: airbytehq/airbyte-python-cdk#58
File: airbyte_cdk/cli/source_declarative_manifest/_run.py:62-65
Timestamp: 2024-11-15T01:04:21.272Z
Learning: The files in `airbyte_cdk/cli/source_declarative_manifest/`, including `_run.py`, are imported from another repository, and changes to these files should be minimized or avoided when possible to maintain consistency.
🧬 Code Graph Analysis (2)
airbyte_cdk/cli/airbyte_cdk/_image.py (2)
airbyte_cdk/utils/connector_paths.py (1)
resolve_connector_name_and_directory(81-145)airbyte_cdk/utils/docker.py (1)
verify_docker_installation(394-400)
airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
airbyte_cdk/utils/connector_paths.py (1)
resolve_connector_name_and_directory(81-145)
🪛 GitHub Actions: Linters
airbyte_cdk/utils/connector_paths.py
[error] 142-142: mypy error: Name "connector" is not defined [name-defined]
⏰ Context from checks skipped due to timeout of 90000ms (8)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: SDM Docker Image Build
🔇 Additional comments (21)
airbyte_cdk/utils/docker.py (5)
15-15: New dependency addedAdded
requestslibrary for downloading Dockerfile templates from GitHub. This allows fallback template fetching when local templates aren't available.
144-144: Added flexibility with dockerfile_override parameter, nice!This allows users to specify a custom Dockerfile path, which is a good addition for flexibility. This matches the PR objective of making the CLI more intuitive and versatile.
168-196: Enhanced Dockerfile resolution with multiple fallback strategiesThe implementation handles three scenarios gracefully:
- Custom Dockerfile specified via the new parameter
- Local templates from the Airbyte repository
- Downloaded templates from GitHub as a fallback
This robust approach ensures the build process works in various environments, including when the Airbyte repo isn't checked out locally.
213-213: Simplified build argsThe code now only uses the kebab-case connector name instead of both formats, simplifying the build arguments.
312-360: Good implementation of template resolutionThis function correctly uses the new
resolve_airbyte_repo_rootfunction to find Docker templates from the local repository. The error handling is thorough and the validation of file existence is comprehensive.airbyte_cdk/test/standard_tests/declarative_sources.py (1)
15-15: Updated import to use centralized connector paths moduleThe import was updated to use the new
airbyte_cdk.utils.connector_pathsmodule, which aligns with the broader refactoring in this PR. This is consistent with the objective of centralizing repository path resolution.airbyte_cdk/test/standard_tests/connector_base.py (1)
27-30: Updated imports to use centralized connector paths moduleThis change follows the same pattern of centralizing connector path and repository resolution logic into the new
airbyte_cdk.utils.connector_pathsmodule. The consistent import changes across multiple files show good attention to detail.airbyte_cdk/test/standard_tests/util.py (1)
15-18: Updated imports to use centralized connector paths moduleThe change maintains consistency with other files by importing from the new centralized module. This change contributes to the PR objective of making repository root finding more robust.
unit_tests/source_declarative_manifest/test_source_declarative_w_custom_components.py (1)
35-35: Import path change looks good.The import of
MANIFEST_YAMLhas been updated to use the new centralized moduleairbyte_cdk.utils.connector_pathsinstead of the previous location. This aligns well with the PR objective of consolidating connector path resolution logic.airbyte_cdk/cli/airbyte_cdk/_connector.py (4)
104-109: CLI interface simplification looks great!Changing from two separate options to a single positional argument makes the CLI more intuitive and aligns perfectly with the PR objectives.
117-117: Function signature updated correctly.The parameter now accepts either a string or Path object, matching the CLI interface changes.
124-127: Clear and helpful documentation.The updated docstring clearly explains the new behavior, making it easy for users to understand how the command works.
137-137: Implementation update matches new approach.The call to
resolve_connector_name_and_directoryhas been correctly updated to use the simplified parameter.unit_tests/cli/airbyte_cdk/test_util.py (2)
11-25: Helper function looks good.The
check_local_repo_structurehelper function effectively validates the existence of the expected repository layout. This will help ensure the tests only run in the appropriate environment.
27-47: Well-structured test parametrization.The test cases cover a good variety of scenarios for repository root resolution, including both expected successes and failures. The skip condition is also well implemented to avoid running tests in environments without the required directory structure.
airbyte_cdk/cli/airbyte_cdk/_image.py (5)
14-14: Import path update looks good.The import has been correctly updated to use the new centralized location for connector path utilities.
31-36: CLI interface simplification is consistent.Using a single positional argument instead of two separate options is consistent with the changes in
_connector.pyand makes the CLI more intuitive.
39-43: New dockerfile override option is useful.The addition of the
--dockerfileoption provides flexibility for users who need to customize the build process, which is a nice enhancement.
53-54: Clear documentation of the new parameter behavior.The docstring clearly explains how the
CONNECTORargument works, making it easy for users to understand.
62-62: Implementation updated correctly.The call to
resolve_connector_name_and_directoryhas been correctly updated to use the simplified parameter.airbyte_cdk/cli/airbyte_cdk/_secrets.py (1)
105-107: Docstring update keeps CLI behaviour crystal-clear👍 Thanks for expanding the docstring to reflect the new positional argument – this will save users a lot of confusion.
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
airbyte_cdk/utils/connector_paths.py (2)
44-45: Should we use_is_airbyte_repo_rootfor parent dir validation as well?Unlike the sibling and child directory checks that use
_is_airbyte_repo_root, the parent directory check only verifies the directory name. This could lead to false positives if a directory happens to be named "airbyte" but doesn't have the expected structure.Wdyt about using the same validation here?
if parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise": return parent_dirif parent_dir.name == "airbyte" or parent_dir.name == "airbyte-enterprise": + if _is_airbyte_repo_root(parent_dir): return parent_dir
142-142: Undefined variable in error message - should beconnector_refThe variable
connectoris never defined in this scope. I believe you meant to useconnector_refhere.- f"Could not infer connector_name or connector_directory from input ref: {connector}", + f"Could not infer connector_name or connector_directory from input ref: {connector_ref}",
🧹 Nitpick comments (2)
airbyte_cdk/utils/connector_paths.py (2)
136-140: Potential redundancy in connector directory resolution logicThe code flow has already established
connector_directoryearlier, but then conditionally sets it again here. We might be able to simplify this logic.Wdyt about this approach?
- if connector_directory: - connector_directory = connector_directory.resolve().absolute() - elif connector_name: - connector_directory = find_connector_root_from_name(connector_name) - else: + if not connector_directory and connector_name: + connector_directory = find_connector_root_from_name(connector_name) + elif not connector_directory: raise ValueError( f"Could not infer connector_name or connector_directory from input ref: {connector_ref}", ) + connector_directory = connector_directory.resolve().absolute()
105-106: Path detection could be more robust for edge casesDetecting paths by the presence of "/" or "\" works for most cases, but might not handle all valid paths correctly. For example, network paths or URIs.
Wdyt about using
Path.is_absolute()or checking if the string can be converted to a valid path that exists?- if "/" in connector_ref or "\\" in connector_ref: + # Try to convert to Path and check if it exists + potential_path = Path(connector_ref) + if potential_path.exists() or "/" in connector_ref or "\\" in connector_ref: # If the connector name is a path, treat it as a directory connector_directory = Path(connector_ref)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
airbyte_cdk/utils/connector_paths.py(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (9)
- GitHub Check: Check: 'source-pokeapi' (skip=false)
- GitHub Check: Check: 'source-amplitude' (skip=false)
- GitHub Check: Check: 'source-shopify' (skip=false)
- GitHub Check: Check: 'source-hardcoded-records' (skip=false)
- GitHub Check: Pytest (All, Python 3.11, Ubuntu)
- GitHub Check: SDM Docker Image Build
- GitHub Check: Pytest (Fast)
- GitHub Check: Pytest (All, Python 3.10, Ubuntu)
- GitHub Check: Analyze (python)
🔇 Additional comments (4)
airbyte_cdk/utils/connector_paths.py (4)
1-10: Good job on setting up the module structure and constants!The constants defined for marker files provide a clear reference point for identifying connector directories. The imports are minimal and appropriate for the functionality.
80-146: Well-structured resolution logic for connector referencesThe function provides a flexible interface for resolving connector names and directories from various input types. The design allows users to specify either a name or path, with intelligent fallbacks.
174-193: Well-implemented connector root finderThe function efficiently searches for marker files across multiple paths and their parent directories, with a clean stopping condition when reaching "airbyte_cdk". This will make connector discovery reliable.
195-221: Good fallback logic for finding connectorsI like how this function first tries the direct approach, then falls back to locating via the Airbyte repo root with clear error messages at each step.
CONNECTOR positional arg in CLI commands
CONNECTOR positional arg in CLI commandsairbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands
airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commandsairbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands
airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commandsDockerfile locations, add capability to locate airbyte_repo_root, use multi-purpose CONNECTOR positional arg in CLI commands
David Gold (dbgold17)
left a comment
There was a problem hiding this comment.
Approving, with a discussion question that we can always figure out later.
Did you consider having the Dockerfiles live within the CDK and imported into the airbyte repo as needed instead of now having all of this logic to find the airbyte repo locally? The file path traversing and directory structure logic feels a bit magic / fragile, but is is also nice if it just works. Another alternative that I don't love is that we could have the cdk cli by default pull from Github unless the user specifies a path to the airbyte repo locally.
Aaron ("AJ") Steers (aaronsteers)
left a comment
There was a problem hiding this comment.
David Gold (@dbgold17) - Thanks again for your review here. Responded inline to your questions and we can keep these discussions going after the merge.
Regarding your question about where to house the Dockerfiles, I feel more strongly that putting them in the main repo is at least a good path, and as discussed the past couple days, it allows all teams to have a sense of ownership in those build processes.
I think it's okay for the CDK CLI to get files from that repo - and I don't mind much if we get from local refs vs defaulting to downloading from github. The only reason I didn't default to downloading from github in this iteration is that I wanted our default mode to not need network access. So, preferring local files, then falling back to a remote download seemed like the best first implementation for now, at least.
Thanks again for all your feedback and the very thoughtful comments!
d09c2ec
into
main
A follow-up to:
docker-imagesdirectory airbyte#59193The above PR migrated the Dockerfile and Dockerignore definitions into the
docker-imagesdirectory in theairbytehq/airbyterepo. This PR updates CDK logic to use those newly migrated files during docker image builds. The Dockerfile defs were originally hosted here in the CDK but now that they are migrated, this PR also removes them from the CDK repo. If we cannot locate them in the local airbyte repo checked out, then we download the text file contents frommasterbranch of the repo on GitHub.In addition, this PR does a few other things:
airbyte-enterpriserepo.airbyte-python-cdkandairbyteare sibling directories.--connector-nameand--connector-directorycan be ommitted from the first CLI arg.poetry run airbyte-cdk image build source-s3. The CDK CLI will find the airbyte repo and find the connector directory automatically.--connector-name/--connector-directory).image build,secrets fetch,secrets list, andconnector testSummary by CodeRabbit
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Tests
Chores