Feat: Add support for Snowflake key-pair authentication#681
Feat: Add support for Snowflake key-pair authentication#681nakamichi (nakamichiworks) wants to merge 4 commits into
Conversation
📝 Walkthrough""" WalkthroughSupport for key-pair authentication was added to the Snowflake connector by extending the configuration to accept private key files and related parameters. The connection logic now dynamically handles both password and key-pair authentication. Additionally, a new method for supplying SQLAlchemy connection arguments was introduced and integrated into the shared SQL processor base class. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant SnowflakeConfig
participant SQLAlchemy
participant SnowflakeClient
User->>SnowflakeConfig: Initialize with config (password or private key)
User->>SnowflakeConfig: get_vendor_client()
alt Using password
SnowflakeConfig->>SnowflakeClient: Connect with password
else Using private key file
SnowflakeConfig->>SnowflakeConfig: Load and serialize private key
SnowflakeConfig->>SnowflakeClient: Connect with private key (JWT)
end
SnowflakeClient-->>User: Connection established
Assessment against linked issues
Assessment against linked issues: Out-of-scope changesNo out-of-scope changes were found. Would you like me to help with a review checklist or some testing suggestions to ensure the new key-pair authentication works smoothly? Wdyt? 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (2)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
airbyte/_processors/sql/snowflake.py (3)
1-229: Code formatting needs attentionThe pipeline is failing because the file needs to be formatted. Would you mind running
ruff formaton this file to fix the formatting issues? This will ensure consistent code style across the project.🧰 Tools
🪛 Pylint (3.3.7)
[error] 11-11: Unable to import 'sqlalchemy'
(E0401)
[error] 12-12: Unable to import 'cryptography.hazmat.backends'
(E0401)
[error] 13-13: Unable to import 'cryptography.hazmat.primitives'
(E0401)
[error] 14-14: Unable to import 'overrides'
(E0401)
[error] 15-15: Unable to import 'pydantic'
(E0401)
[error] 16-16: Unable to import 'snowflake'
(E0401)
[error] 17-17: Unable to import 'snowflake.sqlalchemy'
(E0401)
[error] 18-18: Unable to import 'sqlalchemy'
(E0401)
🪛 GitHub Actions: Run Linters
[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
66-84: Consider adding error handling for private key operationsThe private key loading implementation looks solid and follows Snowflake's requirements. However, what happens if the private key file doesn't exist, is corrupted, or has incorrect permissions? Would it be helpful to wrap the file operations and cryptographic operations in a try-except block to provide more user-friendly error messages, wdyt?
Here's a suggestion for more robust error handling:
def get_sql_alchemy_connect_args(self) -> dict[str, Any]: """Return the connect_args for key pair authentication.""" if self.private_key_file is None: return {} - with Path(self.private_key_file).open("rb") as f: - private_key = serialization.load_pem_private_key( - f.read(), - password=self.private_key_file_pwd.encode("utf-8") - if self.private_key_file_pwd is not None - else None, - backend=default_backend(), - ) + try: + with Path(self.private_key_file).open("rb") as f: + private_key = serialization.load_pem_private_key( + f.read(), + password=self.private_key_file_pwd.encode("utf-8") + if self.private_key_file_pwd is not None + else None, + backend=default_backend(), + ) + except FileNotFoundError: + raise exc.PyAirbyteInputError( + message=f"Private key file not found: {self.private_key_file}" + ) + except Exception as e: + raise exc.PyAirbyteInputError( + message=f"Failed to load private key: {str(e)}" + ) private_key_bytes = private_key.private_bytes( encoding=serialization.Encoding.DER, format=serialization.PrivateFormat.PKCS8,
103-120: Dynamic configuration building looks good!The refactored
get_vendor_clientmethod cleanly handles both authentication methods. The use of dictionary updates for conditional configuration is elegant. One small observation - should we validate that eitherpasswordorprivate_key_fileis provided (but not neither)? This could help catch configuration errors early, wdyt?Consider adding validation:
def get_vendor_client(self) -> object: """Return the Snowflake connection object.""" + if not self.password and not self.private_key_file: + raise exc.PyAirbyteInputError( + message="Either password or private_key_file must be provided for authentication" + ) connection_config = { "user": self.username,
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
airbyte/_processors/sql/snowflake.py(3 hunks)airbyte/shared/sql_processor.py(3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/shared/sql_processor.py (1)
airbyte/_processors/sql/snowflake.py (1)
get_sql_alchemy_connect_args(66-83)
🪛 Pylint (3.3.7)
airbyte/_processors/sql/snowflake.py
[error] 11-11: Unable to import 'sqlalchemy'
(E0401)
[error] 12-12: Unable to import 'cryptography.hazmat.backends'
(E0401)
[error] 13-13: Unable to import 'cryptography.hazmat.primitives'
(E0401)
🪛 GitHub Actions: Run Linters
airbyte/_processors/sql/snowflake.py
[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
airbyte/shared/sql_processor.py (1)
12-12: Clean extensibility pattern for connection arguments!The addition of
get_sql_alchemy_connect_argsas a base method that returns an empty dict is a solid design choice. It provides a clear extension point for subclasses while maintaining backward compatibility. The integration intoget_sql_enginelooks seamless.Also applies to: 131-134, 144-144
airbyte/_processors/sql/snowflake.py (2)
41-43: Well-structured authentication options!Making
passwordoptional and adding the private key fields creates a clean API for supporting both authentication methods. The use ofSecretStringfor sensitive fields is a nice security touch.
96-98: Helpful inline documentation!The comment explaining when password is absent adds clarity to the code. Nice touch!
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
airbyte/_processors/sql/snowflake.py (1)
12-13: The cryptography dependency issue was already identified.This was flagged in previous reviews - the project needs
cryptographyadded to its dependencies.🧰 Tools
🪛 Pylint (3.3.7)
[error] 12-12: Unable to import 'cryptography.hazmat.backends'
(E0401)
[error] 13-13: Unable to import 'cryptography.hazmat.primitives'
(E0401)
🧹 Nitpick comments (1)
airbyte/_processors/sql/snowflake.py (1)
101-122: Solid implementation of dual authentication support!The connection configuration logic correctly handles both authentication methods. One small suggestion - should we add a comment explaining the
SNOWFLAKE_JWTauthenticator for future maintainers, wdyt?if self.private_key_file: connection_config.update( { + # Use JWT authenticator for key-pair authentication "authenticator": "SNOWFLAKE_JWT", "private_key_file": str(self.private_key_file), } )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
poetry.lockis excluded by!**/*.lock
📒 Files selected for processing (2)
airbyte/_processors/sql/snowflake.py(3 hunks)pyproject.toml(1 hunks)
✅ Files skipped from review due to trivial changes (1)
- pyproject.toml
🧰 Additional context used
🪛 Pylint (3.3.7)
airbyte/_processors/sql/snowflake.py
[error] 11-11: Unable to import 'sqlalchemy'
(E0401)
[error] 12-12: Unable to import 'cryptography.hazmat.backends'
(E0401)
[error] 13-13: Unable to import 'cryptography.hazmat.primitives'
(E0401)
⏰ Context from checks skipped due to timeout of 90000ms (1)
- GitHub Check: Pytest (No Creds)
🔇 Additional comments (1)
airbyte/_processors/sql/snowflake.py (1)
85-99: Clean implementation of conditional password handling!The logic for conditionally including the password in the URL is exactly right for supporting both authentication methods. Nice use of SecretString for security too!
| def get_sql_alchemy_connect_args(self) -> dict[str, Any]: | ||
| """Return the connect_args for key pair authentication.""" | ||
| if self.private_key_file is None: | ||
| return {} | ||
| with Path(self.private_key_file).open("rb") as f: | ||
| private_key = serialization.load_pem_private_key( | ||
| f.read(), | ||
| password=self.private_key_file_pwd.encode("utf-8") | ||
| if self.private_key_file_pwd is not None | ||
| else None, | ||
| backend=default_backend(), | ||
| ) | ||
| private_key_bytes = private_key.private_bytes( | ||
| encoding=serialization.Encoding.DER, | ||
| format=serialization.PrivateFormat.PKCS8, | ||
| encryption_algorithm=serialization.NoEncryption(), | ||
| ) | ||
| return {"private_key": private_key_bytes} | ||
|
|
There was a problem hiding this comment.
🛠️ Refactor suggestion
Improve error handling and robustness for private key processing.
The private key loading logic looks solid! A few suggestions for making it more robust:
- Add error handling - File operations and key parsing could fail, wdyt about wrapping them in try/catch?
- Validate password encoding - The UTF-8 encoding assumption might not always hold
- Consider key format validation - Should we validate the key format before processing?
+ @overrides
+ def get_sql_alchemy_connect_args(self) -> dict[str, Any]:
+ """Return the connect_args for key pair authentication."""
+ if self.private_key_file is None:
+ return {}
+
+ try:
+ with Path(self.private_key_file).open("rb") as f:
+ key_data = f.read()
+
+ # Validate password encoding if provided
+ password_bytes = None
+ if self.private_key_file_pwd is not None:
+ try:
+ password_bytes = self.private_key_file_pwd.encode("utf-8")
+ except UnicodeEncodeError as e:
+ raise ValueError(f"Private key password contains invalid UTF-8 characters: {e}")
+
+ private_key = serialization.load_pem_private_key(
+ key_data,
+ password=password_bytes,
+ backend=default_backend(),
+ )
+
+ private_key_bytes = private_key.private_bytes(
+ encoding=serialization.Encoding.DER,
+ format=serialization.PrivateFormat.PKCS8,
+ encryption_algorithm=serialization.NoEncryption(),
+ )
+ return {"private_key": private_key_bytes}
+
+ except FileNotFoundError:
+ raise ValueError(f"Private key file not found: {self.private_key_file}")
+ except ValueError as e:
+ if "Could not deserialize" in str(e):
+ raise ValueError(f"Invalid private key format or incorrect password: {e}")
+ raise
+ except Exception as e:
+ raise ValueError(f"Failed to load private key: {e}")Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In airbyte/_processors/sql/snowflake.py around lines 66 to 84, improve
robustness by wrapping the file opening and private key loading code in a
try/except block to catch and handle potential IO or parsing errors gracefully.
Validate that the password, if provided, can be safely encoded to UTF-8 before
using it, and consider adding a check to confirm the private key is in the
expected format before proceeding with serialization. This will make the method
more fault-tolerant and provide clearer error feedback.
| password: SecretString | None = None | ||
| private_key_file: str | Path | None = None | ||
| private_key_file_pwd: SecretString | None = None |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider adding validation for authentication method requirements.
The optional fields look good for supporting both auth methods! Should we add a model validator to ensure that either password or private_key_file is provided, but not necessarily both? This would catch configuration errors early, wdyt?
+ @pydantic.model_validator(mode='after')
+ def validate_auth_method(self):
+ if not self.password and not self.private_key_file:
+ raise ValueError("Either password or private_key_file must be provided")
+ return self📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| password: SecretString | None = None | |
| private_key_file: str | Path | None = None | |
| private_key_file_pwd: SecretString | None = None | |
| password: SecretString | None = None | |
| private_key_file: str | Path | None = None | |
| private_key_file_pwd: SecretString | None = None | |
| @pydantic.model_validator(mode='after') | |
| def validate_auth_method(self): | |
| if not self.password and not self.private_key_file: | |
| raise ValueError("Either password or private_key_file must be provided") | |
| return self |
🤖 Prompt for AI Agents
In airbyte/_processors/sql/snowflake.py around lines 41 to 43, add a model
validator to ensure that either the password or private_key_file is provided,
but not both or neither. Implement a validation method that checks these fields
after initialization and raises a clear error if the condition is not met, to
catch configuration errors early.
There was a problem hiding this comment.
I think I'll disagree with this suggestion. Probably best to defer deep validation of auth-type inputs until the auth is invoked. So, guard statements around the engine creation/auth would be fine, but I wouldn't necessarily put them into the model validation itself.
|
/test-pr
|
|
nakamichi (@nakamichiworks) - Thank you for this contribution! I've completed my initial prelim review. Within the PR are two important aspects, which I'll review independently...
My experience in the past with SQLAlchemy is that it is very fuzzy line between what needs to be part of the URL and what can be provided as connector args. And often certain inputs can be provided either into the URL or the For the reason above, it is simpler (although not necessarily better) if we keep this change local to the Snowflake implementation, renaming the implementation to (private/internal) That said, I think a case could be made that your implementation is better - and (IIRC) implementations would then more likely implement their solutions with a combination of overriding the URL and the connector_args (overriding these methods) and (hopefully?) not also needing to override If you do choose this path, then I'd suggest to make this as ergonomic and generically applicable as possible. I would suggest expanded docstrings on these three methods; For
For
For
Above is just a first pass. Pardon any typos or mispellings. Also, if I'm misunderstanding how these would be applied, please feel free to fix/improve as needed. Docstrings create our docs, so getting these as intuitive and explanatory as possible will result in more maintainability in the future, and will help us and future contributors to know what is expected. I hope the above is helpful. Feel free to push back or make alternative proposals. Thanks again! |
|
Thanks for detailed review! |
|
nakamichi (@nakamichiworks) - Thanks very much for this contribution. I'm happy to report that this feature has just been released in Thanks for all your work on this! Your PR set the foundation for this feature to be added and we're very grateful. 🙏 |
|
Sorry for leaving this PR unfinished 🙇 |
Fixes #654.
Summary by CodeRabbit