Skip to content

Feat: Add support for Snowflake key-pair authentication#681

Closed
nakamichi (nakamichiworks) wants to merge 4 commits into
airbytehq:mainfrom
nakamichiworks:feature/snowflake-keypair-auth
Closed

Feat: Add support for Snowflake key-pair authentication#681
nakamichi (nakamichiworks) wants to merge 4 commits into
airbytehq:mainfrom
nakamichiworks:feature/snowflake-keypair-auth

Conversation

@nakamichiworks

@nakamichiworks nakamichi (nakamichiworks) commented Jun 1, 2025

Copy link
Copy Markdown
Contributor

Fixes #654.

Summary by CodeRabbit

  • New Features
    • Added support for Snowflake authentication using private key files alongside password authentication.
  • Enhancements
    • Improved Snowflake connection setup to dynamically handle both password and key pair authentication methods.
    • Enhanced SQL connection configuration to include optional connection arguments.
  • Other Changes
    • Integrated cryptography dependency to support private key processing.
    • Updated Snowflake cache configuration examples to illustrate key-pair authentication usage.

@coderabbitai

coderabbitai Bot commented Jun 1, 2025

Copy link
Copy Markdown
Contributor
📝 Walkthrough

"""

Walkthrough

Support for key-pair authentication was added to the Snowflake connector by extending the configuration to accept private key files and related parameters. The connection logic now dynamically handles both password and key-pair authentication. Additionally, a new method for supplying SQLAlchemy connection arguments was introduced and integrated into the shared SQL processor base class.

Changes

File(s) Change Summary
airbyte/_processors/sql/snowflake.py Added key-pair authentication fields and logic to SnowflakeConfig; updated connection methods.
airbyte/shared/sql_processor.py Added get_sql_alchemy_connect_args method to SqlConfig and integrated it into engine creation.
pyproject.toml Added cryptography dependency for private key handling.
airbyte/caches/snowflake.py Added commented examples for key-pair authentication usage in Snowflake cache configuration.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant SnowflakeConfig
    participant SQLAlchemy
    participant SnowflakeClient

    User->>SnowflakeConfig: Initialize with config (password or private key)
    User->>SnowflakeConfig: get_vendor_client()
    alt Using password
        SnowflakeConfig->>SnowflakeClient: Connect with password
    else Using private key file
        SnowflakeConfig->>SnowflakeConfig: Load and serialize private key
        SnowflakeConfig->>SnowflakeClient: Connect with private key (JWT)
    end
    SnowflakeClient-->>User: Connection established
Loading

Assessment against linked issues

Objective Addressed Explanation
Support key-pair authentication for Snowflake cache (#654)
Maintain existing password authentication for backward compatibility (#654)
Dynamically construct connection using either password or key-pair (#654)

Assessment against linked issues: Out-of-scope changes

No out-of-scope changes were found.
"""

Would you like me to help with a review checklist or some testing suggestions to ensure the new key-pair authentication works smoothly? Wdyt?


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between df30b43 and 13a3b19.

📒 Files selected for processing (1)
  • airbyte/caches/snowflake.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • airbyte/caches/snowflake.py
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Pytest (No Creds)
  • GitHub Check: Pytest (Fast)
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

‼️ IMPORTANT
Auto-reply has been disabled for this repository in the CodeRabbit settings. The CodeRabbit bot will not respond to your replies unless it is explicitly tagged.

  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
airbyte/_processors/sql/snowflake.py (3)

1-229: Code formatting needs attention

The pipeline is failing because the file needs to be formatted. Would you mind running ruff format on this file to fix the formatting issues? This will ensure consistent code style across the project.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 11-11: Unable to import 'sqlalchemy'

(E0401)


[error] 12-12: Unable to import 'cryptography.hazmat.backends'

(E0401)


[error] 13-13: Unable to import 'cryptography.hazmat.primitives'

(E0401)


[error] 14-14: Unable to import 'overrides'

(E0401)


[error] 15-15: Unable to import 'pydantic'

(E0401)


[error] 16-16: Unable to import 'snowflake'

(E0401)


[error] 17-17: Unable to import 'snowflake.sqlalchemy'

(E0401)


[error] 18-18: Unable to import 'sqlalchemy'

(E0401)

🪛 GitHub Actions: Run Linters

[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.


66-84: Consider adding error handling for private key operations

The private key loading implementation looks solid and follows Snowflake's requirements. However, what happens if the private key file doesn't exist, is corrupted, or has incorrect permissions? Would it be helpful to wrap the file operations and cryptographic operations in a try-except block to provide more user-friendly error messages, wdyt?

Here's a suggestion for more robust error handling:

 def get_sql_alchemy_connect_args(self) -> dict[str, Any]:
     """Return the connect_args for key pair authentication."""
     if self.private_key_file is None:
         return {}
-    with Path(self.private_key_file).open("rb") as f:
-        private_key = serialization.load_pem_private_key(
-            f.read(),
-            password=self.private_key_file_pwd.encode("utf-8")
-            if self.private_key_file_pwd is not None
-            else None,
-            backend=default_backend(),
-        )
+    try:
+        with Path(self.private_key_file).open("rb") as f:
+            private_key = serialization.load_pem_private_key(
+                f.read(),
+                password=self.private_key_file_pwd.encode("utf-8")
+                if self.private_key_file_pwd is not None
+                else None,
+                backend=default_backend(),
+            )
+    except FileNotFoundError:
+        raise exc.PyAirbyteInputError(
+            message=f"Private key file not found: {self.private_key_file}"
+        )
+    except Exception as e:
+        raise exc.PyAirbyteInputError(
+            message=f"Failed to load private key: {str(e)}"
+        )
     private_key_bytes = private_key.private_bytes(
         encoding=serialization.Encoding.DER,
         format=serialization.PrivateFormat.PKCS8,

103-120: Dynamic configuration building looks good!

The refactored get_vendor_client method cleanly handles both authentication methods. The use of dictionary updates for conditional configuration is elegant. One small observation - should we validate that either password or private_key_file is provided (but not neither)? This could help catch configuration errors early, wdyt?

Consider adding validation:

 def get_vendor_client(self) -> object:
     """Return the Snowflake connection object."""
+    if not self.password and not self.private_key_file:
+        raise exc.PyAirbyteInputError(
+            message="Either password or private_key_file must be provided for authentication"
+        )
     connection_config = {
         "user": self.username,
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 647fd09 and ee6dfa5.

📒 Files selected for processing (2)
  • airbyte/_processors/sql/snowflake.py (3 hunks)
  • airbyte/shared/sql_processor.py (3 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
airbyte/shared/sql_processor.py (1)
airbyte/_processors/sql/snowflake.py (1)
  • get_sql_alchemy_connect_args (66-83)
🪛 Pylint (3.3.7)
airbyte/_processors/sql/snowflake.py

[error] 11-11: Unable to import 'sqlalchemy'

(E0401)


[error] 12-12: Unable to import 'cryptography.hazmat.backends'

(E0401)


[error] 13-13: Unable to import 'cryptography.hazmat.primitives'

(E0401)

🪛 GitHub Actions: Run Linters
airbyte/_processors/sql/snowflake.py

[error] 1-1: Ruff formatting check failed. File would be reformatted. Run 'ruff format' to fix code style issues.

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (3)
airbyte/shared/sql_processor.py (1)

12-12: Clean extensibility pattern for connection arguments!

The addition of get_sql_alchemy_connect_args as a base method that returns an empty dict is a solid design choice. It provides a clear extension point for subclasses while maintaining backward compatibility. The integration into get_sql_engine looks seamless.

Also applies to: 131-134, 144-144

airbyte/_processors/sql/snowflake.py (2)

41-43: Well-structured authentication options!

Making password optional and adding the private key fields creates a clean API for supporting both authentication methods. The use of SecretString for sensitive fields is a nice security touch.


96-98: Helpful inline documentation!

The comment explaining when password is absent adds clarity to the code. Nice touch!

Comment thread airbyte/_processors/sql/snowflake.py

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
airbyte/_processors/sql/snowflake.py (1)

12-13: The cryptography dependency issue was already identified.

This was flagged in previous reviews - the project needs cryptography added to its dependencies.

🧰 Tools
🪛 Pylint (3.3.7)

[error] 12-12: Unable to import 'cryptography.hazmat.backends'

(E0401)


[error] 13-13: Unable to import 'cryptography.hazmat.primitives'

(E0401)

🧹 Nitpick comments (1)
airbyte/_processors/sql/snowflake.py (1)

101-122: Solid implementation of dual authentication support!

The connection configuration logic correctly handles both authentication methods. One small suggestion - should we add a comment explaining the SNOWFLAKE_JWT authenticator for future maintainers, wdyt?

         if self.private_key_file:
             connection_config.update(
                 {
+                    # Use JWT authenticator for key-pair authentication
                     "authenticator": "SNOWFLAKE_JWT",
                     "private_key_file": str(self.private_key_file),
                 }
             )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ee6dfa5 and df30b43.

⛔ Files ignored due to path filters (1)
  • poetry.lock is excluded by !**/*.lock
📒 Files selected for processing (2)
  • airbyte/_processors/sql/snowflake.py (3 hunks)
  • pyproject.toml (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • pyproject.toml
🧰 Additional context used
🪛 Pylint (3.3.7)
airbyte/_processors/sql/snowflake.py

[error] 11-11: Unable to import 'sqlalchemy'

(E0401)


[error] 12-12: Unable to import 'cryptography.hazmat.backends'

(E0401)


[error] 13-13: Unable to import 'cryptography.hazmat.primitives'

(E0401)

⏰ Context from checks skipped due to timeout of 90000ms (1)
  • GitHub Check: Pytest (No Creds)
🔇 Additional comments (1)
airbyte/_processors/sql/snowflake.py (1)

85-99: Clean implementation of conditional password handling!

The logic for conditionally including the password in the URL is exactly right for supporting both authentication methods. Nice use of SecretString for security too!

Comment on lines +66 to +84
def get_sql_alchemy_connect_args(self) -> dict[str, Any]:
"""Return the connect_args for key pair authentication."""
if self.private_key_file is None:
return {}
with Path(self.private_key_file).open("rb") as f:
private_key = serialization.load_pem_private_key(
f.read(),
password=self.private_key_file_pwd.encode("utf-8")
if self.private_key_file_pwd is not None
else None,
backend=default_backend(),
)
private_key_bytes = private_key.private_bytes(
encoding=serialization.Encoding.DER,
format=serialization.PrivateFormat.PKCS8,
encryption_algorithm=serialization.NoEncryption(),
)
return {"private_key": private_key_bytes}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling and robustness for private key processing.

The private key loading logic looks solid! A few suggestions for making it more robust:

  1. Add error handling - File operations and key parsing could fail, wdyt about wrapping them in try/catch?
  2. Validate password encoding - The UTF-8 encoding assumption might not always hold
  3. Consider key format validation - Should we validate the key format before processing?
+    @overrides
+    def get_sql_alchemy_connect_args(self) -> dict[str, Any]:
+        """Return the connect_args for key pair authentication."""
+        if self.private_key_file is None:
+            return {}
+        
+        try:
+            with Path(self.private_key_file).open("rb") as f:
+                key_data = f.read()
+            
+            # Validate password encoding if provided
+            password_bytes = None
+            if self.private_key_file_pwd is not None:
+                try:
+                    password_bytes = self.private_key_file_pwd.encode("utf-8")
+                except UnicodeEncodeError as e:
+                    raise ValueError(f"Private key password contains invalid UTF-8 characters: {e}")
+            
+            private_key = serialization.load_pem_private_key(
+                key_data,
+                password=password_bytes,
+                backend=default_backend(),
+            )
+            
+            private_key_bytes = private_key.private_bytes(
+                encoding=serialization.Encoding.DER,
+                format=serialization.PrivateFormat.PKCS8,
+                encryption_algorithm=serialization.NoEncryption(),
+            )
+            return {"private_key": private_key_bytes}
+            
+        except FileNotFoundError:
+            raise ValueError(f"Private key file not found: {self.private_key_file}")
+        except ValueError as e:
+            if "Could not deserialize" in str(e):
+                raise ValueError(f"Invalid private key format or incorrect password: {e}")
+            raise
+        except Exception as e:
+            raise ValueError(f"Failed to load private key: {e}")

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In airbyte/_processors/sql/snowflake.py around lines 66 to 84, improve
robustness by wrapping the file opening and private key loading code in a
try/except block to catch and handle potential IO or parsing errors gracefully.
Validate that the password, if provided, can be safely encoded to UTF-8 before
using it, and consider adding a check to confirm the private key is in the
expected format before proceeding with serialization. This will make the method
more fault-tolerant and provide clearer error feedback.

Comment on lines +41 to +43
password: SecretString | None = None
private_key_file: str | Path | None = None
private_key_file_pwd: SecretString | None = None

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Consider adding validation for authentication method requirements.

The optional fields look good for supporting both auth methods! Should we add a model validator to ensure that either password or private_key_file is provided, but not necessarily both? This would catch configuration errors early, wdyt?

+    @pydantic.model_validator(mode='after')
+    def validate_auth_method(self):
+        if not self.password and not self.private_key_file:
+            raise ValueError("Either password or private_key_file must be provided")
+        return self
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
password: SecretString | None = None
private_key_file: str | Path | None = None
private_key_file_pwd: SecretString | None = None
password: SecretString | None = None
private_key_file: str | Path | None = None
private_key_file_pwd: SecretString | None = None
@pydantic.model_validator(mode='after')
def validate_auth_method(self):
if not self.password and not self.private_key_file:
raise ValueError("Either password or private_key_file must be provided")
return self
🤖 Prompt for AI Agents
In airbyte/_processors/sql/snowflake.py around lines 41 to 43, add a model
validator to ensure that either the password or private_key_file is provided,
but not both or neither. Implement a validation method that checks these fields
after initialization and raises a clear error if the condition is not met, to
catch configuration errors early.

@aaronsteers Aaron ("AJ") Steers (aaronsteers) Jun 1, 2025

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'll disagree with this suggestion. Probably best to defer deep validation of auth-type inputs until the auth is invoked. So, guard statements around the engine creation/auth would be fine, but I wouldn't necessarily put them into the model validation itself.

@aaronsteers

Aaron ("AJ") Steers (aaronsteers) commented Jun 1, 2025

Copy link
Copy Markdown
Member

/test-pr

PR test job started... Check job output.

✅ Tests passed.

@aaronsteers

Aaron ("AJ") Steers (aaronsteers) commented Jun 1, 2025

Copy link
Copy Markdown
Member

nakamichi (@nakamichiworks) - Thank you for this contribution! I've completed my initial prelim review. Within the PR are two important aspects, which I'll review independently...

  1. The private key implementation - this looks very strong. I didn't find any issues with the implementation on first pass through the code.
  2. The refactoring to accept get_sql_alchemy_connect_args() as a public member of SQLConfig, called by the default get_sql_engine(). This part I need to pause on to understand implications and make sure it is a pattern we can extend...

My experience in the past with SQLAlchemy is that it is very fuzzy line between what needs to be part of the URL and what can be provided as connector args. And often certain inputs can be provided either into the URL or the connect_args, and sometimes both work, with undefined behavior of which will win if both are provided.

For the reason above, it is simpler (although not necessarily better) if we keep this change local to the Snowflake implementation, renaming the implementation to (private/internal) _get_sql_alchemy_connect_args() with a "_" prefix, and not modifying the base class implementation. In this approach, the Snowlake implementation changes are hidden from the caller and not requiring base class changes. (Although they could easily later be extended to be global and public.) If you want to go this simpler route, I think we can pretty quickly approve and merge your contribution without needing to land on an ideal generic implementation.

That said, I think a case could be made that your implementation is better - and (IIRC) implementations would then more likely implement their solutions with a combination of overriding the URL and the connector_args (overriding these methods) and (hopefully?) not also needing to override get_sql_engine().

If you do choose this path, then I'd suggest to make this as ergonomic and generically applicable as possible. I would suggest expanded docstrings on these three methods; get_sql_alchemy_connect_args(), get_sql_engine(), and get_sql_alchemy_url().

For get_sql_alchemy_connect_args():

Gets the connect_args for creating a SQLAlchemy enginer.

By default, these arguments will be passed to get_sql_engine(),
combining with the URL path and args specified in get_sql_alchemy_url()
implementation.

For get_sql_alchemy_url():

Creates the SQLAlchemy URL for use when connecting to the database.

The SQLAlchemy URL will be sent to get_sql_engine(). These will
be combined with the inputs from get_sql_alchemy_connect_args()
implementation, if applicable.

For get_sql_engine():

Creates a SQLAlchemy engine object.

The default implementation will combine inputs from get_sql_alchemy_url()
and get_sql_alchemy_connect_args(), if implemented.

Above is just a first pass. Pardon any typos or mispellings. Also, if I'm misunderstanding how these would be applied, please feel free to fix/improve as needed. Docstrings create our docs, so getting these as intuitive and explanatory as possible will result in more maintainability in the future, and will help us and future contributors to know what is expected.

I hope the above is helpful. Feel free to push back or make alternative proposals. Thanks again!

@nakamichiworks

Copy link
Copy Markdown
Contributor Author

Thanks for detailed review!
I will check it in detail later.

@aaronsteers

Aaron ("AJ") Steers (aaronsteers) commented Jun 27, 2025

Copy link
Copy Markdown
Member

nakamichi (@nakamichiworks) - Thanks very much for this contribution. I'm happy to report that this feature has just been released in v0.27.0:

Thanks for all your work on this! Your PR set the foundation for this feature to be added and we're very grateful. 🙏

@nakamichiworks

Copy link
Copy Markdown
Contributor Author

Sorry for leaving this PR unfinished 🙇
Really glad someone picked it up and completed it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Support key-pair authentication for Snowflake cache

2 participants