Azure: Support multiple storage credential prefixes#13241
Azure: Support multiple storage credential prefixes#13241ChaladiMohanVamsi wants to merge 7 commits into
Conversation
|
Yet to implement the test cases. |
|
@nastra @amogh-jahagirdar Can you please help with the review. |
amogh-jahagirdar
left a comment
There was a problem hiding this comment.
Thank you for the change @ChaladiMohanVamsi , I'm Still going through tests a bit more but left some comments
| return client(location).getFileClient(location.path()); | ||
| Preconditions.checkState( | ||
| null != prefixedADLSClient, | ||
| "[BUG] ADLS client-builder for storage path not available: %s", |
There was a problem hiding this comment.
the "[BUG]" callout doesn't really fit in the pattern of error messages in the project, can we just say "Cannot build ADLS cllient for path: %s"
There was a problem hiding this comment.
btw we're using the same logging pattern (which I added) in S3FileIO and GcsFileIO
There was a problem hiding this comment.
Keeping it consistent with other FileIO implementations as of now.
| public void multipleStorageCredentialsConfigured() { | ||
| StorageCredential adlsCredential1 = | ||
| StorageCredential.create( | ||
| abfsPath("custom-container", "account1", "/table1"), |
There was a problem hiding this comment.
Can we move some of these abfsPath values into separate variables; it'll reduce duplication and make it easier to verify these tests are doing what they should be
| testImplementation libs.testcontainers | ||
| testImplementation libs.mockserver.netty | ||
| testImplementation libs.mockserver.client.java | ||
| testImplementation(libs.hadoop3.common) { |
There was a problem hiding this comment.
Why did this need to change? what are we pulling from hadoop for the new tests?
There was a problem hiding this comment.
In few test cases where we initialise ResolvingFileIO, it is failing with missing HaddopConf dependency. Hence had include this in test scope.
| Optional.ofNullable(vendedAdlsCredentialProvider) | ||
| .map(p -> new VendedAzureSasCredentialPolicy(location.host(), p)) | ||
| .ifPresent(clientBuilder::addPolicy); |
There was a problem hiding this comment.
Nit: I know this was just copied over from what already existed but imo this logic would just be a bit clearer with plain old null check and assignment:
if (credentialProvider != null) {
clientBuilder.addPolicy(new VendedAzureSasCredentialPolicy(location.host(), credentialProvider)))
}
There was a problem hiding this comment.
Updated as per the suggestion.
| assertThat(fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table1"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/table2"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))); |
There was a problem hiding this comment.
I feel like this test isn't as strong as it could be; we're asserting that the clients for these paths are all different which is true but we're not asserting the actual client right?
There was a problem hiding this comment.
Unlike S3FileIO and GCSFileIO we are not caching the clients in ADLSFileIO, we are always creating a new client with reused HTTP client during any FileIO operation. I kept the behaviour same even with prefix based clients. Hence we are not validating client as they would be different for each file.
I am not completely sure of overall benefit we can get out of caching the clients as we are already reusing HTTP client. Wanted your inputs if we should go in this route of caching the clients similar to other FileIO implementations.
cc// @nastra
| assertThat( | ||
| fileIO.clientForStoragePath(abfsPath("custom-container", "account1", "/path/to/file"))) | ||
| .isNotSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))); | ||
|
|
||
| assertThat(fileIO.clientForStoragePath(abfsPath("my-container", "account1", "/path/to/file"))) | ||
| .isSameAs( | ||
| fileIO.clientForStoragePath(abfsPath("my-container2", "account2", "/path/to/file"))); | ||
|
|
There was a problem hiding this comment.
Same as what I mentioned somewhere else in this test, could we make sure some of these abfsPath(...) are moved to separate values and can we make sure these tests are strong as possible beyond just comparing if it's the same as another path.
| clientByPrefix.values().forEach(PrefixedADLSClient::close); | ||
| this.clientByPrefix = null; | ||
| } | ||
|
|
There was a problem hiding this comment.
we should keep the newline after }
# Conflicts: # azure/src/main/java/org/apache/iceberg/azure/adlsv2/ADLSFileIO.java
| return azureProperties; | ||
| } | ||
|
|
||
| public String storagePrefix() { |
There was a problem hiding this comment.
nit: maybe we can keep this package private until there's a good reason to make it public?
There was a problem hiding this comment.
Agree, made it package private.
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
Associated Issue : #13050
Changes
SupportsStorageCredentialsto pass storage credentials from LoadTableResponse to FileIOSimilar PRs for other FileIOs