Search before asking
Version
master (fe-foundation, org.apache.doris.foundation.util.PathUtils).
What's Wrong?
PathUtils.equalsIgnoreSchemeIfOneIsS3(p1, p2) compares two storage-location URIs treating the s3 scheme as interchangeable with other object-store schemes. Its two branches used inconsistent rules:
- Same scheme →
p1.equalsIgnoreCase(p2): full-string, case-insensitive, trailing slash significant.
- Cross-scheme (one is
s3) → compares normalize(authority)/normalize(path) with Objects.equals: case-sensitive, trailing slash stripped.
Consequences:
- The result for one URI depends on the other URI's scheme. For example
s3://bucket/path/ vs s3://bucket/path are unequal (same-scheme branch), but s3://bucket/path/ vs cos://bucket/path are equal (cross-scheme branch).
- The same-scheme branch ignores case for the whole string, so it can wrongly equate case-sensitive S3 object keys (
s3://b/A == s3://b/a).
The only caller is HMSTransaction.prepareInsertExistingTable (fe-core), which uses this to decide whether a Hive commit needs a rename — so inconsistent equality can lead to an incorrect rename decision.
What You Expected?
A single, consistent rule regardless of whether the two schemes match: compare authority + path (scheme ignored when equal or when one side is s3), with trailing slashes insignificant and the comparison case-sensitive (object-storage keys are case-sensitive).
How to Reproduce?
// same pair, different "other" scheme -> different answer (inconsistent):
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", "s3://bucket/path"); // false
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", "cos://bucket/path"); // true
// same-scheme comparison wrongly ignores case:
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/A", "s3://bucket/a"); // true (should be false)
Anything Else?
Fix proposed in the linked PR. It also hardens several edge cases surfaced during review (opaque URIs, percent-encoded slashes, triple-slash / network-path forms) by falling back to exact string comparison for inputs that are malformed for object storage.
Are you willing to submit PR?
Search before asking
Version
master (
fe-foundation,org.apache.doris.foundation.util.PathUtils).What's Wrong?
PathUtils.equalsIgnoreSchemeIfOneIsS3(p1, p2)compares two storage-location URIs treating thes3scheme as interchangeable with other object-store schemes. Its two branches used inconsistent rules:p1.equalsIgnoreCase(p2): full-string, case-insensitive, trailing slash significant.s3) → comparesnormalize(authority)/normalize(path)withObjects.equals: case-sensitive, trailing slash stripped.Consequences:
s3://bucket/path/vss3://bucket/pathare unequal (same-scheme branch), buts3://bucket/path/vscos://bucket/pathare equal (cross-scheme branch).s3://b/A==s3://b/a).The only caller is
HMSTransaction.prepareInsertExistingTable(fe-core), which uses this to decide whether a Hive commit needs a rename — so inconsistent equality can lead to an incorrect rename decision.What You Expected?
A single, consistent rule regardless of whether the two schemes match: compare authority + path (scheme ignored when equal or when one side is
s3), with trailing slashes insignificant and the comparison case-sensitive (object-storage keys are case-sensitive).How to Reproduce?
Anything Else?
Fix proposed in the linked PR. It also hardens several edge cases surfaced during review (opaque URIs, percent-encoded slashes, triple-slash / network-path forms) by falling back to exact string comparison for inputs that are malformed for object storage.
Are you willing to submit PR?