Skip to content

[Bug] PathUtils.equalsIgnoreSchemeIfOneIsS3 path comparison is inconsistent between same-scheme and cross-scheme branches #64767

Description

@LuciferYang

Search before asking

  • I had searched in the issues and found no similar issues.

Version

master (fe-foundation, org.apache.doris.foundation.util.PathUtils).

What's Wrong?

PathUtils.equalsIgnoreSchemeIfOneIsS3(p1, p2) compares two storage-location URIs treating the s3 scheme as interchangeable with other object-store schemes. Its two branches used inconsistent rules:

  • Same schemep1.equalsIgnoreCase(p2): full-string, case-insensitive, trailing slash significant.
  • Cross-scheme (one is s3) → compares normalize(authority)/normalize(path) with Objects.equals: case-sensitive, trailing slash stripped.

Consequences:

  1. The result for one URI depends on the other URI's scheme. For example s3://bucket/path/ vs s3://bucket/path are unequal (same-scheme branch), but s3://bucket/path/ vs cos://bucket/path are equal (cross-scheme branch).
  2. The same-scheme branch ignores case for the whole string, so it can wrongly equate case-sensitive S3 object keys (s3://b/A == s3://b/a).

The only caller is HMSTransaction.prepareInsertExistingTable (fe-core), which uses this to decide whether a Hive commit needs a rename — so inconsistent equality can lead to an incorrect rename decision.

What You Expected?

A single, consistent rule regardless of whether the two schemes match: compare authority + path (scheme ignored when equal or when one side is s3), with trailing slashes insignificant and the comparison case-sensitive (object-storage keys are case-sensitive).

How to Reproduce?

// same pair, different "other" scheme -> different answer (inconsistent):
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", "s3://bucket/path");   // false
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/path/", "cos://bucket/path");  // true

// same-scheme comparison wrongly ignores case:
PathUtils.equalsIgnoreSchemeIfOneIsS3("s3://bucket/A", "s3://bucket/a");          // true (should be false)

Anything Else?

Fix proposed in the linked PR. It also hardens several edge cases surfaced during review (opaque URIs, percent-encoded slashes, triple-slash / network-path forms) by falling back to exact string comparison for inputs that are malformed for object storage.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions