Skip to content

COG range coalescing can turn safe tile reads into huge over-fetches #2266

@brendancol

Description

@brendancol

Describe the bug

coalesce_ranges in xrspatial/geotiff/_sources.py merges any two adjacent byte ranges whose gap is at most COALESCE_GAP_THRESHOLD_DEFAULT (1 MiB). There is no upper bound on the merged range length and no cap on the total slack accumulated across a chain of merges.

The per-tile cap at xrspatial/geotiff/_cog_http.py:883 rejects individual tiles that declare a TileByteCount larger than MAX_TILE_BYTES_DEFAULT (256 MiB by default). After that check, the full per-tile fetch list is handed to read_ranges_coalesced at xrspatial/geotiff/_cog_http.py:914, which calls coalesce_ranges and issues one GET per merged range.

A malformed or hostile COG can supply a tile table where every individual TileByteCount passes the per-tile cap (say, 1 KB each), but the tile offsets sit just under 1 MiB apart. Each adjacent pair fits the merge predicate, so the coalescer chains them all into one merged range whose length is roughly num_tiles * 1 MiB. For a 4096-tile COG that is a single ~4 GiB HTTP / cloud GET, even though each tile individually passed the cap.

This affects both the HTTP path (_HTTPSource.read_ranges_coalesced) and the cloud / fsspec path (_FSSpecSource.read_ranges_coalesced).

Expected behavior

The coalescer should refuse to merge ranges when the resulting merged-range length would exceed a configurable cap (and/or when the accumulated slack between merged members would exceed a cap). When the cap is hit, seal the current merged range and start a new one. The cap should follow the existing tile-cap pattern: a module-level default with an env-var override.

Reproduction

from xrspatial.geotiff._sources import coalesce_ranges

# 4096 tiles, each 1 KB compressed, but offsets spaced 1 MiB apart.
# Every individual range passes a 256 MiB per-tile cap.
ranges = [(i * (1 << 20), 1024) for i in range(4096)]
merged, mapping = coalesce_ranges(ranges)
assert len(merged) == 1
merged_len = merged[0][1]
print(f'merged length: {merged_len / (1 << 30):.2f} GiB')  # ~4.00 GiB

A single merged range that large drives an unbounded HTTP / cloud read even though each tile individually was tiny.

Additional context

  • xrspatial/geotiff/_cog_http.py:883 -- per-tile cap (max_tile_bytes), enforced before coalescing.
  • xrspatial/geotiff/_cog_http.py:914 -- call site for read_ranges_coalesced.
  • xrspatial/geotiff/_sources.py:547 -- COALESCE_GAP_THRESHOLD_DEFAULT = 1 << 20.
  • xrspatial/geotiff/_sources.py:597 -- the merge predicate; only checks gap <= gap_threshold.

Suggested fix

Add a max_coalesced_range_bytes parameter (and matching XRSPATIAL_COG_MAX_COALESCED_RANGE_BYTES env var) to coalesce_ranges. Before extending the current merged range, also check that the resulting new_end - cur_start would not exceed the cap; if it would, seal the current merged range and start a new one. Propagate the parameter through read_ranges_coalesced so the COG HTTP and fsspec call sites pick up the same cap. Default: reuse MAX_TILE_BYTES_DEFAULT (256 MiB).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinginfrastructureCI, benchmarks, and tooling

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions