Describe the bug
coalesce_ranges in xrspatial/geotiff/_sources.py merges any two adjacent byte ranges whose gap is at most COALESCE_GAP_THRESHOLD_DEFAULT (1 MiB). There is no upper bound on the merged range length and no cap on the total slack accumulated across a chain of merges.
The per-tile cap at xrspatial/geotiff/_cog_http.py:883 rejects individual tiles that declare a TileByteCount larger than MAX_TILE_BYTES_DEFAULT (256 MiB by default). After that check, the full per-tile fetch list is handed to read_ranges_coalesced at xrspatial/geotiff/_cog_http.py:914, which calls coalesce_ranges and issues one GET per merged range.
A malformed or hostile COG can supply a tile table where every individual TileByteCount passes the per-tile cap (say, 1 KB each), but the tile offsets sit just under 1 MiB apart. Each adjacent pair fits the merge predicate, so the coalescer chains them all into one merged range whose length is roughly num_tiles * 1 MiB. For a 4096-tile COG that is a single ~4 GiB HTTP / cloud GET, even though each tile individually passed the cap.
This affects both the HTTP path (_HTTPSource.read_ranges_coalesced) and the cloud / fsspec path (_FSSpecSource.read_ranges_coalesced).
Expected behavior
The coalescer should refuse to merge ranges when the resulting merged-range length would exceed a configurable cap (and/or when the accumulated slack between merged members would exceed a cap). When the cap is hit, seal the current merged range and start a new one. The cap should follow the existing tile-cap pattern: a module-level default with an env-var override.
Reproduction
from xrspatial.geotiff._sources import coalesce_ranges
# 4096 tiles, each 1 KB compressed, but offsets spaced 1 MiB apart.
# Every individual range passes a 256 MiB per-tile cap.
ranges = [(i * (1 << 20), 1024) for i in range(4096)]
merged, mapping = coalesce_ranges(ranges)
assert len(merged) == 1
merged_len = merged[0][1]
print(f'merged length: {merged_len / (1 << 30):.2f} GiB') # ~4.00 GiB
A single merged range that large drives an unbounded HTTP / cloud read even though each tile individually was tiny.
Additional context
xrspatial/geotiff/_cog_http.py:883 -- per-tile cap (max_tile_bytes), enforced before coalescing.
xrspatial/geotiff/_cog_http.py:914 -- call site for read_ranges_coalesced.
xrspatial/geotiff/_sources.py:547 -- COALESCE_GAP_THRESHOLD_DEFAULT = 1 << 20.
xrspatial/geotiff/_sources.py:597 -- the merge predicate; only checks gap <= gap_threshold.
Suggested fix
Add a max_coalesced_range_bytes parameter (and matching XRSPATIAL_COG_MAX_COALESCED_RANGE_BYTES env var) to coalesce_ranges. Before extending the current merged range, also check that the resulting new_end - cur_start would not exceed the cap; if it would, seal the current merged range and start a new one. Propagate the parameter through read_ranges_coalesced so the COG HTTP and fsspec call sites pick up the same cap. Default: reuse MAX_TILE_BYTES_DEFAULT (256 MiB).
Describe the bug
coalesce_rangesinxrspatial/geotiff/_sources.pymerges any two adjacent byte ranges whose gap is at mostCOALESCE_GAP_THRESHOLD_DEFAULT(1 MiB). There is no upper bound on the merged range length and no cap on the total slack accumulated across a chain of merges.The per-tile cap at
xrspatial/geotiff/_cog_http.py:883rejects individual tiles that declare aTileByteCountlarger thanMAX_TILE_BYTES_DEFAULT(256 MiB by default). After that check, the full per-tile fetch list is handed toread_ranges_coalescedatxrspatial/geotiff/_cog_http.py:914, which callscoalesce_rangesand issues one GET per merged range.A malformed or hostile COG can supply a tile table where every individual
TileByteCountpasses the per-tile cap (say, 1 KB each), but the tile offsets sit just under 1 MiB apart. Each adjacent pair fits the merge predicate, so the coalescer chains them all into one merged range whose length is roughlynum_tiles * 1 MiB. For a 4096-tile COG that is a single ~4 GiB HTTP / cloud GET, even though each tile individually passed the cap.This affects both the HTTP path (
_HTTPSource.read_ranges_coalesced) and the cloud / fsspec path (_FSSpecSource.read_ranges_coalesced).Expected behavior
The coalescer should refuse to merge ranges when the resulting merged-range length would exceed a configurable cap (and/or when the accumulated slack between merged members would exceed a cap). When the cap is hit, seal the current merged range and start a new one. The cap should follow the existing tile-cap pattern: a module-level default with an env-var override.
Reproduction
A single merged range that large drives an unbounded HTTP / cloud read even though each tile individually was tiny.
Additional context
xrspatial/geotiff/_cog_http.py:883-- per-tile cap (max_tile_bytes), enforced before coalescing.xrspatial/geotiff/_cog_http.py:914-- call site forread_ranges_coalesced.xrspatial/geotiff/_sources.py:547--COALESCE_GAP_THRESHOLD_DEFAULT = 1 << 20.xrspatial/geotiff/_sources.py:597-- the merge predicate; only checksgap <= gap_threshold.Suggested fix
Add a
max_coalesced_range_bytesparameter (and matchingXRSPATIAL_COG_MAX_COALESCED_RANGE_BYTESenv var) tocoalesce_ranges. Before extending the current merged range, also check that the resultingnew_end - cur_startwould not exceed the cap; if it would, seal the current merged range and start a new one. Propagate the parameter throughread_ranges_coalescedso the COG HTTP and fsspec call sites pick up the same cap. Default: reuseMAX_TILE_BYTES_DEFAULT(256 MiB).