Centralize VRT capability validation (#2329)#2339
Conversation
Add `_vrt_validation.validate_parsed_vrt` as the single entry point for
auditing a parsed `VRTDataset` against every capability the read
pipeline does not honour. Both `read_vrt` (eager and chunked dispatch)
and `open_geotiff('foo.vrt')` call the validator before any source
decode, so the two entry points produce equivalent failures for the
same bad input. Previously the per-source SrcRect/DstRect, zero
pixel-size, and unsupported-resample checks fired mid-decode (and per
chunk task under chunked dispatch), so a malformed VRT could build a
dask graph successfully and blow up deep in a `compute()` chunk
function.
Validator rules: band count sanity, dtype kind compatibility, transform
orientation (rotated/sheared), pixel-size compatibility,
SrcRect/DstRect non-negativity and within-extent, and the supported
resampling set. The mixed-band-nodata case keeps its existing
`MixedBandMetadataError` typed-error contract by delegating to the
already-registered `_check_read_mixed_band_metadata` hook. The
rotated-transform and unparseable-CRS cases keep their existing typed
subclasses (`RotatedTransformError`, `UnparseableCRSError`) so
`except`-by-subclass callers stay green; the validator adds the
source-path to the message and lifts the check ahead of any decode.
The new capability checks raise `VRTUnsupportedError`, a
`GeoTIFFAmbiguousMetadataError` subclass (and therefore `ValueError`).
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Centralize VRT capability validation (#2329)
Read every changed file in full from the issue-2329 worktree. Findings below; the rest of the diff is fine.
Blockers
None.
Suggestions
-
Duplicate
GeoTIFFFallbackWarningon malformed CRS (_vrt_validation.py:246). The validator calls_wkt_to_epsg, which emits aGeoTIFFFallbackWarningon parse failure. The existing_check_read_unparseable_crsin_validation.py:1025usespyproj.CRS.from_user_inputdirectly with no warning. Under the new code, a malformed VRT CRS emits a fallback warning that the existing check did not. You can see it in the test output fortest_unparseable_crs_rejected_without_opt_in. Either swap tofrom_user_inputdirectly, or wrap the probe inwarnings.catch_warnings(). -
_looks_like_wktand_NEAREST_RESAMPLE_ALGSare duplicated instead of imported (_vrt_validation.py:52-77)._NEAREST_RESAMPLE_ALGSalready lives in_vrt.py:566and_looks_like_wktalready lives in_crs.py:35. The docstring justifies the copy by pointing at an import cycle, but a lazy import inside the validator body sidesteps the cycle without the drift risk. If someone addsNearest_Neighborto one set but not the other later, the bug will be hard to spot.
Nits
-
mode='read'parameter (_vrt_validation.py:84, 132-137). The only accepted value is'read'; the documented future write mode does not exist yet. Either drop the parameter until a write validator actually needs to share the entry point, or accept that every caller is going to passmode='read'forever. -
Rotated and unparseable-CRS branches shadow the registered hooks (
_vrt_validation.py:181-193, 240-258)._check_read_rotated_transformand_check_read_unparseable_crsstill run fromvalidate_read_metadataright after the validator, but the validator's branches now preempt them by raising first. That is intentional (to embed the source path in the message), but it is worth a comment in the docstring or invalidate_read_metadataso a future maintainer modifying the registered hook realises the hook is now reachable only when the validator passes the case (e.g.allow_rotated=True).
What looks good
- TDD order is correct: 17 negative tests added, validator written to make them pass, full geotiff suite stays at 5276 passing.
- The entry-point parity tests (
test_resample_parity_across_entry_points,test_rotated_parity_across_entry_points,test_zero_bands_parity_across_entry_points) compare both error type and message betweenread_vrtandopen_geotiff, which is the strongest form of the parity claim. - Chunked dispatch is wired (
test_unsupported_resample_chunked_raises_at_buildexercises it), so the "deep in a chunk function" failure mode the parent issue called out is closed. VRTUnsupportedErroris aGeoTIFFAmbiguousMetadataErrorsubclass and therefore aValueError, soexcept ValueErrorcallers keep working.- Existing typed errors keep their contracts: the validator raises
RotatedTransformError/MixedBandMetadataError/UnparseableCRSErrorfor cases that already had those types; only the new capability rules raiseVRTUnsupportedError.
Checklist
- Algorithm matches reference: not applicable (no algorithmic change).
- All implemented backends produce consistent results: validator runs at both eager and chunked dispatch.
- NaN handling: not applicable.
- Edge cases covered: zero bands, negative SrcRect/DstRect, zero pixel size, out-of-extent DstRect, unsupported resample, rotated transform, unparseable CRS, mixed-band nodata.
- Dask chunk boundaries: rejection happens at graph build, not in a chunk function.
- No premature materialization or unnecessary copies.
- Benchmark not needed (validation overhead is a few ms per parse).
- README feature matrix: not applicable (no new public API).
- Docstrings present and accurate.
…2329) - Replace ``_wkt_to_epsg`` (which emits ``GeoTIFFFallbackWarning`` on parse failure) with a direct ``pyproj.CRS.from_user_input`` probe so the validator's unparseable-CRS check matches the no-warning behaviour of the registered ``_check_read_unparseable_crs`` hook. - Lazy-import ``_NEAREST_RESAMPLE_ALGS`` from ``_vrt`` and ``_looks_like_wkt`` from ``_crs`` instead of hand-copying both constants. Sidesteps the import cycle without the drift risk that comes with maintaining two copies. - Add a Notes section to the ``validate_parsed_vrt`` docstring that flags the overlap with the registered ``validate_read_metadata`` hooks so a future maintainer modifying the rotated-transform or unparseable-CRS check knows the validator preempts them on the VRT path.
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review
Second pass after commit 8be5d84. Re-read the diff in the worktree.
Dispositions from the first review
- Fixed: duplicate
GeoTIFFFallbackWarningon malformed CRS. The validator now probes pyproj directly withfrom_user_input, matching the silent behaviour of the registered hook. Test output fortest_unparseable_crs_rejected_without_opt_inno longer emits the spurious warning. - Fixed: duplicated
_NEAREST_RESAMPLE_ALGSand_looks_like_wkt. The validator lazy-imports both from their canonical homes (_vrt._NEAREST_RESAMPLE_ALGSand_crs._looks_like_wkt). Drift risk gone. - Fixed: shadow on registered hooks. Added a Notes section to the validator docstring spelling out the overlap with
_check_read_rotated_transformand_check_read_unparseable_crsso a future maintainer editing either side knows the validator preempts them on the VRT path. - Dismissed (with reason): drop the
mode='read'parameter. The master plan (docs/superpowers/plans/2026-05-22-geotiff-vrt-release-hardening-2321.md) calls outvalidate_parsed_vrt(parsed, *, mode)as the canonical signature for the centralised entry point, with a write-mode follow-up implied. Dropping the parameter would diverge from the plan; keeping it costs a singlemode != 'read'check per call.
Remaining
None. Full geotiff suite stays at 5276 passing.
Checklist
- All first-pass blockers and suggestions addressed.
- Nit dismissals recorded with reason.
- Tests still pass on the eager and chunked paths.
Closes #2329. Part of #2321.
Summary
xrspatial/geotiff/_vrt_validation.pywith onevalidate_parsed_vrtentry point that audits an already-parsedVRTDatasetagainst every capability the read pipeline does not honour: band count, dtype kind, transform orientation, pixel-size, SrcRect/DstRect non-negativity and within-extent, the supported resampling set, plus the existing CRS / mixed-nodata policies.read_vrtdispatch in_backends/vrt.py.open_geotiff('foo.vrt')delegates toread_vrt, so the same validator runs there too. Direct and dispatched entry points produce equivalent failures for the same bad input.compute().VRTUnsupportedError(aGeoTIFFAmbiguousMetadataError/ValueErrorsubclass) carries the offending source path and field in its message. Existing typed errors (RotatedTransformError,MixedBandMetadataError,UnparseableCRSError) keep their contracts soexcept-by-subclass callers stay green; the validator delegates those cases to the existing registered hooks rather than swapping the type.Backend coverage
Test plan
test_vrt_validation_2321.py: one negative test per validator rule, plus entry-point parity tests assertingread_vrtandopen_geotiffraise the same type and message for the same input.