Skip to content

Centralize VRT capability validation (#2329)#2339

Open
brendancol wants to merge 2 commits into
mainfrom
issue-2329
Open

Centralize VRT capability validation (#2329)#2339
brendancol wants to merge 2 commits into
mainfrom
issue-2329

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2329. Part of #2321.

Summary

  • New xrspatial/geotiff/_vrt_validation.py with one validate_parsed_vrt entry point that audits an already-parsed VRTDataset against every capability the read pipeline does not honour: band count, dtype kind, transform orientation, pixel-size, SrcRect/DstRect non-negativity and within-extent, the supported resampling set, plus the existing CRS / mixed-nodata policies.
  • Wired into both eager and chunked read_vrt dispatch in _backends/vrt.py. open_geotiff('foo.vrt') delegates to read_vrt, so the same validator runs there too. Direct and dispatched entry points produce equivalent failures for the same bad input.
  • The rejection lives ahead of any source decode. Previously some checks fired mid-decode (and per chunk task under chunked dispatch), so a malformed VRT could build a dask graph and then blow up deep inside compute().
  • New VRTUnsupportedError (a GeoTIFFAmbiguousMetadataError / ValueError subclass) carries the offending source path and field in its message. Existing typed errors (RotatedTransformError, MixedBandMetadataError, UnparseableCRSError) keep their contracts so except-by-subclass callers stay green; the validator delegates those cases to the existing registered hooks rather than swapping the type.

Backend coverage

  • numpy: yes (eager read path)
  • dask+numpy: yes (chunked dispatcher)
  • cupy: not applicable (VRT reads do not go through the GPU decoder pipeline)
  • dask+cupy: not applicable

Test plan

  • 17 new tests in test_vrt_validation_2321.py: one negative test per validator rule, plus entry-point parity tests asserting read_vrt and open_geotiff raise the same type and message for the same input.
  • Full geotiff suite: 5276 passed, 68 skipped.
  • Chunked dispatch: unsupported VRT raises at graph build, not inside a chunk function.

Add `_vrt_validation.validate_parsed_vrt` as the single entry point for
auditing a parsed `VRTDataset` against every capability the read
pipeline does not honour. Both `read_vrt` (eager and chunked dispatch)
and `open_geotiff('foo.vrt')` call the validator before any source
decode, so the two entry points produce equivalent failures for the
same bad input. Previously the per-source SrcRect/DstRect, zero
pixel-size, and unsupported-resample checks fired mid-decode (and per
chunk task under chunked dispatch), so a malformed VRT could build a
dask graph successfully and blow up deep in a `compute()` chunk
function.

Validator rules: band count sanity, dtype kind compatibility, transform
orientation (rotated/sheared), pixel-size compatibility,
SrcRect/DstRect non-negativity and within-extent, and the supported
resampling set. The mixed-band-nodata case keeps its existing
`MixedBandMetadataError` typed-error contract by delegating to the
already-registered `_check_read_mixed_band_metadata` hook. The
rotated-transform and unparseable-CRS cases keep their existing typed
subclasses (`RotatedTransformError`, `UnparseableCRSError`) so
`except`-by-subclass callers stay green; the validator adds the
source-path to the message and lifts the check ahead of any decode.
The new capability checks raise `VRTUnsupportedError`, a
`GeoTIFFAmbiguousMetadataError` subclass (and therefore `ValueError`).
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label May 23, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Centralize VRT capability validation (#2329)

Read every changed file in full from the issue-2329 worktree. Findings below; the rest of the diff is fine.

Blockers

None.

Suggestions

  • Duplicate GeoTIFFFallbackWarning on malformed CRS (_vrt_validation.py:246). The validator calls _wkt_to_epsg, which emits a GeoTIFFFallbackWarning on parse failure. The existing _check_read_unparseable_crs in _validation.py:1025 uses pyproj.CRS.from_user_input directly with no warning. Under the new code, a malformed VRT CRS emits a fallback warning that the existing check did not. You can see it in the test output for test_unparseable_crs_rejected_without_opt_in. Either swap to from_user_input directly, or wrap the probe in warnings.catch_warnings().

  • _looks_like_wkt and _NEAREST_RESAMPLE_ALGS are duplicated instead of imported (_vrt_validation.py:52-77). _NEAREST_RESAMPLE_ALGS already lives in _vrt.py:566 and _looks_like_wkt already lives in _crs.py:35. The docstring justifies the copy by pointing at an import cycle, but a lazy import inside the validator body sidesteps the cycle without the drift risk. If someone adds Nearest_Neighbor to one set but not the other later, the bug will be hard to spot.

Nits

  • mode='read' parameter (_vrt_validation.py:84, 132-137). The only accepted value is 'read'; the documented future write mode does not exist yet. Either drop the parameter until a write validator actually needs to share the entry point, or accept that every caller is going to pass mode='read' forever.

  • Rotated and unparseable-CRS branches shadow the registered hooks (_vrt_validation.py:181-193, 240-258). _check_read_rotated_transform and _check_read_unparseable_crs still run from validate_read_metadata right after the validator, but the validator's branches now preempt them by raising first. That is intentional (to embed the source path in the message), but it is worth a comment in the docstring or in validate_read_metadata so a future maintainer modifying the registered hook realises the hook is now reachable only when the validator passes the case (e.g. allow_rotated=True).

What looks good

  • TDD order is correct: 17 negative tests added, validator written to make them pass, full geotiff suite stays at 5276 passing.
  • The entry-point parity tests (test_resample_parity_across_entry_points, test_rotated_parity_across_entry_points, test_zero_bands_parity_across_entry_points) compare both error type and message between read_vrt and open_geotiff, which is the strongest form of the parity claim.
  • Chunked dispatch is wired (test_unsupported_resample_chunked_raises_at_build exercises it), so the "deep in a chunk function" failure mode the parent issue called out is closed.
  • VRTUnsupportedError is a GeoTIFFAmbiguousMetadataError subclass and therefore a ValueError, so except ValueError callers keep working.
  • Existing typed errors keep their contracts: the validator raises RotatedTransformError / MixedBandMetadataError / UnparseableCRSError for cases that already had those types; only the new capability rules raise VRTUnsupportedError.

Checklist

  • Algorithm matches reference: not applicable (no algorithmic change).
  • All implemented backends produce consistent results: validator runs at both eager and chunked dispatch.
  • NaN handling: not applicable.
  • Edge cases covered: zero bands, negative SrcRect/DstRect, zero pixel size, out-of-extent DstRect, unsupported resample, rotated transform, unparseable CRS, mixed-band nodata.
  • Dask chunk boundaries: rejection happens at graph build, not in a chunk function.
  • No premature materialization or unnecessary copies.
  • Benchmark not needed (validation overhead is a few ms per parse).
  • README feature matrix: not applicable (no new public API).
  • Docstrings present and accurate.

…2329)

- Replace ``_wkt_to_epsg`` (which emits ``GeoTIFFFallbackWarning`` on
  parse failure) with a direct ``pyproj.CRS.from_user_input`` probe so
  the validator's unparseable-CRS check matches the no-warning
  behaviour of the registered ``_check_read_unparseable_crs`` hook.
- Lazy-import ``_NEAREST_RESAMPLE_ALGS`` from ``_vrt`` and
  ``_looks_like_wkt`` from ``_crs`` instead of hand-copying both
  constants. Sidesteps the import cycle without the drift risk that
  comes with maintaining two copies.
- Add a Notes section to the ``validate_parsed_vrt`` docstring that
  flags the overlap with the registered ``validate_read_metadata``
  hooks so a future maintainer modifying the rotated-transform or
  unparseable-CRS check knows the validator preempts them on the
  VRT path.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review

Second pass after commit 8be5d84. Re-read the diff in the worktree.

Dispositions from the first review

  • Fixed: duplicate GeoTIFFFallbackWarning on malformed CRS. The validator now probes pyproj directly with from_user_input, matching the silent behaviour of the registered hook. Test output for test_unparseable_crs_rejected_without_opt_in no longer emits the spurious warning.
  • Fixed: duplicated _NEAREST_RESAMPLE_ALGS and _looks_like_wkt. The validator lazy-imports both from their canonical homes (_vrt._NEAREST_RESAMPLE_ALGS and _crs._looks_like_wkt). Drift risk gone.
  • Fixed: shadow on registered hooks. Added a Notes section to the validator docstring spelling out the overlap with _check_read_rotated_transform and _check_read_unparseable_crs so a future maintainer editing either side knows the validator preempts them on the VRT path.
  • Dismissed (with reason): drop the mode='read' parameter. The master plan (docs/superpowers/plans/2026-05-22-geotiff-vrt-release-hardening-2321.md) calls out validate_parsed_vrt(parsed, *, mode) as the canonical signature for the centralised entry point, with a write-mode follow-up implied. Dropping the parameter would diverge from the plan; keeping it costs a single mode != 'read' check per call.

Remaining

None. Full geotiff suite stays at 5276 passing.

Checklist

  • All first-pass blockers and suggestions addressed.
  • Nit dismissals recorded with reason.
  • Tests still pass on the eager and chunked paths.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GeoTIFF: centralize VRT capability validation (#2321 sub-task 2)

1 participant