Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
326 changes: 326 additions & 0 deletions docs/source/user_guide/geotiff_safe_io.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,326 @@
.. _user_guide.geotiff_safe_io:

***********************
Safe GeoTIFF IO usage
***********************

This page is the user-facing answer to "is this safe to rely on?" for
:mod:`xrspatial.geotiff`. It explains which entry points to prefer,
how to read the tier vocabulary the module publishes, which codecs and
COG combinations sit inside the stable contract, the fail-closed errors
a caller will hit, and the env vars / kwargs that bound remote reads.

The page does not claim full GDAL / VRT / GPU parity. Where a feature
is tested but the public surface is not yet pinned, it is called out as
``advanced`` or ``experimental`` and a caller should treat it as such.

.. contents:: On this page
:local:
:depth: 2


Entry points
============

The public IO surface lives at ``xrspatial.geotiff``. Five names cover
the read and write paths:

.. list-table::
:header-rows: 1
:widths: 30 70

* - Entry point
- What it does
* - :func:`xrspatial.geotiff.open_geotiff`
- The read entry point. A path or a binary file-like is the only
required argument. Pass ``chunks=N`` for a dask-backed lazy
read; pass ``gpu=True`` for a CuPy-backed eager read; combine
both for a dask + CuPy read. Returns a 2D
:class:`xarray.DataArray` for single-band input and a 3D one for
multi-band input. The binary file-like form is restricted to the
eager numpy reader; dask, GPU, VRT, and remote-URL paths require
a string.
* - :func:`xrspatial.geotiff.read_vrt`
- Dedicated entry point for reading a GDAL ``.vrt`` mosaic over a
set of GeoTIFF sources. Tier: ``advanced``. The VRT path honours
a documented subset of the GDAL VRT schema; unsupported features
raise ``VRTUnsupportedError`` or
:class:`xrspatial.geotiff.UnsupportedGeoTIFFFeatureError` at
graph-build time rather than producing wrong pixels. Both error
classes live in :mod:`xrspatial.geotiff._errors`.
* - :func:`xrspatial.geotiff.to_geotiff`
- Write a DataArray to a local path. Pass ``cog=True`` for a
Cloud-optimized GeoTIFF layout. Pass ``allow_experimental_codecs=True``
to opt into ``lerc``, ``jpeg2000`` / ``j2k``, or ``lz4``; pass
``allow_internal_only_jpeg=True`` to opt into the
internal-only ``jpeg`` codec.
* - :func:`xrspatial.geotiff.write_geotiff_gpu`
- GPU writer. Tier: ``experimental``. Use the CPU writer for
anything you intend to round-trip through external tools.
* - :func:`xrspatial.geotiff.write_vrt`
- Emit a GDAL ``.vrt`` over local GeoTIFF sources. Tier:
``advanced``.

A dask-backed read is just ``open_geotiff(source, chunks=...)`` -- there
is no separate ``read_geotiff_dask`` name on the public surface. The
internal helper exists for backend wiring; callers should go through
``open_geotiff``.


Tier vocabulary
===============

:data:`xrspatial.geotiff.SUPPORTED_FEATURES` is a dict that maps every
feature name on the public surface to one of four tier strings. Read
the tier before depending on a feature in production:

* ``stable`` -- the path a new user should be on. Covered by the
cross-backend parity matrix and a release-gate test. A regression
here fails CI. Safe to rely on for the supported release.
* ``advanced`` -- works and is tested, but the caller should know what
they are signing up for. Cloud cost, partial VRT mosaics, rotated
transforms dropping on write, BigTIFF promotion, and ``.tif.ovr``
sidecar discovery all live here. No kwarg gate; the docstring
carries an ``Advanced:`` marker.
* ``experimental`` -- works in our tests, no claim about external
interop or numerical parity across backends. GPU read and write,
rotated-transform escape hatches, and Tier 3 codecs sit here. Tier 3
codecs additionally require ``allow_experimental_codecs=True`` on the
writer.
* ``internal_only`` -- the strictest tier. The output does not round-trip
through libtiff / GDAL / rasterio. ``codec.jpeg`` is the only entry
today and requires its own ``allow_internal_only_jpeg=True`` opt-in;
``allow_experimental_codecs`` does not cover it.

To check a feature at runtime::

from xrspatial.geotiff import SUPPORTED_FEATURES

if SUPPORTED_FEATURES.get('writer.cog') != 'stable':
# The release you are on has not promoted COG writes.
# Fall back to a plain GeoTIFF write or pin a known release.
...

The full tier map and the rationale for each entry live in
:ref:`reference.geotiff_release_contract`. The release-gate audit table
that ties each ``stable`` promise to a regression test lives in
:ref:`reference.geotiff_release_gate`.


Recommended codecs
==================

Five codecs are tagged ``stable`` and form the lossless contract:

* ``none`` -- no compression (``COMPRESSION_NONE`` in the TIFF spec).
* ``deflate`` -- DEFLATE.
* ``lzw`` -- LZW.
* ``packbits`` -- PackBits.
* ``zstd`` -- Zstandard.

Each of these is lossless and round-trips byte-for-byte for integer and
float dtypes through the CPU writer and CPU reader. If you do not have
a reason to pick something else, write with one of these.

The following codecs are tagged ``experimental`` and require
``allow_experimental_codecs=True`` on :func:`xrspatial.geotiff.to_geotiff`:

* ``lerc`` -- Limited Error Raster Compression.
* ``jpeg2000`` and ``j2k`` -- JPEG 2000.
* ``lz4`` -- LZ4.

The ``jpeg`` codec is tagged ``internal_only``. It does not round-trip
through libtiff / GDAL / rasterio and the writer rejects it unless the
caller passes ``allow_internal_only_jpeg=True``. The general
``allow_experimental_codecs=True`` flag does not unlock it.

A file falls outside the stable codec contract whenever it uses a
non-``stable`` codec, or whenever it is read or written through a
non-``stable`` path (GPU, BigTIFF COG, HTTP COG, file-like destinations
with ``cog=True``).


COG output
==========

Pass ``cog=True`` to :func:`xrspatial.geotiff.to_geotiff` to write a
Cloud-optimized GeoTIFF. The writer emits an IFD-first, tiled layout
with internal overviews using a lossless codec.

The stable COG contract covers:

* Axis-aligned 2D / 3D rasters.
* CPU writer and CPU reader paths (``writer.cog`` and
``reader.local_cog`` are both ``stable``).
* Stable codecs only.
* Internal overviews only -- no ``.tif.ovr`` sidecars in the stable
layout.
* Normal CRS, transform, dtype, nodata, band, and
pixel-is-area / pixel-is-point round-trip.

The following combinations stay outside the stable contract even when
``cog=True`` is set:

* GPU COG read or write -- ``writer.gpu`` and ``reader.gpu`` are
``experimental``.
* Experimental codecs (``lerc``, ``jpeg2000`` / ``j2k``, ``lz4``) and
the internal-only ``jpeg`` codec.
* Rotated transforms -- read-side ``allow_rotated=True`` is
``experimental``, and the writer drops rotation terms on round-trip.
* External ``.tif.ovr`` sidecars (``reader.sidecar_ovr`` is
``advanced``).
* File-like destinations with ``cog=True``.
* BigTIFF COG (``writer.bigtiff_cog`` is ``advanced``).
* HTTP / range COG (``reader.http_cog`` is ``advanced``).

If your pipeline relies on any of these, pin the xrspatial release and
treat the behaviour as opt-in rather than as part of the stable
contract.


Fail-closed errors
==================

The reader and writer raise typed errors instead of guessing when the
input is ambiguous or unsupported. The hierarchy lives in
:mod:`xrspatial.geotiff`. Every entry below subclasses
:class:`ValueError`, so existing ``except ValueError`` callers keep
catching them. Every entry except
:class:`~xrspatial.geotiff.UnsupportedGeoTIFFFeatureError` also subclasses
:class:`~xrspatial.geotiff.GeoTIFFAmbiguousMetadataError`, which catches
the ambiguous-metadata family at once.
:class:`~xrspatial.geotiff.UnsupportedGeoTIFFFeatureError` is a direct
``ValueError`` subclass and sits outside that family on purpose --
"we refuse this input" is distinct from "the input is malformed".

.. list-table::
:header-rows: 1
:widths: 30 50 20

* - Error
- Meaning
- Opt-in
* - :class:`~xrspatial.geotiff.InvalidCRSCodeError`
- The CRS code does not resolve to a known authority entry.
- Pass a valid EPSG code or full WKT.
* - :class:`~xrspatial.geotiff.UnparseableCRSError`
- The CRS string cannot be parsed as WKT or an authority code.
- ``allow_unparseable_crs=True`` (experimental).
* - :class:`~xrspatial.geotiff.RotatedTransformError`
- The affine transform has non-zero rotation / shear terms.
- ``allow_rotated=True`` (experimental). The opt-in returns the
pixel grid without the geospatial assumption.
* - :class:`~xrspatial.geotiff.NonUniformCoordsError`
- The DataArray coords on write imply a non-uniform pixel grid.
- Regrid the array to uniform spacing first.
* - :class:`~xrspatial.geotiff.MixedBandMetadataError`
- A VRT declares conflicting per-band metadata (most often
disagreeing nodata sentinels).
- ``band_nodata='first'`` to keep the legacy "use band 0" behaviour
explicitly.
* - :class:`~xrspatial.geotiff.ConflictingCRSError`
- ``attrs['crs']`` and ``attrs['crs_wkt']`` do not canonicalise to
the same WKT on write.
- Resolve the conflict in caller code before writing.
* - :class:`~xrspatial.geotiff.ConflictingNodataError`
- ``attrs['nodata']`` and ``attrs['nodatavals']`` disagree on
write.
- Resolve in caller code; the writer will not pick one silently.
* - ``VRTUnsupportedError``
- The parsed VRT declares a feature the read pipeline does not
honour (CRS / dtype / band / nodata / transform / pixel-size /
window / resampling mismatch).
- No opt-in. Either fix the VRT or read the sources directly.
* - :class:`~xrspatial.geotiff.UnknownCRSModelTypeError`
- The writer cannot classify an EPSG code as geographic or
projected.
- Pass a code pyproj can resolve, or install pyproj.
* - :class:`~xrspatial.geotiff.UnsupportedGeoTIFFFeatureError`
- The input declares a feature the GeoTIFF module does not
implement (warped / reprojection VRTs, pansharpened or derived
VRT subclasses, non-zero skew on a VRT mosaic source transform,
and so on).
- No opt-in. The error message names the feature and the source
that triggered it.

Remote-read safety limits
=========================

When :func:`xrspatial.geotiff.open_geotiff` is pointed at an
``http://``, ``https://``, ``s3://``, ``gs://``, ``az://``, or
``memory://`` URI, the reader applies several bounded-read guards
before fetching pixel bytes.

Byte budget
-----------

The reader caps the total bytes pulled from a remote source via the
``max_cloud_bytes`` kwarg on
:func:`~xrspatial.geotiff.open_geotiff`. The resolution order is:

1. The ``max_cloud_bytes`` kwarg, if the caller passed one.
2. The ``XRSPATIAL_GEOTIFF_MAX_CLOUD_BYTES`` env var, if it is set to a
positive integer.
3. The module default, 256 MiB. The constant lives at
:data:`xrspatial.geotiff._sources.MAX_CLOUD_BYTES_DEFAULT`.

Pass ``max_cloud_bytes=None`` to disable the cap explicitly when the
caller has another reason to trust the source. The cap is a guard
against an unintended full-file fetch; it is not a substitute for an
explicit window or chunked read.

Private-host rejection
----------------------

HTTP / HTTPS reads resolve the URL's host and reject any address that
maps to a private, loopback, link-local, or otherwise non-public IP.
The check is on by default and exists to keep an SSRF-style request
from reaching an internal service. Set
``XRSPATIAL_GEOTIFF_ALLOW_PRIVATE_HOSTS=1`` to opt out when the caller
is intentionally targeting a host on a private network.

Timeouts
--------

Two env vars control the HTTP timeouts on remote reads:

* ``XRSPATIAL_GEOTIFF_HTTP_CONNECT_TIMEOUT`` -- connect timeout in
seconds.
* ``XRSPATIAL_GEOTIFF_HTTP_READ_TIMEOUT`` -- read timeout in seconds.

Both fall back to the module default when unset.

Strict mode
-----------

``XRSPATIAL_GEOTIFF_STRICT=1`` flips several "warn and continue" sites
to "raise". The flag affects CRS resolution, VRT validation, and a
handful of decode-side fallback paths. Use it in CI when you want a
hard failure on metadata that the default path would tolerate.

Other env vars
--------------

* ``XRSPATIAL_GEOTIFF_MMAP_CACHE_SIZE`` -- caps the mmap cache size for
local-file reads. Default 32.

The full list of env vars lives in the source under
:mod:`xrspatial.geotiff._sources` and :mod:`xrspatial.geotiff._runtime`.
The user-facing names above cover everything a caller normally
configures.


See also
========

* :ref:`reference.geotiff` -- the API reference for every public name on
the module, including signatures, kwargs, and the stable COG contract
text.
* :ref:`reference.geotiff_release_contract` -- the user-facing release
contract that enumerates every feature in
:data:`xrspatial.geotiff.SUPPORTED_FEATURES` against its tier.
* :ref:`reference.geotiff_release_gate` -- the release-gate audit
checklist that ties each ``stable`` promise to a regression test.
* :ref:`user_guide.attrs_contract` -- the round-trip contract for the
``DataArray.attrs`` mapping that the reader emits and the writer
consumes.
1 change: 1 addition & 0 deletions docs/source/user_guide/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@ User Guide
surface
zonal
attrs_contract
geotiff_safe_io
local-migration
Loading