Surfaced by #2290 / #2289 while wiring up the conda-forge rasterio CI workflow.
Symptom
xrspatial/geotiff/tests/golden_corpus/test_corpus_determinism.py::test_fixture_bytes_are_deterministic fails for two fixtures when run against the GDAL / libjpeg shipped by conda-forge today:
cog_internal_overview_uint16: committed md5 ad1c9adf5958ba10f19319c59e99e1dc, regenerated md5 3f5111931f0c6e24c90f7139376bc72b.
compression_jpeg_uint8_ycbcr: committed md5 ccf1387a7799cc30065882874de5d709, regenerated md5 d3702bde1e887945740796257ff80dc2.
Both are sensitive to GDAL internals (overview pyramid encoding) and libjpeg version (JPEG encoder output), so the byte mismatch is expected when the toolchain differs from the one that originally produced the fixtures.
Workaround in #2290
The new pytest-geotiff-corpus workflow --ignores test_corpus_determinism.py so the parity oracle still runs in CI. The oracle and nodata tests compare semantic output via rasterio, which is what we actually want from the conda-forge lane.
What needs to happen here
Decide how the determinism test should behave when the toolchain differs. Options worth weighing:
- Pin GDAL / libjpeg versions in
setup.cfg and in the corpus generator's docstring so anyone regenerating gets a deterministic result. The CI workflow then matches those pins. Brittle long-term but explicit.
- Loosen the determinism test: compare semantic content (rasterio read of the file) instead of file-byte md5 for fixtures known to depend on the encoder (COG overviews, JPEG). Keep md5 checks for the simple uncompressed fixtures.
- Regenerate the two drifting fixtures against the conda-forge toolchain and commit the new bytes. Trades the current local-machine baseline for a conda-forge baseline; better for CI but worse for developers who don't have conda-forge installed.
(2) is probably the right shape, since the test's stated purpose is "the generator is reproducible," not "the encoder bytes are stable across versions."
Tracking this so the new conda-forge workflow can drop its --ignore once the determinism test is toolchain-agnostic.
Surfaced by #2290 / #2289 while wiring up the conda-forge rasterio CI workflow.
Symptom
xrspatial/geotiff/tests/golden_corpus/test_corpus_determinism.py::test_fixture_bytes_are_deterministicfails for two fixtures when run against the GDAL / libjpeg shipped by conda-forge today:cog_internal_overview_uint16: committed md5ad1c9adf5958ba10f19319c59e99e1dc, regenerated md53f5111931f0c6e24c90f7139376bc72b.compression_jpeg_uint8_ycbcr: committed md5ccf1387a7799cc30065882874de5d709, regenerated md5d3702bde1e887945740796257ff80dc2.Both are sensitive to GDAL internals (overview pyramid encoding) and libjpeg version (JPEG encoder output), so the byte mismatch is expected when the toolchain differs from the one that originally produced the fixtures.
Workaround in #2290
The new
pytest-geotiff-corpusworkflow--ignorestest_corpus_determinism.pyso the parity oracle still runs in CI. The oracle and nodata tests compare semantic output via rasterio, which is what we actually want from the conda-forge lane.What needs to happen here
Decide how the determinism test should behave when the toolchain differs. Options worth weighing:
setup.cfgand in the corpus generator's docstring so anyone regenerating gets a deterministic result. The CI workflow then matches those pins. Brittle long-term but explicit.(2) is probably the right shape, since the test's stated purpose is "the generator is reproducible," not "the encoder bytes are stable across versions."
Tracking this so the new conda-forge workflow can drop its
--ignoreonce the determinism test is toolchain-agnostic.