Skip to content

Add Geospatial geography files with statistics#109

Merged
pitrou merged 4 commits into
apache:masterfrom
paleolimbot:geospatial-geography-files
Jun 11, 2026
Merged

Add Geospatial geography files with statistics#109
pitrou merged 4 commits into
apache:masterfrom
paleolimbot:geospatial-geography-files

Conversation

@paleolimbot

Copy link
Copy Markdown
Member

This PR adds some geography test files with statistics that may be useful in testing implementations. Notably, geography statistics can have xmin > xmax such that row group statistics can "wrap around" the antimeridian (e.g., so that ship position statistics in the pacific ocean, or a catalogue of wildlife in Fiji do not have longitude bounds that span the globe).

I recently implemented this in SedonaDB ( apache/sedona-db#805 ) based on the pluggable statistics writer in arrow-rs ( apache/arrow-rs#8414 ).

The underlying stats are coming from s2geometry's S2LatLngRectBounder ( https://github.com/google/s2geometry/blob/master/src/s2/s2latlng_rect_bounder.h ) via s2geography ( https://github.com/paleolimbot/s2geography/blob/main/src/s2geography/coverings.h#L13-L19 ). I'd love to simplify that and just have it all in a self-contained implementation but certain components of bounding on the sphere (e.g., if a polygon contains the north pole) are non-trivial.

The files are basically uniformly distributed (on the sphere) points, segements (basically sequential points sorted on a hilbert curve), and polygons (buffered points, basically rectangles). Both lines and polygon have some geographies that cross the antimeridian, and all the files have at least two row groups with wraparound statistics. All the files have at least one geometry intersecting the north pole and one intersecting the south pole (for polygons, the geometry contains it).

These aren't exhaustive cases for geographical testing but the addition of the wraparound statistics will hopefully help ensure pruning is correct.

@jiayuasu jiayuasu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

@pitrou

pitrou commented May 13, 2026

Copy link
Copy Markdown
Member

Hmm, can the files be smaller?

@paleolimbot

Copy link
Copy Markdown
Member Author

That's a great point...the biggest one is now 60 KB and gets the same point (or line or polygon, as it may be) across.

@pitrou

pitrou commented May 13, 2026

Copy link
Copy Markdown
Member

That's a great point...the biggest one is now 60 KB and gets the same point (or line or polygon, as it may be) across.

Accross the antimeridian, right?

@pitrou

pitrou commented May 13, 2026

Copy link
Copy Markdown
Member

Both lines and polygon have some geographies that cross the antimeridian, and all the files have at least two row groups with wraparound statistics. All the files have at least one geometry intersecting the north pole and one intersecting the south pole (for polygons, the geometry contains it).

Can you add this to the geospatial README?

@pitrou

pitrou commented Jun 11, 2026

Copy link
Copy Markdown
Member

Sorry for forgetting about this @paleolimbot . This LGTM, are you ok merging it?

@paleolimbot

Copy link
Copy Markdown
Member Author

No problem!

Yes, these are good to merge.

@pitrou pitrou merged commit 1a2a751 into apache:master Jun 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants