Skip to content

add vortex-geo crate and WKB extension type#7722

Merged
a10y merged 17 commits into
developfrom
aduffy/geo-v0
Jun 9, 2026
Merged

add vortex-geo crate and WKB extension type#7722
a10y merged 17 commits into
developfrom
aduffy/geo-v0

Conversation

@a10y

@a10y a10y commented Apr 29, 2026

Copy link
Copy Markdown
Contributor

Summary

Part of the implementation of #7686

This PR adds a new crate, vortex-geo, which will hold all extension types, custom layouts, and encodings necessary to store geospatial vector datasets in Vortex files.

The goal of this crate is to enable integration with DuckDB Spatial, GeoDataFusion, SedonaDB, and Iceberg v3 geo types.

This initial PR implements an extension type for the Well-Known Binary encoding (WKB). This encoding is the most common format for geospatial data for analytics, it's what both GeoParquet and DuckDB use to represent geometry types.

We also wire this into vortex DuckDB extension to support converting Geometry columns between Vortex and DuckDB formats.

API Changes

Adds new crate vortex-geo with extension type WellKnownBinary, (geo.wkb)

Adds support for geometry columns to DuckDB as well, e.g. you can now do something like

SELECT building_id, building_name, ST_Area(geometry) AS area 
FROM read_vortex("buildings.vortex")
ORDER BY area DESC
LIMIT 10;

It won't be very performant yet, not until we support better layouts + stats for geometry.

Testing

We have unit tests for metadata serde, round trip conversion between Vortex <> DuckDB geometry format, and an additional E2E test that demonstrates reading a Vortex file with geometry data and providing it to the DuckDB Spatial extension.

@a10y a10y force-pushed the aduffy/geo-v0 branch 2 times, most recently from a445f7d to 1c99386 Compare April 29, 2026 20:51
@codspeed-hq

codspeed-hq Bot commented Apr 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 1523 untouched benchmarks


Comparing aduffy/geo-v0 (acad7c7) with develop (f2148d4)

Open in CodSpeed

@a10y a10y added the changelog/feature A new feature label Apr 29, 2026
@a10y a10y force-pushed the aduffy/geo-v0 branch 3 times, most recently from 3382e4f to 05dcff9 Compare May 1, 2026 16:02
@a10y a10y marked this pull request as ready for review May 1, 2026 16:24
Comment on lines +28 to +34
duckdb_logical_type duckdb_vx_create_geometry(const char *crs) {
D_ASSERT(crs);
auto geom =
(*crs == '\0') ? duckdb::LogicalType::GEOMETRY() : duckdb::LogicalType::GEOMETRY(std::string(crs));
auto copy = duckdb::make_uniq<duckdb::LogicalType>(std::move(geom));
return reinterpret_cast<duckdb_logical_type>(copy.release());
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These C++ changes are because DuckDB upstream doesn't expose the full geometry type stuff over the C API.

Comment thread vortex-duckdb/cpp/logical_type.cpp
Comment thread vortex-duckdb/src/datasource.rs Outdated
Comment on lines -100 to -102
let blob = unsafe { cpp::duckdb_get_blob(self.as_ptr()) };
let slice =
unsafe { std::slice::from_raw_parts(blob.data.cast::<u8>(), blob.size.as_()) };

@a10y a10y May 1, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in certain situations this could lead to UB depending on the allocator, b/c if duckdb_malloc returns NULL then you'd call slice::from_raw_parts(NULL) which is UB.

see the new take_blob function below

i only fixed this b/c i was going to repeat this for the GEOMETRY branch and figured i'd fix it in both places instead

}

pub unsafe fn set_vector_buffer(&self, buffer: &VectorBufferRef) {
pub unsafe fn set_vector_buffer(&mut self, buffer: &VectorBufferRef) {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these methods should've always been &mut for soundness reasons, they were just missing before

@a10y a10y requested review from 0ax1 and joseph-isaacs May 1, 2026 18:51
}

#[test]
fn test_geometry() {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see this for (rather simple) example of querying geometry data from DuckDB

@a10y a10y force-pushed the aduffy/geo-v0 branch from 134b470 to f499553 Compare May 4, 2026 13:17
Comment thread vortex-geo/src/extension/wkb.rs Outdated
type NativeValue<'a> = Wkb<'a>;

fn id(&self) -> ExtId {
ExtId::new_static("geo.wkb")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets think about this id a little more.

What namespace do we want?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep this is just a placeholder. we can do vortex.geo.wkb, idk we don't have great rules around this rn

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i updated this to vortex.geo.wkb, wdyt?

Comment thread vortex-duckdb/src/exporter/mod.rs Outdated
Comment thread vortex-duckdb/src/exporter/geo.rs Outdated
Comment on lines 273 to +308

fn temporal_to_duckdb(temporal: TemporalMetadata) -> VortexResult<LogicalType> {
let duckdb_type = match temporal {
TemporalMetadata::Timestamp(unit, None) => match unit {
TimeUnit::Nanoseconds => DUCKDB_TYPE::DUCKDB_TYPE_TIMESTAMP_NS,
TimeUnit::Microseconds => DUCKDB_TYPE::DUCKDB_TYPE_TIMESTAMP,
TimeUnit::Milliseconds => DUCKDB_TYPE::DUCKDB_TYPE_TIMESTAMP_MS,
TimeUnit::Seconds => DUCKDB_TYPE::DUCKDB_TYPE_TIMESTAMP_S,
_ => vortex_bail!("Invalid TimeUnit {} for timestamp", unit),
},
TemporalMetadata::Timestamp(unit, Some(tz)) => {
if tz.as_ref() != "UTC" {
vortex_bail!("Invalid timezone for timestamp_tz {tz}, must be UTC");
}
if unit != &TimeUnit::Microseconds {
vortex_bail!(
"Invalid TimeUnit {} for timestamp_tz, must be Microseconds",
unit
);
}
DUCKDB_TYPE::DUCKDB_TYPE_TIMESTAMP_TZ
}
TemporalMetadata::Date(unit) => match unit {
TimeUnit::Days => DUCKDB_TYPE::DUCKDB_TYPE_DATE,
_ => vortex_bail!("Invalid TimeUnit {} for date", unit),
},
TemporalMetadata::Time(unit) => match unit {
TimeUnit::Microseconds => DUCKDB_TYPE::DUCKDB_TYPE_TIME,
TimeUnit::Nanoseconds => DUCKDB_TYPE::DUCKDB_TYPE_TIME_NS,
_ => vortex_bail!("Invalid TimeUnit {} for time", unit),
},
};

Ok(LogicalType::new(duckdb_type))
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we also move these into a ext mod?

Comment thread vortex-duckdb/cpp/value.cpp
Comment thread vortex-duckdb/cpp/include/duckdb_vx/value.h Outdated
@gatesn

gatesn commented May 17, 2026

Copy link
Copy Markdown
Contributor

Is this still blocked on arrow stuff @a10y ? Is that moving or do you want it taken over. Keen for this to merge this time!

@a10y

a10y commented May 18, 2026

Copy link
Copy Markdown
Contributor Author

Arrow stuff is merged so this is unblocked now. I will find some time this week to fix this up and get merged

a10y added 3 commits May 18, 2026 11:00
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
a10y added 6 commits May 18, 2026 11:06
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y force-pushed the aduffy/geo-v0 branch from f499553 to 8352d6c Compare May 23, 2026 14:17
a10y added 4 commits May 23, 2026 10:21
Resolve conflict in vortex-duckdb/src/convert/dtype.rs by keeping
both new match arms: DUCKDB_TYPE_GEOMETRY (from this branch) and
DUCKDB_TYPE_VARIANT (from develop).

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
…aths

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Resolved conflict in vortex-duckdb/cpp/value.cpp and value.h by adopting
develop's removal of duckdb_diagnostics.h wrapping (PR #7747) while
preserving the new geometry value APIs and geometry_crs.hpp include.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@0ax1 0ax1 removed their request for review June 2, 2026 09:07
Comment thread vortex-duckdb/src/duckdb/logical_type.rs
Comment thread vortex-duckdb/src/exporter/canonical.rs Outdated
Comment on lines +42 to +48
if ext.ext_dtype().is::<AnyTemporal>() {
return temporal::new_exporter(TemporalArray::try_from(ext)?, ctx);
}

if ext.ext_dtype().is::<WellKnownBinary>() {
return geo::new_wkb_exporter(WellKnownBinaryData::try_from(ext)?, ctx);
}

@joseph-isaacs joseph-isaacs Jun 4, 2026

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mind moving this out of the base canonical exporter.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, how should i thread it thru then? geometry is a core type in duckdb

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if array.is_ext() {
extension::new_exporter(array...)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment thread vortex-duckdb/cpp/logical_type.cpp
Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y requested a review from a team June 9, 2026 15:24
@a10y a10y enabled auto-merge (squash) June 9, 2026 15:24
a10y added 3 commits June 9, 2026 08:45
Develop bumped workspace prost to 0.14.4 via lockfile maintenance, but
vortex-geo only exists on this branch so its Cargo.lock entry still
pointed at 0.14.3. Fix the lockfile so `cargo check --locked` succeeds
after merging develop.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
… module

`vortex_array` is only in vortex-duckdb's [dev-dependencies], so lib code
can't reference it directly. Switch to the `vortex::array::*` re-exports
that the neighboring exporter modules already use.

Also drop the unused `ConversionCache` import and add the missing
`use crate::exporter::geo;` so the bare `geo::new_wkb_exporter(...)` call
resolves to the sibling module instead of an external `geo` crate.

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y merged commit efd3e9b into develop Jun 9, 2026
62 checks passed
@a10y a10y deleted the aduffy/geo-v0 branch June 9, 2026 16:32

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah crap

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed now #8394

a10y added a commit that referenced this pull request Jun 10, 2026
## Summary

Part of #7686 

Following up on #7722 , which added the WellKnownBinary extension type
and handling for exporting to DuckDB vectors.

This PR adds support for import/export to Arrow for the extension type.

## Testing

Unit tests are added to exercise both code paths

---------

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
@a10y a10y mentioned this pull request Jun 12, 2026
a10y added a commit that referenced this pull request Jun 12, 2026
We are no longer using the lockfiles, but I accidentally added this one
as part of #7722

Signed-off-by: Andrew Duffy <andrew@a10y.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants