feat[gpu]: arrow device array stream support#8483
Conversation
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
This reverts commit 52952d2. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Merging this PR will degrade performance by 10.99%
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
108.7 µs | 139.1 µs | -21.89% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
109 µs | 139.5 µs | -21.85% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
108.7 µs | 139.1 µs | -21.83% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.0)] |
496 µs | 583.8 µs | -15.05% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
78.1 µs | 91.2 µs | -14.43% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.01)] |
78.1 µs | 91 µs | -14.2% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.0)] |
78.5 µs | 91.2 µs | -13.91% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
206.8 µs | 170.2 µs | +21.45% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
307.1 µs | 272.5 µs | +12.71% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing ad/arrow-device-array-stream (f1a8999) with develop (35e4d72)
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
3ae0d00 to
07926a0
Compare
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
The Arrow C device array stream export drove the Vortex stream on a private CurrentThreadRuntime, but a partition scan spawns its decode work onto the session's runtime (vortex-ffi's RUNTIME). Nothing ever drove that runtime during streaming, so the first get_next on a real partition deadlocked waiting on tasks that never ran. The existing tests only exercise an inert in-memory stream, so they never hit it. Thread the session's runtime through export_device_array_stream and drive the stream and per-array exports on it, removing the private runtime and worker pool. Expose vortex_ffi::runtime() so layered FFI crates can pass the same runtime the partition's scan spawns onto. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
The device stream derives its schema from the first array and rejects any later array whose Arrow schema differs, which is required by the Arrow C stream contract but means a stream whose chunks vary their encoding (a dictionary-encoded chunk among plain chunks) fails mid-stream. Document this on the trait, note that an empty stream reports a dtype-derived schema that can differ from a non-empty run, and sharpen the mismatch error to name the cause. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Add a shared ArrowDeviceArray::empty() constructor and build the end-of-stream marker from it, replacing the hand-rolled struct literal. The stream tests now call the module-level release_schema/release_device_array helpers instead of redefining byte-for-byte copies, and drop the duplicate empty_device_array placeholder in favor of ArrowDeviceArray::empty(). Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Several doc and line comments added for the Arrow device array stream exceeded the 100-column limit. Wrap them; no behavior change. Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
myrrc
left a comment
There was a problem hiding this comment.
Changes look good to me, but this PR would benefit from adding C-side tests.
Signed-off-by: Alexander Droste <alexander.droste@protonmail.com>
Adds Arrow device stream support which is exercised and tested through the cuDF harness.