Use R2 bucket for duckdb libraries#8486
Conversation
8129660 to
8af34b4
Compare
Merging this PR will not alter performance
|
| Mode | Benchmark | BASE |
HEAD |
Efficiency | |
|---|---|---|---|---|---|
| ❌ | Simulation | decompress_rd[f64, (10000, 0.01)] |
108.7 µs | 139.1 µs | -21.89% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.1)] |
109 µs | 139.5 µs | -21.85% |
| ❌ | Simulation | decompress_rd[f64, (10000, 0.0)] |
108.7 µs | 139.1 µs | -21.83% |
| ❌ | Simulation | decompress_rd[f32, (100000, 0.0)] |
496 µs | 583.8 µs | -15.05% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.1)] |
78.1 µs | 91.2 µs | -14.43% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.01)] |
78.1 µs | 91 µs | -14.2% |
| ❌ | Simulation | decompress_rd[f32, (10000, 0.0)] |
78.5 µs | 91.2 µs | -13.91% |
| ⚡ | Simulation | chunked_varbinview_opt_canonical_into[(1000, 10)] |
206.8 µs | 170.2 µs | +21.46% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[128] |
215.3 ns | 186.1 ns | +15.67% |
| ⚡ | Simulation | chunked_varbinview_into_canonical[(100, 100)] |
307.1 µs | 272.8 µs | +12.59% |
| ⚡ | Simulation | bitwise_not_vortex_buffer_mut[1024] |
275.6 ns | 246.4 ns | +11.84% |
Tip
Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.
Comparing myrrc/duckdb-r2-infra (3c6ac96) with develop (575db9c)
c3d1f15 to
d3fc1da
Compare
|
Macos deployment error: |
DuckdbFS implementation for Vortex was introduced in #6198 as opt-out, but changed to opt-in in #6564 due to performance regressions. There were multiple issues (#6709, #6565 #6685) associated with it which differ from vortex's file system behaviour. It also requires additional dependencies CI which are a blocker for #8486 since MacOS runner doesn't bundle openssl for x86_64 on arm, and builds fail. As a long term goal, calling duckdb's blocking IO inside our event loop isn't the right abstraction. We want to allow duckdb to use its own IO outside vortex. Duckdb fs is also not maintaned actively so we're removing it Signed-off-by: Mikhail Kot <mikhail@spiraldb.com>
e12de4f to
59d0cd7
Compare
b32cb14 to
3c6ac96
Compare
|
I've verified manually this works for commits, they are uploaded and used from R2 as well. |
0ax1
left a comment
There was a problem hiding this comment.
Great to see this coming into shape! Couple of questions inline. In general, will this be able to handle diff build configs: debug, release, asan etc.? The build config should prob be part of the key with which we store builds.
| needs: duckdb-mirror | ||
| if: ${{ !cancelled() }} | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 5 |
There was a problem hiding this comment.
What's the rationale for the timeout duration here?
| uses: ./.github/workflows/duckdb-r2.yml | ||
| secrets: inherit | ||
|
|
||
| duckdb-ready: |
There was a problem hiding this comment.
Do we really need this, or asked diff can we encode the dependency?
| @@ -0,0 +1,197 @@ | |||
| name: DuckDB R2 mirror | |||
|
|
|||
| # Mirror DuckDB libraries referenced by vortex-duckdb/build.rs to R2 when they | |||
There was a problem hiding this comment.
What does mirror mean exactly? Should we extend the text here a bit on how the whole setup works with R2 and caching?
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
| outputs: | ||
| version: ${{ steps.resolve.outputs.version }} |
There was a problem hiding this comment.
Do we support individual commits? (by default DDB sets sth like version 0.0.0 or so right ?)
There was a problem hiding this comment.
Yes, we support commits.
| - name: Resolve version and check R2 | ||
| id: resolve | ||
| run: | | ||
| set -Eeuo pipefail |
There was a problem hiding this comment.
Bit complex and long to inline a shell script into the GH action, wdyt?
| matrix='{"include":[]}' | ||
| any_missing=false | ||
| else | ||
| include=$(printf '%s\n' "${entries[@]}" | jq -sc '.') |
There was a problem hiding this comment.
If the runners have python preinstalled, we could maybe consider that.
|
|
||
| lib_dir="${src_dir}/build/release/src" | ||
| stage="stage" | ||
| rm -rf "$stage" |
There was a problem hiding this comment.
How come we need to clear the dir here? Isn't this empty for each new runner?
Yes, but for the first version it won't to keep the diff small |
| // extensions statically, otherwise DuckDB tries to load them from an http | ||
| // endpoint with version 0.0.1 (all non-tagged builds) which doesn't exist. | ||
| let static_extensions = match version { | ||
| DuckDBVersion::Release(_) => "parquet;jemalloc", |
There was a problem hiding this comment.
It's not an extension anymore, the build prints this as a warning in the log.
| timeout-minutes: 5 | ||
| steps: | ||
| - name: Verify DuckDB mirror | ||
| if: ${{ needs.duckdb-mirror.result == 'failure' }} |
There was a problem hiding this comment.
There's more result types than success and failure. We could consider if: ${{ needs.duckdb-mirror.result != 'success' }} to signal a failure here.
There was a problem hiding this comment.
Apparently there's also: needs.check.outputs.any_missing == 'true'`.
Uh oh!
There was an error while loading. Please reload this page.