Skip to content

Use R2 bucket for duckdb libraries#8486

Open
myrrc wants to merge 1 commit into
developfrom
myrrc/duckdb-r2-infra
Open

Use R2 bucket for duckdb libraries#8486
myrrc wants to merge 1 commit into
developfrom
myrrc/duckdb-r2-infra

Conversation

@myrrc

@myrrc myrrc commented Jun 18, 2026

Copy link
Copy Markdown
Contributor
  • Use ci-builds.vortex.dev R2 bucket as source for duckdb release and commit builds.
  • Mirror release builds from duckdb github releases page. Build commits from source.
  • For commit builds also try to download from R2 (useful for testing pre-release in CI)
  • Gate test_geometry for duckdb under release builds only. Running it for commits means we need to bundle "spatial" extension which is hard on macos due to openssl-dev symbols

@myrrc myrrc requested a review from a team June 18, 2026 09:30
@myrrc myrrc force-pushed the myrrc/duckdb-r2-infra branch from 8129660 to 8af34b4 Compare June 18, 2026 09:31
@myrrc myrrc marked this pull request as draft June 18, 2026 09:35
@codspeed-hq

codspeed-hq Bot commented Jun 18, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

⚡ 4 improved benchmarks
❌ 7 regressed benchmarks
✅ 1570 untouched benchmarks

Warning

Please fix the performance issues or acknowledge them on CodSpeed.

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation decompress_rd[f64, (10000, 0.01)] 108.7 µs 139.1 µs -21.89%
Simulation decompress_rd[f64, (10000, 0.1)] 109 µs 139.5 µs -21.85%
Simulation decompress_rd[f64, (10000, 0.0)] 108.7 µs 139.1 µs -21.83%
Simulation decompress_rd[f32, (100000, 0.0)] 496 µs 583.8 µs -15.05%
Simulation decompress_rd[f32, (10000, 0.1)] 78.1 µs 91.2 µs -14.43%
Simulation decompress_rd[f32, (10000, 0.01)] 78.1 µs 91 µs -14.2%
Simulation decompress_rd[f32, (10000, 0.0)] 78.5 µs 91.2 µs -13.91%
Simulation chunked_varbinview_opt_canonical_into[(1000, 10)] 206.8 µs 170.2 µs +21.46%
Simulation bitwise_not_vortex_buffer_mut[128] 215.3 ns 186.1 ns +15.67%
Simulation chunked_varbinview_into_canonical[(100, 100)] 307.1 µs 272.8 µs +12.59%
Simulation bitwise_not_vortex_buffer_mut[1024] 275.6 ns 246.4 ns +11.84%

Tip

Investigate this regression by commenting @codspeedbot fix this regression on this PR, or directly use the CodSpeed MCP with your agent.


Comparing myrrc/duckdb-r2-infra (3c6ac96) with develop (575db9c)

Open in CodSpeed

@myrrc myrrc force-pushed the myrrc/duckdb-r2-infra branch from c3d1f15 to d3fc1da Compare June 18, 2026 10:44
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 10:45 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 10:45 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 10:45 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 11:17 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 11:17 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 11:17 — with GitHub Actions Inactive
@myrrc myrrc requested a review from joseph-isaacs June 18, 2026 11:19
@myrrc myrrc marked this pull request as ready for review June 18, 2026 11:19
@myrrc myrrc marked this pull request as draft June 18, 2026 11:19
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 13:39 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 18, 2026 13:39 — with GitHub Actions Inactive
@myrrc

myrrc commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

Macos deployment error:

error:  cannot create /Users/runner/work/vortex/vortex/duckdb-08e34c447bae34eaee3723cac61f2878b6bdf787/data/csv/issue2628_������.csv
        Illegal byte sequence

@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 09:53 — with GitHub Actions Inactive
@myrrc myrrc mentioned this pull request Jun 19, 2026
@myrrc myrrc requested a review from AdamGS June 19, 2026 10:16
myrrc added a commit that referenced this pull request Jun 19, 2026
DuckdbFS implementation for Vortex was introduced in
#6198 as opt-out, but changed
to opt-in in #6564 due to
performance regressions.
There were multiple issues
(#6709,
#6565
#6685) associated with it
which differ from vortex's file system behaviour.

It also requires additional dependencies CI which are a blocker for 
#8486 since MacOS runner
doesn't bundle openssl for x86_64 on arm, and builds fail.

As a long term goal, calling duckdb's blocking IO inside our event loop
isn't the right abstraction. We want to allow duckdb to use its own IO
outside vortex.

Duckdb fs is  also not maintaned actively so we're removing it

Signed-off-by: Mikhail Kot <mikhail@spiraldb.com>
@myrrc myrrc force-pushed the myrrc/duckdb-r2-infra branch from e12de4f to 59d0cd7 Compare June 19, 2026 11:02
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:06 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:06 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:06 — with GitHub Actions Inactive
Signed-off-by: Mikhail Kot <mikhail@spiraldb.com>
@myrrc myrrc force-pushed the myrrc/duckdb-r2-infra branch from b32cb14 to 3c6ac96 Compare June 19, 2026 13:44
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:44 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:44 — with GitHub Actions Inactive
@myrrc myrrc temporarily deployed to duckdb-build June 19, 2026 13:44 — with GitHub Actions Inactive
@myrrc myrrc marked this pull request as ready for review June 19, 2026 13:46
@myrrc myrrc requested review from robert3005 and removed request for joseph-isaacs June 19, 2026 13:47
@myrrc

myrrc commented Jun 19, 2026

Copy link
Copy Markdown
Contributor Author

I've verified manually this works for commits, they are uploaded and used from R2 as well.

@myrrc myrrc added the ext/duckdb Relates to the DuckDB integration label Jun 22, 2026
@myrrc myrrc enabled auto-merge (squash) June 22, 2026 13:22
@robert3005 robert3005 requested a review from 0ax1 June 22, 2026 13:25

@0ax1 0ax1 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to see this coming into shape! Couple of questions inline. In general, will this be able to handle diff build configs: debug, release, asan etc.? The build config should prob be part of the key with which we store builds.

Comment thread .github/workflows/ci.yml
needs: duckdb-mirror
if: ${{ !cancelled() }}
runs-on: ubuntu-latest
timeout-minutes: 5

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rationale for the timeout duration here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chosen randomly

Comment thread .github/workflows/ci.yml
uses: ./.github/workflows/duckdb-r2.yml
secrets: inherit

duckdb-ready:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this, or asked diff can we encode the dependency?

@@ -0,0 +1,197 @@
name: DuckDB R2 mirror

# Mirror DuckDB libraries referenced by vortex-duckdb/build.rs to R2 when they

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does mirror mean exactly? Should we extend the text here a bit on how the whole setup works with R2 and caching?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

runs-on: ubuntu-latest
timeout-minutes: 10
outputs:
version: ${{ steps.resolve.outputs.version }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we support individual commits? (by default DDB sets sth like version 0.0.0 or so right ?)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we support commits.

- name: Resolve version and check R2
id: resolve
run: |
set -Eeuo pipefail

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bit complex and long to inline a shell script into the GH action, wdyt?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

matrix='{"include":[]}'
any_missing=false
else
include=$(printf '%s\n' "${entries[@]}" | jq -sc '.')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the runners have python preinstalled, we could maybe consider that.


lib_dir="${src_dir}/build/release/src"
stage="stage"
rm -rf "$stage"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How come we need to clear the dir here? Isn't this empty for each new runner?

@myrrc

myrrc commented Jun 22, 2026

Copy link
Copy Markdown
Contributor Author

In general, will this be able to handle diff build configs: debug, release, asan etc.?

Yes, but for the first version it won't to keep the diff small

@vortex-data vortex-data deleted a comment from github-actions Bot Jun 22, 2026
@vortex-data vortex-data deleted a comment from 0ax1 Jun 22, 2026
Comment thread vortex-duckdb/build.rs
// extensions statically, otherwise DuckDB tries to load them from an http
// endpoint with version 0.0.1 (all non-tagged builds) which doesn't exist.
let static_extensions = match version {
DuckDBVersion::Release(_) => "parquet;jemalloc",

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove jemalloc?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not an extension anymore, the build prints this as a warning in the log.

Comment thread .github/workflows/ci.yml
timeout-minutes: 5
steps:
- name: Verify DuckDB mirror
if: ${{ needs.duckdb-mirror.result == 'failure' }}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's more result types than success and failure. We could consider if: ${{ needs.duckdb-mirror.result != 'success' }} to signal a failure here.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there's also: needs.check.outputs.any_missing == 'true'`.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/ci ext/duckdb Relates to the DuckDB integration

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants