-
Notifications
You must be signed in to change notification settings - Fork 173
Use R2 bucket for duckdb libraries #8486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -25,6 +25,25 @@ env: | |
| NIGHTLY_TOOLCHAIN: nightly-2026-02-05 | ||
|
|
||
| jobs: | ||
| duckdb-mirror: | ||
| name: "Mirror DuckDB to R2" | ||
| if: github.event_name == 'pull_request' | ||
| uses: ./.github/workflows/duckdb-r2.yml | ||
| secrets: inherit | ||
|
|
||
| duckdb-ready: | ||
| name: "DuckDB libraries available in R2" | ||
| needs: duckdb-mirror | ||
| if: ${{ !cancelled() }} | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 5 | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What's the rationale for the timeout duration here?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Chosen randomly |
||
| steps: | ||
| - name: Verify DuckDB mirror | ||
| if: ${{ needs.duckdb-mirror.result == 'failure' }} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There's more result types than success and failure. We could consider
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Apparently there's also: |
||
| run: | | ||
| echo "DuckDB mirror failed; downstream builds would 404" | ||
| exit 1 | ||
|
|
||
| lint-toml: | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
|
|
@@ -115,6 +134,7 @@ jobs: | |
|
|
||
| rust-docs: | ||
| name: "Rust (docs)" | ||
| needs: duckdb-ready | ||
| timeout-minutes: 30 | ||
| runs-on: >- | ||
| ${{ github.repository == 'vortex-data/vortex' | ||
|
|
@@ -204,6 +224,7 @@ jobs: | |
|
|
||
| rust-lint: | ||
| name: "Rust (lint)" | ||
| needs: duckdb-ready | ||
| timeout-minutes: 30 | ||
| runs-on: >- | ||
| ${{ github.repository == 'vortex-data/vortex' | ||
|
|
@@ -301,6 +322,7 @@ jobs: | |
|
|
||
| rust-test-other: | ||
| name: "Rust tests (${{ matrix.os }})" | ||
| needs: duckdb-ready | ||
| timeout-minutes: 30 | ||
| strategy: | ||
| fail-fast: false | ||
|
|
@@ -422,6 +444,7 @@ jobs: | |
|
|
||
| sqllogic-test: | ||
| name: "SQL logic tests" | ||
| needs: duckdb-ready | ||
| runs-on: >- | ||
| ${{ github.repository == 'vortex-data/vortex' | ||
| && format('runs-on={0}/runner=amd64-medium/image=ubuntu24-full-x64-pre-v2/tag=sql-logic-test', github.run_id) | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,197 @@ | ||
| name: DuckDB R2 mirror | ||
|
|
||
| # Mirror DuckDB libraries referenced by vortex-duckdb/build.rs to R2 when they | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What does mirror mean exactly? Should we extend the text here a bit on how the whole setup works with R2 and caching?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
| # are not present yet. Download tagged archives or build commits from source. | ||
| on: | ||
| workflow_call: { } | ||
|
|
||
| concurrency: | ||
| group: duckdb-r2-${{ github.event.pull_request.number || github.ref }} | ||
| cancel-in-progress: false | ||
|
|
||
| permissions: | ||
| contents: read | ||
|
|
||
| env: | ||
| PUBLIC_BASE_URL: "https://ci-builds.vortex.dev" | ||
| R2_BUCKET: "duckdb-builds" | ||
| R2_ENDPOINT_URL: "https://52bdeab5651e1584747feefd051fd566.r2.cloudflarestorage.com" | ||
|
|
||
| jobs: | ||
| check: | ||
| name: "Resolve DuckDB version and check R2" | ||
| runs-on: ubuntu-latest | ||
| timeout-minutes: 10 | ||
| outputs: | ||
| version: ${{ steps.resolve.outputs.version }} | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we support individual commits? (by default DDB sets sth like version 0.0.0 or so right ?)
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we support commits. |
||
| ref_dir: ${{ steps.resolve.outputs.ref_dir }} | ||
| release: ${{ steps.resolve.outputs.release }} | ||
| matrix: ${{ steps.resolve.outputs.matrix }} | ||
| any_missing: ${{ steps.resolve.outputs.any_missing }} | ||
| steps: | ||
| - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6 | ||
| - name: Resolve version and check R2 | ||
| id: resolve | ||
| run: | | ||
| set -Eeuo pipefail | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bit complex and long to inline a shell script into the GH action, wdyt?
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 |
||
| version=$(grep -oP 'DEFAULT_DUCKDB_VERSION:\s*&str\s*=\s*"\K[^"]+' \ | ||
| vortex-duckdb/build.rs) | ||
| # Same as in vortex-duckdb/build.rs: >=2 dot-separated numeric | ||
| # components is a tagged release (ref dir "vX.Y.Z"), anything | ||
| # else is a commit. | ||
| ref="${version#v}" | ||
| if [[ "$ref" =~ ^[0-9]+(\.[0-9]+)+$ ]]; then | ||
| release=true | ||
| ref_dir="v$ref" | ||
| else | ||
| release=false | ||
| ref_dir="$ref" | ||
| fi | ||
| echo "DuckDB $version release=$release" | ||
| entries=() | ||
| for archive in \ | ||
| libduckdb-linux-amd64.zip \ | ||
| libduckdb-linux-arm64.zip \ | ||
| libduckdb-osx-universal.zip; do | ||
| url="${PUBLIC_BASE_URL}/${ref_dir}/${archive}" | ||
| code=$(curl -o /dev/null -s -w '%{http_code}' --head "$url" || echo 000) | ||
| if [ "$code" = "200" ]; then | ||
| echo "present in R2: $archive" | ||
| continue | ||
| fi | ||
| echo "missing in R2 (HTTP $code): $archive" | ||
| case "$archive" in | ||
| *linux-amd64*) runner="ubuntu-latest"; os="linux"; arch="amd64" ;; | ||
| *linux-arm64*) runner="ubuntu-24.04-arm"; os="linux"; arch="arm64" ;; | ||
| *osx-universal*) runner="macos-14"; os="osx"; arch="universal" ;; | ||
| esac | ||
| entries+=("$(jq -nc \ | ||
| --arg archive "$archive" \ | ||
| --arg runner "$runner" \ | ||
| --arg os "$os" \ | ||
| --arg arch "$arch" \ | ||
| '{archive: $archive, runner: $runner, os: $os, arch: $arch}')") | ||
| done | ||
| if [ "${#entries[@]}" -eq 0 ]; then | ||
| matrix='{"include":[]}' | ||
| any_missing=false | ||
| else | ||
| include=$(printf '%s\n' "${entries[@]}" | jq -sc '.') | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the runners have python preinstalled, we could maybe consider that. |
||
| matrix=$(jq -nc --argjson include "$include" '{include: $include}') | ||
| any_missing=true | ||
| fi | ||
| echo "any_missing=$any_missing" | ||
| { | ||
| echo "version=$version" | ||
| echo "ref_dir=$ref_dir" | ||
| echo "release=$release" | ||
| echo "matrix=$matrix" | ||
| echo "any_missing=$any_missing" | ||
| } >> "$GITHUB_OUTPUT" | ||
| mirror: | ||
| name: "Mirror DuckDB ${{ matrix.archive }} to R2" | ||
| needs: check | ||
| if: >- | ||
| needs.check.outputs.any_missing == 'true' && | ||
| github.repository == 'vortex-data/vortex' && | ||
| github.event.pull_request.head.repo.full_name == github.repository | ||
| environment: duckdb-build | ||
| timeout-minutes: 120 | ||
| strategy: | ||
| fail-fast: false | ||
| matrix: ${{ fromJSON(needs.check.outputs.matrix) }} | ||
| runs-on: ${{ matrix.runner }} | ||
| steps: | ||
| - uses: actions/checkout@df4cb1c069e1874edd31b4311f1884172cec0e10 # v6 | ||
|
|
||
| - name: Install build dependencies (Linux) | ||
| if: needs.check.outputs.release != 'true' && runner.os == 'Linux' | ||
| run: | | ||
| sudo apt-get update | ||
| sudo apt-get install -y ninja-build libcurl4-openssl-dev zip unzip | ||
| # MacOS already has ninja and p7zip | ||
|
|
||
| - name: Prepare ${{ matrix.archive }} | ||
| env: | ||
| ARCHIVE: ${{ matrix.archive }} | ||
| REF_DIR: ${{ needs.check.outputs.ref_dir }} | ||
| RELEASE: ${{ needs.check.outputs.release }} | ||
| PLATFORM_OS: ${{ matrix.os }} | ||
| run: | | ||
| set -Eeuo pipefail | ||
| if [ "$RELEASE" = "true" ]; then | ||
| echo "Mirroring DuckDB release ${REF_DIR}/${ARCHIVE}" | ||
| curl -fSL --retry 3 -o "$ARCHIVE" \ | ||
| "https://github.com/duckdb/duckdb/releases/download/${REF_DIR}/${ARCHIVE}" | ||
| else | ||
| echo "Building DuckDB commit ${REF_DIR} from source" | ||
| curl -fSL --retry 3 -o duckdb-src.zip \ | ||
| "https://github.com/duckdb/duckdb/archive/${REF_DIR}.zip" | ||
| # macos zip extract error: cannot create | ||
| # <...>/issue2628_������.csv Illegal byte sequence | ||
| if [ "$PLATFORM_OS" = "osx" ]; then | ||
| 7z x duckdb-src.zip | ||
| else | ||
| unzip -q duckdb-src.zip | ||
| fi | ||
| src_dir="duckdb-${REF_DIR}" | ||
| extra="" | ||
| if [ "$PLATFORM_OS" = "osx" ]; then | ||
| extra="OSX_BUILD_UNIVERSAL=1" | ||
| fi | ||
| make -C "$src_dir" \ | ||
| GEN=ninja \ | ||
| DISABLE_SANITIZER=1 \ | ||
| THREADSAN=0 \ | ||
| BUILD_SHELL=false \ | ||
| BUILD_UNITTESTS=false \ | ||
| ENABLE_UNITTEST_CPP_TESTS=false \ | ||
| BUILD_EXTENSIONS="parquet;tpch;tpcds" \ | ||
| $extra | ||
| lib_dir="${src_dir}/build/release/src" | ||
| stage="stage" | ||
| rm -rf "$stage" | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How come we need to clear the dir here? Isn't this empty for each new runner? |
||
| mkdir -p "$stage" | ||
| cp -a "${lib_dir}/libduckdb.so" "$stage/" 2>/dev/null || true | ||
| cp -a "${lib_dir}/libduckdb.dylib" "$stage/" 2>/dev/null || true | ||
| cp -a "${lib_dir}/libduckdb_static.a" "$stage/" | ||
| cp -a "${src_dir}/src/include/duckdb.h" "$stage/" 2>/dev/null || true | ||
| cp -a "${src_dir}/src/include/duckdb.hpp" "$stage/" 2>/dev/null || true | ||
| ( cd "$stage" && zip -r "../${ARCHIVE}" . ) | ||
| fi | ||
| ls -la "$ARCHIVE" | ||
| - name: Upload to R2 | ||
| env: | ||
| AWS_ACCESS_KEY_ID: ${{ secrets.DUCKDB_R2_ACCESS_KEY_ID }} | ||
| AWS_SECRET_ACCESS_KEY: ${{ secrets.DUCKDB_R2_SECRET_ACCESS_KEY }} | ||
| AWS_REGION: "us-east-1" | ||
| AWS_ENDPOINT_URL: ${{ env.R2_ENDPOINT_URL }} | ||
| run: | | ||
| set -Eeuo pipefail | ||
| python3 scripts/s3-upload.py \ | ||
| --bucket "$R2_BUCKET" \ | ||
| --key "${{ needs.check.outputs.ref_dir }}/${{ matrix.archive }}" \ | ||
| --body "${{ matrix.archive }}" \ | ||
| --checksum-algorithm CRC32 | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this, or asked diff can we encode the dependency?