fix(kv-store): gate flaky sqlite-opfs browser tests behind VITE_SQLITE_OPFS=1#22693
Merged
Merged
Conversation
mverzilli
approved these changes
Apr 21, 2026
This was referenced Apr 21, 2026
mverzilli
added a commit
that referenced
this pull request
Apr 28, 2026
vitest.config.ts only includes src/sqlite-opfs/**/*.test.ts when VITE_SQLITE_OPFS=1 (gated since #22693 due to a separate hang). The wrapper script enumerated both directories unconditionally, so vitest received a positional path that the include filter excluded, returned "no test files found", and exited 1 — killing the loop via set -e. Mirror the gate at file-enumeration time so the script only feeds vitest paths that the config will accept.
mverzilli
added a commit
that referenced
this pull request
Apr 28, 2026
…olation The VITE_SQLITE_OPFS gate was added in #22693 as a workaround for the 2-CPU CI hang. That hang was a vitest+chromium CDP teardown deadlock at test-file transitions, now fixed by per-file vitest invocation (see run-browser-tests.sh). The gating was always meant to be removed once the hang was root-caused. Drop the env-flag gate in vitest.config.ts and stop mirroring it in the wrapper script — vitest now discovers sqlite-opfs tests by default and the wrapper enumerates them in the same loop as the indexeddb files. The existing wide-net catch-all in .test_patterns.yml continues to quarantine any residual flakiness to martin so colleagues aren't blocked.
chrismarino
pushed a commit
to chrismarino/aztec-packages
that referenced
this pull request
May 5, 2026
BEGIN_COMMIT_OVERRIDE fix(kv-store): ensure LMDB cursor is closed on iteration abort (AztecProtocol#22509) fix(telemetry-client): use appropriate histogram buckets for L1 gas prices (AztecProtocol#22512) fix(telemetry-client): log warning when BatchSpanProcessor drops spans (AztecProtocol#22511) fix(stdlib): wrap HA signer databaseUrl in SecretValue (AztecProtocol#22510) fix(prover-client): don't mark in-progress epoch N jobs as stale when epoch N+1 starts (AztecProtocol#22508) chore: (A-730) graceful shutdown for services in node startup failure path (AztecProtocol#22112) fix(prover-client): reject stale job promises and count timeouts toward retry limit (AztecProtocol#21842) feat(archiver): validate historical L1 log availability at startup (AztecProtocol#22644) fix(archiver): do not query MessageSent events by blockhash (AztecProtocol#22641) refactor(e2e): skip initial sequencer in p2p and epochs tests (AztecProtocol#22535) fix: handle missing L1 finalized block on devnets (AztecProtocol#22663) fix(world-state): treat historical block 0 queries as historical, not latest (AztecProtocol#22679) fix(sequencer): re-check parent checkpoint validity before pipelined L1 submission (AztecProtocol#22586) fix(world-state): make block 0 a first-class historical block (AztecProtocol#22711) chore: show all running versions (AztecProtocol#22376) chore: fix prettier inside worktrees (AztecProtocol#22557) feat: use optimized verifier for rollup (AztecProtocol#21840) fix(kv-store): skip pool creation on ephemeral deleteDb to unstick browser tests (AztecProtocol#22693) chore: rm claude lockfile (AztecProtocol#22718) fix(e2e): wait for first checkpoint in fee_asset_price_oracle_gossip test (AztecProtocol#22719) chore(prover-node): track estimated L1 fee when proof publishing is disabled (AztecProtocol#22691) fix(ci): rerun squashed PR check on base branch change (AztecProtocol#22713) feat(archiver): decouple calldata from blob fetching in L1 synchronizer (AztecProtocol#22716) refactor(e2e): enable pipelining in e2e_epochs tests (AztecProtocol#22544) feat(p2p): reject and evict txs with insufficient max fee per gas (AztecProtocol#22118) refactor(world-state): always index block 0 regardless of initial tree size (AztecProtocol#22724) fix(e2e): fix redistribution test (AztecProtocol#22729) END_COMMIT_OVERRIDE
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
cd yarn-project/kv-store && yarn testconsistently hangs the CI container for 10 min aftersqlite-opfs/multi_map.test.tsfinishes its last test (multiple keys are independent). The next test file never runs. This is reproducible across two CI runs on this branch and was the original merge-train/spartan failure (http://ci.aztec-labs.com/1776770924267106).Initial hypothesis was that
handleDeleteDbwas installing an OPFS SAH Pool for ephemeral:memory:DBs and contending on the OPFS directory lock. Skipping the pool install there is still a real cleanup win, so that change stays. But it did not resolve the hang — CI still times out at the exact same point, so the root cause is somewhere else in thesqlite-opfsbrowser stack (vitest browser-mode file transition, chromium resource exhaustion with dozens of SQLite-WASM workers, or similar).Root-causing this inside the merge-train isn't viable — every failed run holds the train up. Gate the
sqlite-opfsbrowser tests behindVITE_SQLITE_OPFS=1so they stay runnable in dev (they pass consistently locally) but stop blocking CI until someone owning the OPFS backend can dig in. The OPFS backend itself is still marked experimental in #22658.Default CI browser run: 70 tests (indexeddb only), ~7s.
Opt-in with
VITE_SQLITE_OPFS=1: 131 tests (adds sqlite-opfs), ~9s locally — hangs in the CI container.Also keeps the earlier
handleDeleteDbcleanup change (skip pool install on ephemeral DB delete, drop the now-deadpoolDirectoryvariable, dropasyncsince noawaitremains).Follow-up
A proper investigation of the
sqlite-opfsbrowser-test hang belongs in a dedicated issue — likely needs traces from a reproducing CI run. Filing separately.