Skip to content

test: mark fee_settings.test.ts teardown segfault as flake#23378

Closed
AztecBot wants to merge 1 commit into
merge-train/spartanfrom
claudebox/fee-settings-segfault-flake
Closed

test: mark fee_settings.test.ts teardown segfault as flake#23378
AztecBot wants to merge 1 commit into
merge-train/spartanfrom
claudebox/fee-settings-segfault-flake

Conversation

@AztecBot

Copy link
Copy Markdown
Collaborator

Why

PR #23344 (merge-train/spartan) was dequeued from the merge queue at 2026-05-18 17:08:03Z after CI run 26047392849 failed on ci/x8-full (1 of 10 merge-queue-heavy grinds). The PR branch CI on the same head (8caa1d336a) passed cleanly.

The only failure was src/e2e_fees/fee_settings.test.ts exiting with code 139 (SIGSEGV) at 355s. Log: http://ci.aztec-labs.com/14142e6c59162a95

What's happening

The segfault occurs in afterAll teardown, not in the test body. From the stack:

  1. Sequencer.stop awaits the in-flight checkpoint L1 submission, which is interrupted (Transaction sending is interrupted — a clean abort signal from fix: interrupt prover jobs in stop #23358).
  2. Node fully shuts down: sequencer, slashing, p2p, world-state, archiver.
  3. Then prover-node starts stopping. Its still-running EpochProvingJob calls into the native world-state DB to Create fork at 2 / Insert 0 L1 to L2 messages in fork.
  4. Native code logs GET_TREE_INFO failed: Fork not found and segfaults the Jest process.

fix: interrupt prover jobs in stop (#23358) already on the train interrupts prover jobs at stop, but doesn't fully serialise prover-node shutdown against in-flight native fork operations — the segfault wins the race intermittently. Same class of teardown-time native crash that the existing e2e_fees/gas_estimation.test.ts flake entry covers (different surface: timeout: sending signal TERM to command 'bash').

Fix

Mark fee_settings.test.ts as a flake only when the error matches the segfault signature (Segmentation fault.*core dumped|code: 139). Real test-body assertion failures still fail CI. Assigned to *alex (PR author, owns the related gas_estimation flake entry).

A proper fix is to either (a) cancel and await in-flight epoch-proving jobs before world-state synchronizer stops, or (b) make the native world-state DB return JS errors for Fork not found on a stopped store rather than segfaulting. Out of scope here — left as a follow-up for the prover-node / world-state team.

Full analysis: https://gist.github.com/AztecBot/704d54fc69850b1b9ceb1aeaeae64667

Note on local CI

./bootstrap.sh ci not run locally — the change is metadata only (.test_patterns.yml is consumed by ci3/filter_test_cmds and ci3/get_test_entry, no compiled artifact depends on it). YAML validated with yaml.safe_load; both regex and error_regex matched against the actual failure string.

ClaudeBox log: https://claudebox.work/s/16f3aaf1a7b118c7?run=1

@AztecBot AztecBot added ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR. labels May 18, 2026
@AztecBot

Copy link
Copy Markdown
Collaborator Author

Automatically closing this stale claudebox draft PR (no updates for 5+ days). Re-open if still needed.

@AztecBot AztecBot closed this May 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-draft Run CI on draft PRs. claudebox Owned by claudebox. it can push to this PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant