Skip to content

refactor(p2p): merge FastTxCollection into TxCollection with sequential pipeline#23245

Merged
fcarreiro merged 1 commit into
merge-train/spartanfrom
fc/absorb-fast-tx-collection
May 13, 2026
Merged

refactor(p2p): merge FastTxCollection into TxCollection with sequential pipeline#23245
fcarreiro merged 1 commit into
merge-train/spartanfrom
fc/absorb-fast-tx-collection

Conversation

@fcarreiro

@fcarreiro fcarreiro commented May 13, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Removes FastTxCollection as a separate class and absorbs all its logic directly into TxCollection
  • Replaces the old parallel file-store delay with a single sequential pipeline: node RPC → reqresp → file store, where each phase blocks on the previous (cancellation-aware)
  • File store collection is now driven by IRequestTracker — the same synchronization primitive used by node and reqresp paths. The tracker is the single source of truth for "is this tx still missing?" and "is this request still alive?"
  • FileStoreTxCollection simplified: dropped start()/stop()/persistent worker pool/wakeSignal. startCollecting(requestTracker, context) returns Promise<void>, spins up its own per-call worker pool, and workers self-terminate when the tracker is cancelled (all-fetched / deadline / external)

Collection flow inside collectFast

  1. Start node RPC collection in the background
  2. Wait txCollectionFastNodesTimeoutBeforeReqRespMs — interruptible by cancellation or by node exhaustion (so when no nodes are configured, reqresp starts immediately)
  3. Start reqresp in the background (parallel with nodes)
  4. Wait txCollectionFileStoreFastDelayMs — interruptible by cancellation or reqresp completion
  5. Start file store collection in the background (its workers self-terminate)
  6. Promise.allSettled on node + reqresp + file store

txCollectionFileStoreFastDelayMs description updated to reflect it is now anchored to reqresp start, not collection start.

File store / tracker integration

  • FileStoreTxCollection.startCollecting no longer takes (txHashes, context, deadline); it takes (requestTracker, context) and reads the missing txs + deadline from the tracker
  • Workers check requestTracker.isMissing(hash) each scan — if the tx was found via another path (node/reqresp/gossipsub), the entry is dropped without an extra fetch
  • Workers race their backoff sleeps against requestTracker.cancellationToken — cancelling a request (deadline, stopCollectingForBlocksUpTo/After, or stop()) propagates to file store workers immediately
  • Removed foundTxs/clearPending plumbing on FileStoreTxCollection — the tracker handles both implicitly
  • startCollecting yields once after building its entry set, so a synchronous follow-up call (e.g. markFetched in tests, or the gossipsub-found path in production) lands before workers begin scanning

Tests

  • tx_collection.test.ts: collapsed the TestFastTxCollection subclass; all accesses go directly through TxCollection. Added "starts reqresp immediately when no nodes are configured" covering the node-exhaustion shortcut
  • file_store_tx_collection.test.ts: rewritten for the new shape — no start()/stop(), lifecycle driven by the tracker (cancel to terminate workers). New "workers exit when tracker is cancelled" covers the per-call worker-pool teardown

Closes https://linear.app/aztec-labs/issue/A-933/tx-collection-dont-retrieve-transactions-that-have-already-been via new synchronization with the request tracker.

@fcarreiro fcarreiro marked this pull request as draft May 13, 2026 13:16
Base automatically changed from fc/remove-old-reqresp to merge-train/spartan May 13, 2026 14:17
@fcarreiro fcarreiro force-pushed the fc/absorb-fast-tx-collection branch from 0d77c01 to d6be609 Compare May 13, 2026 14:41

Copy link
Copy Markdown
Contributor Author

This stack of pull requests is managed by Graphite. Learn more about stacking.

@fcarreiro fcarreiro force-pushed the fc/absorb-fast-tx-collection branch from d6be609 to eb8d696 Compare May 13, 2026 15:15
@fcarreiro fcarreiro marked this pull request as ready for review May 13, 2026 15:16
@fcarreiro fcarreiro requested a review from spalladino May 13, 2026 15:16
@AztecBot

Copy link
Copy Markdown
Collaborator

Flakey Tests

🤖 says: This CI run detected 1 tests that failed, but were tolerated due to a .test_patterns.yml entry.

\033FLAKED\033 (8;;http://ci.aztec-labs.com/b22d1e1ac820ccc7�b22d1e1ac820ccc78;;�): yarn-project/scripts/run_test.sh ethereum/src/test/tx_delayer.test.ts (110s) (code: 0)

/** Clears all pending entries. */
public clearPending(): void {
this.entries.clear();
await Promise.allSettled(times(this.config.workerCount, () => this.workerLoop(entries, requestTracker, context)));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up this changes the semantics of workerCount. If I understood correctly, now we have workerCount per request, not global. I think this is fine though, since worker count is not too high (defaults to 5).

@fcarreiro fcarreiro enabled auto-merge (squash) May 13, 2026 15:52
@fcarreiro fcarreiro disabled auto-merge May 13, 2026 16:31
@fcarreiro fcarreiro merged commit 566d7f5 into merge-train/spartan May 13, 2026
29 checks passed
@fcarreiro fcarreiro deleted the fc/absorb-fast-tx-collection branch May 13, 2026 16:31
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
refactor(p2p): merge FastTxCollection into TxCollection with sequential
pipeline (AztecProtocol#23245)
refactor(publisher): bundle-level simulate; drop per-action enqueue sims
(AztecProtocol#23165)
refactor(stdlib): remove deprecated RevertCode/TxExecutionResult aliases
(AztecProtocol#23249)
test(e2e): fix race in 'proposer invalidates multiple checkpoints'
(AztecProtocol#23259)
fix: clean up old jobs regardless of pending status (AztecProtocol#23260)
refactor(p2p): remove unused sendBatchRequest (AztecProtocol#23273)
chore(p2p): remove proposal_tx_collector leftovers (AztecProtocol#23276)
feat: slash truncated checkpoint proposals (AztecProtocol#23250)
refactor: remove unused map in attestation pool (AztecProtocol#23284)
chore(p2p): assert last block in checkpoint proposal is correct (AztecProtocol#23274)
refactor(l1-tx-utils): use DateProvider for fail-fast timeout check
(AztecProtocol#23257)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23277)
test(e2e): fix race in broadcasted_invalid_block_proposal_slash under
pipelining (AztecProtocol#23302)
fix(archiver): atomic getter for L2 tips (AztecProtocol#23295)
fix(sequencer): use targetSlot in tryVoteWhenEscapeHatchOpen under
pipelining (AztecProtocol#23296)
fix(world-state): make fork close idempotent for pruned forks (AztecProtocol#23298)
test(e2e): migrate passing tests to proposer pipelining (AztecProtocol#23275)
chore: update dashboard (AztecProtocol#23312)
chore: Revert "feat(sandbox): support proposer pipelining in local
network" (AztecProtocol#23313)
test: slash on bad attestation (AztecProtocol#23184)
feat(slasher): per-slot data-withholding watcher (A-523, A-525) (AztecProtocol#23116)
test(e2e): enable pipelining on e2e debug trace (AztecProtocol#23301)
test(e2e): enable pipelining on l1-to-l2 test (AztecProtocol#23300)
test(e2e): switch fee_settings to organic fee bumps under pipelining
(AztecProtocol#23303)
fix(ci): retry sqlite3mc-wasm download on transient DNS/TLS failures
(AztecProtocol#23333)
test(e2e): wait for real oracle rotation in fee_settings inflate helper
(AztecProtocol#23334)
test(e2e): anchor e2e_amm PXE to checkpointed tip under pipelining
(AztecProtocol#23336)
fix(spartan-bench): tolerate older node images in SlasherConfig schema
(AztecProtocol#23351)
fix: interrupt prover jobs in stop (AztecProtocol#23358)
test(e2e): enable pipelining on bot, fees, and avm simulator tests
(AztecProtocol#23329)
feat(sentinel): end-of-epoch evaluation with re-execution outcomes
(AztecProtocol#23286)
feat: slash for invalid checkpoint proposals (AztecProtocol#23270)
fix: fork closure in epoch proving jobs (AztecProtocol#23390)
fix(slasher): anchor watcher scans at archiver synced L2 slot (AztecProtocol#23394)
fix: avoid npm uplink for aztec-up local publishes (AztecProtocol#23396)
test(e2e): ignore benign 'Insufficient valid txs' block-build-failed in
epochs tests (AztecProtocol#23424)
chore: refactor weekly proving test wait (AztecProtocol#23395)
refactor: add fifo set (AztecProtocol#23271)
feat(sandbox): support proposer pipelining in local network (AztecProtocol#23327)
fix(p2p): validate BLOCK_TXS in BatchTxRequester (AztecProtocol#23371)
chore(p2p): simplify IBatchRequestTxValidator (AztecProtocol#23373)
feat(sequencer): AutomineSequencer for single-sequencer e2e tests
(AztecProtocol#23354)
fix(prover): wait for previous epoch to be proven (AztecProtocol#23458)
chore: collocate provers (AztecProtocol#23439)
chore: rm staging-ignition (AztecProtocol#23440)
chore: rm unused networks (AztecProtocol#23441)
test(e2e): migrate block_building, multi_validator_node,
publisher_funding, invalid_checkpoint_proposal to pipelining (AztecProtocol#23414)
fix(archiver): reconcile local blocks with L1 checkpoints by block
number (AztecProtocol#23461)
feat: Updated slash conditions on block proposals (AztecProtocol#23466)
test(e2e): migrate HA full test to pipelining (AztecProtocol#23463)
chore: update resource profiles (AztecProtocol#23442)
chore: update debug log levels (AztecProtocol#23456)
test: fix flaky sentinel_status_slash by asserting the fault on the
checkpoint slot (AztecProtocol#23483)
feat(slasher): slash checkpoint equivocation between P2P and L1 (A-980)
(AztecProtocol#23436)
refactor(slasher): rename ATTESTED_DESCENDANT_OF_INVALID ->
PROPOSED_DESCENDANT_OF_CHECKPOINT_WITH_INVALID_ATTESTATIONS (AztecProtocol#23468)
fix: reject block proposals in poisoned slots (AztecProtocol#23411)
fix: retry nargo dep + solc downloads to survive transient DNS drops
(AztecProtocol#23490)
fix: enrich json-rpc tracing (AztecProtocol#23412)
feat: add trace export controls (AztecProtocol#23413)
test(e2e): assert no equivocation offenses in HA full test (AztecProtocol#23496)
test: cover invalid checkpoint proposal slashing (AztecProtocol#23503)
test(e2e): migrate more e2e suites to proposer pipelining (AztecProtocol#23482)
test: flag e2e_slashing_attested_invalid_proposal as flake under
pipelining (AztecProtocol#23501)
test: flag e2e_p2p_duplicate_proposal_slash as flake under pipelining
(AztecProtocol#23515)
test(e2e): require cross-observer agreement on sentinel fault slot
(AztecProtocol#23513)
test: flag e2e_ha_full afterAll hook timeout as flake under pipelining
(AztecProtocol#23524)
fix(e2e): propagate l1ContractsArgs into node config so archiver matches
L1 (AztecProtocol#23514)
test: flag e2e_multi_validator_node_key_store P2P tx-dropped failure as
flake (AztecProtocol#23528)
test(cheat-codes): retry warpL2TimeAtLeastTo in-current-slot test on L1
race (AztecProtocol#23533)
test(e2e_ha_full): parallel HA peer node teardown with per-node deadline
(AztecProtocol#23539)
test: flag e2e_ha_full as flake under HA pipelining (AztecProtocol#23541)
test(ci): skip e2e_ha_full entirely on merge-train/spartan (AztecProtocol#23542)
test(ci): skip e2e_multi_validator_node_key_store entirely on
merge-train/spartan (AztecProtocol#23544)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants