Skip to content

consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896

Open
wen-coding wants to merge 58 commits intomainfrom
wen/persist_appqc_and_blocks
Open

consensus: persist AppQC, blocks, and CommitQCs with async persistence#2896
wen-coding wants to merge 58 commits intomainfrom
wen/persist_appqc_and_blocks

Conversation

@wen-coding
Copy link
Contributor

@wen-coding wen-coding commented Feb 16, 2026

Summary

Crash-safe persistence for availability state (AppQC, signed lane proposals, and CommitQCs). All I/O is fully asynchronous — no disk operations on the critical path or under locks.

  • Extract consensus/persist/ sub-package: Generic A/B file persistence (Persister[T]) with crash-safe alternating writes. BlockPersister manages per-block files in a blocks/ subdirectory. CommitQCPersister manages per-road-index CommitQC files. No-op implementations (NewNoOpPersister, NewNoOpBlockPersister, NewNoOpCommitQCPersister) for test/disabled paths.
  • Generic Persister[T proto.Message] interface: Strongly-typed persistence API; concrete abPersister[T] handles A/B file strategy. NewNoOpPersister[T] silently discards writes. A/B suffixes are unexported; WriteRawFile helper for corruption tests.
  • Block-file persistence (persist/blocks.go): Each signed lane proposal stored as <lane_hex>_<blocknum>.pb. On load, blocks are sorted and truncated at the first gap. PersistBatch encapsulates the I/O: persist blocks, update tips, clean up old files. When noop, all disk I/O is skipped but cursor/tips tracking still works.
  • CommitQC persistence (persist/commitqcs.go): Each CommitQC stored as commitqc_<roadindex>.pb. On load, QCs are sorted and truncated at the first gap. Needed for reconstructing FullCommitQCs on restart. When noop, disk I/O is skipped but next index tracking still works.
  • Single persist goroutine with ordered writes: One background goroutine (always spawned — real or no-op) watches inner state directly (the in-memory state is the queue). collectPersistBatch acquires the lock, waits for new data, reads persistence cursors (inner.nextBlockToPersist, inner.nextCommitQCToPersist) directly from inner state, clamps past pruned entries, and collects the batch. I/O runs with no lock held. No channel, no backpressure. Write order: blocks → CommitQCs → AppQC → delete old CommitQCs — guaranteeing a matching CommitQC is always on disk before its AppQC, and the AppQC is persisted before old CommitQCs are deleted (crash-safe ordering).
  • Persistence cursors in inner state: nextBlockToPersist (per-lane) and nextCommitQCToPersist track how far persistence has progressed. Always initialized — the no-op persist goroutine bumps them immediately. Reconstructed from disk on restart. collectPersistBatch reads these cursors under lock (no parameter passing); markPersisted updates them after successful I/O.
  • Cursor clamp for prune safety: Between persist iterations (lock not held), prune can delete map entries. Cursors are clamped to q.first before reading to prevent nil pointer dereference. Regression test (TestStateWithPersistence) reliably catches this.
  • No-op persister pattern: When persistence is disabled (stateDir is None), newNoOpPersisters() creates no-op implementations for all three persisters. The persist goroutine always runs (same code path as production) — it just skips disk I/O and immediately bumps cursors. This eliminates all nil checks for disabled persistence and ensures tests exercise the exact same control flow as production.
  • persisters struct — pure I/O, always initialized: Groups the three disk persisters (appQC, blocks, commitQCs). Always present on State (not wrapped in Option). All inner state access goes through State methods (collectPersistBatch, markPersisted), keeping a clean separation between orchestration and I/O.
  • Gate consensus on CommitQC persistence: PushCommitQC no longer publishes latestCommitQC directly — the persist goroutine publishes it after writing to disk (or immediately for no-op persisters) via markPersisted. Since consensus subscribes to LastCommitQC() to advance views, it won't proceed until the CommitQC is durable — preventing CommitQC loss on crash.
  • Gate voting on block persistence (avail/subscriptions.go): RecvBatch only yields blocks below the nextBlockToPersist watermark, so votes are only signed for durably written blocks.
  • Wire persistence into availability state (avail/state.go): NewState accepts stateDir, initialises the A/B persister (for AppQC), BlockPersister, and CommitQCPersister, and loads persisted data on restart. Returns error for corrupt state.
  • Restore state on restart (avail/inner.go): Uses queue.reset() to set starting positions, then pushBack to reload entries. Finally calls inner.prune() with the persisted AppQC to advance all queues — same code path as runtime. Returns error for corrupt state (non-consecutive CommitQCs, AppQC without matching CommitQC on disk).
  • Clean up orphaned block files: DeleteBefore removes files for pruned blocks and orphaned lanes (from previous committees). Driven by the persist goroutine observing laneFirsts.
  • prune() returns (bool, error): Simplified from returning a laneFirsts map — callers only need to know if pruning occurred. The persist goroutine reads q.first directly.
  • Test injection via newState constructors: consensus/state.go exposes newState() accepting a custom Persister for test mocks (e.g. failPersister), avoiding fragile field mutation after construction. avail/state.go's NewState accepts stateDir as Option[string].
  • queue.reset() method: Clearly sets the starting position of an empty queue, replacing misleading prune() calls during initialization.
  • Latency TODO: Documents a potential positive feedback loop where slow block persistence causes batches to grow, which makes PersistBatch slower, which delays nextBlockToPersist and RecvBatch, which delays voting, which grows batches further. Mitigation (e.g. persisting one block at a time) deferred.

Ref: sei-protocol/sei-v3#512

Test plan

  • persist/blocks_test.go: load/store, gap truncation, DeleteBefore, orphaned lane cleanup, header mismatch, corrupt files
  • persist/commitqcs_test.go: load/store, gap truncation, DeleteBefore, corrupt files, mismatched index
  • persist/persist_test.go: A/B file crash safety, seq management, corrupt fallback, generic typed API
  • avail/state_test.go: fresh start, load AppQC, load blocks, load both, load commitQCs, load commitQCs with AppQC, corrupt data, headers returns ErrPruned for blocks before loaded range
  • avail/state_test.go (TestStateWithPersistence): end-to-end persist + prune race regression test (5 iterations with disk persistence; reliably catches cursor race without the clamp fix)
  • avail/inner_test.go: newInner with loaded state, newInner with all three (AppQC + CommitQCs + blocks), newInner error cases (non-consecutive CommitQCs, AppQC without matching CommitQC), nextBlockToPersist reconstruction, nextCommitQCToPersist reconstruction, votes queue advancement
  • avail/queue_test.go: newQueue, pushBack, reset, prune, stale prune, prune past next
  • consensus/inner_test.go: consensus inner persistence round-trip, persist error propagation via newState injection
  • data/state_test.go: data state tests

@github-actions
Copy link

github-actions bot commented Feb 16, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 2, 2026, 7:30 PM

@codecov
Copy link

codecov bot commented Feb 16, 2026

Codecov Report

❌ Patch coverage is 80.81535% with 80 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.23%. Comparing base (1c50d43) to head (010e5c4).
⚠️ Report is 10 commits behind head on main.

Files with missing lines Patch % Lines
...mint/internal/autobahn/consensus/persist/blocks.go 78.57% 12 Missing and 12 partials ⚠️
...t/internal/autobahn/consensus/persist/commitqcs.go 79.59% 10 Missing and 10 partials ⚠️
sei-tendermint/internal/autobahn/avail/state.go 82.85% 9 Missing and 9 partials ⚠️
...int/internal/autobahn/consensus/persist/persist.go 80.55% 5 Missing and 2 partials ⚠️
sei-tendermint/internal/autobahn/avail/inner.go 86.66% 3 Missing and 3 partials ⚠️
...ei-tendermint/internal/autobahn/consensus/state.go 58.33% 4 Missing and 1 partial ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##             main    #2896       +/-   ##
===========================================
+ Coverage   58.13%   77.23%   +19.09%     
===========================================
  Files        2113       20     -2093     
  Lines      174071     1735   -172336     
===========================================
- Hits       101204     1340    -99864     
+ Misses      63812      254    -63558     
+ Partials     9055      141     -8914     
Flag Coverage Δ
sei-chain-pr 79.71% <80.81%> (?)
sei-db 70.41% <ø> (+0.90%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-tendermint/internal/autobahn/avail/queue.go 100.00% <100.00%> (ø)
...endermint/internal/autobahn/avail/subscriptions.go 86.66% <100.00%> (+0.30%) ⬆️
...ei-tendermint/internal/autobahn/consensus/inner.go 63.21% <100.00%> (ø)
...ei-tendermint/internal/autobahn/consensus/state.go 83.03% <58.33%> (+0.10%) ⬆️
sei-tendermint/internal/autobahn/avail/inner.go 87.50% <86.66%> (-3.93%) ⬇️
...int/internal/autobahn/consensus/persist/persist.go 80.16% <80.55%> (ø)
sei-tendermint/internal/autobahn/avail/state.go 76.48% <82.85%> (+2.74%) ⬆️
...t/internal/autobahn/consensus/persist/commitqcs.go 79.59% <79.59%> (ø)
...mint/internal/autobahn/consensus/persist/blocks.go 78.57% <78.57%> (ø)

... and 2097 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@wen-coding wen-coding force-pushed the wen/persist_appqc_and_blocks branch from ebf93df to f4a9c1e Compare February 17, 2026 04:50
@wen-coding wen-coding changed the title Port sei-v3 PR #512: persist AppQC and blocks to disk consensus: persist AppQC and blocks to disk Feb 18, 2026
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks to disk consensus: persist AppQC and blocks, async block fsync Feb 18, 2026
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks, async block fsync consensus: persist AppQC and blocks in avail Feb 18, 2026
wen-coding and others added 7 commits February 20, 2026 10:15
Extract generic A/B file persistence into a reusable consensus/persist/
sub-package and add block-file persistence for crash-safe availability
state recovery.

Changes:
- Move persist.go and persist_test.go into consensus/persist/ (git mv to
  preserve history), exporting Persister, NewPersister, WriteAndSync,
  SuffixA, SuffixB.
- Add persist/blocks.go: per-block file persistence using
  <lane_hex>_<blocknum>.pb files in a blocks/ subdirectory, with load,
  delete-before, and header-mismatch validation.
- Wire avail.NewState to accept stateDir, create A/B persister for
  AppQC and BlockPersister for signed lane proposals, and restore both
  on restart (contiguous block runs, queue alignment).
- Update avail/state.go to persist AppQC on prune and delete obsolete
  block files after each AppQC advance.
- Thread PersistentStateDir from consensus.Config through to
  avail.NewState.
- Expand consensus/inner.go doc comment with full persistence design
  (what, why, recovery, write behavior, rebroadcasting).
- Move TestRunOutputsPersistErrorPropagates to consensus/inner_test.go
  for proper package alignment.
- Add comprehensive tests for blocks persistence (empty dir, multi-lane,
  corrupt/mismatched skip, DeleteBefore, filename roundtrip).

Ref: sei-protocol/sei-v3#512
Co-authored-by: Cursor <cursoragent@cursor.com>
Move persisted data loading (AppQC deserialization and block loading)
into a dedicated function for readability.

Co-authored-by: Cursor <cursoragent@cursor.com>
Move block sorting, contiguous-prefix extraction, and gap truncation
from avail/inner.go into persist/blocks.go so all disk-recovery logic
lives in one place. This isolates storage concerns in the persistence
layer, simplifying newInner and preparing for a future storage backend
swap.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
PushBlock and ProduceBlock now add blocks to the in-memory queue
immediately and send a persist job to a background goroutine via a
buffered channel. The background writer fsyncs each block to disk
and advances a per-lane blockPersisted cursor under the inner lock.

RecvBatch gates on this cursor so votes are only signed for blocks
that have been durably written to disk. When persistence is disabled
(testing), the cursor is nil and RecvBatch falls back to bq.next.

Co-authored-by: Cursor <cursoragent@cursor.com>
newInner no longer takes a separate persistEnabled bool; loaded != nil
already implies persistence is enabled. Tests with loaded data now
correctly reflect this.

Co-authored-by: Cursor <cursoragent@cursor.com>
blockPersisted is reconstructed from disk on restart, not persisted
itself. Move its creation to just above the block restoration loop
(past the loaded==nil early return) so the code reads top-down.

Co-authored-by: Cursor <cursoragent@cursor.com>
@wen-coding wen-coding force-pushed the wen/persist_appqc_and_blocks branch from 05beddb to 2f0bbad Compare February 20, 2026 18:46
wen-coding and others added 6 commits February 20, 2026 10:48
Co-authored-by: Cursor <cursoragent@cursor.com>
Move persistCh, persistJob, and the writer loop from avail/State into
BlockPersister.Queue + BlockPersister.Run, so callers just call Queue()
and the persist layer owns the channel, buffer sizing, and drain loop.

Queue blocks with context to avoid holes in the sequential
blockPersisted cursor (which would permanently stall voting).
Call sites use utils.IgnoreAfterCancel to swallow shutdown errors.

Co-authored-by: Cursor <cursoragent@cursor.com>
…l/sei-chain into wen/persist_appqc_and_blocks
@wen-coding wen-coding changed the title consensus: persist AppQC and blocks in avail consensus: persist AppQC and blocks with async block persistence Feb 20, 2026
wen-coding and others added 5 commits February 22, 2026 09:27
Remove redundant loop that explicitly zeroed every lane. Map zero-values
handle lanes without loaded blocks; only lanes with blocks on disk need
an explicit write. Add comment explaining why starting at 0 is safe.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Add comment explaining why votes are not persisted and why the votes
  queue must be advanced past loaded blocks on restart.
- Consolidate redundant tests: fold blockPersisted assertions into
  existing tests, remove TestNewInnerLoadedBlocksContiguousPrefix.
- Add test that headers() returns ErrPruned for blocks before the
  loaded range (verifies votes queue advancement prevents hangs).

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace require.Contains(err.Error(), "...") with require.Error(err).
Callers don't branch on specific error messages, so string matching
adds no value; the test name already documents what is being rejected.

Co-authored-by: Cursor <cursoragent@cursor.com>
BlockPersister now owns the per-lane contiguous persistence cursor and
passes the exclusive upper bound to the callback. The caller no longer
needs to compute n+1 or guard against out-of-order completion.

This localizes the ordering assumption (FIFO queue) inside
BlockPersister, so switching to parallel storage only requires changing
BlockPersister.Run.

Co-authored-by: Cursor <cursoragent@cursor.com>
Also add TODO for retry on persistence failure.

Co-authored-by: Cursor <cursoragent@cursor.com>
// When persistence is disabled, publish immediately.
// When enabled, the persist goroutine publishes after writing to disk,
// so consensus won't advance until the CommitQC is durable.
if inner.nextBlockToPersist == nil {
Copy link
Contributor

@pompon0 pompon0 Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't compare to nil, either make nextBlockToPersist into an Option, or check persisters field presence instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively, you can move branching to the persisters logic - i.e. have a dummy persister task which will just bump nextBlockToPersist and latestCommitQC without persisting anything - it would make the control flow in tests and in prod more similar imo.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added dummy persisters.

wen-coding and others added 4 commits February 27, 2026 11:14
Moved DeleteBefore after AppQC persist so a crash between the two
never leaves the on-disk AppQC pointing at a deleted CommitQC.
Extracted the persist loop into its own runPersist method.

Made-with: Cursor
Instead of conditionally spawning the persist goroutine and checking
nextBlockToPersist == nil in PushCommitQC and RecvBatch, always run the
persist goroutine with either real or no-op persisters. The no-op
persisters skip disk I/O but still track cursors, making the test and
production code paths identical.

Made-with: Cursor
Instead of batching all block writes and updating nextBlockToPersist
once at the end, persist each block individually and advance the cursor
after each fsync. This makes vote latency equal to single-block write
time regardless of backlog, preventing the positive feedback loop where
slow persistence causes ever-growing batches.

Also moves persistence cursors (nextBlockToPersist, nextCommitQCToPersist)
into inner state, read directly by collectPersistBatch under lock.

Made-with: Cursor
}
first := bs[0].Number
q.reset(first)
for _, b := range bs {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency with populating CommitQCs, you might want to check the block numbers here as well. Maybe also put parent hash checking here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

import (
"testing"

"github.com/stretchr/testify/require"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

libs/utils/require

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func TestQueueReset(t *testing.T) {
q := newQueue[uint64, string]()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a test for reset() after populating the queue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

const innerFile = "avail_inner"

// loadPersistedState loads persisted avail state from disk and creates persisters for ongoing writes.
func loadPersistedState(dir string) (*loadedAvailState, persisters, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for consistency, you might want to accept dir Option[string] and pass it to each persister constructor, which will act as a noop in case no dir is provided (given that you have added a noop mode to each of them anyway).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the caveat with how noop mode currently works is that you enable noop mode for each persister separately. Which is fine as long as fields of persisters type are private, and therefore it is not possible to set a combination of noop and real persisters (which would result in an invalid avail State on restart). Just a FYI

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}
s.markBlockPersisted(h.Lane(), h.BlockNumber()+1)
}
if err := pers.blocks.DeleteBefore(batch.laneFirsts); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deleting blocks before persisting AppQC may lead to weird avail State, where we have a gap between the AppQC and the first persisted block of the lane. I don't see any obvious reason why it would be a bug in consensus protocol, but it is a nice invariant to have.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved all deletion to after AppQC is persisted

if err := pers.commitQCs.DeleteBefore(batch.commitQCFirst); err != nil {
return fmt.Errorf("commitqc deleteBefore: %w", err)
}
s.markCommitQCsPersisted(commitQCCur, utils.Some(batch.commitQCs[len(batch.commitQCs)-1]))
Copy link
Contributor

@pompon0 pompon0 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: that can happen just after persisting commitQCs. CommitQCCur seems redundant - NextOpt(persistedCommitQC) should do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

appQC utils.Option[*types.AppQC]
laneFirsts map[types.LaneID]types.BlockNumber
commitQCFirst types.RoadIndex
commitQCCur types.RoadIndex // snapshot of nextCommitQCToPersist (clamped)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: commitQCNext for consistency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

for {
for lane, bq := range inner.blocks {
for i := max(bq.first, r.next[lane]); i < bq.next; i++ {
upperBound := min(bq.next, inner.nextBlockToPersist[lane])
Copy link
Contributor

@pompon0 pompon0 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: add a TODO, that nextBlockToPersist might deserve a separate Watch to avoid waking up too often too many tasks (a potential optimization).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// PersistBatch persists a batch of blocks to disk, updates tips after each
// successful write, and cleans up old files below laneFirsts.
// Returns the updated tips snapshot.
func (bp *BlockPersister) PersistBatch(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

sorted := slices.Sorted(maps.Keys(bs))
var contiguous []LoadedBlock
for i, n := range sorted {
if i > 0 && n != sorted[i-1]+1 {
Copy link
Contributor

@pompon0 pompon0 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think here we should load the LAST contiguous range, no? Actually, this is more complicated. We need pruning to be consistent across the CommitQCs, blocks and AppQC. In particular which range is the one to be loaded depends on the AppQC. Also the blocks which do not belong to the relevant range should be pruned, so that they are not loaded later by accident - let's say that we have a gap, and but during loading we figured that we did not persist the AppQC for the last range, so we loaded the earlier range. Then after a while, we persisted the blocks from the gap and restarted again. Suddenly there is no gap any more, however the old blocks after a gap do not match the blocks before the gap.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, since persisting may fail before AppQC is persisted, we might end up with a block range which exceeds the lane capacity - this also needs to be taken care of during loading.

require.NoError(t, utils.TestDiff(b0, blocks[lane][0].Proposal))
}

func TestLoadCorruptMidSequenceTruncatesAtGap(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if it is not too complicated, it would be nice to have a test which makes avail.State to generate a gap (i.e. persist some stuff, then suddenly receive future data, then persist it) and try to restart the state from it, ensuring that all the latest data is actually loaded back.


// PersistCommitQC writes a CommitQC to its own file.
func (cp *CommitQCPersister) PersistCommitQC(qc *types.CommitQC) error {
idx := qc.Index()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: you might want to add a check that qc.Index() >= cp.next and return an error otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


var contiguous []LoadedCommitQC
for i, idx := range sorted {
if i > 0 && idx != sorted[i-1]+1 {
Copy link
Contributor

@pompon0 pompon0 Mar 2, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, we need the LAST contiguous range.

…tors private

Each persister constructor (NewPersister, NewBlockPersister,
NewCommitQCPersister) now accepts utils.Option[string] and returns a
no-op implementation when None is passed. The previously-public no-op
constructors are now private, ensuring they can only be created through
the unified constructor path.

This simplifies avail.NewState to a single loadPersistedState(stateDir)
call with no branching, and guarantees all persisters are atomically
either all-real or all-noop.

Also adds block number/parent hash checks during inner block restoration
and a queue reset-after-populate test.

Made-with: Cursor
…CNext

Block DeleteBefore now happens after AppQC is persisted, matching the
existing CommitQC deletion ordering. This prevents a crash from leaving
a gap between the on-disk AppQC and the first persisted block.

Also renamed commitQCCur to commitQCNext for consistency with the
"next" naming convention used elsewhere (queue.next,
nextCommitQCToPersist).

Made-with: Cursor
BlockPersister.PersistBatch, LoadTips, and the tips AtomicSend field
are unused now that runPersist drives persistence via PersistBlock +
markBlockPersisted directly. Removed all three.

Added an order check in PersistCommitQC: returns error if idx < next
(caller bug). Fixed stale comment referencing AtomicSend tips.

Made-with: Cursor
latestCommitQC is now only written by markCommitQCsPersisted (after disk
write) and on startup. The Store() in prune() is removed so that
PushAppQC fast-forward no longer advances the cursor past what's
actually persisted. collectPersistBatch derives the cursor via
NextIndexOpt(latestCommitQC) with a max clamp against commitQCs.first
to handle queue jumps from prune().

Made-with: Cursor
Comment on lines +671 to +675
for lane, q := range inner.blocks {
if inner.nextBlockToPersist[lane] < q.next {
return true
}
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Comment on lines +681 to +687
for lane, q := range inner.blocks {
start := max(inner.nextBlockToPersist[lane], q.first)
for n := start; n < q.next; n++ {
b.blocks = append(b.blocks, q.q[n])
}
b.laneFirsts[lane] = q.first
}

Check warning

Code scanning / CodeQL

Iteration over map Warning

Iteration over map may be a possible source of non-determinism
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants