Skip to content

feat(blockchain): defer early gossip messages to their production interval#318

Draft
MegaRedHand wants to merge 1 commit intomainfrom
defer-early-gossip-arrivals
Draft

feat(blockchain): defer early gossip messages to their production interval#318
MegaRedHand wants to merge 1 commit intomainfrom
defer-early-gossip-arrivals

Conversation

@MegaRedHand
Copy link
Copy Markdown
Collaborator

Summary

  • Gossip attestations / aggregated attestations arriving before their slot has started locally are currently rejected with AttestationTooFarInFuture and dropped, wasting the peer's signature and re-aggregation work.
  • This change buffers them via send_after and replays each message at exactly the interval an honest validator would have produced it (interval 1 of slot S for single attestations, interval 2 for aggregated).
  • Bounds on receipt prevent abuse: messages more than MAX_DEFER_FUTURE_SLOTS (4) ahead are dropped, and the in-flight retry budget is capped at MAX_DEFERRED_GOSSIP_MESSAGES (1024).

Closes (potentially) #307.

⚠️ Draft — necessity not yet verified

We don't have direct evidence that early-gossip arrivals actually happen at meaningful rates on our network. Before merging this we should:

  1. Add a counter for AttestationTooFarInFuture rejections (cheap diagnostic, separate PR).
  2. Run a devnet for a non-trivial window and read the counter from :5054/metrics.
  3. Use the reading to decide:
    • 0 / near-zero — fix isn't justified, close this PR.
    • non-zero but small — modest benefit, ship.
    • frequent — clear win, ship and investigate why peers gossip that early.

The recent leanSpec PR #682 (branch port/leanspec-682-tighten-attestation-future-bound) tightens the bound from a full slot to ~800ms of clock skew. Once that lands, this rejection becomes much easier to hit at the edge, which would tilt the answer toward "yes, fix it." Suggested ordering: land the diagnostic counter, land #682, measure, then decide on this PR.

Implementation notes

  • New retry envelopes RetryDeferredAttestation / RetryDeferredAggregated are scheduled with send_after(delay, …) where delay is computed by the pure function defer_delay(store_time, slot, interval_in_slot).
  • A deferred_gossip_count: usize on BlockChainServer bounds in-flight retry timers; the count is incremented on enqueue and decremented when the retry handler runs.
  • defer_delay and the retry-target intervals are unit-tested in a new defer_tests module in lib.rs (production-interval timing for both kinds, slot-window edge, past-interval collapse to zero).
  • The on-time path is unchanged: is_gossip_too_early mirrors store::validate_attestation_data's rejection rule, and only future-slot messages divert to the retry path.

Test plan

  • cargo fmt --all
  • cargo clippy --workspace --all-targets -- -D warnings
  • cargo test --workspace --release — 327 passed, 6 ignored, 0 failed
  • Add lean_attestations_rejected_too_early_total counter and observe on devnet
  • Decide whether to merge based on the reading

…erval

Gossip attestations arriving before their slot has started locally are
rejected with `AttestationTooFarInFuture` and dropped today, wasting the
peer's work even though the next slot tick is at most a few seconds
away. Buffer them via `send_after` and replay each message at the
interval an honest validator would have produced it (interval 1 of slot
S for single attestations, interval 2 for aggregated).

Bounds on receipt prevent abuse: messages more than
MAX_DEFER_FUTURE_SLOTS (4) ahead are dropped, and the in-flight retry
budget is capped at MAX_DEFERRED_GOSSIP_MESSAGES (1024).

Refs #307
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant