Skip to content

fix: protect HA validator Postgres from cluster scale-down#23772

Merged
PhilWindle merged 1 commit into
merge-train/spartanfrom
spy/ha-postgres-scale-down-protection
Jun 1, 2026
Merged

fix: protect HA validator Postgres from cluster scale-down#23772
PhilWindle merged 1 commit into
merge-train/spartanfrom
spy/ha-postgres-scale-down-protection

Conversation

@spypsy

@spypsy spypsy commented Jun 1, 2026

Copy link
Copy Markdown
Member

Summary

  • Annotate validator HA Postgres pods with cluster-autoscaler.kubernetes.io/safe-to-evict: "false" so GKE does not evict them during node scale-down.
  • Add a PodDisruptionBudget (minAvailable: 1) matching the aztec-node chart pattern for voluntary disruptions.

Motivation: CI validator_nuke_and_suppression.test.ts missed slot 445 when autoscaler scale-down deleted v5-scenario-2-validator-ha-db-postgres-0, leaving validators with ECONNREFUSED on the DB service for ~53s.

Prevent GKE autoscaler from evicting the single-replica signing DB during
node scale-down, which caused missed slots in validator nuke e2e.
@PhilWindle PhilWindle merged commit 772f03a into merge-train/spartan Jun 1, 2026
17 checks passed
@PhilWindle PhilWindle deleted the spy/ha-postgres-scale-down-protection branch June 1, 2026 16:14
danielntmd pushed a commit to danielntmd/aztec-packages that referenced this pull request Jun 4, 2026
BEGIN_COMMIT_OVERRIDE
chore: deploy next-net and reuse contracts (AztecProtocol#23761)
chore: turn on autoscaling (AztecProtocol#23706)
chore: rename staging-public to staging (AztecProtocol#23767)
chore(p2p): use sync hash for tx validation hashing (AztecProtocol#23768)
test(e2e): wait warmup slots in slashing tests (AztecProtocol#23719)
feat(api)!: make getTxReceipt the single tx-lookup API (AztecProtocol#23660)
fix: cap cloned n_tps fees within sponsored FPC balance (AztecProtocol#23770)
fix: protect HA validator Postgres from cluster scale-down (AztecProtocol#23772)
refactor: remove non-pipelining sequencer code path (AztecProtocol#23665)
feat(archiver): add getL2ToL1MembershipWitness node RPC (AztecProtocol#23646)
fix(p2p)!: revamp BLOCK_TXS validations (AztecProtocol#23778)
chore: name the bots (AztecProtocol#23795)
fix(e2e): ensure BBSync init (AztecProtocol#23793)
fix(p2p)!: fix BLOCK_TXS response under proposer equivocation (AztecProtocol#23786)
fix: reconnect L1 port-forward after epoch-boundary sleep in n_tps_prove
(AztecProtocol#23800)
chore: add empty vscode settings for yarn-project (AztecProtocol#23808)
fix(sequencer): only warn about missing proposed checkpoint once overdue
(AztecProtocol#23807)
fix: refresh n_tps fee quotes during sustained benchmark (AztecProtocol#23797)
fix(sequencer): enforce build-frame deadlines and align
attestation/publish windows (AztecProtocol#23776)
END_COMMIT_OVERRIDE
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants