Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Changes

* Added publisher-mode synchronization option for failover scenarios with early P2P infrastructure readiness [#3222](https://github.com/evstack/ev-node/pull/3222)
* Improve execution/evm check for stored meta not stale [#3221](https://github.com/evstack/ev-node/pull/3221)

## v1.1.0-rc.1
Expand Down
8 changes: 4 additions & 4 deletions apps/evm/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@ module github.com/evstack/ev-node/apps/evm

go 1.25.7

// replace (
// github.com/evstack/ev-node => ../../
// github.com/evstack/ev-node/execution/evm => ../../execution/evm
// )
replace (
github.com/evstack/ev-node => ../../
github.com/evstack/ev-node/execution/evm => ../../execution/evm
)
Comment on lines +5 to +8
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check for go.work file (monorepo pattern)
fd -H 'go.work' .

# Explore repository structure
ls -la apps/ 2>/dev/null || echo "apps directory structure"
find . -name 'go.mod' -type f | head -20

Repository: evstack/ev-node

Length of output: 631


🏁 Script executed:

# Check the complete go.mod for apps/evm
cat -n apps/evm/go.mod

Repository: evstack/ev-node

Length of output: 14568


🏁 Script executed:

# Look for CI/CD configuration to understand build patterns
find . -name '.github' -type d
find . -type f \( -name '*.yml' -o -name '*.yaml' \) -path '*/.github/workflows/*' | head -10

Repository: evstack/ev-node

Length of output: 434


🏁 Script executed:

# Check if there are any docs or comments about module publishing
grep -r "apps/evm" --include='*.md' --include='*.txt' . 2>/dev/null | head -10

Repository: evstack/ev-node

Length of output: 579


Remove repo-local replace directives from the published module manifest.

These relative paths only resolve within this repository layout and break downstream builds when github.com/evstack/ev-node/apps/evm is used as a dependency. The module is published with independent versioning (per RELEASE.md); local wiring should use the existing go.work pattern or be removed before merge.

Suggested adjustment
-replace (
-	github.com/evstack/ev-node => ../../
-	github.com/evstack/ev-node/execution/evm => ../../execution/evm
-)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
replace (
github.com/evstack/ev-node => ../../
github.com/evstack/ev-node/execution/evm => ../../execution/evm
)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/evm/go.mod` around lines 5 - 8, Remove the repo-local replace directives
from the module manifest (the replace block that points
github.com/evstack/ev-node and github.com/evstack/ev-node/execution/evm to
relative ../../ paths) so the published module no longer contains
repository-specific local wiring; keep the go.mod clean for downstream consumers
and, if local development linking is needed, use a go.work file or temporary
local replace only in your working copy before publishing.


require (
github.com/ethereum/go-ethereum v1.17.2
Expand Down
4 changes: 0 additions & 4 deletions apps/evm/go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -472,12 +472,8 @@ github.com/ethereum/go-bigmodexpfix v0.0.0-20250911101455-f9e208c548ab h1:rvv6MJ
github.com/ethereum/go-bigmodexpfix v0.0.0-20250911101455-f9e208c548ab/go.mod h1:IuLm4IsPipXKF7CW5Lzf68PIbZ5yl7FFd74l/E0o9A8=
github.com/ethereum/go-ethereum v1.17.2 h1:ag6geu0kn8Hv5FLKTpH+Hm2DHD+iuFtuqKxEuwUsDOI=
github.com/ethereum/go-ethereum v1.17.2/go.mod h1:KHcRXfGOUfUmKg51IhQ0IowiqZ6PqZf08CMtk0g5K1o=
github.com/evstack/ev-node v1.1.0-rc.1 h1:NtPuuDLqN2h4/edu5zxRlZAxmLkTG3ncXBO2PlCDvVs=
github.com/evstack/ev-node v1.1.0-rc.1/go.mod h1:6rhWWzuyiqNn/erDmWCk1aLxUuQphyOGIRq56/smSyk=
github.com/evstack/ev-node/core v1.0.0 h1:s0Tx0uWHme7SJn/ZNEtee4qNM8UO6PIxXnHhPbbKTz8=
github.com/evstack/ev-node/core v1.0.0/go.mod h1:n2w/LhYQTPsi48m6lMj16YiIqsaQw6gxwjyJvR+B3sY=
github.com/evstack/ev-node/execution/evm v1.0.0 h1:UTAdCrnPsLoGzSgsBx4Kv76jkXpMmHBIpNv3MxyzWPo=
github.com/evstack/ev-node/execution/evm v1.0.0/go.mod h1:UrqkiepfTMiot6M8jnswgu3VU8SSucZpaMIHIl22/1A=
github.com/fatih/color v1.7.0/go.mod h1:Zm6kSWBoL9eyXnKyktHP6abPY2pDugNf5KwzbycvMj4=
github.com/fatih/color v1.10.0/go.mod h1:ELkj/draVOlAH/xkhN6mQ50Qd0MPOk5AAr3maGEBuJM=
github.com/fatih/color v1.13.0/go.mod h1:kLAiJbzzSOZDVNGyDpeOxJ47H46qBXwg5ILebYFFOfk=
Expand Down
13 changes: 10 additions & 3 deletions block/internal/syncing/syncer.go
Original file line number Diff line number Diff line change
Expand Up @@ -694,6 +694,12 @@ var (
// TrySyncNextBlock attempts to sync the next available block
// the event is always the next block in sequence as processHeightEvent ensures it.
func (s *Syncer) TrySyncNextBlock(ctx context.Context, event *common.DAHeightEvent) error {
return s.trySyncNextBlockWithState(ctx, event, s.getLastState())
}

// trySyncNextBlockWithState attempts to sync the next available block using
// the provided current state as the validation/apply baseline.
func (s *Syncer) trySyncNextBlockWithState(ctx context.Context, event *common.DAHeightEvent, currentState types.State) error {
select {
case <-ctx.Done():
return ctx.Err()
Expand All @@ -703,7 +709,6 @@ func (s *Syncer) TrySyncNextBlock(ctx context.Context, event *common.DAHeightEve
header := event.Header
data := event.Data
nextHeight := event.Header.Height()
currentState := s.getLastState()
headerHash := header.Hash().String()

s.logger.Info().Uint64("height", nextHeight).Str("source", string(event.Source)).Msg("syncing block")
Expand Down Expand Up @@ -1201,6 +1206,7 @@ func (s *Syncer) RecoverFromRaft(ctx context.Context, raftState *raft.RaftBlockS
s.logger.Debug().Err(err).Msg("no state in store, using genesis defaults for recovery")
currentState = types.State{
ChainID: s.genesis.ChainID,
InitialHeight: s.genesis.InitialHeight,
LastBlockHeight: s.genesis.InitialHeight - 1,
}
}
Expand All @@ -1214,11 +1220,12 @@ func (s *Syncer) RecoverFromRaft(ctx context.Context, raftState *raft.RaftBlockS
return nil
} else if currentState.LastBlockHeight+1 == raftState.Height { // raft is 1 block ahead
// apply block
err := s.TrySyncNextBlock(ctx, &common.DAHeightEvent{
event := &common.DAHeightEvent{
Header: &header,
Data: &data,
Source: "",
})
}
err := s.trySyncNextBlockWithState(ctx, event, currentState)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change is effectivelly to use the correct state when we override it at L1207 right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, before it was re-read in the method and could be empty

if err != nil {
return err
}
Expand Down
116 changes: 116 additions & 0 deletions block/internal/syncing/syncer_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ import (
"github.com/evstack/ev-node/pkg/config"
datypes "github.com/evstack/ev-node/pkg/da/types"
"github.com/evstack/ev-node/pkg/genesis"
"github.com/evstack/ev-node/pkg/raft"
signerpkg "github.com/evstack/ev-node/pkg/signer"
"github.com/evstack/ev-node/pkg/signer/noop"
"github.com/evstack/ev-node/pkg/store"
Expand Down Expand Up @@ -306,6 +307,121 @@ func TestSequentialBlockSync(t *testing.T) {
requireEmptyChan(t, errChan)
}

func TestSyncer_RecoverFromRaft_BootstrapsStateWhenUninitialized(t *testing.T) {
ds := dssync.MutexWrap(datastore.NewMapDatastore())
st := store.New(ds)

cm, err := cache.NewManager(config.DefaultConfig(), st, zerolog.Nop())
require.NoError(t, err)

addr, pub, signer := buildSyncTestSigner(t)
gen := genesis.Genesis{
ChainID: "1234",
InitialHeight: 1,
StartTime: time.Now().Add(-time.Second),
ProposerAddress: addr,
}

mockExec := testmocks.NewMockExecutor(t)
mockHeaderStore := extmocks.NewMockStore[*types.P2PSignedHeader](t)
mockDataStore := extmocks.NewMockStore[*types.P2PData](t)
s := NewSyncer(
st,
mockExec,
nil,
cm,
common.NopMetrics(),
config.DefaultConfig(),
gen,
mockHeaderStore,
mockDataStore,
zerolog.Nop(),
common.DefaultBlockOptions(),
make(chan error, 1),
nil,
)

// lastState intentionally not initialized to simulate recovery-before-start path.
data := makeData(gen.ChainID, 1, 0)
headerBz, hdr := makeSignedHeaderBytes(t, gen.ChainID, 1, addr, pub, signer, []byte("app0"), data, nil)
dataBz, err := data.MarshalBinary()
require.NoError(t, err)

mockExec.EXPECT().
ExecuteTxs(mock.Anything, mock.Anything, uint64(1), mock.Anything, mock.Anything).
Return([]byte("app1"), nil).
Once()

err = s.RecoverFromRaft(t.Context(), &raft.RaftBlockState{
Height: 1,
Hash: hdr.Hash(),
Header: headerBz,
Data: dataBz,
})
require.NoError(t, err)

state, err := st.GetState(t.Context())
require.NoError(t, err)
require.Equal(t, gen.ChainID, state.ChainID)
require.Equal(t, uint64(1), state.LastBlockHeight)
}

func TestSyncer_RecoverFromRaft_KeepsStrictValidationAfterStateExists(t *testing.T) {
ds := dssync.MutexWrap(datastore.NewMapDatastore())
st := store.New(ds)

cm, err := cache.NewManager(config.DefaultConfig(), st, zerolog.Nop())
require.NoError(t, err)

addr, pub, signer := buildSyncTestSigner(t)
gen := genesis.Genesis{
ChainID: "1234",
InitialHeight: 1,
StartTime: time.Now().Add(-time.Second),
ProposerAddress: addr,
}

mockExec := testmocks.NewMockExecutor(t)
mockHeaderStore := extmocks.NewMockStore[*types.P2PSignedHeader](t)
mockDataStore := extmocks.NewMockStore[*types.P2PData](t)
s := NewSyncer(
st,
mockExec,
nil,
cm,
common.NopMetrics(),
config.DefaultConfig(),
gen,
mockHeaderStore,
mockDataStore,
zerolog.Nop(),
common.DefaultBlockOptions(),
make(chan error, 1),
nil,
)

// Non-empty state must remain strictly validated.
s.SetLastState(types.State{
ChainID: "wrong-chain",
InitialHeight: 1,
LastBlockHeight: 0,
})

data := makeData(gen.ChainID, 1, 0)
headerBz, hdr := makeSignedHeaderBytes(t, gen.ChainID, 1, addr, pub, signer, []byte("app0"), data, nil)
dataBz, err := data.MarshalBinary()
require.NoError(t, err)

err = s.RecoverFromRaft(t.Context(), &raft.RaftBlockState{
Height: 1,
Hash: hdr.Hash(),
Header: headerBz,
Data: dataBz,
})
require.Error(t, err)
require.ErrorContains(t, err, "invalid chain ID")
}

func TestSyncer_processPendingEvents(t *testing.T) {
ds := dssync.MutexWrap(datastore.NewMapDatastore())
st := store.New(ds)
Expand Down
32 changes: 18 additions & 14 deletions docs/guides/raft_production.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ This guide details the Raft consensus implementation in `ev-node`, used for High

## Configuration

Raft is configured via CLI flags or the `config.toml` file under the `[raft]` (or `[rollkit.raft]`) section.
Raft is configured via CLI flags or the `config.toml` file under the `[raft]` (or `[evnode.raft]`) section.

### Essential Flags

Expand All @@ -33,7 +33,7 @@ Raft is configured via CLI flags or the `config.toml` file under the `[raft]` (o
| `--evnode.raft.raft_addr` | `raft.raft_addr` | TCP address for Raft transport. | `0.0.0.0:5001` (Bind to private IP) |
| `--evnode.raft.raft_dir` | `raft.raft_dir` | Directory for Raft data. | `/data/raft` (Must be persistent) |
| `--evnode.raft.peers` | `raft.peers` | Comma-separated list of peer addresses in format `nodeID@host:port`. | `node-1@10.0.0.1:5001,node-2@10.0.0.2:5001,node-3@10.0.0.3:5001` |
| `--evnode.raft.bootstrap` | `raft.bootstrap` | Bootstrap the cluster. **Required** for initial setup. | `true` (See Limitations) |
| `--evnode.raft.bootstrap` | `raft.bootstrap` | Compatibility flag. Startup mode is selected automatically from persisted raft configuration state. | optional |

### Timeout Tuning

Expand All @@ -55,11 +55,15 @@ Ideally, a failover should complete within `2 * BlockTime` to minimize user impa

## Production Deployment Principles

### 1. Static Peering & Bootstrap
Current implementation requires **Bootstrap Mode** (`--evnode.raft.bootstrap=true`) for all nodes participating in the cluster initialization.
* **All nodes** should list the full set of peers in `--evnode.raft.peers`.
### 1. Static Peering & Automatic Startup Mode
Use static peering with automatic mode selection from local raft configuration:
* If local raft configuration already exists in `--evnode.raft.raft_dir`, the node starts in rejoin mode.
* If no local raft configuration exists yet, the node bootstraps from configured peers.
* `--evnode.raft.bootstrap` is retained for compatibility but does not control mode selection.
* **All configured cluster members** should list the full set of peers in `--evnode.raft.peers`.
* The `peers` list format is strict: `NodeID@Host:Port`.
* **Limitation**: Dynamic addition of peers (Run-time Membership Changes) via RPC/CLI is not currently exposed. The cluster membership is static based on the initial bootstrap configuration.
* **Limitation**: Dynamic addition of peers (run-time membership changes) via RPC/CLI is not currently exposed.
* **Not supported**: Joining an existing cluster as a brand-new node that was not part of the initial static membership.

### 2. Infrastructure Requirements
* **Encrypted Network (CRITICAL)**: Raft traffic is **unencrypted** (plain TCP). You **MUST** run the cluster inside a private network, VPN, or encrypted mesh (e.g., WireGuard, Tailscale). **Never expose Raft ports to the public internet**; doing so allows attackers to hijack the cluster consensus.
Expand All @@ -86,13 +90,13 @@ Monitor the following metrics (propagated via Prometheus if enabled):

```bash
./ev-node start \
--node.aggregator \
--raft.enable \
--raft.node_id="node-1" \
--raft.raft_addr="0.0.0.0:5001" \
--raft.raft_dir="/var/lib/ev-node/raft" \
--raft.bootstrap=true \
--raft.peers="node-1@10.0.1.1:5001,node-2@10.0.1.2:5001,node-3@10.0.1.3:5001" \
--p2p.listen_address="/ip4/0.0.0.0/tcp/26656" \
--evnode.node.aggregator=true \
--evnode.raft.enable=true \
--evnode.raft.node_id="node-1" \
--evnode.raft.raft_addr="0.0.0.0:5001" \
--evnode.raft.raft_dir="/var/lib/ev-node/raft" \
--evnode.raft.bootstrap=true \
--evnode.raft.peers="node-1@10.0.1.1:5001,node-2@10.0.1.2:5001,node-3@10.0.1.3:5001" \
--evnode.p2p.listen_address="/ip4/0.0.0.0/tcp/26656" \
...other flags
```
12 changes: 11 additions & 1 deletion docs/learn/config.md
Original file line number Diff line number Diff line change
Expand Up @@ -1321,7 +1321,7 @@ _Constant:_ `FlagRaftDir`
### Raft Bootstrap

**Description:**
If true, bootstraps a new Raft cluster. Only set this on the very first node when initializing a new cluster.
Legacy compatibility flag. Startup mode is now auto-selected from persisted raft configuration state, so this flag is not used to choose bootstrap vs rejoin.

**YAML:**

Expand Down Expand Up @@ -1352,6 +1352,16 @@ raft:
_Default:_ `""` (empty)
_Constant:_ `FlagRaftPeers`

### Raft Startup Mode

Raft startup mode is selected automatically from local raft configuration state:

* If the node already has persisted raft configuration in `raft.raft_dir`, it starts in rejoin mode.
* If no raft configuration exists yet, it bootstraps a cluster from configured peers.
* `raft.bootstrap` is retained for compatibility but does not control mode selection.

`--evnode.raft.rejoin` has been removed.

### Raft Snap Count

**Description:**
Expand Down
47 changes: 45 additions & 2 deletions node/failover.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ type failoverState struct {
dataSyncService *evsync.DataSyncService
rpcServer *http.Server
bc *block.Components
raftNode *raft.Node
isAggregator bool

// catchup fields — used when the aggregator needs to sync before producing
catchupEnabled bool
Expand Down Expand Up @@ -172,13 +174,41 @@ func setupFailoverState(
dataSyncService: dataSyncService,
rpcServer: rpcServer,
bc: bc,
raftNode: raftNode,
isAggregator: isAggregator,
store: rktStore,
catchupEnabled: catchupEnabled,
catchupTimeout: nodeConfig.Node.CatchupTimeout.Duration,
daBlockTime: nodeConfig.DA.BlockTime.Duration,
}, nil
}

// shouldStartSyncInPublisherMode avoids startup deadlock when a raft leader boots
// with empty sync stores and no peer can serve height 1 yet.
func (f *failoverState) shouldStartSyncInPublisherMode(ctx context.Context) bool {
if !f.isAggregator || f.raftNode == nil || !f.raftNode.IsLeader() {
return false
}

storeHeight, err := f.store.Height(ctx)
if err != nil {
f.logger.Warn().Err(err).Msg("cannot determine store height; keeping blocking sync startup")
return false
}
headerHeight := f.headerSyncService.Store().Height()
dataHeight := f.dataSyncService.Store().Height()
if headerHeight > 0 || dataHeight > 0 {
return false
}

f.logger.Info().
Uint64("store_height", storeHeight).
Uint64("header_height", headerHeight).
Uint64("data_height", dataHeight).
Msg("raft-enabled aggregator with empty sync stores: starting sync services in publisher mode")
return true
}

func (f *failoverState) Run(pCtx context.Context) (multiErr error) {
stopService := func(stoppable func(context.Context) error, name string) { //nolint:contextcheck // shutdown uses context.Background intentionally
// parent context is cancelled already, so we need to create a new one
Expand Down Expand Up @@ -207,15 +237,28 @@ func (f *failoverState) Run(pCtx context.Context) (multiErr error) {
})

// start header and data sync services concurrently to avoid cumulative startup delay.
startSyncInPublisherMode := f.shouldStartSyncInPublisherMode(ctx)
syncWg, syncCtx := errgroup.WithContext(ctx)
syncWg.Go(func() error {
if err := f.headerSyncService.Start(syncCtx); err != nil {
var err error
if startSyncInPublisherMode {
err = f.headerSyncService.StartForPublishing(syncCtx)
} else {
err = f.headerSyncService.Start(syncCtx)
}
if err != nil {
return fmt.Errorf("header sync service: %w", err)
}
return nil
})
syncWg.Go(func() error {
if err := f.dataSyncService.Start(syncCtx); err != nil {
var err error
if startSyncInPublisherMode {
err = f.dataSyncService.StartForPublishing(syncCtx)
} else {
err = f.dataSyncService.Start(syncCtx)
}
if err != nil {
return fmt.Errorf("data sync service: %w", err)
}
return nil
Expand Down
4 changes: 2 additions & 2 deletions pkg/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ type RaftConfig struct {
NodeID string `mapstructure:"node_id" yaml:"node_id" comment:"Unique identifier for this node in the Raft cluster"`
RaftAddr string `mapstructure:"raft_addr" yaml:"raft_addr" comment:"Address for Raft communication (host:port)"`
RaftDir string `mapstructure:"raft_dir" yaml:"raft_dir" comment:"Directory for Raft logs and snapshots"`
Bootstrap bool `mapstructure:"bootstrap" yaml:"bootstrap" comment:"Bootstrap a new Raft cluster (only for the first node)"`
Bootstrap bool `mapstructure:"bootstrap" yaml:"bootstrap" comment:"Bootstrap a new static Raft cluster during initial bring-up"`
Peers string `mapstructure:"peers" yaml:"peers" comment:"Comma-separated list of peer Raft addresses (nodeID@host:port)"`
SnapCount uint64 `mapstructure:"snap_count" yaml:"snap_count" comment:"Number of log entries between snapshots"`
SendTimeout time.Duration `mapstructure:"send_timeout" yaml:"send_timeout" comment:"Max duration to wait for a message to be sent to a peer"`
Expand Down Expand Up @@ -646,7 +646,7 @@ func AddFlags(cmd *cobra.Command) {
cmd.Flags().String(FlagRaftNodeID, def.Raft.NodeID, "unique identifier for this node in the Raft cluster")
cmd.Flags().String(FlagRaftAddr, def.Raft.RaftAddr, "address for Raft communication (host:port)")
cmd.Flags().String(FlagRaftDir, def.Raft.RaftDir, "directory for Raft logs and snapshots")
cmd.Flags().Bool(FlagRaftBootstrap, def.Raft.Bootstrap, "bootstrap a new Raft cluster (only for the first node)")
cmd.Flags().Bool(FlagRaftBootstrap, def.Raft.Bootstrap, "bootstrap a new static Raft cluster during initial bring-up")
cmd.Flags().String(FlagRaftPeers, def.Raft.Peers, "comma-separated list of peer Raft addresses (nodeID@host:port)")
cmd.Flags().Uint64(FlagRaftSnapCount, def.Raft.SnapCount, "number of log entries between snapshots")
cmd.Flags().Duration(FlagRaftSendTimeout, def.Raft.SendTimeout, "max duration to wait for a message to be sent to a peer")
Expand Down
Loading
Loading