fix: stop genservers gracefully by dicethedev · Pull Request #319 · lambdaclass/ethlambda

dicethedev · 2026-04-28T04:39:45Z

🗒️ Description

This PR implements graceful shutdown for the ethlambda node's actors and HTTP servers. Previously, the node would abruptly terminate on Ctrl+C without waiting for in-flight operations to complete.

Now, on shutdown signal:

The BlockChain and P2P actors are stopped cleanly via their context's stop() method
The API and metrics HTTP servers are shut down gracefully using Axum's with_graceful_shutdown()
The main thread waits for all actors and server tasks to finish before exiting

These changes ensure that:

Aggregation workers complete their current jobs
Database writes are flushed
Network connections are closed properly

This prevents data corruption and improves reliability.

🔗 Related Issues or PRs

Closes #195

greptile-apps · 2026-04-28T04:42:06Z

Greptile Summary

This PR adds graceful shutdown to the ethlambda node by stopping the BlockChain and P2P actors via context().stop(), signalling both HTTP servers through a shared Arc<Notify>, and awaiting all handles before exit. The crates/net/rpc changes are clean; the main concern is in the shutdown signalling path.

notify_waiters() race: Notify::notify_waiters() only wakes currently registered waiters. If the spawned server tasks haven't been polled (and thus haven't registered their notified() futures) before Ctrl+C fires, both notifications are silently dropped and api_handle.await / metrics_handle.await block indefinitely. Using notify_one() with two separate Arc<Notify> instances, or a watch channel, would eliminate this race.
Unbounded shutdown awaits: There is no timeout on the four join/await calls, so a stuck actor or a server holding open long-lived connections will prevent the process from ever exiting.

Confidence Score: 3/5

Safe to merge for typical usage, but a race condition in the shutdown signalling path can cause an indefinite hang.

One P1 finding (notify_waiters race that silently drops shutdown signals and causes indefinite hangs) and one P2 (no shutdown timeout). The P1 drives the score below the 4/5 ceiling.

bin/ethlambda/src/main.rs — specifically the notify_waiters() call and the unbounded join awaits in the shutdown sequence.

Important Files Changed

Filename	Overview
bin/ethlambda/src/main.rs	Adds graceful shutdown via Arc and actor context().stop(); the notify_waiters() call can silently drop notifications if spawned tasks haven't registered their waiters yet, causing an indefinite hang on shutdown.
crates/net/rpc/src/lib.rs	Adds a generic shutdown future parameter to both server functions and wires it into axum's with_graceful_shutdown(); clean and idiomatic change with no issues.

Sequence Diagram

sequenceDiagram
    participant Main
    participant BlockChain
    participant P2P
    participant ApiServer
    participant MetricsServer

    Main->>BlockChain: spawn actor
    Main->>P2P: spawn actor
    Main->>ApiServer: tokio::spawn (with shutdown_future)
    Main->>MetricsServer: tokio::spawn (with shutdown_future)
    Main->>Main: ctrl_c().await

    Main->>BlockChain: context().stop()
    Main->>P2P: context().stop()
    Main->>ApiServer: shutdown_notify.notify_waiters()
    Main->>MetricsServer: shutdown_notify.notify_waiters()

    Main->>BlockChain: join().await
    BlockChain-->>Main: done
    Main->>P2P: join().await
    P2P-->>Main: done
    Main->>ApiServer: api_handle.await
    ApiServer-->>Main: done
    Main->>MetricsServer: metrics_handle.await
    MetricsServer-->>Main: done

    Main->>Main: info!("Shutdown complete")

Prompt To Fix All With AI

This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 238

Comment:
**`notify_waiters()` can silently drop shutdown signals**

`Notify::notify_waiters()` only wakes futures that are *currently registered* as waiters at the moment of the call — it does not store a permit for futures that haven't been polled yet. The shutdown futures inside the spawned tasks (`async move { X.notified().await }`) don't register themselves until axum polls the shutdown future for the first time, which happens asynchronously after `tokio::spawn`. If `ctrl_c` fires before the spawned tasks have had a chance to run, both notifications are silently dropped and `api_handle.await` / `metrics_handle.await` will block indefinitely — the process never exits.

The clearest fix is two separate `Arc<Notify>` handles each called with `notify_one()`, which stores a permit that survives until consumed:

```rust
let api_shutdown_notify = Arc::new(Notify::new());
let metrics_shutdown_notify = Arc::new(Notify::new());
// ... pass clones into spawned tasks ...
api_shutdown_notify.notify_one();
metrics_shutdown_notify.notify_one();
```

Alternatively, a `tokio::sync::watch` channel with a stored value avoids this race entirely for any number of listeners.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: bin/ethlambda/src/main.rs
Line: 240-243

Comment:
**No timeout on shutdown awaits**

All four `await` calls — `blockchain_ref.join()`, `p2p_ref.join()`, `api_handle.await`, and `metrics_handle.await` — are unbounded. If any actor gets stuck in a long-running handler or the HTTP servers hold open long-lived connections, the process will hang indefinitely. Consider wrapping the shutdown sequence in `tokio::time::timeout` so the node can force-exit after a reasonable deadline (e.g., 30 s) even if something stalls.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "fix: stop genservers gracefully" | Re-trigger Greptile}

greptile-apps · 2026-04-28T04:42:09Z

+    let p2p_ref = p2p.actor_ref().clone();
+    blockchain_ref.context().stop();
+    p2p_ref.context().stop();
+    shutdown_notify.notify_waiters();


notify_waiters() can silently drop shutdown signals

Notify::notify_waiters() only wakes futures that are currently registered as waiters at the moment of the call — it does not store a permit for futures that haven't been polled yet. The shutdown futures inside the spawned tasks (async move { X.notified().await }) don't register themselves until axum polls the shutdown future for the first time, which happens asynchronously after tokio::spawn. If ctrl_c fires before the spawned tasks have had a chance to run, both notifications are silently dropped and api_handle.await / metrics_handle.await will block indefinitely — the process never exits.

The clearest fix is two separate Arc<Notify> handles each called with notify_one(), which stores a permit that survives until consumed:

let api_shutdown_notify = Arc::new(Notify::new()); let metrics_shutdown_notify = Arc::new(Notify::new()); // ... pass clones into spawned tasks ... api_shutdown_notify.notify_one(); metrics_shutdown_notify.notify_one();

Alternatively, a tokio::sync::watch channel with a stored value avoids this race entirely for any number of listeners.

Prompt To Fix With AI

This is a comment left during a code review. Path: bin/ethlambda/src/main.rs Line: 238 Comment: **`notify_waiters()` can silently drop shutdown signals** `Notify::notify_waiters()` only wakes futures that are *currently registered* as waiters at the moment of the call — it does not store a permit for futures that haven't been polled yet. The shutdown futures inside the spawned tasks (`async move { X.notified().await }`) don't register themselves until axum polls the shutdown future for the first time, which happens asynchronously after `tokio::spawn`. If `ctrl_c` fires before the spawned tasks have had a chance to run, both notifications are silently dropped and `api_handle.await` / `metrics_handle.await` will block indefinitely — the process never exits. The clearest fix is two separate `Arc<Notify>` handles each called with `notify_one()`, which stores a permit that survives until consumed: ```rust let api_shutdown_notify = Arc::new(Notify::new()); let metrics_shutdown_notify = Arc::new(Notify::new()); // ... pass clones into spawned tasks ... api_shutdown_notify.notify_one(); metrics_shutdown_notify.notify_one(); ``` Alternatively, a `tokio::sync::watch` channel with a stored value avoids this race entirely for any number of listeners. How can I resolve this? If you propose a fix, please make it concise.

This is shy I suggested CancellationToken, since it was made specifically for these cases

greptile-apps · 2026-04-28T04:42:10Z

+    blockchain_ref.join().await;
+    p2p_ref.join().await;
+    let _ = api_handle.await;
+    let _ = metrics_handle.await;


No timeout on shutdown awaits

All four await calls — blockchain_ref.join(), p2p_ref.join(), api_handle.await, and metrics_handle.await — are unbounded. If any actor gets stuck in a long-running handler or the HTTP servers hold open long-lived connections, the process will hang indefinitely. Consider wrapping the shutdown sequence in tokio::time::timeout so the node can force-exit after a reasonable deadline (e.g., 30 s) even if something stalls.

Prompt To Fix With AI

This is a comment left during a code review. Path: bin/ethlambda/src/main.rs Line: 240-243 Comment: **No timeout on shutdown awaits** All four `await` calls — `blockchain_ref.join()`, `p2p_ref.join()`, `api_handle.await`, and `metrics_handle.await` — are unbounded. If any actor gets stuck in a long-running handler or the HTTP servers hold open long-lived connections, the process will hang indefinitely. Consider wrapping the shutdown sequence in `tokio::time::timeout` so the node can force-exit after a reasonable deadline (e.g., 30 s) even if something stalls. How can I resolve this? If you propose a fix, please make it concise.

I think it'd be better to do "1 ctrl+C to stop, N ctrl+C to abort immediately" instead. We can tackle that in another PR, if we find it necessary

MegaRedHand

Hey @dicethedev! Thank you for your contribution! I left you some comments.

Also, remember to sign your commits, as explained in our CONTRIBUTING.md

MegaRedHand · 2026-04-28T13:58:18Z


-    tokio::spawn(async move {
-        let _ = ethlambda_rpc::start_metrics_server(metrics_socket)
+    let shutdown_notify = Arc::new(Notify::new());


Can we use a CancellationToken here?

MegaRedHand · 2026-04-28T14:00:10Z

    address: SocketAddr,
    store: Store,
    aggregator: AggregatorController,
+    shutdown: impl std::future::Future<Output = ()> + Send + 'static,


Let's try to use the specific type here, instead of generics

MegaRedHand · 2026-04-28T14:05:38Z

+    blockchain_ref.join().await;
+    p2p_ref.join().await;
+    let _ = api_handle.await;
+    let _ = metrics_handle.await;


I think it'd be better to do "1 ctrl+C to stop, N ctrl+C to abort immediately" instead. We can tackle that in another PR, if we find it necessary

MegaRedHand · 2026-04-28T14:06:22Z

+    let p2p_ref = p2p.actor_ref().clone();
+    blockchain_ref.context().stop();
+    p2p_ref.context().stop();
+    shutdown_notify.notify_waiters();


This is shy I suggested CancellationToken, since it was made specifically for these cases

fix: stop genservers gracefully

fce5c1d

greptile-apps Bot reviewed Apr 28, 2026

View reviewed changes

MegaRedHand requested changes Apr 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stop genservers gracefully#319

fix: stop genservers gracefully#319
dicethedev wants to merge 1 commit intolambdaclass:mainfrom
dicethedev:fix/stop-genservers-gracefully

dicethedev commented Apr 28, 2026

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Important Files Changed

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

greptile-apps Bot Apr 28, 2026

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

MegaRedHand left a comment

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

MegaRedHand Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dicethedev commented Apr 28, 2026

🗒️ Description

Uh oh!

greptile-apps Bot commented Apr 28, 2026

Greptile Summary

Confidence Score: 3/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

greptile-apps Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand left a comment

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

MegaRedHand Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants