Skip to content

test: testapp bench#3214

Draft
julienrbrt wants to merge 4 commits intomainfrom
julien/testapp-bench
Draft

test: testapp bench#3214
julienrbrt wants to merge 4 commits intomainfrom
julien/testapp-bench

Conversation

@julienrbrt
Copy link
Copy Markdown
Member

@julienrbrt julienrbrt commented Mar 30, 2026

Improve the command to stress-test evnode via testapp.

Summary by CodeRabbit

  • New Features

    • Added /stats HTTP endpoint exposing injected/executed transaction and block metrics
    • Stress-test tool: worker-based concurrency, target-RPS, live RPS reporting, and detailed summary output
    • Expanded runtime metrics from the KV service (improved capacity and execution counters)
  • Documentation

    • Revamped testapp README with 3-terminal quick start, CLI subcommand/flag reference, stress-test examples, and HTTP endpoint docs

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

📝 Walkthrough

Walkthrough

Transforms testapp into a KV store rollup reference: adds stress-test tooling, HTTP /stats endpoint, executor metrics, updated defaults, and comprehensive README/CLI docs for running and benchmarking the KV server.

Changes

Cohort / File(s) Summary
Documentation
apps/testapp/README.md, apps/testapp/kv/bench/README.md
Rewrote docs from generic example to a KV rollup reference. Added 3-terminal quick start, build/run steps, CLI flag tables, stress-test usage, and HTTP endpoints (/tx, /kv, /store, /stats).
Init & Defaults
apps/testapp/cmd/init.go, apps/testapp/examples/passphrase.txt
Set Node.BlockTime to 100ms and changed ChainIDFlag default to ev-test. Added example passphrase file (foo:bar).
Stress Test Tool
apps/testapp/kv/bench/main.go
Replaced interval-based benchmark with worker pool stress-test: -workers, -target-rps, persistent TCP HTTP requests, per-second progress and peak RPS tracking, server /stats sampling, and final summary reporting.
HTTP Server & Tests
apps/testapp/kv/http_server.go, apps/testapp/kv/http_server_test.go
Added injectedTxs atomic counter and GET /stats handler returning injected_txs, executed_txs, blocks_produced. handleTx increments injected counter. Minor test payload construction change.
Executor & Metrics
apps/testapp/kv/kvexecutor.go
Increased mempool buffer to 100_000_000. Added atomic counters (blocksProduced, totalExecutedTxs), ExecutorStats type and GetStats() method. Increment stats on commit and optimized string/byte handling.
Runtime Param
apps/testapp/cmd/run.go
Adjusted sequencer parameter from 1000 to 1_000_000 when constructing the sequencer.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant StressClient as Stress-Test
participant HTTP as testapp HTTPServer
participant Exec as KV Executor
participant Store as Persistent Store
StressClient->>HTTP: POST /tx (raw HTTP over TCP)
HTTP->>Exec: executor.InjectTx(tx)
Exec->>Exec: mempool enqueue / ExecuteTxs -> Commit
Exec->>Store: persist block, set final
Exec-->>HTTP: return acceptance (202)
HTTP-->>StressClient: 202 Accepted
StressClient->>HTTP: GET /stats (periodic)
HTTP->>Exec: executor.GetStats()
Exec-->>HTTP: {executed_txs, blocks_produced}
HTTP-->>StressClient: {injected_txs, executed_txs, blocks_produced}

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested reviewers

  • randygrok
  • tac0turtle
  • Manav-Aggarwal

Poem

🐰 I hopped in with bytes and a tiny test,
Workers all scurrying, aim for the best,
I poke /tx doors and query /stats bright,
Blocks stack like carrots in soft evening light,
Hop on, roll up — the KV sings tonight.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The description is brief but vague; it states the goal without explaining the implementation details, changes made, or technical rationale behind the improvements. Expand the description to explain what was changed (stress-test tool rewrite, KV server enhancements, queue size increases) and why these changes improve testapp's stress-testing capabilities.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'test: testapp bench' directly reflects the PR's core objective of improving stress-testing capabilities for testapp, following semantic commit conventions.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch julien/testapp-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 30, 2026

The latest Buf updates on your PR. Results from workflow CI / buf-check (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed⏩ skipped✅ passed✅ passedMar 30, 2026, 9:33 PM

@claude
Copy link
Copy Markdown
Contributor

claude bot commented Mar 30, 2026

Claude finished @julienrbrt's task in 1m 44s —— View job


Code Review: test: testapp bench

  • Gather context and understand the PR
  • Review apps/testapp/kv/bench/main.go
  • Review apps/testapp/kv/http_server.go
  • Review apps/testapp/kv/kvexecutor.go
  • Review apps/testapp/kv/http_server_test.go
  • Review documentation and other files
  • Post final review

Good set of improvements overall — the persistent-TCP approach in the bench tool is the right call for maximizing throughput, the /stats endpoint is clean, and the docs are now genuinely useful. A few issues worth fixing before merge:


Critical

txChannelBufferSize = 100_000_000 causes extreme memory allocation
kvexecutor.go:25

txChannelBufferSize = 100_000_000

A Go channel with 100M capacity pre-allocates internal storage proportional to element size. For []byte (a slice header = 24 bytes), this is ~2.4 GB just for the channel's internal buffer array, before any data is written. This will OOM most development machines and CI runners. The original intent was likely to prevent InjectTx from blocking under extreme load, but even at 10M req/s the bench runs for only 10s — 100M capacity is 10x more than needed. A value of 10_000_000 (or even 1_000_000) would be plenty and far more reasonable. Fix this →


Medium

handleKV fallback DB lookup uses wrong key space
http_server.go:139

_, err := hs.executor.db.Get(r.Context(), ds.NewKey(key))

GetStoreValue searches height-prefixed keys (/height/{h}/{key}). But this fallback calls db.Get with the raw key (no height prefix). Since all KV data is stored under /height/…, a raw lookup will always return ds.ErrNotFound — even when the real cause is a DB I/O error. The error discrimination between "key not found" and "internal error" is effectively broken; both will return 404. Fix this →

TestHTTPServerStartStop is nearly a no-op
http_server_test.go:171-213

The test creates an httptest.NewServer that's immediately unused by the code under test, then creates a bare testServer struct just to do _ = testServer.Start. It verifies no real behavior. This is misleading — a reader expects a Start/Stop test to actually start and stop the server. Either extend it or remove it (the context-cancellation test below it covers real behavior). The time.Sleep(10 * time.Millisecond) on line 70 is also flaky.

Port TOCTOU race in TestHTTPServerContextCancellation
http_server_test.go:222-230

listener, err := net.Listen("tcp", "127.0.0.1:0")
// ...
listener.Close()
// ... port may be grabbed here by another process
server := NewHTTPServer(exec, serverAddr)

A small but real race: another process can bind the port between listener.Close() and server.Start(ctx). The cleaner approach is to pass the already-open listener to the server (or use httptest.NewServer).


Minor

fetchStats silently returns zero on any error
bench/main.go:173-182

func fetchStats(addr string) serverStats {
    resp, err := http.Get("http://" + addr + "/stats")
    if err != nil {
        return serverStats{}  // error silently dropped
    }
    defer resp.Body.Close()
    var s serverStats
    json.NewDecoder(resp.Body).Decode(&s)  // decode error also dropped
    return s
}

If the /stats fetch fails (e.g. network hiccup after the bench), the before/after delta is meaningless (zero). A brief warning to stderr would help users diagnose misleading results.

formatNum allocates a string per character
bench/main.go:185-197

result.WriteString(string(c))  // allocates a new string for each rune

c is a rune, so string(c) allocates. Use result.WriteRune(c) instead. Trivial in context but easy to fix.

blocksProduced incremented even on zero-tx blocks
kvexecutor.go:310

k.blocksProduced.Add(1) is called unconditionally, even when validTxCount == 0. This means the bench's "avg txs per block" stat will be diluted by empty blocks. Whether this is intentional (a block is a block regardless of content) is fine — just worth a comment since empty blocks will skew the bench output.


What's Good

  • Raw TCP with HTTP/1.1 persistent connections in worker() is the right approach for this benchmark — avoids TLS and connection-setup overhead per request.
  • Atomic counters for blocksProduced/totalExecutedTxs/injectedTxs are all correctly used with no data races.
  • Live progress line (\r overwrite) with per-second RPS tracking is a nice UX touch.
  • Context cancellation flow in HTTPServer.Start (goroutine on ctx.DoneShutdown) is correct.
  • examples/passphrase.txt is a sensible addition for the quick-start flow.
  • Docs are clear and the quick-start guide accurately reflects the 3-terminal workflow.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 30, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 61.41%. Comparing base (8d68f9d) to head (b8dc969).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #3214   +/-   ##
=======================================
  Coverage   61.41%   61.41%           
=======================================
  Files         120      120           
  Lines       12474    12474           
=======================================
  Hits         7661     7661           
  Misses       3953     3953           
  Partials      860      860           
Flag Coverage Δ
combined 61.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@julienrbrt julienrbrt marked this pull request as ready for review March 30, 2026 21:30
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Nitpick comments (1)
apps/testapp/kv/kvexecutor.go (1)

45-54: Unexport ExecutorStats and GetStats() since they're package-private.

ExecutorStats and GetStats() are only consumed internally in http_server.go within the same package. Unexporting them to executorStats and getStats() keeps the API surface minimal. If they need to remain exported, they require Go doc comments per the coding guidelines.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/kvexecutor.go` around lines 45 - 54, Rename the exported
types and methods to unexported identifiers: change ExecutorStats to
executorStats and GetStats() to getStats() in KVExecutor, update the return type
and constructor usages accordingly, and update all internal references (e.g., in
http_server.go) to use executorStats and getStats(); ensure method receiver
signature remains the same (func (k *KVExecutor) getStats() executorStats) and
run `go build`/tests to catch any missed references.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/testapp/cmd/init.go`:
- Line 38: The init command currently unconditionally sets cfg.Node.BlockTime =
rollconf.DurationWrapper{Duration: 100 * time.Millisecond}, overwriting any
value rollconf.Load(cmd) parsed (e.g., --evnode.node.block_time=1s); change this
so you only apply 100ms when the user did not supply the flag or instead set the
flag's default to 100ms. Locate the init command logic around rollconf.Load(cmd)
and cfg.Node.BlockTime and either (a) detect whether the flag was provided (use
the command/flag API to check if evnode.node.block_time was set) and only assign
the 100ms default when not provided, or (b) set the CLI flag default for
evnode.node.block_time to 100ms and remove the unconditional assignment to
cfg.Node.BlockTime.

In `@apps/testapp/kv/bench/main.go`:
- Around line 173-181: The fetchStats function currently swallows HTTP/JSON
errors and returns an all-zero serverStats which can cause underflow when
callers subtract uint64 deltas; change fetchStats to return (serverStats, error)
instead of serverStats, propagate errors from http.Get and json.Decode
(including non-200 responses), and have callers check the error and skip
server-side delta calculations/reporting when a snapshot cannot be trusted (use
the presence of an error to avoid subtracting snapshots or to re-sample). Ensure
you update the signature and all call sites accordingly (references: fetchStats,
serverStats).
- Line 28: The flag targetRPS (variable targetRPS) is only reported and not
enforced; either implement pacing or rename it to avoid confusion — to implement
pacing, add a token-bucket or ticker driven rate limiter in the request send
path (e.g., inside the send loop function used to launch workers / sendRequests)
that consumes tokens at the rate specified by *targetRPS and blocks or sleeps
when tokens are exhausted, or alternatively rename the flag (and all uses in
banner/logging) to something like target-rps-goal or desired-rps to make it
clear it’s only informational; update the worker launch/send loop (the function
that iterates and issues requests referenced at lines 56-57 and 103-108) to
either use the limiter or to remove/rename the misleading references so behavior
matches the flag name.
- Around line 121-149: The worker currently uses net.DialTimeout and then
performs conn.Write and http.ReadResponse without timeouts, which can block
forever; replace the dial with a context-aware dial (use net.Dialer.DialContext
with ctx) and before each network operation set per-operation deadlines on conn
(via conn.SetDeadline(time.Now().Add(timeout))) so the conn.Write(rawReq) and
http.ReadResponse(br, nil) calls will time out and return errors that you handle
by calling failures.Add(1), closing conn, and breaking the loop; ensure the
ctx.Done() check remains and choose a sensible timeout value (e.g., a short
per-op timeout) so workers can exit promptly on server stalls or context
cancellation.

In `@apps/testapp/kv/bench/README.md`:
- Around line 16-20: The README uses an incorrect startup command: it shows
./build/testapp but the build artifacts are ./stress-test and ./testapp; update
the Terminal 1 command to start the built testapp binary (use ./testapp start
--kv-endpoint localhost:9090) so it matches the documented build output and the
Terminal 2 stress-test command; ensure the symbols shown (./build/testapp,
./testapp, ./stress-test, --kv-endpoint) are updated consistently in the README
entry.

In `@apps/testapp/kv/http_server.go`:
- Line 112: The injectedTxs counter is incremented unconditionally at
hs.injectedTxs.Add(1) even when KVExecutor.InjectTx drops transactions; change
the flow so InjectTx (in apps/testapp/kv/kvexecutor.go) returns a success
boolean or error indicating whether the tx was enqueued, then in the HTTP
handler only call hs.injectedTxs.Add(1) and return 202 when InjectTx reports
success—otherwise return an error status (e.g., 429/500) and do not increment
injectedTxs so dropped transactions are not counted.

In `@apps/testapp/kv/kvexecutor.go`:
- Line 25: The txChannelBufferSize constant is far too large and will
preallocate gigabytes when used in make(chan []byte, txChannelBufferSize) inside
NewKVExecutor; change txChannelBufferSize to a much smaller default (e.g., 1_000
or similar) and make it configurable (via constructor param or config/env) so
callers can tune it, or replace the large buffered channel with a bounded/ring
buffer implementation or an unbuffered channel plus worker backpressure; update
NewKVExecutor to accept the buffer size parameter (or config) and use it instead
of the hardcoded 100_000_000, and add a comment about memory implications.

---

Nitpick comments:
In `@apps/testapp/kv/kvexecutor.go`:
- Around line 45-54: Rename the exported types and methods to unexported
identifiers: change ExecutorStats to executorStats and GetStats() to getStats()
in KVExecutor, update the return type and constructor usages accordingly, and
update all internal references (e.g., in http_server.go) to use executorStats
and getStats(); ensure method receiver signature remains the same (func (k
*KVExecutor) getStats() executorStats) and run `go build`/tests to catch any
missed references.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5360d9f5-19ba-4c64-bdd9-9121413b1041

📥 Commits

Reviewing files that changed from the base of the PR and between 8d68f9d and 5581431.

📒 Files selected for processing (8)
  • apps/testapp/README.md
  • apps/testapp/cmd/init.go
  • apps/testapp/examples/passphrase.txt
  • apps/testapp/kv/bench/README.md
  • apps/testapp/kv/bench/main.go
  • apps/testapp/kv/http_server.go
  • apps/testapp/kv/http_server_test.go
  • apps/testapp/kv/kvexecutor.go

// we use load in order to parse all the flags
cfg, _ := rollconf.Load(cmd)
cfg.Node.Aggregator = aggregator
cfg.Node.BlockTime = rollconf.DurationWrapper{Duration: 100 * time.Millisecond}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't overwrite the parsed block-time value.

Line 38 clobbers whatever rollconf.Load(cmd) just read, so testapp init --evnode.node.block_time=1s still writes 100ms. If the goal is a faster default for this command, set the flag default or only apply 100ms when the user did not pass the flag; otherwise this also drifts from apps/testapp/cmd/init_test.go, which still expects block_time: 1s.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/cmd/init.go` at line 38, The init command currently
unconditionally sets cfg.Node.BlockTime = rollconf.DurationWrapper{Duration: 100
* time.Millisecond}, overwriting any value rollconf.Load(cmd) parsed (e.g.,
--evnode.node.block_time=1s); change this so you only apply 100ms when the user
did not supply the flag or instead set the flag's default to 100ms. Locate the
init command logic around rollconf.Load(cmd) and cfg.Node.BlockTime and either
(a) detect whether the flag was provided (use the command/flag API to check if
evnode.node.block_time was set) and only assign the 100ms default when not
provided, or (b) set the CLI flag default for evnode.node.block_time to 100ms
and remove the unconditional assignment to cfg.Node.BlockTime.

addr := flag.String("addr", "localhost:9090", "server host:port")
duration := flag.Duration("duration", 10*time.Second, "test duration")
workers := flag.Int("workers", 1000, "concurrent workers")
targetRPS := flag.Uint64("target-rps", 10_000_000, "target requests per second")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

-target-rps is only reported, not enforced.

The new flag looks like a rate-control knob, but nothing in the worker launch or send loop uses it to pace requests. Right now it only changes the pass/fail banner, so runs with -target-rps=100 and -target-rps=10000000 generate the same load. Either add pacing/token-bucket logic or rename the flag so it is clearly just a goal.

Also applies to: 56-57, 103-108

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` at line 28, The flag targetRPS (variable
targetRPS) is only reported and not enforced; either implement pacing or rename
it to avoid confusion — to implement pacing, add a token-bucket or ticker driven
rate limiter in the request send path (e.g., inside the send loop function used
to launch workers / sendRequests) that consumes tokens at the rate specified by
*targetRPS and blocks or sleeps when tokens are exhausted, or alternatively
rename the flag (and all uses in banner/logging) to something like
target-rps-goal or desired-rps to make it clear it’s only informational; update
the worker launch/send loop (the function that iterates and issues requests
referenced at lines 56-57 and 103-108) to either use the limiter or to
remove/rename the misleading references so behavior matches the flag name.

Comment on lines +121 to +149
conn, err := net.DialTimeout("tcp", addr, time.Second)
if err != nil {
failures.Add(1)
continue
}

br := bufio.NewReaderSize(conn, 512)

for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}

return
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}

case <-ticker.C:
// Send a batch of transactions
for i := 0; i < txPerInterval; i++ {
wg.Add(1)
go func() {
defer wg.Done()
var currentTxData string
// Generate random key-value pair
key := randomString(8)
value := randomString(16)
currentTxData = fmt.Sprintf("%s=%s", key, value)
success := sendTransaction(url, currentTxData)
if success {
atomic.AddUint64(&successCount, 1)
} else {
atomic.AddUint64(&failureCount, 1)
}
}()
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

cat -n apps/testapp/kv/bench/main.go | head -200

Repository: evstack/ev-node

Length of output: 5905


Set deadlines on TCP operations to prevent workers from blocking indefinitely.

Line 121 only limits the connection establishment time. After that, conn.Write and http.ReadResponse have no timeouts and can block forever. If the server is slow or stalled, workers get stuck in these operations, never checking ctx.Done(), and the main goroutine hangs waiting for them to drain the done channel.

Use DialContext to respect context cancellation during connection, and set per-operation deadlines with SetDeadline so Write and ReadResponse calls are bounded:

Suggested direction
-		conn, err := net.DialTimeout("tcp", addr, time.Second)
+		d := net.Dialer{Timeout: time.Second}
+		conn, err := d.DialContext(ctx, "tcp", addr)
 		if err != nil {
 			failures.Add(1)
 			continue
 		}
@@
-			if _, err := conn.Write(rawReq); err != nil {
+			if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
+				failures.Add(1)
+				conn.Close()
+				break
+			}
+			if _, err := conn.Write(rawReq); err != nil {
 				failures.Add(1)
 				conn.Close()
 				break
 			}
 
 			resp, err := http.ReadResponse(br, nil)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
conn, err := net.DialTimeout("tcp", addr, time.Second)
if err != nil {
failures.Add(1)
continue
}
br := bufio.NewReaderSize(conn, 512)
for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}
return
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}
case <-ticker.C:
// Send a batch of transactions
for i := 0; i < txPerInterval; i++ {
wg.Add(1)
go func() {
defer wg.Done()
var currentTxData string
// Generate random key-value pair
key := randomString(8)
value := randomString(16)
currentTxData = fmt.Sprintf("%s=%s", key, value)
success := sendTransaction(url, currentTxData)
if success {
atomic.AddUint64(&successCount, 1)
} else {
atomic.AddUint64(&failureCount, 1)
}
}()
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
d := net.Dialer{Timeout: time.Second}
conn, err := d.DialContext(ctx, "tcp", addr)
if err != nil {
failures.Add(1)
continue
}
br := bufio.NewReaderSize(conn, 512)
for {
select {
case <-ctx.Done():
conn.Close()
return
default:
}
if err := conn.SetDeadline(time.Now().Add(time.Second)); err != nil {
failures.Add(1)
conn.Close()
break
}
if _, err := conn.Write(rawReq); err != nil {
failures.Add(1)
conn.Close()
break
}
resp, err := http.ReadResponse(br, nil)
if err != nil {
failures.Add(1)
conn.Close()
break
}
io.Copy(io.Discard, resp.Body)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` around lines 121 - 149, The worker currently
uses net.DialTimeout and then performs conn.Write and http.ReadResponse without
timeouts, which can block forever; replace the dial with a context-aware dial
(use net.Dialer.DialContext with ctx) and before each network operation set
per-operation deadlines on conn (via conn.SetDeadline(time.Now().Add(timeout)))
so the conn.Write(rawReq) and http.ReadResponse(br, nil) calls will time out and
return errors that you handle by calling failures.Add(1), closing conn, and
breaking the loop; ensure the ctx.Done() check remains and choose a sensible
timeout value (e.g., a short per-op timeout) so workers can exit promptly on
server stalls or context cancellation.

Comment on lines +173 to +181
func fetchStats(addr string) serverStats {
resp, err := http.Get("http://" + addr + "/stats")
if err != nil {
return serverStats{}
}
defer resp.Body.Close()
var s serverStats
json.NewDecoder(resp.Body).Decode(&s)
return s
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't collapse /stats failures into zero counters.

Line 176 turns any fetch failure into an all-zero snapshot, and the caller later subtracts those snapshots as uint64. If the initial sample succeeds and the final sample fails, or the server restarts between samples, the delta math underflows and the summary prints enormous bogus block/tx counts. Return an error from fetchStats and skip server-side delta reporting when a snapshot cannot be trusted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/main.go` around lines 173 - 181, The fetchStats
function currently swallows HTTP/JSON errors and returns an all-zero serverStats
which can cause underflow when callers subtract uint64 deltas; change fetchStats
to return (serverStats, error) instead of serverStats, propagate errors from
http.Get and json.Decode (including non-200 responses), and have callers check
the error and skip server-side delta calculations/reporting when a snapshot
cannot be trusted (use the presence of an error to avoid subtracting snapshots
or to re-sample). Ensure you update the signature and all call sites accordingly
(references: fetchStats, serverStats).

Comment on lines +16 to +20
# Terminal 1: start the testapp with KV endpoint
./build/testapp start --kv-endpoint localhost:9090

### List all key-value pairs in the store

```bash
./txclient -list [-addr <url>]
# Terminal 2: run the stress test
./stress-test --addr localhost:9090 --duration 10s --workers 1000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

The testapp startup command doesn't match the documented build output.

Line 17 runs ./build/testapp, but the documented build commands produce ./stress-test here and ./testapp in apps/testapp/README.md. Following the steps as written will not create ./build/testapp.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/bench/README.md` around lines 16 - 20, The README uses an
incorrect startup command: it shows ./build/testapp but the build artifacts are
./stress-test and ./testapp; update the Terminal 1 command to start the built
testapp binary (use ./testapp start --kv-endpoint localhost:9090) so it matches
the documented build output and the Terminal 2 stress-test command; ensure the
symbols shown (./build/testapp, ./testapp, ./stress-test, --kv-endpoint) are
updated consistently in the README entry.

}

hs.executor.InjectTx(body)
hs.injectedTxs.Add(1)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don't count dropped transactions as injected.

KVExecutor.InjectTx in apps/testapp/kv/kvexecutor.go still drops when txChan is full, but Line 112 increments injectedTxs unconditionally and Line 113 still returns 202. Under load, /stats will over-report injections and callers will see success for work that was discarded. Have InjectTx return a bool/error and only increment/respond 202 on a successful enqueue.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/http_server.go` at line 112, The injectedTxs counter is
incremented unconditionally at hs.injectedTxs.Add(1) even when
KVExecutor.InjectTx drops transactions; change the flow so InjectTx (in
apps/testapp/kv/kvexecutor.go) returns a success boolean or error indicating
whether the tx was enqueued, then in the HTTP handler only call
hs.injectedTxs.Add(1) and return 202 when InjectTx reports success—otherwise
return an error status (e.g., 429/500) and do not increment injectedTxs so
dropped transactions are not counted.

finalizedHeightKey = ds.NewKey("/finalizedHeight")
// Define a buffer size for the transaction channel
txChannelBufferSize = 10000
txChannelBufferSize = 100_000_000
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

This mempool size will allocate gigabytes at startup.

Line 25 drives make(chan []byte, txChannelBufferSize) in NewKVExecutor. With 100_000_000 slots, a buffered chan []byte needs roughly 24 bytes per slot just for slice headers on 64-bit Go, so the queue alone is already well over 2 GB before any payload bytes are stored. That is likely to OOM the process or starve everything else of memory. As per coding guidelines, "Consider memory allocation in hot paths of Go code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/kv/kvexecutor.go` at line 25, The txChannelBufferSize constant
is far too large and will preallocate gigabytes when used in make(chan []byte,
txChannelBufferSize) inside NewKVExecutor; change txChannelBufferSize to a much
smaller default (e.g., 1_000 or similar) and make it configurable (via
constructor param or config/env) so callers can tune it, or replace the large
buffered channel with a bounded/ring buffer implementation or an unbuffered
channel plus worker backpressure; update NewKVExecutor to accept the buffer size
parameter (or config) and use it instead of the hardcoded 100_000_000, and add a
comment about memory implications.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
apps/testapp/cmd/run.go (1)

140-149: ⚠️ Potential issue | 🟠 Major

Avoid hardcoding a 1,000,000 sequencer queue cap.

Line 146 sets an extremely high maxQueueSize with no local validation/clamping. Given BatchQueue grows via append and Prepend bypasses the cap, this can cause severe memory pressure/OOM during stress or recovery paths. Please make this value configurable and enforce a bounded limit before passing it to single.NewSequencer.

Suggested direction
-	sequencer, err := single.NewSequencer(
+	// TODO: source from config/flag with validation instead of hardcoding.
+	maxQueueSize := 10_000
+	sequencer, err := single.NewSequencer(
 		logger,
 		datastore,
 		daClient,
 		nodeConfig,
 		[]byte(genesis.ChainID),
-		1_000_000,
+		maxQueueSize,
 		genesis,
 		executor,
 	)

As per coding guidelines, "Consider memory allocation in hot paths of Go code."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/testapp/cmd/run.go` around lines 140 - 149, The call to
single.NewSequencer currently passes a hardcoded maxQueueSize (1_000_000) which
risks unbounded memory use; make this value configurable (e.g., add a CLI flag
or a field on nodeConfig such as MaxSequencerQueueSize) and validate/clamp it to
a safe upper bound (e.g., MAX_ALLOWED_QUEUE_SIZE constant) before calling
single.NewSequencer in run.go; ensure the clamping logic is applied where
sequencer is constructed so both normal append and any Prepend/BatchQueue paths
cannot exceed the enforced limit.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@apps/testapp/cmd/run.go`:
- Around line 140-149: The call to single.NewSequencer currently passes a
hardcoded maxQueueSize (1_000_000) which risks unbounded memory use; make this
value configurable (e.g., add a CLI flag or a field on nodeConfig such as
MaxSequencerQueueSize) and validate/clamp it to a safe upper bound (e.g.,
MAX_ALLOWED_QUEUE_SIZE constant) before calling single.NewSequencer in run.go;
ensure the clamping logic is applied where sequencer is constructed so both
normal append and any Prepend/BatchQueue paths cannot exceed the enforced limit.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 30c00489-4bfe-407d-8107-9189f8023f9d

📥 Commits

Reviewing files that changed from the base of the PR and between 5581431 and b8dc969.

📒 Files selected for processing (1)
  • apps/testapp/cmd/run.go

@@ -0,0 +1 @@
foo:bar
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needed?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just make running the command faster. I can just remove it from being committed

| `-addr` | `localhost:9090` | Server host:port |
| `-duration` | `10s` | Test duration |
| `-workers` | `1000` | Number of concurrent TCP workers |
| `-target-rps` | `10000000` | Target requests per second (goal) |
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this transaction submission or requests on the api?

Copy link
Copy Markdown
Member Author

@julienrbrt julienrbrt Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is tx execution, how fast the kv executor is able to process txs.
maybe let's clarify what we indeed mean by req/s

@julienrbrt julienrbrt marked this pull request as draft March 31, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants