A zero-allocation matching engine and full-stack exchange simulator, from byte-level wire protocols to a 60 fps React dashboard.
Live dashboard under the random-order simulator: free-floating Order Book, OHLC Price chart
with 3s/10s/1m/3m timeframes, Order Entry, Depth heatmap, Trade tape, Metrics, and Latency
monitor. Three themes in the top-right toggle: dark, light, and a
colorblind-safe palette (blue/orange in place of green/red). Deep dives in
docs/ARCHITECTURE.md (processes, threads, memory model),
docs/PROTOCOL.md (byte-level wire formats),
docs/PERFORMANCE.md (benchmark methodology and results), and
docs/DECISIONS.md (20 ADRs).
A self-contained, CLOB-style matching engine plus the production infrastructure around it:
binary TCP order gateway, UDP multicast market-data feed with snapshot + incremental recovery,
deterministic memory-mapped journal, Python client library, WebSocket bridge, React dashboard,
and JMH benchmarks that quantify every layer. The hot path is allocation-free after warmup
across all order types (LIMIT, MARKET, IOC, FOK, ICEBERG), verified by JMH -prof gc; the
dashboard holds 60 fps under 10 k market-data msg/s because every WebSocket message is queued
and drained inside a single requestAnimationFrame per tick.
Three processes. Two wire protocols (binary TCP for order entry, UDP multicast for market
data). One JSON envelope for the browser. Source for the diagram above:
docs/architecture.d2. The component-level diagrams live in
docs/ARCHITECTURE.md; the byte-level layouts live in
docs/PROTOCOL.md.
Numbers are per-component, not end-to-end. The 10 k figure is the dashboard/bridge ceiling; the engine is three orders of magnitude higher.
| Component | Metric | Result |
|---|---|---|
MatchingEngine.process resting limit |
throughput | 33.9 M ops/s |
| latency / op | 29.5 ns | |
| allocation | 0 B/op after warmup | |
MatchingEngine.process 5-level sweep |
throughput | 6.5 M ops/s |
| latency / op | 155 ns | |
RingBuffer SPSC hand-off |
throughput | 58 M ops/s |
vs ArrayBlockingQueue |
~1.3× faster, ~4× less variance | |
WireCodec NEW_ORDER encode |
throughput | 28.6 M ops/s (35 ns) |
WireCodec NEW_ORDER decode |
throughput | 27.9 M ops/s |
| Dashboard under 10 k msg/s load | frame rate | 60.0 fps sustained |
| p99 frame time | 17.8 ms | |
| longest task | 42 ms |
Apple M5 · JDK 21.0.10 · macOS 26.1 · JMH 1.37 default config. Methodology, flamegraph
pointers, and interpretation notes in docs/PERFORMANCE.md.
- Zero-allocation hot path, end-to-end. Pooled orders, pooled execution reports, a
length-prefix codec that writes into a pre-sized
ByteBuffer, and aLongHashMapkeyed by primitivelongso order-ID lookups never box. JMH-prof gcis the contract, not an afterthought. - Deterministic replay. Every input event and every emitted report is journaled to a memory-mapped file framed with CRC32. Replaying the file into a fresh engine reproduces the output stream byte-for-byte, which is how the restart test proves the engine is deterministic (ADR-008).
- Real wire protocols, documented to the byte. Little-endian, length-prefix-framed,
CRC-checked binary TCP for order entry. UDP multicast with monotonic sequence numbers and
snapshot + incremental recovery for market data. Both specified in
docs/PROTOCOL.mdwith hex examples, not English. - Frame-accurate dashboard instrumentation. Incoming WebSocket messages are not
dispatched to React on arrival; they are queued and drained inside a single
requestAnimationFrameper tick (ADR-016). Frame metrics useuseSyncExternalStoreso the LatencyMonitor re-renders at 1 Hz while the rest of the UI re-renders at 60 Hz (ADR-018). - Analytics worth running. A Python analytics package computes VPIN (Easley / López de
Prado / O'Hara, 2012) off the journal, renders a latency histogram and depth heatmap, and
includes a market-making simulator that drives the live engine via the TCP gateway. See
make analytics.
Prerequisites: Python ≥ 3.11 and Node ≥ 20. JDK 21 is fetched automatically by the Gradle wrapper via the foojay resolver.
git clone https://github.com/qflen/NanoExchange.git && cd NanoExchange
./run.shFirst run auto-bootstraps the Python venv and dashboard npm packages, builds the
engine, then starts all three processes. Open http://localhost:5173. Ctrl-C tears
everything down. ./run.sh --help explains each piece.
To run the full test suite across Java, Python, and the dashboard:
./gradlew check
.venv/bin/pytest client/tests bridge/tests analytics/tests
npm --prefix dashboard test -- --run- The price-level container is still a sorted array. ADR-005 pins this as a deliberate tradeoff for shallow books; the JMH numbers agreed when I measured it. The first time I profile a thousand-level book under realistic cancel churn I expect a B-tree-of-arrays to beat it, and the replay machinery makes the swap safe. It is in the backlog, not shipped.
- MPSC ring buffer. The current SPSC hand-off is fine for one gateway thread, but the moment a second matching engine (different instrument) appears, the gateway wants to fan out. MPSC with a claim strategy is half a day of work; it is in the backlog because this build did not need it.
- Cross-language protocol test. Python's
structlayouts and Java'sByteBuffercalls agree today because I wrote them both and PROTOCOL.md is the source of truth. A byte-for-byte round-trip test that generates frames from both stacks would catch silent drift. In the backlog. - Playwright E2E. Vitest covers the components; a Playwright run that submits an order and asserts the exec-report lands in the Open Orders table would be the last mile. Out of scope here.
MIT
