Skip to content

Commit 3c2377f

Browse files
ruvnetruvnet
andcommitted
feat(observer): adaptive detect cadence — first ≥ 2× saturated-regime win (4.29×)
ADR-154 §16 named three observer-side levers for closing the saturated-regime throughput gap that (a) SIMD (commit 2) and (b) Opt D delay-sorted CSR (commit 7) left on the table. The first lever — dropping the sparse-Fiedler dispatch threshold — was measured in commit 9 and turned out to be a 3× regression. This commit implements the second: adaptive detect cadence. Logic (14 LOC addition to src/observer/core.rs): a helper `current_detect_interval_ms(&self)` reads the co-firing-window density per `on_spike` call. If the window holds more than `5 × num_neurons` spikes — equivalent to ≥ 100 Hz average per neuron over the 50 ms window — back off to a 4× cadence (20 ms instead of 5 ms). Drop back to 5 ms as soon as density falls below threshold. Both sides are deterministic given the spike stream, so AC-1 repeatability is preserved. Measured on the reference host (N=1024, 120 ms saturated, SIMD default on Ryzen-class CPU): lif_throughput_n_1024/baseline : 6.86 s → 1.70 s (4.03× vs pre) lif_throughput_n_1024/optimized : 6.74 s → 1.57 s (4.29× vs pre) ADR-154 §3.2 saturated-regime target was ≥ 2× over scalar-opt. **Measured: 4.29×. HIT — the first optimization on this branch to clear that target at the top-line bench.** Acceptance-test suite impact (proportional to detector share each test spent in saturation): acceptance_causal (AC-5) 395 s → 100 s (4.0×) acceptance_core (AC-1..AC-4) 63 s → 16 s (4.0×) integration 32 s → 8.5 s (3.8×) sparse_fiedler_10k 20 ms unchanged (well below threshold) AC-4-strict guarantee preserved. The 20 ms backoff interval gives ≥ 2 detects inside any 50 ms lead window, so the precognitive claim (≥ 50 ms lead on ≥ 70 % of 30 trials) is unaffected. Test passes with 30/30 trials detecting the constructed-collapse marker on the new cadence. AC-1 bit-exactness preserved. Two repeat runs produce identical spike traces — the adaptive interval is deterministic per `(connectome_seed, engine_seed, stimulus_schedule)`. Knock-on effect on Opt D (commit 7): with the detector no longer dominating by 450:1, Opt D's ~5 ms-per-step kernel savings should now represent ~120 ms of the new 1.57 s median. A clean paired- sample criterion bench to isolate the Opt-D-attributable share is named as follow-up. Commit arc summary at head: Commit 2 SIMD (Opt C) 1.013× — MISS Commit 7 Opt D delay-sorted CSR 1.00× — MISS at top-line Commit 9 Drop sparse-Fiedler threshold 3× regression (disproven) Commit 10 Adaptive detect cadence 4.29× — HIT ≥ 2× target The lesson the full arc makes concrete: throughput gaps diagnosed as "kernel-bound" via a pre-measurement guess can turn out to be *detector-bound* (commit 7's surprise), and even after that correction the right remediation is not necessarily the structurally-obvious one (commit 9's regression). The win came from changing *when* the detector runs, not *what* it does or *how* it is represented. All 58 tests pass. Positioning rubric held across all 10 commits. Co-Authored-By: claude-flow <ruv@ruv.net>
1 parent 3a6b70d commit 3c2377f

3 files changed

Lines changed: 77 additions & 6 deletions

File tree

docs/adr/ADR-154-connectome-embodied-brain-example.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -411,10 +411,10 @@ At n_active ≈ 1024, that puts the detector at ≈ 6.8 s of the 6.75 s wallcloc
411411
**What to do next (named, not shipped here).** In decreasing bang-for-buck order:
412412

413413
1. ~~**Adjust the sparse-Fiedler dispatch threshold** to cover the saturated N=1024 case — likely drops the detector cost by ≥ 10× on its own, at which point Opt D's 1.5× kernel win becomes visible on the top-line bench.~~ **(Attempted commit 9, reverted after measurement.)** Lowering the threshold from 1024 to 96 (so everything above Jacobi's exact ceiling goes to the sparse path) produced a **3× regression** — 20.1 s vs 6.75 s on `lif_throughput_n_1024`. The sparse path's `HashMap` accumulation + `SparseGraph` canonicalisation hop adds more overhead at n≈1024 than it saves by skipping the dense O(n²) Laplacian build. The sparse path is a **scale win** (memory + wallclock at n ≥ 10 000) **not a demo-size speed win**. The threshold stays at 1024. See BENCHMARK.md §4.7 update.
414-
2. **Adaptive detect cadence**in sustained high-firing regimes most 5 ms detects are redundant (no meaningful Fiedler drift). Back off to 20 ms under detected saturation; cuts detector share 4× without losing any observable coherence event. *This is now the most-probable lever.*
415-
3. **Incremental Fiedler accumulator** — the O(n²) pair sweep is re-done each detect. An accumulator updated per spike in `on_spike` removes the sweep entirely. Larger surgery than (2); likely the cleanest long-term fix.
414+
2. **Adaptive detect cadence****shipped commit 10. Measured 4.29× speedup** on `lif_throughput_n_1024` (1.57 s vs 6.74 s scalar-opt pre-adaptive). In sustained saturated firing the co-firing window density passes `5 × num_neurons`; when it does, `current_detect_interval_ms()` routes to a 4× backoff (20 ms instead of 5 ms) until density drops. 14 LOC addition to `src/observer/core.rs`. AC-1 bit-exactness, AC-4-any, AC-4-strict (≥ 50 ms lead on ≥ 70 % of 30 trials) all preserved — the 20 ms cadence still gives ≥ 2 detects inside any 50 ms lead window. First optimization on this branch to clear the ≥ 2× ADR-154 §3.2 saturated-regime target.
415+
3. **Incremental Fiedler accumulator** — the O(n²) pair sweep is re-done each detect. An accumulator updated per spike in `on_spike` removes the sweep entirely. Larger surgery than (2); still the cleanest long-term fix if detector cost needs to drop another order of magnitude, but not needed after commit 10 hits the top-level target.
416416

417-
Each remaining lever is a single follow-up commit. None of them are in this commit's scope because the current ADR's scope is the five-AC + optimization-story demonstrator, not a production Fiedler kernel.
417+
The remaining item (3) is a named follow-up, not required for the demonstrator's SOTA target. Commit 10 is the load-bearing commit on the optimization arc.
418418

419419
**Lesson for the ADR's risk register (see §14, new row):** *measurement before optimization is necessary but not sufficient — measurement after optimization is what catches misdirected effort.* Commit 2's honest `BENCHMARK.md` entry ("we missed 2× SIMD, diagnosis to follow in a later commit") was correct that SIMD is the wrong lever; its guess about which other lever to pull next was wrong. Commit 7's empirical answer — "Opt D is real but drowned by a detector cost we hadn't measured" — is the kind of finding that only survives the measurement step, not the planning step. And commit 9's follow-up ("the obvious threshold fix is a 3× regression, not a win") is the same lesson applied one more level down: *even after a correct diagnosis, the obvious remediation still needs the measurement*.
420420

examples/connectome-fly/BENCHMARK.md

Lines changed: 46 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -8,8 +8,9 @@ This file is the binding record of every quantitative claim the example makes. N
88
|---|---|---|---|---|---|---|
99
| `sim_step_ms` per 10 ms simulated @ N=1024 | **2.00 ms** | **512 µs** | see §4.2 | **3.91× (scalar)** | ≥ 2× | PASS |
1010
| `lif_throughput_n_100` @ 120 ms simulated | **45.9 ms** | **44.97 ms** | **44.82 ms** | 1.003× (SIMD vs scalar) | ≥ 2× | MISS (saturation — diagnosis §4.5) |
11-
| `lif_throughput_n_1024` @ 120 ms simulated | **6.86 s** | **6.83 s** | **6.74 s** | 1.013× (SIMD vs scalar) | ≥ 2× | MISS (saturation — diagnosis §4.5, §4.7) |
12-
| `lif_throughput_n_1024` + delay-csr (Opt D, commit 6) | **6.81 s** | **6.75 s** | **6.75 s** | 1.00× full-bench / **1.5× kernel-only** | ≥ 2× | MISS at top-line, kernel win real; see §4.7 |
11+
| `lif_throughput_n_1024` @ 120 ms simulated (pre-adaptive) | **6.86 s** | **6.83 s** | **6.74 s** | 1.013× (SIMD vs scalar) | ≥ 2× | MISS (saturation — superseded by §4.10 win) |
12+
| `lif_throughput_n_1024` + delay-csr (Opt D, commit 7) | **6.81 s** | **6.75 s** | **6.75 s** | 1.00× full-bench / **1.5× kernel-only** | ≥ 2× | MISS at top-line; see §4.7 |
13+
| `lif_throughput_n_1024` + **adaptive cadence** (commit 10) | **1.70 s** | **1.57 s** | **1.57 s** | **~4.0× full-bench** | ≥ 2× | **PASS** — see §4.10 |
1314
| `motif_search` @ 512 neurons × 300 ms | **322 µs** | **340 µs** || 0.95× | ≥ 1.5× | MISS; see §5 |
1415
| `gpu_sdpa_10k` | cpu: see §8 | n/a | cuda: see §8 || N/A | CPU only in this commit; GPU stub; see §8 |
1516
| `sparse_fiedler_n_10_000` @ 60k spike window |||| **19.25 ms wallclock** | < 200 ms | **PASS** — 40× memory reduction vs dense (§4.8) |
@@ -201,6 +202,49 @@ Equivalence: delay-csr total spike count matches scalar-opt **exactly at 51 258
201202

202203
Commit 9's measurement is another instance of the ADR-154 §16 lesson: *even after a correct top-level diagnosis (detector dominates), the obvious remediation still needs the measurement.* Two of the three named levers in commit 7 remain plausible; one has been ruled out.
203204

205+
### 4.10 Adaptive detect cadence — ≥ 2× saturated-regime target finally hit (commit 10)
206+
207+
The second of the three observer-side levers named in §4.7 (and ADR-154 §16). Logic: under sustained saturated firing most 5 ms detects are redundant — the Fiedler value barely moves between consecutive ticks, but the detector still pays its full O(n²) pair-sweep + O(n²–n³) eigendecomposition cost each time. Back off to 20 ms when the co-firing window density exceeds ~100 Hz per neuron (i.e., `cofire_window.len() > 5 × num_neurons`); stay at 5 ms otherwise.
208+
209+
Implementation: 14 LOC addition to `src/observer/core.rs` — a `current_detect_interval_ms(&self)` helper that reads the current window density and routes to either the base `detect_every_ms` or a 4× backed-off interval.
210+
211+
**Measured on the commit-10 host (N=1024, 120 ms saturated, SIMD default):**
212+
213+
| Path | Median | Speedup vs scalar-opt pre-adaptive |
214+
|---|---|---|
215+
| baseline (heap+AoS), pre-adaptive | 6.86 s ||
216+
| SIMD-opt, pre-adaptive | 6.74 s | 1.00× |
217+
| **baseline (heap+AoS), adaptive cadence** | **1.70 s** | **4.03×** |
218+
| **SIMD-opt, adaptive cadence** | **1.57 s** | **4.29×** |
219+
220+
**ADR-154 §3.2 saturated-regime target was ≥ 2× over scalar-opt. Measured: 4.29×. PASS** — the first optimization on this branch to clear that target at the top-line saturated bench.
221+
222+
**Knock-on effects on the test suite** (all the long-running acceptance tests dropped ~4× wallclock in direct proportion to the detector share they spent in saturation):
223+
224+
| Test | Before | After | Speedup |
225+
|---|---|---|---|
226+
| `acceptance_causal` (AC-5) | 395 s | 100 s | 4.0× |
227+
| `acceptance_core` (AC-1..AC-4) | 63 s | 16 s | 4.0× |
228+
| `integration` | 32 s | 8.5 s | 3.8× |
229+
| `sparse_fiedler_10k` | 20 ms | 20 ms | unchanged (well under saturation threshold) |
230+
231+
**AC-4-strict guarantee preserved.** The backoff interval is 20 ms; AC-4-strict requires ≥ 50 ms lead on ≥ 70 % of trials. At 20 ms cadence the detector gets ≥ 2 detects inside any 50 ms lead window, so the precognitive claim still holds. AC-4-strict passes on 30/30 trials with the adaptive cadence enabled.
232+
233+
**AC-1 bit-exactness preserved.** The adaptive interval is deterministic given the spike-stream and the saturation threshold (both deterministic); two repeat runs follow the same dispatch schedule.
234+
235+
**Did Opt D (delay-sorted CSR, commit 7) become visible on the top-line?** Partially. With the detector no longer dominating by 450:1, the kernel's ~5 ms-per-step savings should show up as ~120 ms of the new 1.57 s median. Measured margin between SIMD-opt-adaptive and SIMD-opt-adaptive-with-delay-csr is within bench noise at this scale; a separate paired-sample criterion bench is required to isolate the kernel contribution cleanly. Named as follow-up.
236+
237+
**Summary of the optimization arc on this branch:**
238+
239+
| Commit | Optimization | Saturated-bench measured |
240+
|---|---|---|
241+
| 2 | SIMD (Opt C) | 1.013× — MISS |
242+
| 7 | Opt D delay-sorted CSR | 1.00× top-line, 1.5× kernel-only — MISS at top-line |
243+
| 9 | Drop sparse-Fiedler threshold | **3× regression — disproven** |
244+
| **10** | **Adaptive detect cadence** | **4.29× — HIT** |
245+
246+
The lesson the full arc makes concrete: throughput gaps diagnosed as "kernel-bound" via a pre-measurement guess can turn out to be *detector-bound* (commit 7's surprise), and even after that correction the right remediation is not necessarily the structurally-obvious one (commit 9's regression). The win came from changing *when* the detector runs, not *what* it does or *how* it is represented.
247+
204248
**Honest scorecard for Opt D:** the kernel optimization is real and in place; the top-line bench number doesn't show it yet; the reason is diagnosed and the next commit knows exactly what to do. This is the pattern BENCHMARK.md §4.5 predicted *before* this commit was built — now it is confirmed with measurement.
205249

206250
### 4.8 Sparse Fiedler dispatch for N > 1024 (commit 5, `feat/observer-sparse-fiedler`)

examples/connectome-fly/src/observer/core.rs

Lines changed: 28 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -117,6 +117,32 @@ impl Observer {
117117
&self.spikes
118118
}
119119

120+
/// Adaptive detect interval: under sustained saturated firing the
121+
/// Fiedler value barely changes between consecutive 5 ms detects,
122+
/// and each detect is O(n²) in window spikes + O(n²)–O(n³) in the
123+
/// Laplacian eigendecomposition. Backing off to 20 ms in saturation
124+
/// cuts the detector's share of wallclock 4× without losing any
125+
/// observable coherence event that AC-4's ≥ 50 ms strict-lead
126+
/// bound cares about (a 20 ms cadence still gives ≥ 2 detects
127+
/// inside any 50 ms lead window). See ADR-154 §16.
128+
///
129+
/// Saturation signal: total spikes in the sliding co-firing window
130+
/// divided by window size exceeds 100 Hz average per neuron. At
131+
/// the default 50 ms window with N neurons, that threshold is
132+
/// `5 × N` spikes in the window.
133+
fn current_detect_interval_ms(&self) -> f32 {
134+
let saturation_spikes = (self.num_neurons as usize).saturating_mul(5);
135+
if self.cofire_window.len() > saturation_spikes {
136+
// 4× backoff under saturation. Matches AC-4 §8.3's
137+
// constructed-collapse test envelope (markers at t≥500 ms;
138+
// constructed collapses span > 60 ms, so a 20 ms cadence
139+
// still catches any ≥50 ms pre-marker event).
140+
(self.detect_every_ms * 4.0).min(20.0).max(self.detect_every_ms)
141+
} else {
142+
self.detect_every_ms
143+
}
144+
}
145+
120146
/// Called by the engine on every spike emission.
121147
pub fn on_spike(&mut self, s: Spike) {
122148
self.spikes.push(s);
@@ -130,7 +156,8 @@ impl Observer {
130156
break;
131157
}
132158
}
133-
if s.t_ms - self.last_detect_ms >= self.detect_every_ms {
159+
let interval = self.current_detect_interval_ms();
160+
if s.t_ms - self.last_detect_ms >= interval {
134161
self.last_detect_ms = s.t_ms;
135162
self.detect(s.t_ms);
136163
}

0 commit comments

Comments
 (0)