Skip to content

knowledge: neurosymbolic + RLVR + causal learning-layer curriculum (v1) — 8 papers, 5-PR roadmap, governance only#373

Merged
AdaWorldAPI merged 1 commit into
mainfrom
claude/learning-layer-curriculum
May 15, 2026
Merged

knowledge: neurosymbolic + RLVR + causal learning-layer curriculum (v1) — 8 papers, 5-PR roadmap, governance only#373
AdaWorldAPI merged 1 commit into
mainfrom
claude/learning-layer-curriculum

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

8-paper curriculum + 5-PR roadmap for the stack's missing self-improvement loop. No code changes — plan + board hygiene per CLAUDE.md Mandatory Board-Hygiene Rule (new integration plan → INTEGRATION_PLANS.md PREPEND + .claude/knowledge/<name>-v<N>.md + EPIPHANIES.md PREPEND).

What this composes

The substrate landed in PR #372 (causaledge64-mailbox-rename-soa-v1) — AriGraph SPO-G + CausalEdge64 v2 + Σ-tier router + MailboxSoA. What this curriculum adds is the learning loop on top of that substrate — the verbs that turn the existing Think struct into a system that trains itself.

8 papers in 4 reading tiers (~6 hours total):

Tier Papers Stack verb
0 — Doctrinal Causal de Finetti · Executable CFG ICM grouping, Pearl 2³ trainable
1 — Method LINC · GRPO Σ9-Σ10 prover dispatch, RLVR algorithm
2 — Generation LPN · TextGrad · Opt-Sym StyleVector test-time grad, hybrid prompt opt, symbolic data gen
3 — Safety Conformal CFG Calibrated bounds for L4 outputs

Stack mapping (5 live, 4 missing)

Live (PR #372): AriGraph SPO-G quads · StyleVectors · Σ9-Σ10 → L4 shell · MUL gate · Pearl 2³ in NarsEngine

Missing (this curriculum's PR roadmap fills): NARS Intervention/Counterfactual verbs · ICM-invariance column · TextGrad-style style_synthesize · GRPO trainer · LINC bridge + conformal CFG

5-PR sequencing

# Scope LOC Risk
PR-LL-1 NARS Intervention/Counterfactual verbs + AriGraph::intervene_on ~200 Low
PR-LL-2 ICM-invariance column + lance-graph-planner::data_gen (Opt-Sym generator) ~800 Med
PR-LL-3 Hybrid TextGrad/LPN style_synthesize ~400 Med
PR-LL-4 crates/lance-graph-trainer/ (GRPO loop) ~800 High
PR-LL-5 crates/linc-bridge/ (Z3 prover + conformal CFG) ~600 Med

Sequential. PR-LL-4 requires ~2 weeks separate prep for the Qwen3-head-via-candle wiring.

Doctrinal claims worth flagging

  1. Stack's NARS verifier is strictly stronger than Opt-Sym's LLM verifier — graded confidence ∈ [0,1] is better than binary commit/reject as a GRPO reward signal. The stack gets a free-quality improvement on the published method.
  2. Stack's StyleVectors is already an LPN-style continuous latent space — the gradient-at-inference operator is what's missing, not the representation.
  3. The MUL gate is already the LINC dispatch shape — LINC fills the "what does L4 actually compute" slot in the existing Σ9-Σ10 escalation path.
  4. Conformal CFG with Jirak bounds, not Berry-Esseen — counterfactual rollouts share latent abduction; correlation makes classical bounds underestimate variance. The stack's I-NOISE-FLOOR-JIRAK iron rule already encodes this.

6 Open Questions (ratify before sprint fan-out)

  • OQ-LL-1 reward shape (graded NARS confidence vs binary) — recommendation: graded
  • OQ-LL-2 TextGrad optimizer location (local Qwen3 vs frontier API) — recommendation: local with frontier fallback
  • OQ-LL-3 prover choice (Z3 vs Prover9 vs HOL Light) — recommendation: Z3 default, HOL Light for verified-code consumers
  • OQ-LL-4 style-pool location (contract vs separate learned_styles) — recommendation: separate pool, contract exposes StylePoolProvider trait
  • OQ-LL-5 ICM-invariance update protocol — recommendation: clear bit on counterfactual contradiction
  • OQ-LL-6 Σ-tier-as-difficulty probe (hot-path latency)

Iron rule audit (all 6 satisfied)

Rule Status
I-SUBSTRATE-MARKOV All synthesized trajectories pass Chapman-Kolmogorov test
I-NOISE-FLOOR-JIRAK Conformal calibration uses Jirak bounds, not Berry-Esseen
I-VSA-IDENTITIES style_synthesize produces identity fingerprints, not content
I1 BindSpace read-only IcmInvarianceColumn writes via CollapseGate::bundle
Method-on-carrier All 4 new capabilities are methods on existing carriers
AGI-as-glove SoA New styles land in StyleColumn extension, no new layer

Blast radius

  • 2 new crates: lance-graph-trainer + linc-bridge (~1400 LOC)
  • 3 crates modified: lance-graph-planner, causal-edge, lance-graph-contract
  • Zone 3 surface UNCHANGED
  • ndarray side UNCHANGED (curriculum stays thinking-side of the doctrinal split)
  • External deps gated: z3-rs (PR-LL-5), candle/burn (PR-LL-4)

Files changed

  • NEW: .claude/knowledge/neurosymbolic-rlvr-causal-curriculum-v1.md (~620 lines, 12 sections)
  • PREPEND: .claude/board/EPIPHANIES.md (E-LL-CURRICULUM-1)
  • PREPEND: .claude/board/INTEGRATION_PLANS.md (plan-index entry)

3 files changed, +617 lines. Zero code.

Test plan

  • Knowledge doc renders cleanly (markdown)
  • EPIPHANIES.md PREPEND format matches existing entries (append-only governance preserved)
  • INTEGRATION_PLANS.md PREPEND format matches existing entries
  • No code in this PR — nothing to compile or test
  • Reviewer ratifies §7 OQ-LL-1 through OQ-LL-6 before sprint-11 worker fan-out
  • Reviewer confirms §5 stack-substrate alignment table accurately reflects current state post-PR specs(sprint-10): 12-worker CCA2A fleet + meta-review (governance) #372

What this PR does NOT touch

  • No code. Zero .rs / Cargo.toml changes.
  • No new crates. All 5 PR-LL-* sub-PRs describe future crates; this PR does not create them.
  • No ndarray changes. Curriculum is doctrinally thinking-side; ndarray stays as hardware substrate.
  • PR-LL-1 through PR-LL-5 implementation workers — owned by the next sprint kick-off after this curriculum merges and OQs are ratified.
  • Pre-training of foundation models — outside scope. Stack uses Jina v5 / Qwen3 / ModernBERT / etc. as frozen encoders; PR-LL-4 fine-tunes a head, not the foundation.

Predecessor

PR #372 (causaledge64-mailbox-rename-soa-v1) — the substrate this curriculum builds the learning loop on top of. Without PR #372's SPO-G quads, MailboxSoA, and Σ-tier router, the curriculum has nothing to wire into.


Generated by Claude Code

8-paper synthesis composing Schölkopf SCMs + MIT BPL + LINC + GRPO into a
5-PR roadmap for the stack's missing self-improvement loop. Governance
only — no code changes. Doc + 2 board PREPENDs per CLAUDE.md Mandatory
Board-Hygiene Rule.

## What this composes

Eight papers, four tiers, ~6 hours total reading load:

Tier 0 (doctrinal frame):
- Causal de Finetti (Guo+Schölkopf 2022, arXiv:2203.15756) — ICM principle,
  AriGraph SPO-G grouping doctrine
- Executable Counterfactuals (Vashishtha 2025, arXiv:2510.01539) — Pearl
  2³ trainable verbs, RL>SFT for OOD

Tier 1 (method substrate):
- LINC (Olausson+Solar-Lezama+Tenenbaum 2023, arXiv:2310.15164) — Σ9-Σ10
  classical-prover dispatch
- GRPO/DeepSeekMath (Shao 2024, arXiv:2402.03300) — RLVR algorithm spec

Tier 2 (closed-loop generation):
- LPN (Bonnet 2024, arXiv:2411.08706) — StyleVectors test-time gradient
- TextGrad (Yuksekgonul 2024, ~arXiv:2406.07496) — textual-gradient prompt
  optimizer
- Opt-Sym (Yeo+Solar-Lezama 2026, ID pending) — symbolic-space adaptive
  data generation

Tier 3 (safety/calibration):
- Conformal CFG (Farzaneh 2026, arXiv:2601.20090) — calibrated bounds for
  L4 planner outputs

## Stack mapping (5 live components, 4 missing modules)

Live: AriGraph SPO-G quads (PR #372) · StyleVectors (cache::triple_model) ·
Σ9-Σ10 → L4 dispatch shell · MUL gate · Pearl 2³ in nars_engine

Missing: NARS Intervention/Counterfactual verbs · ICM-invariance column ·
TextGrad-style style_synthesize · GRPO trainer · LINC bridge

## 5-PR roadmap (each ~200-800 LOC, sequential, PR-LL-1..LL-5)

PR-LL-1: NARS Intervention/Counterfactual InferenceType variants + AriGraph
::intervene_on — closes Pearl 2³ dispatch gap

PR-LL-2: ICM-invariance BindSpace column (1 bit/row, gated through
CollapseGate per I1) + lance-graph-planner::data_gen module (Opt-Sym
generator targeting Σ-tier as difficulty axis, with NARS+Chapman-Kolmogorov
deterministic verification stronger than Opt-Sym's LLM verifier)

PR-LL-3: Hybrid TextGrad/LPN style_synthesize — numerical autograd on
StyleVector + textual gradient on rendering prompt — closes
THINKING_ORCHESTRATION_WIRING.md Gap 1 (12 vs 36 ThinkingStyle)

PR-LL-4: crates/lance-graph-trainer/ — GRPO loop with NARS confidence
∈ [0,1] as graded reward (strictly stronger than Opt-Sym binary). Backed
by candle or burn for the Qwen3 head fine-tune. ~2 weeks separate prep
work for the head-via-candle wiring.

PR-LL-5: crates/linc-bridge/ — Z3 SMT prover + Farzaneh-style conformal
counterfactual wrap. SMT theories (arithmetic, bitvectors, arrays) match
stack queries better than pure FOL. Required for MedCare-rs / q2
high-stakes safety.

## 6 Open Questions (ratify before sprint fan-out)

OQ-LL-1 reward shape (graded NARS confidence vs binary)
OQ-LL-2 TextGrad optimizer location (local Qwen3 vs frontier API)
OQ-LL-3 prover choice (Z3 vs Prover9 vs HOL Light)
OQ-LL-4 style-pool location (contract vs separate learned_styles pool)
OQ-LL-5 ICM-invariance update protocol (when does invariance bit clear?)
OQ-LL-6 Σ-tier-as-difficulty probe (hot-path latency check)

## Iron rule audit (all 6 satisfied)

I-SUBSTRATE-MARKOV: synthesized trajectories pass Chapman-Kolmogorov in
PR-LL-2 verify step
I-NOISE-FLOOR-JIRAK: PR-LL-5 conformal calibration uses Jirak bounds (not
classical Berry-Esseen) — counterfactual rollouts share latent abduction,
classical bounds underestimate variance
I-VSA-IDENTITIES: style_synthesize produces identity fingerprints, not
content
I1: IcmInvarianceColumn writes through CollapseGate::bundle, never raw
assignment
Method-on-carrier: all 4 new capabilities are methods on existing carriers
(AriGraph, StyleVector, Student, Query)
AGI-as-glove SoA: synthesized styles land in StyleColumn extension; no
new layer

## Blast radius

- 2 new crates: lance-graph-trainer + linc-bridge (~1400 LOC total)
- 3 crates modified: lance-graph-planner, causal-edge, lance-graph-contract
- Zone 3 surface UNCHANGED
- ndarray side UNCHANGED (curriculum stays thinking-side of doctrinal split)
- External deps gated behind features: z3-rs (PR-LL-5), candle/burn (PR-LL-4)

## Files changed

- NEW: .claude/knowledge/neurosymbolic-rlvr-causal-curriculum-v1.md
  (~620 lines, 12 sections, 5-PR roadmap + 6 OQs + iron-rule audit)
- PREPEND: .claude/board/EPIPHANIES.md (E-LL-CURRICULUM-1)
- PREPEND: .claude/board/INTEGRATION_PLANS.md (plan-index entry)

3 files changed. Pure plan + board hygiene. No code.

## Predecessor / successor

Predecessor: PR #371/#372 (causaledge64-mailbox-rename-soa-v1) substrate
— this curriculum is the learning loop on top of that substrate

Successor: PR-LL-1 through PR-LL-5 (the curriculum is the spec for the
sequential implementation wave; each PR fan-outs via the established
CCA2A 12-worker pattern)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants