knowledge: neurosymbolic + RLVR + causal learning-layer curriculum (v1) — 8 papers, 5-PR roadmap, governance only#373
Merged
Conversation
8-paper synthesis composing Schölkopf SCMs + MIT BPL + LINC + GRPO into a 5-PR roadmap for the stack's missing self-improvement loop. Governance only — no code changes. Doc + 2 board PREPENDs per CLAUDE.md Mandatory Board-Hygiene Rule. ## What this composes Eight papers, four tiers, ~6 hours total reading load: Tier 0 (doctrinal frame): - Causal de Finetti (Guo+Schölkopf 2022, arXiv:2203.15756) — ICM principle, AriGraph SPO-G grouping doctrine - Executable Counterfactuals (Vashishtha 2025, arXiv:2510.01539) — Pearl 2³ trainable verbs, RL>SFT for OOD Tier 1 (method substrate): - LINC (Olausson+Solar-Lezama+Tenenbaum 2023, arXiv:2310.15164) — Σ9-Σ10 classical-prover dispatch - GRPO/DeepSeekMath (Shao 2024, arXiv:2402.03300) — RLVR algorithm spec Tier 2 (closed-loop generation): - LPN (Bonnet 2024, arXiv:2411.08706) — StyleVectors test-time gradient - TextGrad (Yuksekgonul 2024, ~arXiv:2406.07496) — textual-gradient prompt optimizer - Opt-Sym (Yeo+Solar-Lezama 2026, ID pending) — symbolic-space adaptive data generation Tier 3 (safety/calibration): - Conformal CFG (Farzaneh 2026, arXiv:2601.20090) — calibrated bounds for L4 planner outputs ## Stack mapping (5 live components, 4 missing modules) Live: AriGraph SPO-G quads (PR #372) · StyleVectors (cache::triple_model) · Σ9-Σ10 → L4 dispatch shell · MUL gate · Pearl 2³ in nars_engine Missing: NARS Intervention/Counterfactual verbs · ICM-invariance column · TextGrad-style style_synthesize · GRPO trainer · LINC bridge ## 5-PR roadmap (each ~200-800 LOC, sequential, PR-LL-1..LL-5) PR-LL-1: NARS Intervention/Counterfactual InferenceType variants + AriGraph ::intervene_on — closes Pearl 2³ dispatch gap PR-LL-2: ICM-invariance BindSpace column (1 bit/row, gated through CollapseGate per I1) + lance-graph-planner::data_gen module (Opt-Sym generator targeting Σ-tier as difficulty axis, with NARS+Chapman-Kolmogorov deterministic verification stronger than Opt-Sym's LLM verifier) PR-LL-3: Hybrid TextGrad/LPN style_synthesize — numerical autograd on StyleVector + textual gradient on rendering prompt — closes THINKING_ORCHESTRATION_WIRING.md Gap 1 (12 vs 36 ThinkingStyle) PR-LL-4: crates/lance-graph-trainer/ — GRPO loop with NARS confidence ∈ [0,1] as graded reward (strictly stronger than Opt-Sym binary). Backed by candle or burn for the Qwen3 head fine-tune. ~2 weeks separate prep work for the head-via-candle wiring. PR-LL-5: crates/linc-bridge/ — Z3 SMT prover + Farzaneh-style conformal counterfactual wrap. SMT theories (arithmetic, bitvectors, arrays) match stack queries better than pure FOL. Required for MedCare-rs / q2 high-stakes safety. ## 6 Open Questions (ratify before sprint fan-out) OQ-LL-1 reward shape (graded NARS confidence vs binary) OQ-LL-2 TextGrad optimizer location (local Qwen3 vs frontier API) OQ-LL-3 prover choice (Z3 vs Prover9 vs HOL Light) OQ-LL-4 style-pool location (contract vs separate learned_styles pool) OQ-LL-5 ICM-invariance update protocol (when does invariance bit clear?) OQ-LL-6 Σ-tier-as-difficulty probe (hot-path latency check) ## Iron rule audit (all 6 satisfied) I-SUBSTRATE-MARKOV: synthesized trajectories pass Chapman-Kolmogorov in PR-LL-2 verify step I-NOISE-FLOOR-JIRAK: PR-LL-5 conformal calibration uses Jirak bounds (not classical Berry-Esseen) — counterfactual rollouts share latent abduction, classical bounds underestimate variance I-VSA-IDENTITIES: style_synthesize produces identity fingerprints, not content I1: IcmInvarianceColumn writes through CollapseGate::bundle, never raw assignment Method-on-carrier: all 4 new capabilities are methods on existing carriers (AriGraph, StyleVector, Student, Query) AGI-as-glove SoA: synthesized styles land in StyleColumn extension; no new layer ## Blast radius - 2 new crates: lance-graph-trainer + linc-bridge (~1400 LOC total) - 3 crates modified: lance-graph-planner, causal-edge, lance-graph-contract - Zone 3 surface UNCHANGED - ndarray side UNCHANGED (curriculum stays thinking-side of doctrinal split) - External deps gated behind features: z3-rs (PR-LL-5), candle/burn (PR-LL-4) ## Files changed - NEW: .claude/knowledge/neurosymbolic-rlvr-causal-curriculum-v1.md (~620 lines, 12 sections, 5-PR roadmap + 6 OQs + iron-rule audit) - PREPEND: .claude/board/EPIPHANIES.md (E-LL-CURRICULUM-1) - PREPEND: .claude/board/INTEGRATION_PLANS.md (plan-index entry) 3 files changed. Pure plan + board hygiene. No code. ## Predecessor / successor Predecessor: PR #371/#372 (causaledge64-mailbox-rename-soa-v1) substrate — this curriculum is the learning loop on top of that substrate Successor: PR-LL-1 through PR-LL-5 (the curriculum is the spec for the sequential implementation wave; each PR fan-outs via the established CCA2A 12-worker pattern)
This was referenced May 15, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
8-paper curriculum + 5-PR roadmap for the stack's missing self-improvement loop. No code changes — plan + board hygiene per CLAUDE.md Mandatory Board-Hygiene Rule (
new integration plan → INTEGRATION_PLANS.md PREPEND + .claude/knowledge/<name>-v<N>.md + EPIPHANIES.md PREPEND).What this composes
The substrate landed in PR #372 (causaledge64-mailbox-rename-soa-v1) — AriGraph SPO-G + CausalEdge64 v2 + Σ-tier router + MailboxSoA. What this curriculum adds is the learning loop on top of that substrate — the verbs that turn the existing
Thinkstruct into a system that trains itself.8 papers in 4 reading tiers (~6 hours total):
Stack mapping (5 live, 4 missing)
Live (PR #372): AriGraph SPO-G quads · StyleVectors · Σ9-Σ10 → L4 shell · MUL gate · Pearl 2³ in NarsEngine
Missing (this curriculum's PR roadmap fills): NARS Intervention/Counterfactual verbs · ICM-invariance column · TextGrad-style style_synthesize · GRPO trainer · LINC bridge + conformal CFG
5-PR sequencing
lance-graph-planner::data_gen(Opt-Sym generator)crates/lance-graph-trainer/(GRPO loop)crates/linc-bridge/(Z3 prover + conformal CFG)Sequential. PR-LL-4 requires ~2 weeks separate prep for the Qwen3-head-via-candle wiring.
Doctrinal claims worth flagging
6 Open Questions (ratify before sprint fan-out)
learned_styles) — recommendation: separate pool, contract exposesStylePoolProvidertraitIron rule audit (all 6 satisfied)
style_synthesizeproduces identity fingerprints, not contentIcmInvarianceColumnwrites viaCollapseGate::bundleStyleColumnextension, no new layerBlast radius
lance-graph-trainer+linc-bridge(~1400 LOC)lance-graph-planner,causal-edge,lance-graph-contractz3-rs(PR-LL-5),candle/burn(PR-LL-4)Files changed
.claude/knowledge/neurosymbolic-rlvr-causal-curriculum-v1.md(~620 lines, 12 sections).claude/board/EPIPHANIES.md(E-LL-CURRICULUM-1).claude/board/INTEGRATION_PLANS.md(plan-index entry)3 files changed, +617 lines. Zero code.
Test plan
What this PR does NOT touch
.rs/Cargo.tomlchanges.Predecessor
PR #372 (causaledge64-mailbox-rename-soa-v1) — the substrate this curriculum builds the learning loop on top of. Without PR #372's SPO-G quads, MailboxSoA, and Σ-tier router, the curriculum has nothing to wire into.
Generated by Claude Code