feat: MoE architecture — 4096 experts, top-128 sparse, 4-group hierarchy 4096 experts → router selects top-128 → each runs 16 internal layers → 4 hierarchical groups where expert outputs compose → collapse → token Architecture maps Qwopus's actual MoE structure: Router: 4096×4096 input distance table (which experts respond?) Expert internals: 256×256 per-layer tables (how does each expert think?) Sparse activation: only top-128 of 4096 fire (97% sparsity) Hierarchical meeting: experts compose at 4 intermediate points Performance: 42s for 3 prompts in release (128×16 MatVec per token) → production needs SIMD batching or GPU Output quality still blocked by attractor collapse ("!" dominates). Needs: thinking style temperature + persona routing + ONNX gate correction. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp#110
Merged
Conversation
…ared Three gate modes on Qwopus 27B (64 layers, real tokenizer): No Gate: attn → up → down (no gate at all) Gate Filter: attn → gate×SiLU×up → down (multiplicative, current) Gate NARS: gate modulates NARS truth → NARS gates attention+FFN Gate NARS approach: - Gate topology → per-centroid agreement score with active neighbors - Agreement → NARS freq↑ conf↑ (trusted path, strengthen) - Disagreement → conf↓ (uncertain, explore) - NARS expectation modulates attention output - NARS confidence gates FFN magnitude - Confidence decays 1%/layer (prevents premature crystallization) Results on "The meaning of life is": No Gate: entropy=4.733, top=[85,5,53] (diverse, unfocused) Gate Filter: entropy=4.534, top=[97,121,81] (focused, code-heavy) Gate NARS: entropy=4.639, top=[0,4,5] (intermediate, common tokens) All three discriminate between prompts. Gate NARS produces unique routing. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
The system DRIVES ITSELF: 16M compositions → interference pattern → peaks emerge peaks = TENSION (free energy = |hidden - ghost_prediction|²) high tension → must keep composing → next cycle ghost learns (EMA) → tension drops → thought resolves → stop Autoregressive: each collapsed token re-enters context for next step. Recency-weighted: recent tokens contribute more to hidden state. Free energy decay: 1.0 → 0.12 across 8-12 steps = thought completes. Results (nonsense but mechanically correct): "meaning of life" → generates 9 tokens, stops on natural break "AI will" → 12 tokens, stops when tension resolves (fe=0.15) "once upon a time" → 8 tokens, tension resolves quickly (fe=0.12) Output quality blocked by: - 256 routing centroids too coarse (need 4096 in loop) - First-token-in-cluster selection (need frequency-weighted) - No temperature sampling (argmax locks onto attractors) The metabolic loop IS the driving force. Resolution IS the answer. https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
4096 experts → router selects top-128 → each runs 16 internal layers
→ 4 hierarchical groups where expert outputs compose → collapse → token
Architecture maps Qwopus's actual MoE structure:
Router: 4096×4096 input distance table (which experts respond?)
Expert internals: 256×256 per-layer tables (how does each expert think?)
Sparse activation: only top-128 of 4096 fire (97% sparsity)
Hierarchical meeting: experts compose at 4 intermediate points
Performance: 42s for 3 prompts in release (128×16 MatVec per token)
→ production needs SIMD batching or GPU
Output quality still blocked by attractor collapse ("!" dominates).
Needs: thinking style temperature + persona routing + ONNX gate correction.
https://claude.ai/code/session_01ChLvBfpJS8dQhHxRD4pYNp
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 19, 2026
Append FINDING to EPIPHANIES.md. PR #213 + ndarray PR #110 demonstrated the dumb-bookkeeper pattern: ~90 seconds, Haiku, enumerate+match+append. Result is a grep-addressable index of every shipped artifact keyed by the prompt-file brief that birthed it. For every future "what did we ship about X" query the ledger replaces a full-codebase grep with a single line — ~25 tokens vs ~25M tokens. Seven orders of magnitude cheaper. https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
AdaWorldAPI
pushed a commit
that referenced
this pull request
Apr 19, 2026
Concrete three-pass recipe added to the cca2a skill:
Pass 1 — Haiku bookkeeper (~90 s, mechanical): enumerate prompt files,
match against git log, append one ledger line per pair.
Pass 2 — Opus meta-synthesizer (read-only inputs): annotate the "none"
rows from Pass 1 with superseded/open/stale classification.
Pass 3 — Main thread consumer (sub-second per query): grep the ledger
for every "what's open / shipped / about X" question.
Closes three token-waste channels simultaneously:
- cold-start (20-30 turns → 3-5 turns)
- find-code (~25M tokens → ~25 tokens, 10⁷×)
- ambient arc knowledge (30-50% → 0%)
First deployments:
- PR #213 (lance-graph, 41 prompts mapped, 90 s)
- PR #110 (ndarray, 25 prompts mapped, 90 s)
Linked from SKILL.md under "What to read when".
https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.