Skip to content

RISC Thought Engine: softmax fix + highheelbgz encoding proof#142

Merged
AdaWorldAPI merged 2 commits into
mainfrom
claude/risc-thought-engine-TCZw7
Apr 6, 2026
Merged

RISC Thought Engine: softmax fix + highheelbgz encoding proof#142
AdaWorldAPI merged 2 commits into
mainfrom
claude/risc-thought-engine-TCZw7

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Solves attractor collapse and proves highheelbgz encoding is lossless.

Attractor Collapse Fix

ReLU → Softmax(T=0.1). One line change. Measured on 3 models × 20 queries:

Method Qwen3-VL Reranker Jina-v5
Positive-only (old) 42% 26%
Signed + ReLU 43% 32%
Signed + softmax T=0.1 70% 77% 43%

HighHeelBGZ Encoding Proof

Encoding Qwen3-VL Reranker Jina-v5 Size
RAW f32 (truth) 70% 77% 43% 256 KB
BF16 highheelbgz 71% 78% 43% 128 KB
i8 direct 70% 77% 43% 64 KB
u8 CDF 49% 50% 49% 64 KB
γ+φ 49% 50% 49% 64 KB

BF16 and i8 = lossless. u8 CDF confirmed broken. γ+φ confirmed no-op.

New Files

  • f32_engine.rs — f32 ThinkingEngine with softmax normalization (7 tests)
  • contrastive_learner.rs — online table learning from forward passes (8 tests)
  • Dockerfile — build with codebooks from GitHub Release
  • QUICKSTART.md — LM Studio-style usage guide
  • benchmark_thinking.rs — honest thinking vs plain cosine benchmark

Dead Code Audit

15 warnings → 0. All unused imports/variables fixed.
signed_engine: from_unsigned() deprecated, from_f32_cosines() is correct path.

Stats

  • 310 lib tests pass
  • 3 models encoded (Release v0.2.0)
  • Reranker correction tables (ONNX training targets)
  • Cronbach alpha quorum measured
  • 8-stage development roadmap documented

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A

claude added 2 commits April 6, 2026 18:42
ReLU destroys inhibition information → power iteration to dominant eigenvector.
Softmax(1/T) preserves relative ordering → energy concentrates on CORRECT matches.

Benchmark results (20 queries, 10 cycles, f32 tables):

  Method                    Qwen3-VL    Reranker
  ──────                    ────────    ────────
  Positive-only (old)       42% top-5   26% top-5   entropy +6%/+19%
  Signed + ReLU (broken)    43%         32%         entropy +1%/+2%
  Signed + softmax T=0.1    70% top-5   77% top-5   entropy -21%/-31%  ← WINNER
  Sparse top-8              40%         39%         entropy -30%/+4%
  Residual α=0.3            47%         74%         entropy +5%/+17%
  SiLU dampening            43%         36%         entropy +1%/-12%
  Gestalt awareness         42%         26%         zero improvement
  Orchestrated combo        42%         46%         worse than individual

Softmax T=0.1 wins on BOTH models:
  - Top-5 agreement with plain cosine: 70-77% (was 26-42%)
  - Entropy DECREASES (focuses energy, doesn't diffuse)
  - 100% peak diversity (no attractor collapse)
  - Simple: one line change (ReLU → softmax)

7 unit tests pass including anti-collapse test.

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
ENCODING PROOF (softmax T=0.1, 10 cycles, 20 queries × 3 models):

  Source              Qwen3-VL  Reranker  Jina-v5
  ────────            ────────  ────────  ───────
  RAW F32 (truth)       70%       77%       43%
  BF16 highheelbgz      71%       78%       43%   ← LOSSLESS (128 KB)
  i8 direct             70%       77%       43%   ← LOSSLESS (64 KB)
  u8 CDF                49%       50%       49%   ← BROKEN
  γ+φ                   49%       50%       49%   ← IDENTICAL to CDF (no-op)

BF16 and i8 match or beat f32 ground truth on all 3 models.
u8 CDF confirmed broken (destroys cosine geometry).
γ+φ confirmed no-op (identical ranking to CDF).

Also: Dockerfile, QUICKSTART.md (LM Studio-style usage),
fixed dual_signed_experiment.rs broken API call.

https://claude.ai/code/session_019RzHP8tpJu55ESTxhfUy1A
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f890ad0a05

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@@ -0,0 +1,36 @@
FROM rust:1.82-bookworm AS builder
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Use a Rust toolchain that satisfies crate rust-version

crates/thinking-engine/Cargo.toml declares rust-version = "1.94", but this Dockerfile pins the builder image to Rust 1.82; Cargo will refuse to compile when the compiler is below the package’s rust-version. As written, the image build is blocked before the thinking-engine binary is produced, so the base image/toolchain needs to be upgraded to match the crate requirement.

Useful? React with 👍 / 👎.

rm /tmp/*.tar.gz

# Build thinking engine
RUN cargo build --release --manifest-path crates/thinking-engine/Cargo.toml
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Build the playground example before copying it

The runtime stage copies target/release/examples/playground, but the build step only runs plain cargo build --release --manifest-path ...; per cargo build --help, examples are only built when --example <name> or --examples is specified. This means the expected artifact is not created, causing the later COPY --from=builder .../examples/playground to fail.

Useful? React with 👍 / 👎.

Comment on lines +50 to +51
cd crates/thinking-engine
docker build -t thinking-engine .
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Correct Docker quickstart build context

The quickstart tells users to run docker build from crates/thinking-engine, but this Dockerfile references workspace-root paths like crates/thinking-engine/... and crates/bgz-tensor/... in COPY directives. With the documented context, those paths do not exist, so the first copy step fails and users cannot reproduce the Docker workflow.

Useful? React with 👍 / 👎.

@AdaWorldAPI AdaWorldAPI merged commit 718b4ca into main Apr 6, 2026
AdaWorldAPI pushed a commit that referenced this pull request Apr 17, 2026
…eam #150

## Stale artifact removal (182 files, 3 MB)

`AdaWorldAPI-lance-graph-d9df43b/` was a committed snapshot of an older
upstream version (48 .rs files vs our 98). Full audit confirmed:
  - ZERO files exist only in the artifact (every file has a counterpart)
  - Every differing file: ours >= artifact in LOC (ours is strictly ahead)
  - All upstream features (#125 parameter_substitution, #140 lance_vector_search)
    are already in our src tree

The directory created GitHub path confusion — duplicate navigation paths
for datafusion_planner, spo, blasgraph, neighborhood, arigraph. Removing
it eliminates that confusion with zero content loss.

## Cherry-pick: spark_dialect.rs from upstream PR #150

The ONE file upstream has that we didn't:
  - `crates/lance-graph/src/spark_dialect.rs` (107 LOC)
    Spark SQL dialect for DataFusion unparser: backtick quoting, STRING
    type casting, EXTRACT for dates, BIGINT/INT types, LENGTH(), derived
    table aliases.
  - `crates/lance-graph/tests/test_to_spark_sql.rs` (293 LOC)
    Full test suite for Spark SQL output.
  - `pub mod spark_dialect;` added to lib.rs

Adapted from upstream's DF 50.3 to our DF 51 — same API surface, no
changes needed.

## Upstream audit result (for the record)

Upstream (lance-format/lance-graph) is at v0.5.4. Our fork is at v0.5.3
with newer deps (arrow 57 vs 56.2, datafusion 51 vs 50.3). Other than
spark_dialect, every upstream feature and fix is already present in our
source tree — parameter_substitution (#125), lance_vector_search (#140),
complex RETURN clauses (#142), duplicate columns fix (#128) are all in
`crates/lance-graph/src/`.

Their deleted `simple_executor` was a prototype cold-path executor we
never had. Our `ExecutionStrategy::DataFusion` path (6K LOC planner)
+ `ExecutionStrategy::BlasGraph` (semiring algebra) subsume it. The
user has flagged adding a deliberate `ExecutionStrategy::Simple` cold
path as a 4th strategy for trivial queries — that's a separate PR per
the documented matrix of execution strategies.

https://claude.ai/code/session_01NYGrxVopyszZYgLBxe4hgj
AdaWorldAPI pushed a commit that referenced this pull request May 13, 2026
…1/P2 fixes)

PREPEND #364 entry to PR_ARC_INVENTORY.md (Added / Locked / Deferred /
Docs / Confidence / Correction lines per the file's append-only contract).
Update LATEST_STATE.md header and PR table row with #364 summary plus
ndarray #142 adjacent-landing note.

Per CLAUDE.md Mandatory Board-Hygiene Rule: a merged PR requires
LATEST_STATE row + PR_ARC PREPEND in the same commit. This entry closes
that gap (commit lands after merge because #364 was merged before this
session reconstructed state; not a same-PR commit but the durable
record is what the rule protects).

Notes on locked decisions:
* OwlIdentity canonical wire form is now 3 bytes [family, slot_lo,
  slot_hi]; cross-language emitters use OwlIdentity::to_canonical_bytes.
* UnifiedAuditEvent::canonical_bytes is 26 bytes (owl at [13..16)).
* OgitFamilyTable is sparse HashMap<u16, FamilyEntry>; 256-slot framing
  retired.
* Audit super_domain comes from AuditChain.super_domain(), not the
  static FAMILY_TO_SUPER_DOMAIN.
* Sprint-5+ worker prompts: 12-step .claude/plans/ read-order is a hard
  precondition.

Deferred (documented in #364 entry, not closed by this commit):
* PR-B medcare-rs UnifiedBridge: commits on remote integration branch,
  no PR opened.
* PR-C smb-office-rs UnifiedBridge: same shape.
* Per-namespace u8 slot in RegistryState::append: declined this session
  (widening to u16 in 3208743 is the chosen fix; per-namespace would
  cascade into BindSpace + enumerate_first_with_entity_type_id rewrite,
  see TECH_DEBT).
AdaWorldAPI pushed a commit that referenced this pull request May 13, 2026
#364

Sprint-5 cross-repo coordinated landing complete: MedCare-rs#112 (PR-B,
UnifiedBridge<MedcareBridge> + medcare-rbac/realtime substrate) and
smb-office-rs#31 (PR-C, UnifiedBridge<OgitBridge> wiring) both merged
2026-05-13 same day as lance-graph #364 + ndarray #142.

Updates:
* PR_ARC #364 Confidence line: append "Adjacent landings (2026-05-13)"
  note listing both downstream merges. Confidence line is the only
  mutable field per APPEND-ONLY RULE point 3.
* LATEST_STATE header: rewrite to capture the full four-PR landing
  (lance-graph + MedCare + smb-office + ndarray) so cold-start sessions
  see the coordinated landing at a glance.

D-SDR-5's UnifiedBridge surface is now consumed end-to-end.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants