Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
131 changes: 131 additions & 0 deletions .claude/CODING_PRACTICES.md
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,137 @@ scalar fallback INTERNALLY; the consumer never hand-rolls.
hot path → reject + cite this section. Exception: the ndarray
crate itself implements backends, not a violation.

### How `ndarray::simd::*` resolves to backends (polyfill chain)

The `simd.rs` module in ndarray is the **single public surface**; it
re-exports concrete types from backend files based on `cfg` target
features. Consumers never reach around it. The chain:

```
┌─────────────────────────────────────────────────────────────────┐
│ ndarray::simd (src/simd.rs) ← the ONLY consumer surface │
│ │
│ Re-exports F32x16 / U8x64 / F16x32 / F64x8 / BF16x32 etc. from │
│ the right backend, chosen by cfg(target_feature): │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ simd_amx.rs │ │simd_avx512.rs│ │ simd_avx2.rs │ │
│ │ Intel AMX │ │ AVX-512 base │ │ AVX-2 fallbk │ │
│ │ tiles + │ │ F32x16 / │ │ F32x8 / │ │
│ │ VNNI + │ │ U8x64 / ... │ │ F64x4 │ │
│ │ TDPBUSD / │ │ (mandatory │ │ (cfg-gated │ │
│ │ TDPBF16PS │ │ floor at │ │ when build │ │
│ │ via inline │ │ target-cpu= │ │ drops to │ │
│ │ asm (stable) │ │ x86-64-v4) │ │ x86-64-v3) │ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │ │ │ │
│ ├─ runtime-opt ──┤ │ │
│ │ (amx_available) │ │
│ │ compile-time │ │
│ │ cfg(avx2) │ │
│ │
│ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐ │
│ │ simd_neon.rs │ │ simd_wasm.rs │ │ (scalar) │ │
│ │ aarch64 │ │ wasm32-simd │ │ last resort │ │
│ │ │ │ │ │ INTERNAL to │ │
│ │ │ │ │ │ each backend│ │
│ └──────────────┘ └──────────────┘ └──────────────┘ │
│ │
│ hpc/simd_caps.rs — runtime capability struct │
│ hpc/amx_matmul.rs — Intel AMX tile primitives (tile_dpbusd / │
│ tile_dpbf16ps etc.) surfaced for callers │
│ that want explicit matmul routing │
└─────────────────────────────────────────────────────────────────┘
```

**Mandatory consumer rule:** only ever write `use ndarray::simd::…`.
The backend files are private implementation detail — they can be
reshuffled at any time (new `simd_avx512fp16.rs` shipped in a point
release, backends split per architecture, etc.) without breaking
consumers.

**Explicit AMX routing** (when the caller wants to force the tile
path rather than let `simd::*` infer it): the AMX sibling modules
(`ndarray::simd_amx::*` and `ndarray::hpc::amx_matmul::*`) are
**first-class canonical surfaces**, not backend reach. They're
named at the top level because Intel AMX needs explicit OS
enablement + XCR0 prctl on Linux + runtime `amx_available()`
gating that's orthogonal to compile-time cfg.

---

## MANDATORY `cargo clippy` + feature-matrix discipline

Every PR that touches `crates/*/src/` runs this full matrix before
being declared complete. `--features serve` alone is NOT enough
(learned the hard way at PR #238 when `--features grpc` and
`--features lab` silently broke after months of feature-drift).

```bash
# All four compile-and-warning-clean before commit:
cargo check # default
cargo check --manifest-path crates/<name>/Cargo.toml --features serve
cargo check --manifest-path crates/<name>/Cargo.toml --features grpc
cargo check --manifest-path crates/<name>/Cargo.toml --features lab

# Clippy WITH -D warnings (not just --no-deps); catches redundant
# closures, needless collects, manual Default impls, hidden type
# complexity, etc.:
cargo clippy --manifest-path crates/<name>/Cargo.toml --features lab -- -D warnings
cargo clippy --manifest-path crates/<name>/Cargo.toml --features serve -- -D warnings

# Full test under the widest feature set:
cargo test --manifest-path crates/<name>/Cargo.toml --features lab --lib

# Doc-tests (separate target; --lib skips them):
cargo test --manifest-path crates/<name>/Cargo.toml --features lab --doc
```

**Why `--lib` is not enough.** `cargo test` without `--lib` also runs
integration tests in `tests/` and the doc-tests embedded in `///`
comments. A doc comment that compiles as prose but fails as code
is a latent failure; doc-tests catch it. The `--doc` run is cheap
(seconds) and mandatory.

**Why `--features lab` is not enough.** The `lab` umbrella pulls in
everything but only exercises the union. `cargo check --features grpc`
ALONE still needs to work — downstream consumers that only want gRPC
(not REST) compile grpc-only; if wire.rs is `serve`-gated but grpc.rs
references it, the grpc-only build breaks silently.

**Fix pattern** (applied in PR #238 `_lab-dtos` internal feature):
when two features share a dep (serde / serde_json / base64 / bytemuck
used by both `serve` and `grpc`), factor into an internal feature:

```toml
[features]
_lab-dtos = ["dep:serde", "dep:serde_json", "dep:base64", "dep:bytemuck"]
serve = ["_lab-dtos", "dep:axum", "dep:tokio"]
grpc = ["_lab-dtos", "dep:prost", "dep:tonic", "dep:tonic-build", "dep:tokio"]
lab = ["serve", "grpc", "with-engine", "with-planner"]
```

And widen `pub mod wire` from `#[cfg(feature = "serve")]` to
`#[cfg(any(feature = "serve", feature = "grpc"))]` so both transports
see the shared DTOs.

**Reviewer trigger:** a PR whose description cites only
`--features serve` test results → request re-run across the full
matrix before approval. The matrix is a first-class part of the
contract, not an afterthought.

**Rust 1.95 transition note:** `mut ref` / `ref mut` in struct
pattern field shorthand are now feature-gated (were accidentally
stable through 1.94). When the toolchain pin bumps, grep both
`src/` trees:

```bash
grep -rn "mut ref\b\|ref mut\b" crates/*/src/
```

Zero hits today across `lance-graph/crates/` + `ndarray/src/`.
Stay that way.

---

## The 3-Way BindSpace Mutation Scheme
Expand Down
10 changes: 10 additions & 0 deletions crates/cognitive-shader-driver/src/wire.rs
Original file line number Diff line number Diff line change
Expand Up @@ -145,6 +145,7 @@ pub struct WireTensorsResponse {
/// object after ingress — per Rule F, there is no second deserialise anywhere
/// in the pipeline after the handler consumes the request.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireCalibrateRequest {
pub model_path: String,
pub tensor_name: String,
Expand Down Expand Up @@ -183,6 +184,7 @@ fn default_cal_iters() -> usize { 20 }
fn default_icc_samples() -> usize { 512 }

#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireCalibrateResponse {
pub tensor_name: String,
pub dims: Vec<u64>,
Expand Down Expand Up @@ -246,6 +248,7 @@ pub struct WireResidualSpec {
}

#[derive(Debug, Clone, PartialEq, Eq, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireCodecParams {
pub subspaces: u32,
pub centroids: u32,
Expand Down Expand Up @@ -348,6 +351,7 @@ impl TryFrom<WireCodecParams> for CodecParams {
// ═══════════════════════════════════════════════════════════════════════════

#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireTensorView {
/// [rows, cols] in elements (not bytes). Actual byte size inferred from lane_width.
pub shape: [u32; 2],
Expand Down Expand Up @@ -955,6 +959,7 @@ pub enum WireBaseline {
/// `top1_rate = 0.0` and `candidate_latency_us = 0`. D2.1–D2.3 land the
/// real decode-and-compare loop.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireTokenAgreement {
/// Model root directory (safetensors + config.json). Passed to
/// `auto_detect::detect` to infer lane width + architecture defaults
Expand All @@ -974,6 +979,7 @@ pub struct WireTokenAgreement {

/// `POST /v1/shader/token-agreement` response.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireTokenAgreementResult {
/// Top-1 token-match rate across the full prompt set. Pass gate: ≥ 0.99.
pub top1_rate: f32,
Expand Down Expand Up @@ -1049,6 +1055,7 @@ pub enum WireMeasure {
/// × |distances| × |lane_widths|. Clients SHOULD keep the product ≤ a few
/// hundred to fit in one JIT kernel cache warm-up round.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireSweepGrid {
#[serde(default = "default_subspaces_axis")]
pub subspaces: Vec<u32>,
Expand Down Expand Up @@ -1131,6 +1138,7 @@ impl WireSweepGrid {
/// `POST /v1/shader/sweep` request. Client submits one grid + a measure
/// set; server enumerates + calibrates + token-agreements each grid point.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireSweepRequest {
pub tensor_path: String,
pub grid: WireSweepGrid,
Expand All @@ -1156,6 +1164,7 @@ fn default_measure_set() -> Vec<WireMeasure> {
/// One grid-point result, streamed by the sweep handler. Carries the
/// candidate that produced it + optional per-measure payloads.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireSweepResult {
/// Zero-based grid index (0 .. grid.cardinality()).
pub grid_index: u32,
Expand All @@ -1179,6 +1188,7 @@ pub struct WireSweepResult {
/// `POST /v1/shader/sweep` response for batch (non-streaming) clients.
/// Streaming clients receive one `WireSweepResult` per SSE event instead.
#[derive(Debug, Clone, Serialize, Deserialize)]
#[non_exhaustive]
pub struct WireSweepResponse {
pub label: String,
pub cardinality: u32,
Expand Down
31 changes: 29 additions & 2 deletions scripts/codec_sweep.sh
Original file line number Diff line number Diff line change
Expand Up @@ -56,6 +56,33 @@ echo "$response" | jq '.'

echo
echo "=== Stub honesty check ==="
stub_flag=$(echo "$response" | jq '.results[0].stub // "no results"')
# Per EPIPHANIES.md 2026-04-20 "D0.2 stub flag is anti-#219 defense at
# the type level" — the check MUST fail the script (not just log) when
# the flag is absent or false. Until D2.2 lands real decode-and-compare,
# Phase 0/2 runs return stub:true. A non-stub response here means
# either the wrong endpoint was hit, the response was malformed, or
# (worst case) the server silently shipped non-stub code and this
# script is now pretending synthetic numbers are real.

stub_flag=$(echo "$response" | jq -r '.results[0].stub // "missing"')
echo "results[0].stub = $stub_flag"
echo "Expected: true (Phase 0 stub; D2.2 flips to false when real decode lands)."

case "$stub_flag" in
true)
echo "OK — Phase 0 stub honored. (D2.2 will flip this to false when real decode lands;"
echo " at that point, flip this check too.)"
;;
false)
echo "FAIL — results[0].stub is false but D2.2 has not landed." >&2
echo " This script refuses to treat non-stub output as real during Phase 0." >&2
echo " Either the server is running non-scaffold code (update this check)," >&2
echo " or the request hit the wrong endpoint / unexpected handler." >&2
exit 3
;;
*)
echo "FAIL — results[0].stub missing or unparseable (got: $stub_flag)." >&2
echo " Response may be malformed or an error payload." >&2
echo " Inspect the --- response --- section above." >&2
exit 3
;;
esac
Loading