Skip to content

Unified API: codec research + runbook + planner-via-DTO + OrchestrationBridge#221

Merged
AdaWorldAPI merged 5 commits into
mainfrom
claude/cam-pq-orchestration-unified
Apr 20, 2026
Merged

Unified API: codec research + runbook + planner-via-DTO + OrchestrationBridge#221
AdaWorldAPI merged 5 commits into
mainfrom
claude/cam-pq-orchestration-unified

Conversation

@AdaWorldAPI
Copy link
Copy Markdown
Owner

Summary

Follow-up to merged PR #220. Lands the unified-API wiring the review kept correcting toward:

  1. Codec research on the canonical endpoint/v1/shader/{tensors,calibrate,probe} REST + gRPC, same Wire DTOs, no parallel server on the planner.
  2. WireRunbook — scheduled test injection: POST one DTO, server executes a labeled sequence of steps (Tensors/Calibrate/Probe/Dispatch/Ingest/Plan).
  3. Planner-via-DTO/v1/shader/plan behind --features with-planner, Plan step in the runbook.
  4. impl OrchestrationBridge for PlannerAwareness — the canonical dedup trait finally has an implementation; consumers can route lg.* step-types without coupling to planner internals.
  5. cam-pq-unified-pipeline.md — knowledge doc mapping the existing lance-graph/src/cam_pq/* surface (storage/udf/ivf/jitson_kernel) and 5 integration gaps found during this PR.

Architectural corrections captured

  • cognitive-shader-driver IS the unified REST + gRPC API — not lance-graph-planner.
  • cam_pq/storage.rs already is the D3 Lance schema; cam_pq_calibrate (from D1+D2+D5: CAM-PQ calibration pipeline — honest negative result #220) writes a parallel raw format — migration queued.
  • register_cam_udfs exists but isn't called from datafusion_planner — 1-line fix queued.
  • OrchestrationBridge was documented in the contract since before this session but had no impl — now filled for StepDomain::LanceGraph.
  • planner_bridge.rs (introduced in this PR) duplicates OrchestrationBridge. Retirement queued.

New endpoints (cognitive-shader-driver)

Route Feature Purpose
POST /v1/shader/tensors serve List tensors with CamPq/Passthrough/Skip routes
POST /v1/shader/calibrate serve Train CAM-PQ, measure ICC + reconstruction
POST /v1/shader/probe serve ICC vs row-count degradation curve
POST /v1/shader/runbook serve Scheduled sequence of labeled steps
POST /v1/shader/plan serve + with-planner Plan via PlannerAwareness
rpc Tensors/Calibrate/Probe grpc gRPC mirror of REST research ops

DTO parameters (num_subspaces, num_centroids, kmeans_iterations, max_rows, icc_samples) drive codec research without recompilation.

New trait impl (lance-graph-planner)

impl OrchestrationBridge for PlannerAwareness {
    fn route(&self, step: &mut UnifiedStep) -> Result<(), OrchestrationError>;
    fn resolve_thinking(&self, style, inference_type) -> ThinkingContext;
    fn domain_available(&self, domain: StepDomain) -> bool;
}

Step-type routing: lg.plan_auto / lg.orchestrate / lg.health. Other domains return DomainUnavailable — consumers combine bridges via Vec<Box<dyn OrchestrationBridge>>.

Tests

Suite Count Status
lance-graph-contract 133 ✅ all pass
lance-graph-planner lib (incl. 5 new orchestration_impl) 167 ✅ all pass
cognitive-shader-driver lib with serve grpc with-planner 46 ✅ all pass

New wire tests: wire_plan_request_defaults, wire_plan_request_full_mode, wire_runbook_accepts_plan_step, wire_runbook_parses_mixed_steps.

Queued follow-ups (per cam-pq-unified-pipeline.md)

  1. Wire register_cam_udfs in datafusion_planner/mod.rs
  2. Migrate cam_pq_calibrate output to Lance schema via build_codebook_batch
  3. impl OrchestrationBridge for codec research (nd.* step-types)
  4. Retire planner_bridge.rs → shader-driver holds Box<dyn OrchestrationBridge>
  5. Collapse per-op Wire DTOs → UnifiedStep { step_type, args } + BridgeSlot

~1-2 days total, each step additive and independently verifiable.

Test plan

  • cargo test -p lance-graph-contract — 133/133
  • cargo test -p lance-graph-planner --lib — 167/167 (was 162, +5 new)
  • cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features "serve grpc with-planner" --lib — 46/46

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

claude added 5 commits April 20, 2026 10:10
cognitive-shader-driver IS the unified endpoint. Extends its canonical
Wire* DTO surface with 3 codec research operations — same endpoint,
same feature gates (`serve` / `grpc`), same DTOs for REST and gRPC.
No parallel API on the planner.

REST (behind `serve`):
  POST /v1/shader/tensors   — list tensors with route (CamPq/Passthrough/Skip)
  POST /v1/shader/calibrate — calibrate CAM-PQ, return ICC + reconstruction
  POST /v1/shader/probe     — ICC vs row-count degradation curve

gRPC (behind `grpc`):
  rpc Tensors / Calibrate / Probe — mirrors the REST with identical semantics.

DTOs drive parameters (num_subspaces, num_centroids, kmeans_iterations,
max_rows, icc_samples) so you encode / embed / measure without
recompiling. One running server drives every codec experiment.

Backing logic: crates/cognitive-shader-driver/src/codec_research.rs
reuses ndarray::hpc::{cam_pq, gguf, safetensors} + lance-graph-contract's
route_tensor. Zero reimplementation.

Follow-ups queued (separate PRs):
- WireJitCompile DTO: runtime codec-kernel swap via ndarray JitsonTemplate
  (ref: crates/lance-graph-planner/src/strategy/jit_compile.rs scaffold).
- Planner-via-DTO: expose plan_auto / plan_full / 16 strategies through
  the same shader-driver endpoint so the planner becomes remote-drivable.

42/42 shader-driver tests pass with --features "serve grpc".

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Adds POST /v1/shader/runbook — accepts a list of labeled steps, each
reusing an existing Wire* request type (Tensors / Calibrate / Probe /
Dispatch / Ingest). Server executes in order, returns matching results
with per-step label for correlation. `stop_on_error` flag controls
error semantics.

Use cases:
  - Inject a codec test suite from a script / notebook / CI in ONE POST
  - Replay a calibration protocol across many tensors
  - Seed BindSpace with ingests then dispatch queries in one round-trip

Example payload (JSON):
  {"label": "qwen3-tts full-size ICC sweep", "stop_on_error": false,
   "steps": [
     {"label": "inventory", "op": "Tensors", "args": {"model_path": "..."}},
     {"label": "calibrate", "op": "Calibrate", "args": {...}},
     {"label": "probe",     "op": "Probe",     "args": {...}}
   ]}

Architectural note: per THINKING_RECONCILIATION.md, the runbook's proper
home is a planner Sequence strategy (`lance-graph-planner`), not the
shader driver. This iteration executes inline on the shader-driver so
REST/gRPC test injection ships now; the planner-delegation refactor
(follow-up) swaps the handler body to call a Sequence strategy while
keeping the same DTO surface.

REST-only this iteration — gRPC Runbook RPC deferred because nested
oneofs (Dispatch | Ingest | Calibrate | ...) are awkward in proto3.
Will land with the planner Sequence-strategy wiring.

New test: wire_runbook_parses_mixed_steps validates the JSON shape for
a 3-step Tensors/Calibrate/Probe runbook. 43/43 lib tests pass with
--features "serve grpc".

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
… step

Follow-up per INTEGRATION_PLAN_CS.md §5-layer-stack Layer 4. The
unified shader-driver endpoint now delegates planning to
`lance-graph-planner::PlannerAwareness` through the same Wire* DTO
surface.

New DTOs (wire.rs):
  WirePlanRequest  — query, mode ("auto"|"full"), explicit strategies,
                     optional SituationInput for plan_full
  WirePlanResponse — strategies_used, free_will_modifier, compass_score,
                     mul_gate, thinking_style, nars_type, elapsed_ms
  WireRunbookStep::Plan(WirePlanRequest) — runbook can now mix planning
                     steps with codec research and dispatch/ingest steps

New route + runbook variant:
  POST /v1/shader/plan           — direct planning endpoint
  runbook step "op": "Plan"      — plan inside a sequenced runbook

Feature gate: `with-planner` (optional lance-graph-planner dep). Pattern
matches existing `with-engine` — heavy dep is opt-in; without it the
/v1/shader/plan endpoint returns 503 so URL shape stays stable.

Files:
  crates/cognitive-shader-driver/src/planner_bridge.rs  (NEW, 75 LOC)
  crates/cognitive-shader-driver/src/wire.rs            (+82 LOC: WirePlan* + Plan variant)
  crates/cognitive-shader-driver/src/serve.rs           (+90 LOC: plan_handler + runbook Plan routing)
  crates/cognitive-shader-driver/src/lib.rs             (+7 LOC: mod gate)
  crates/cognitive-shader-driver/Cargo.toml             (+with-planner feature + dep)

Example payload (POST /v1/shader/plan):
  {"query": "MATCH (n)-[:KNOWS]->(m) RETURN n, m",
   "mode": "full",
   "strategies": ["cypher_parse", "dp_join"],
   "situation": {"felt_competence": 0.8}}

Runbook payload mixing plan + calibrate:
  {"steps": [
    {"op": "Plan",      "args": {"query": "...", "mode": "auto"}},
    {"op": "Calibrate", "args": {"model_path": "...", "tensor_name": "..."}}
  ]}

46/46 lib tests pass with --features "serve grpc with-planner". 3 new
wire tests: wire_plan_request_defaults, wire_plan_request_full_mode,
wire_runbook_accepts_plan_step.

Next commit (same PR): `impl PlannerContract for PlannerAwareness` per
CONSUMER_WIRING_INSTRUCTIONS.md line 207 — shader-driver switches to
`Box<dyn PlannerContract>` so consumers share the trait without concrete
coupling to lance-graph-planner.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Fills the gap flagged by `check duplication via OrchestrationBridge`.
The contract module already defined `OrchestrationBridge` as THE
routing trait that replaces per-consumer bridge logic (crewai-rust
StepRouter, n8n-rs crew_router/ladybug_router, ladybug-rs
HybridEngine, and the `planner_bridge.rs` I introduced on the shader
driver). No implementation existed. Now `PlannerAwareness` implements
it for the `StepDomain::LanceGraph` domain.

Step-type routing table (prefix: `lg.`):
  lg.plan_auto    → plan_auto (reads query from step.reasoning)
  lg.orchestrate  → minimal ThinkingContext resolution
  lg.health       → domain availability + strategy count

Other domains (Crew/Ladybug/N8n/Ndarray) return `DomainUnavailable` —
consumers combine bridges via Vec<Box<dyn OrchestrationBridge>> and
route each step to whichever bridge reports `domain_available() = true`.

Type bridging:
  planner::ThinkingStyle (12)    → contract::ThinkingStyle (36)
  planner::NarsInferenceType (5) → contract::InferenceType (5)
Mapping follows THINKING_RECONCILIATION.md §Mapping-to-12-canonical.

Files:
  crates/lance-graph-planner/src/orchestration_impl.rs  NEW (+260 LOC,
                                                              5 tests)
  crates/lance-graph-planner/src/lib.rs                 +6 LOC mod gate
  crates/lance-graph-planner/Cargo.toml                 +contract dep

167/167 planner lib tests pass (was 162, +5 OrchestrationBridge tests).

Follow-up: retire cognitive-shader-driver/src/planner_bridge.rs in
favour of `Box<dyn OrchestrationBridge>` so shader-driver holds the
trait object and routes any `lg.*` / `nd.*` / `lb.*` step-type
uniformly. That refactor also lets us collapse the per-operation Wire
DTOs (WirePlanRequest / WireCalibrateRequest / WireProbeRequest) into
one UnifiedStep envelope + BridgeSlot payload per contract guidance.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Captures the insight the user kept correcting toward through this PR:
the full CAM-PQ pipeline (OrchestrationBridge → DataFusion → cam_pq
UDF/storage/IVF/JITSON) already EXISTS in code. The duplication I was
producing (CMPQ/CMFP raw format, per-op Wire DTOs, planner_bridge.rs)
is now visible against the canonical pieces.

Key findings recorded:
- lance-graph/src/cam_pq/storage.rs IS the D3 Lance format (already
  shipped); cam_pq_calibrate writes a parallel raw format
- cam_pq/udf.rs register_cam_udfs exists but is NOT called from
  datafusion_planner — one-line gap, unblocks SQL/Cypher CAM queries
- cam_pq/jitson_kernel.rs IS the runtime JIT codec calibration hook
  the user referenced
- OrchestrationBridge (now implemented for PlannerAwareness) is the
  canonical dedup trait; planner_bridge.rs duplicates it

5 concrete follow-ups queued with clear integration sequence:
1. Wire register_cam_udfs in datafusion_planner
2. Migrate cam_pq_calibrate output to Lance schema
3. OrchestrationBridge impl for codec research (nd.* step-types)
4. Retire planner_bridge.rs -> Box<dyn OrchestrationBridge>
5. Collapse per-op Wire DTOs to UnifiedStep + BridgeSlot

~1-2 days total, each step additive and independently verifiable.

File: .claude/knowledge/cam-pq-unified-pipeline.md (150 lines).

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
@AdaWorldAPI AdaWorldAPI merged commit a1f5423 into main Apr 20, 2026
AdaWorldAPI pushed a commit that referenced this pull request Apr 20, 2026
… I11 measurability

The prior "positive purpose" framing was too narrow (codec iteration
velocity). The actual architecture the lab surface buys is three-part:

  REST/gRPC API  — no rebuild per codec candidate
  Planner        — real dispatch path under test (not a toy bench)
  JIT            — swap kernels at runtime without relinking

Two loads share this stack; neither is secondary:

1. Codec certification. Reconstruction ICC on real safetensors is
   necessary but not sufficient — the cert gate is token agreement
   vs Passthrough on full decode. PR #219's 0.9998 was synthetic /
   overfit-on-training; PR #220's 0.195 was real-weight but still
   reconstruction-only. The next load-bearing measurement is the
   token-level comparison, which is only tractable on this stack.
   At 8-17 min/rebuild × ~200 codec invariants to tune, iteration
   without the API is infeasible.

2. Thinking harvest (the AGI magic bullet). The same API + Planner +
   JIT externalises the planner's 36-style / 13-verb / NARS trace.
   POST a Cypher query, get {rows, thinking_trace} back. The trace
   is log / replay / NARS-revise-able — which is the architectural
   shape of a system that learns its own meta-inference. This is
   the REST/Cypher injection path we can revive at near-zero cost
   now that PR #221 landed the REST/gRPC scaffolding.

I11 (new invariant): Measurable stack, not a black box. Every layer
(L0 ndarray → L4 planner) emits a harvest-ready trace through the
lab surface. Proposed changes that shrink trace for perf/simplicity
are rejected — the trace contract is what makes the feedback loop
mechanisable.

Also refined: Decision Procedure item 3 (codec research is a
legitimate positive use, not a grudging exception); rule-of-thumb
measurement order (reconstruction error → reconstruction ICC →
token agreement) with token agreement as the cert gate.

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants