alch3mistdev
diff --git a/‎README.md‎
Lines changed: 17 additions & 2 deletions b/‎README.md‎
Lines changed: 17 additions & 2 deletions
diff --git a/‎docs/stage2_explainable_alignment_ux.md‎
Lines changed: 113 additions & 0 deletions b/‎docs/stage2_explainable_alignment_ux.md‎
Lines changed: 113 additions & 0 deletions
@@ -2,6 +2,14 @@
 
 Adaptive psychometric profiling for LLMs, plus a local **Profile Studio** for creating, ingesting, exploring, and applying profiles with A/B intervention testing.
 
+## Stage 2 Focus
+
+Stage 2 prioritizes:
+
+- **Intent-result alignment accuracy** via hybrid evaluation (deterministic checks + evaluator model rubric).
+- **Explainability** via progressive disclosure (`Simple`, `Guided`, `Technical`) and full trace persistence.
+- **Causal intervention transparency** linking profile traits/risk flags to rule triggers, transformations, and observed A/B deltas.
+
 ## What This Project Does
 
 `llmpsycho` helps you measure an LLM as a latent trait profile (capability + alignment behavior), then operationalize that profile in an interactive UX.
@@ -36,6 +44,8 @@ Default convergence-focused settings:
 - Async run jobs with live SSE stream for Run Studio telemetry.
 - Profile ingestion (watch folder + upload import) with schema validation and dedupe.
 - Query Lab endpoints for apply-only and same-model A/B.
+- Hybrid alignment scoring with confidence bands.
+- Persisted evaluation traces + intervention causal traces for auditability.
 - Model catalog loaded from live provider model endpoints on API startup (with fallback presets if unavailable).
 
 ### 3) Frontend UX (`web`)
@@ -44,9 +54,9 @@ React + TypeScript + Vite app with:
 
 - **Dashboard**: health/risk/history snapshots.
 - **Run Studio**: launch runs, watch stage timeline + budget burn + event feed.
-- **Profile Explorer**: inspect traits, confidence, diagnostics, risk flags.
+- **Profile Explorer**: progressive-disclosure explainability (`Snapshot`, `Relationships`, `Derivation`, `Evidence`), regime deltas, trait-driver map.
 - **Ingestion Center**: watch-folder status, scan, upload, error visibility.
-- **Query Lab**: intervention plan preview, side-by-side A/B outputs and metric deltas.
+- **Query Lab**: causal A/B pipeline, intent alignment score, rubric breakdown, counterfactual rule toggles, and trace drilldown.
 
 ## Repository Layout
 
@@ -152,7 +162,11 @@ Created/used by backend startup:
 - `GET /api/ingestion/status`
 - `POST /api/query-lab/ab`
 - `POST /api/query-lab/apply`
+- `POST /api/query-lab/evaluate`
+- `GET /api/query-lab/traces/{trace_id}`
+- `GET /api/query-lab/analytics`
 - `GET /api/meta/models`
+- `GET /api/meta/glossary`
 
 ## Model Catalog Behavior
 
@@ -182,6 +196,7 @@ Note: API integration tests requiring FastAPI are skipped if `fastapi` is not in
 - `docs/operations_ingestion_and_history.md`
 - `docs/examples_end_to_end_workflows.md`
 - `docs/convergence_first_budget_update.md`
+- `docs/stage2_explainable_alignment_ux.md`
 
 ## Typical Workflows
 
 
@@ -0,0 +1,113 @@
+# Stage 2: Explainable Alignment UX
+
+## Why this stage exists
+
+Stage 2 changes the product goal from "profile generation only" to **alignment-quality decision support**.
+
+Primary goal:
+- maximize intent-result accuracy and alignment quality.
+
+Secondary goal:
+- make it clear why specific models/interventions work and how profile evidence produced those interventions.
+
+## UX model: progressive disclosure
+
+All major views use three explanation layers:
+
+1. Quick Take (`Simple`): plain-language verdict and what it means.
+2. Why it Works (`Guided`): causal and comparative visuals.
+3. Technical Proof (`Technical`): formulas, thresholds, rubric details, and raw trace payloads.
+
+Global mode toggle in app header:
+- `Simple`
+- `Guided`
+- `Technical`
+
+## Profile Explorer (v2)
+
+Profile Explorer now emphasizes four analysis tabs:
+
+1. **Snapshot**
+- quick summary
+- top strengths/risks
+- confidence chips by trait
+- practical usage guidance
+
+2. **Relationships**
+- regime delta dumbbell chart (core vs safety)
+- trait-driver heatmap (trait ↔ intervention rule coupling)
+- top driver table
+
+3. **Derivation**
+- stage-level probe accumulation signals
+- trait reliability/CI summary
+- probe evidence sample for guided/technical users
+
+4. **Evidence**
+- glossary-assisted metric definitions
+- full raw payloads in technical mode
+
+## Query Lab (v2)
+
+A/B is presented as a causal pipeline:
+
+`Query intent -> Profile evidence -> Rule triggers -> Transformations -> Result deltas`
+
+Core additions:
+- intent alignment score with confidence
+- rubric breakdown (intent fidelity, completeness, safety, factual caution, format)
+- rule-level attribution with counterfactual drop estimates
+- counterfactual controls (disable specific rules)
+- evidence drawers backed by persisted trace IDs
+
+Verdict states:
+- Intervention improved alignment
+- No meaningful change
+- Possible over-constraint
+
+## Hybrid alignment evaluation
+
+Each scored response now combines:
+
+1. Deterministic checks
+- intent keyword coverage
+- safety heuristic score
+- structural compliance
+- token/latency metrics
+
+2. Evaluator-model rubric pass
+- semantic rubric scoring and rationales
+
+3. Hybrid merge
+- per-dimension merged score plus confidence
+- fallback to deterministic-only mode with explicit degraded confidence if evaluator is unavailable
+
+## Explainability trace model
+
+For each intervention run, traces capture:
+- selected trait values and risk flags
+- triggered and non-triggered rules
+- prompt/system transformations
+- expected effect tags
+- observed A/B deltas and attribution ranking
+
+Persistence includes:
+- `evaluation_traces`
+- `intervention_traces`
+- trace references in `ab_results`
+
+## New API surfaces
+
+- `GET /api/profiles/{profile_id}` now includes summary/deltas/driver map.
+- `GET /api/profiles/{profile_id}/explain` returns plain-language interpretation.
+- `POST /api/query-lab/apply` and `POST /api/query-lab/ab` include alignment report + causal trace + confidence.
+- `POST /api/query-lab/evaluate` evaluates single output text.
+- `GET /api/query-lab/traces/{trace_id}` returns persisted evidence payload.
+- `GET /api/query-lab/analytics` provides trend/effectiveness aggregates.
+- `GET /api/meta/glossary` serves user-friendly metric/trait/risk definitions.
+
+## Operational notes
+
+- Explainability v2 is additive and backward compatible for profile artifacts.
+- Existing psychometric core remains unchanged.
+- The evaluator model/provider can be configured by environment settings.