BitRaptors · csacsi · May 23, 2026 · May 23, 2026 · May 23, 2026 · May 23, 2026
diff --git a/archie/assets/viewer/src/components/ReportSections.tsx b/archie/assets/viewer/src/components/ReportSections.tsx
diff --git a/archie/assets/viewer/src/pages/ReportPage.tsx b/archie/assets/viewer/src/pages/ReportPage.tsx
@@ -173,6 +173,7 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
       'guidelines',
       'communications',
       'components',
+      'data-models',
       'integrations',
       'technology',
       'deployment',
@@ -292,6 +293,10 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
   const stack = Array.isArray(technology.stack) ? technology.stack : []
   const runCommands = technology.run_commands || {}
   const deployment = bp.deployment || {}
+  const dataModels = Array.isArray(bp.data_models) ? bp.data_models : []
+  const persistenceStores = Array.isArray(bp.persistence_stores) ? bp.persistence_stores : []
+  const dataOverview = typeof bp.data_overview === 'string' ? bp.data_overview : ''
+  const hasDataSurface = dataModels.length > 0 || persistenceStores.length > 0
   const implementationGuidelines = [
     ...(bp.implementation_guidelines || []),
     ...(bp.decisions?.implementation_guidelines || []),
@@ -538,7 +543,7 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
           )}
 
           {/* Inventory */}
-          {(componentsList.length > 0 || stack.length > 0 || integrations.length > 0) && (
+          {(componentsList.length > 0 || stack.length > 0 || integrations.length > 0 || hasDataSurface) && (
             <div className="space-y-1">
               <p className="px-3 text-[10px] font-black uppercase tracking-[0.2em] text-ink/20 mb-4">Inventory</p>
               {componentsList.length > 0 && (
@@ -549,6 +554,14 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
                   label="Components"
                 />
               )}
+              {hasDataSurface && (
+                <NavButton
+                  active={activeSection === 'data-models'}
+                  onClick={() => scrollToSection('data-models')}
+                  icon={Database}
+                  label="Data Models"
+                />
+              )}
               {integrations.length > 0 && (
                 <NavButton
                   active={activeSection === 'integrations'}
@@ -840,6 +853,13 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
             </section>
           )}
 
+          {/* 10a. Data Models — persistence stores + per-model lifecycle */}
+          {hasDataSurface && (
+            <section id="data-models" className="scroll-mt-24">
+              <Sections.DataModelsSection models={dataModels} stores={persistenceStores} dataOverview={dataOverview} />
+            </section>
+          )}
+
           {/* 10b. Integrations — third-party services wired into the app */}
           {integrations.length > 0 && (
             <section id="integrations" className="scroll-mt-24">

diff --git a/archie/assets/workflow/deep-scan/steps/step-3-wave1/data-agent.md b/archie/assets/workflow/deep-scan/steps/step-3-wave1/data-agent.md
diff --git a/archie/assets/workflow/deep-scan/steps/step-3-wave1/orchestration.md b/archie/assets/workflow/deep-scan/steps/step-3-wave1/orchestration.md
@@ -47,12 +47,18 @@ Then skip to Step 4.
 
 ### If SCAN_MODE = "full" (default):
 
-Spawn 3–4 {{ANALYSIS_MODEL}} subagents in parallel, each focused on a different analytical concern. ALL agents read ALL source files under `$PROJECT_ROOT` — they are not split by directory. Each agent gets: the scan.json file_tree, dependencies, config files, and the GROUNDING RULES at the end of this step.
+Spawn 3–5 {{ANALYSIS_MODEL}} subagents in parallel, each focused on a different analytical concern. ALL agents read ALL source files under `$PROJECT_ROOT` — they are not split by directory. Each agent gets: the scan.json file_tree, dependencies, config files, and the GROUNDING RULES at the end of this step.
 
-**If `frontend_ratio` >= 0.20, spawn all 4 agents. Otherwise spawn only the first 3 (skip UI Layer).**
+**Conditional agents:**
+- **UI Layer** — spawn when `frontend_ratio >= 0.20`; otherwise skip.
+- **Data** — spawn when `has_persistence_signal == true` (set by scanner.py based on detected ORM deps, schema files, migrations dirs, mobile local-persistence APIs, or declared databases in the tech stack); otherwise skip.
+
+The first 3 agents (Structure, Patterns, Technology) always spawn. The Data and UI Layer agents spawn independently — a pure-frontend SPA with no persistence gets UI Layer but not Data; a headless backend service with a DB gets Data but not UI Layer; a full-stack app gets all 5.
 
 **Bulk content — off-limits for reading.** `scan.json.bulk_content_manifest` lists files classified by `.archiebulk` as "visible inventory, not contents": categories like `ui_resource` (Android `res/`, iOS storyboards), `generated`, `localization`, `migration`, `fixture`, `asset`, `lockfile`, `dependency`, `data`. Every agent below inherits this rule: **you may reference these paths by name and inventory counts, but you MUST NOT call Read on them.** The scanner has already summarized their shape. If a specific file is genuinely required to resolve a finding, read it surgically and note why — it is an exception, not the default.
 
+**Data agent exception (stated, not implicit):** the Data agent MAY surgically Read 1-2 recent migration files per persistence store to extract the observed `how_to_add` / `how_to_modify` procedure. This is the only blanket exception to the bulk-content rule and is bounded — enumeration of every migration in `migration` category is still forbidden.
+
 **Dispatching the sub-agents:**
 
 For each sub-agent below, Read the corresponding prompt file, then ALSO Read `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/grounding-rules.md`, and use the concatenated text (agent body + blank line + grounding rules body) as that sub-agent's prompt.
@@ -65,6 +71,7 @@ All paths are relative to the project root (your cwd).
 | Patterns | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/patterns-agent.md` | `.archie/tmp/archie_sub2_$PROJECT_NAME.json` | Always |
 | Technology | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/technology-agent.md` | `.archie/tmp/archie_sub3_$PROJECT_NAME.json` | Always |
 | UI Layer | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/ui-layer-agent.md` | `.archie/tmp/archie_sub4_$PROJECT_NAME.json` | Only when `frontend_ratio >= 0.20` |
+| Data | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/data-agent.md` | `.archie/tmp/archie_sub5_$PROJECT_NAME.json` | Only when `has_persistence_signal == true` |
 
 **Before dispatch — append the output contract to each sub-agent's prompt,
 substituting its output path from the table above as the "file path named
@@ -80,5 +87,15 @@ OUTPUT CONTRACT (mandatory):
 The merge step (Step 4) reads each agent's output file directly — do NOT
 copy or transcribe a subagent's output yourself.
 
-All four sub-agents run at the {{ANALYSIS_MODEL}} model. {{>dispatch_parallel}}
+All spawned sub-agents (3 always + UI Layer and/or Data as applicable) run at the {{ANALYSIS_MODEL}} model. {{>dispatch_parallel}}
+
+After the parallel dispatch returns, record per-agent counts for trend tracking. Each call no-ops gracefully when its source file is missing (skipped agent — sub5 absent when `has_persistence_signal == false`). Uses the standard `python3 -c …` form that both Claude and Codex auto-approve via the installer's command catalogue — no new permission rules needed:
+
+```bash
+DATA_COUNT=$(python3 -c "import json,os,sys; p=sys.argv[1]; print(len((json.load(open(p)).get('data_models') or [])) if os.path.exists(p) else 0)" .archie/tmp/archie_sub5_$PROJECT_NAME.json)
+python3 .archie/telemetry.py extra "$PROJECT_ROOT" wave1 data_models_count=$DATA_COUNT
+
+STORE_COUNT=$(python3 -c "import json,os,sys; p=sys.argv[1]; print(len((json.load(open(p)).get('persistence_stores') or [])) if os.path.exists(p) else 0)" .archie/tmp/archie_sub5_$PROJECT_NAME.json)
+python3 .archie/telemetry.py extra "$PROJECT_ROOT" wave1 persistence_stores_count=$STORE_COUNT
+```
 
diff --git a/archie/assets/workflow/deep-scan/steps/step-4-merge.md b/archie/assets/workflow/deep-scan/steps/step-4-merge.md
@@ -27,12 +27,13 @@ Step 4 is a **consumer step**: Step 3 already assigned each Wave 1 sub-agent
 its output path and appended the output contract to its prompt before
 dispatch. Each sub-agent has now written its file under `.archie/tmp/`.
 
-**Expected files** (skip UI Layer if it was not spawned — `frontend_ratio < 0.20`):
+**Expected files** (skip UI Layer when `frontend_ratio < 0.20`; skip Data when `has_persistence_signal == false`):
 
 - `.archie/tmp/archie_sub1_$PROJECT_NAME.json` (Structure)
 - `.archie/tmp/archie_sub2_$PROJECT_NAME.json` (Patterns)
 - `.archie/tmp/archie_sub3_$PROJECT_NAME.json` (Technology)
 - `.archie/tmp/archie_sub4_$PROJECT_NAME.json` (UI Layer, optional)
+- `.archie/tmp/archie_sub5_$PROJECT_NAME.json` (Data, optional)
 
 **If resuming via `--from` or `--continue`:** `.archie/tmp/` is workspace-relative
 so the files normally survive reboots, but an interrupted or `--from 4` run
@@ -43,9 +44,11 @@ missing — do NOT attempt to re-extract output from a subagent's transcript.
 Merge the files that exist:
 
 ```bash
-python3 .archie/merge.py "$PROJECT_ROOT" .archie/tmp/archie_sub1_$PROJECT_NAME.json .archie/tmp/archie_sub2_$PROJECT_NAME.json .archie/tmp/archie_sub3_$PROJECT_NAME.json .archie/tmp/archie_sub4_$PROJECT_NAME.json
+python3 .archie/merge.py "$PROJECT_ROOT" .archie/tmp/archie_sub1_$PROJECT_NAME.json .archie/tmp/archie_sub2_$PROJECT_NAME.json .archie/tmp/archie_sub3_$PROJECT_NAME.json .archie/tmp/archie_sub4_$PROJECT_NAME.json .archie/tmp/archie_sub5_$PROJECT_NAME.json
 ```
 
+`merge.py` warns and skips files that weren't produced (skipped agents) — listing all five paths unconditionally keeps the command stable across full / frontend-only / backend-only repos.
+
 This saves `$PROJECT_ROOT/.archie/blueprint_raw.json` (raw merged data). Verify the output shows non-zero component/section counts. If it says "0 sections, 0 components", the merge failed — check the agent output files.
 
 ```bash

diff --git a/archie/assets/workflow/deep-scan/steps/step-5-wave2-reasoning.md b/archie/assets/workflow/deep-scan/steps/step-5-wave2-reasoning.md
@@ -73,7 +73,7 @@ Wave 1 gathered facts: components, patterns, technology, deployment, UI layer. N
 
 Tell the Reasoning agent:
 
-> Read `$PROJECT_ROOT/.archie/blueprint_raw.json` — it contains the full analysis from Wave 1 agents: components, communication patterns, technology stack, deployment, frontend. It may also carry a top-level `findings` array holding **draft findings from Wave 1 agents** (for example, the Structure agent's workspace-level observations — cross-workspace cycles, monorepo constraint violations). Pick those drafts up and include them in your own findings output, upgrading them to canonical (fill `root_cause` and `fix_direction`, keep their `problem_statement`/`evidence`/`applies_to`, set `depth: "canonical"`, `source: "deep:synthesis"`). Also read `$PROJECT_ROOT/.archie/findings.json` **if it exists** — it is the accumulated findings store across every prior scan and deep-scan run, each entry shaped as `{id, problem_statement, evidence, root_cause, fix_direction, depth, source, ...}`. If the file is absent, proceed without it and produce findings from scratch. Also read key source files: entry points, main configs, core abstractions.
+> Read `$PROJECT_ROOT/.archie/blueprint_raw.json` — it contains the full analysis from Wave 1 agents: components, communication patterns, technology stack, deployment, frontend, **data models and persistence stores** (when the Data agent spawned). It may also carry a top-level `findings` array holding **draft findings from Wave 1 agents** — for example, the Structure agent's workspace-level observations (cross-workspace cycles, monorepo constraint violations) and the Data agent's data-shaped drafts (model referenced in code with no schema declaration, migration without model update, schema-vs-business-code mismatches). Pick those drafts up and include them in your own findings output, upgrading them to canonical (fill `root_cause` and `fix_direction`, keep their `problem_statement`/`evidence`/`applies_to`, set `depth: "canonical"`, `source: "deep:synthesis"`). Also read `$PROJECT_ROOT/.archie/findings.json` **if it exists** — it is the accumulated findings store across every prior scan and deep-scan run, each entry shaped as `{id, problem_statement, evidence, root_cause, fix_direction, depth, source, ...}`. If the file is absent, proceed without it and produce findings from scratch. Also read key source files: entry points, main configs, core abstractions.
 >
 > With the COMPLETE picture of what was built and how, produce deep architectural reasoning. You will upgrade any draft findings in the accumulated store, emit new findings you discover, AND emit pitfalls (classes of problem rooted in architectural decisions). Both findings and pitfalls share the same 4-field core (`problem_statement`, `evidence`, `root_cause`, `fix_direction`); pitfalls differ in altitude (class-of-problem, not instance) and ownership (blueprint-durable, not per-run).
 >
@@ -109,7 +109,9 @@ Tell the Reasoning agent:
 >
 > **Probe C — Seams:** locate every place designed for substitution or extension — abstract interfaces with multiple concrete implementations, registry- or config-driven dispatch, protocol boundaries, plugin surfaces, hook/callback systems. For each: what **varies** across the seam, what is held **stable**, and what is the **mechanism** for adding a new implementation.
 >
-> **Working from Wave 1 output.** Wave 1 has already read the codebase (skeletons and, where needed, source). Run each probe primarily against `blueprint_raw.json` — specifically the `communication`, `components`, and `technology` sections, plus any raw agent output captured there. The probes are synthesis questions over Wave 1's data, not a fresh read pass.
+> **Probe D — Data architecture** (run only when `blueprint_raw.data_models` is non-empty — i.e. the Data agent spawned): consolidate write fan-in (which components mutate each entity), read fan-out (which components read each entity), cross-entity coupling (entities that always change together — single migration touches both), store-role split (why a primary DB *and* a cache *and* a search index — what does each absorb), denormalization choices visible in the schema (same field stored on two entities, audit columns, soft-delete columns), and the model lifecycle observed by the Data agent (migration discipline, repository discipline). Each finding should name the specific models and stores from `blueprint_raw.data_models` / `blueprint_raw.persistence_stores`. Surfaces decisions the other probes miss ("we chose event sourcing for `AuditLog` because…", "we keep a denormalized `latest_order_total` on `User` to avoid a join on every dashboard load — the trade-off is dual-write discipline in `OrderService`").
+>
+> **Working from Wave 1 output.** Wave 1 has already read the codebase (skeletons and, where needed, source). Run each probe primarily against `blueprint_raw.json` — specifically the `communication`, `components`, `technology`, `data_models`, and `persistence_stores` sections, plus any raw agent output captured there. The probes are synthesis questions over Wave 1's data, not a fresh read pass.
 >
 > Only read source directly when Wave 1's output is genuinely insufficient to answer a probe (e.g., Wave 1 named a seam but didn't record what varies across it, or flagged a gate without naming the invariant). Judge file-by-file — no blanket re-read. If a signal clearly exists in the codebase but Wave 1's data is thin, prefer to record that as a gap in `pitfalls` (so the next scan catches more) over re-doing Wave 1's job here.
 >
@@ -159,6 +161,8 @@ Tell the Reasoning agent:
 >
 > **Novelty check for pitfalls.** If the blueprint already contains pitfalls, reuse their `id`s when you upgrade them (preserve `first_seen`, bump `confirmed_in_scan`). Before minting a new `pf_NNNN`, verify no existing pitfall covers the same class of problem — the store is durable across runs, and a pitfall that re-emerges under a new id loses its history. Spend your cognitive budget surfacing NEW classes of problem (architectural traps visible only from the whole-system view) rather than restating existing ones.
 >
+> **Data-shaped pitfall classes to look for actively** (when `blueprint_raw.data_models` is non-empty): **schema drift** (model field set ≠ migration set), **orphan FK** (referenced table absent or column type mismatch across models), **unbounded collection** (entity grows without retention strategy), **denormalized field drift** (same value stored in two places, only one mutation path updates both — root cause: missing dual-write discipline), **missing audit trail** (mutable entity with no `updated_at` / no event log), **migration-without-model-update** (or vice versa — schema and ORM declaration diverged), **N+1 query risk** (relationship without a documented batch-loading strategy). These are *classes* of problem; only emit them when the architectural conditions are present (e.g. emit denormalized-field-drift only when you can name two entities that share a column with no single owner). Each pitfall must trace to a decision or pattern visible in `blueprint_raw.data_models` or the lifecycle observations.
+>
 > Quality bar: only emit pitfalls whose `root_cause` traces to something visible in the blueprint (decision, pattern, component absence). Soft floor of 3; if fewer meet the bar, say so. If no new pitfalls emerged, say so explicitly rather than duplicating existing ones under new wording.
 >
 > Only describe problems grounded in actual code and observed decisions. Do NOT recommend alternatives the code doesn't use.