Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
d1eb7a3
feat(rules): add canonical kind taxonomy (9 values)
csacsi May 23, 2026
b0b2223
test(rules): make taxonomy tests robust to kind set changes
csacsi May 23, 2026
2cc2107
feat(rules): add classify_kind heuristic for backfill
csacsi May 23, 2026
e9c49db
fix(rules): guard classify_kind against empty pattern_name and violat…
csacsi May 23, 2026
bd37e7b
feat(rules): backfill_kinds CLI populates missing kind via classify_kind
csacsi May 23, 2026
24e1a62
fix(backfill): catch JSONDecodeError, trailing newline, cleanup tmp o…
csacsi May 23, 2026
f9ea0f6
feat(rules): map infrastructure_rules to new infrastructure kind
csacsi May 23, 2026
91515c9
refactor(migrate): explicit kind at both call sites; pin infra id prefix
csacsi May 23, 2026
b92a6cc
feat(prompt): step-6 rule synthesis taxonomy expanded to 9 kinds
csacsi May 23, 2026
98dda5d
feat(prompt): /archie-scan taxonomy matches step-6 (9 kinds)
csacsi May 23, 2026
72c7a67
fix(prompt): /archie-scan kind placeholder avoids anchoring bias
csacsi May 23, 2026
7a8ef05
chore(sync): mirror rule_kinds + backfill_kinds + updated prompts to …
csacsi May 23, 2026
6046c94
feat(deep-scan): add Wave 1 Data agent — inventory + per-model lifecycle
gbrbks May 27, 2026
0354284
feat(deep-scan): wire data_models into renderer, viewers, rules, drift
gbrbks May 27, 2026
e200da4
fix(deep-scan): neutralize Claude-only ARCHIE_PERMISSIONS reference i…
gbrbks May 27, 2026
592e1d2
fix(scanner): walk up to find .archiebulk for monorepo subpackage scans
gbrbks May 27, 2026
18ebd40
feat(data-agent): richer per-model schema — fields/guarantees/consume…
gbrbks May 28, 2026
f8dd95c
feat(data-agent): model/store descriptions, AI overview, reorder UI, …
gbrbks May 28, 2026
c585e7e
Merge origin/feature/data-agent into feature/newKinds
csacsi May 28, 2026
6248b12
feat(viewer): show rule kind badge next to severity_class
csacsi May 29, 2026
c497d71
feat(viewer): hover tooltip explaining each rule kind
csacsi May 29, 2026
bd18282
feat(viewer): drop severity_class badge from rule cards, keep kind
csacsi May 29, 2026
cd85b38
feat(viewer): kind distribution donut in Rules Management
csacsi May 29, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
442 changes: 439 additions & 3 deletions archie/assets/viewer/src/components/ReportSections.tsx

Large diffs are not rendered by default.

22 changes: 21 additions & 1 deletion archie/assets/viewer/src/pages/ReportPage.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,7 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
'guidelines',
'communications',
'components',
'data-models',
'integrations',
'technology',
'deployment',
Expand Down Expand Up @@ -292,6 +293,10 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
const stack = Array.isArray(technology.stack) ? technology.stack : []
const runCommands = technology.run_commands || {}
const deployment = bp.deployment || {}
const dataModels = Array.isArray(bp.data_models) ? bp.data_models : []
const persistenceStores = Array.isArray(bp.persistence_stores) ? bp.persistence_stores : []
const dataOverview = typeof bp.data_overview === 'string' ? bp.data_overview : ''
const hasDataSurface = dataModels.length > 0 || persistenceStores.length > 0
const implementationGuidelines = [
...(bp.implementation_guidelines || []),
...(bp.decisions?.implementation_guidelines || []),
Expand Down Expand Up @@ -538,7 +543,7 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
)}

{/* Inventory */}
{(componentsList.length > 0 || stack.length > 0 || integrations.length > 0) && (
{(componentsList.length > 0 || stack.length > 0 || integrations.length > 0 || hasDataSurface) && (
<div className="space-y-1">
<p className="px-3 text-[10px] font-black uppercase tracking-[0.2em] text-ink/20 mb-4">Inventory</p>
{componentsList.length > 0 && (
Expand All @@ -549,6 +554,14 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
label="Components"
/>
)}
{hasDataSurface && (
<NavButton
active={activeSection === 'data-models'}
onClick={() => scrollToSection('data-models')}
icon={Database}
label="Data Models"
/>
)}
{integrations.length > 0 && (
<NavButton
active={activeSection === 'integrations'}
Expand Down Expand Up @@ -840,6 +853,13 @@ export default function ReportPage({ bundle: bundleProp, createdAt: createdAtPro
</section>
)}

{/* 10a. Data Models — persistence stores + per-model lifecycle */}
{hasDataSurface && (
<section id="data-models" className="scroll-mt-24">
<Sections.DataModelsSection models={dataModels} stores={persistenceStores} dataOverview={dataOverview} />
</section>
)}

{/* 10b. Integrations — third-party services wired into the app */}
{integrations.length > 0 && (
<section id="integrations" className="scroll-mt-24">
Expand Down
232 changes: 232 additions & 0 deletions archie/assets/workflow/deep-scan/steps/step-3-wave1/data-agent.md

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -47,12 +47,18 @@ Then skip to Step 4.

### If SCAN_MODE = "full" (default):

Spawn 3–4 {{ANALYSIS_MODEL}} subagents in parallel, each focused on a different analytical concern. ALL agents read ALL source files under `$PROJECT_ROOT` — they are not split by directory. Each agent gets: the scan.json file_tree, dependencies, config files, and the GROUNDING RULES at the end of this step.
Spawn 3–5 {{ANALYSIS_MODEL}} subagents in parallel, each focused on a different analytical concern. ALL agents read ALL source files under `$PROJECT_ROOT` — they are not split by directory. Each agent gets: the scan.json file_tree, dependencies, config files, and the GROUNDING RULES at the end of this step.

**If `frontend_ratio` >= 0.20, spawn all 4 agents. Otherwise spawn only the first 3 (skip UI Layer).**
**Conditional agents:**
- **UI Layer** — spawn when `frontend_ratio >= 0.20`; otherwise skip.
- **Data** — spawn when `has_persistence_signal == true` (set by scanner.py based on detected ORM deps, schema files, migrations dirs, mobile local-persistence APIs, or declared databases in the tech stack); otherwise skip.

The first 3 agents (Structure, Patterns, Technology) always spawn. The Data and UI Layer agents spawn independently — a pure-frontend SPA with no persistence gets UI Layer but not Data; a headless backend service with a DB gets Data but not UI Layer; a full-stack app gets all 5.

**Bulk content — off-limits for reading.** `scan.json.bulk_content_manifest` lists files classified by `.archiebulk` as "visible inventory, not contents": categories like `ui_resource` (Android `res/`, iOS storyboards), `generated`, `localization`, `migration`, `fixture`, `asset`, `lockfile`, `dependency`, `data`. Every agent below inherits this rule: **you may reference these paths by name and inventory counts, but you MUST NOT call Read on them.** The scanner has already summarized their shape. If a specific file is genuinely required to resolve a finding, read it surgically and note why — it is an exception, not the default.

**Data agent exception (stated, not implicit):** the Data agent MAY surgically Read 1-2 recent migration files per persistence store to extract the observed `how_to_add` / `how_to_modify` procedure. This is the only blanket exception to the bulk-content rule and is bounded — enumeration of every migration in `migration` category is still forbidden.

**Dispatching the sub-agents:**

For each sub-agent below, Read the corresponding prompt file, then ALSO Read `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/grounding-rules.md`, and use the concatenated text (agent body + blank line + grounding rules body) as that sub-agent's prompt.
Expand All @@ -65,6 +71,7 @@ All paths are relative to the project root (your cwd).
| Patterns | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/patterns-agent.md` | `.archie/tmp/archie_sub2_$PROJECT_NAME.json` | Always |
| Technology | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/technology-agent.md` | `.archie/tmp/archie_sub3_$PROJECT_NAME.json` | Always |
| UI Layer | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/ui-layer-agent.md` | `.archie/tmp/archie_sub4_$PROJECT_NAME.json` | Only when `frontend_ratio >= 0.20` |
| Data | `{{WORKFLOW_ROOT}}/deep-scan/steps/step-3-wave1/data-agent.md` | `.archie/tmp/archie_sub5_$PROJECT_NAME.json` | Only when `has_persistence_signal == true` |

**Before dispatch — append the output contract to each sub-agent's prompt,
substituting its output path from the table above as the "file path named
Expand All @@ -80,5 +87,15 @@ OUTPUT CONTRACT (mandatory):
The merge step (Step 4) reads each agent's output file directly — do NOT
copy or transcribe a subagent's output yourself.

All four sub-agents run at the {{ANALYSIS_MODEL}} model. {{>dispatch_parallel}}
All spawned sub-agents (3 always + UI Layer and/or Data as applicable) run at the {{ANALYSIS_MODEL}} model. {{>dispatch_parallel}}

After the parallel dispatch returns, record per-agent counts for trend tracking. Each call no-ops gracefully when its source file is missing (skipped agent — sub5 absent when `has_persistence_signal == false`). Uses the standard `python3 -c …` form that both Claude and Codex auto-approve via the installer's command catalogue — no new permission rules needed:

```bash
DATA_COUNT=$(python3 -c "import json,os,sys; p=sys.argv[1]; print(len((json.load(open(p)).get('data_models') or [])) if os.path.exists(p) else 0)" .archie/tmp/archie_sub5_$PROJECT_NAME.json)
python3 .archie/telemetry.py extra "$PROJECT_ROOT" wave1 data_models_count=$DATA_COUNT

STORE_COUNT=$(python3 -c "import json,os,sys; p=sys.argv[1]; print(len((json.load(open(p)).get('persistence_stores') or [])) if os.path.exists(p) else 0)" .archie/tmp/archie_sub5_$PROJECT_NAME.json)
python3 .archie/telemetry.py extra "$PROJECT_ROOT" wave1 persistence_stores_count=$STORE_COUNT
```

7 changes: 5 additions & 2 deletions archie/assets/workflow/deep-scan/steps/step-4-merge.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,12 +27,13 @@ Step 4 is a **consumer step**: Step 3 already assigned each Wave 1 sub-agent
its output path and appended the output contract to its prompt before
dispatch. Each sub-agent has now written its file under `.archie/tmp/`.

**Expected files** (skip UI Layer if it was not spawned — `frontend_ratio < 0.20`):
**Expected files** (skip UI Layer when `frontend_ratio < 0.20`; skip Data when `has_persistence_signal == false`):

- `.archie/tmp/archie_sub1_$PROJECT_NAME.json` (Structure)
- `.archie/tmp/archie_sub2_$PROJECT_NAME.json` (Patterns)
- `.archie/tmp/archie_sub3_$PROJECT_NAME.json` (Technology)
- `.archie/tmp/archie_sub4_$PROJECT_NAME.json` (UI Layer, optional)
- `.archie/tmp/archie_sub5_$PROJECT_NAME.json` (Data, optional)

**If resuming via `--from` or `--continue`:** `.archie/tmp/` is workspace-relative
so the files normally survive reboots, but an interrupted or `--from 4` run
Expand All @@ -43,9 +44,11 @@ missing — do NOT attempt to re-extract output from a subagent's transcript.
Merge the files that exist:

```bash
python3 .archie/merge.py "$PROJECT_ROOT" .archie/tmp/archie_sub1_$PROJECT_NAME.json .archie/tmp/archie_sub2_$PROJECT_NAME.json .archie/tmp/archie_sub3_$PROJECT_NAME.json .archie/tmp/archie_sub4_$PROJECT_NAME.json
python3 .archie/merge.py "$PROJECT_ROOT" .archie/tmp/archie_sub1_$PROJECT_NAME.json .archie/tmp/archie_sub2_$PROJECT_NAME.json .archie/tmp/archie_sub3_$PROJECT_NAME.json .archie/tmp/archie_sub4_$PROJECT_NAME.json .archie/tmp/archie_sub5_$PROJECT_NAME.json
```

`merge.py` warns and skips files that weren't produced (skipped agents) — listing all five paths unconditionally keeps the command stable across full / frontend-only / backend-only repos.

This saves `$PROJECT_ROOT/.archie/blueprint_raw.json` (raw merged data). Verify the output shows non-zero component/section counts. If it says "0 sections, 0 components", the merge failed — check the agent output files.

```bash
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ Wave 1 gathered facts: components, patterns, technology, deployment, UI layer. N

Tell the Reasoning agent:

> Read `$PROJECT_ROOT/.archie/blueprint_raw.json` — it contains the full analysis from Wave 1 agents: components, communication patterns, technology stack, deployment, frontend. It may also carry a top-level `findings` array holding **draft findings from Wave 1 agents** (for example, the Structure agent's workspace-level observations — cross-workspace cycles, monorepo constraint violations). Pick those drafts up and include them in your own findings output, upgrading them to canonical (fill `root_cause` and `fix_direction`, keep their `problem_statement`/`evidence`/`applies_to`, set `depth: "canonical"`, `source: "deep:synthesis"`). Also read `$PROJECT_ROOT/.archie/findings.json` **if it exists** — it is the accumulated findings store across every prior scan and deep-scan run, each entry shaped as `{id, problem_statement, evidence, root_cause, fix_direction, depth, source, ...}`. If the file is absent, proceed without it and produce findings from scratch. Also read key source files: entry points, main configs, core abstractions.
> Read `$PROJECT_ROOT/.archie/blueprint_raw.json` — it contains the full analysis from Wave 1 agents: components, communication patterns, technology stack, deployment, frontend, **data models and persistence stores** (when the Data agent spawned). It may also carry a top-level `findings` array holding **draft findings from Wave 1 agents** — for example, the Structure agent's workspace-level observations (cross-workspace cycles, monorepo constraint violations) and the Data agent's data-shaped drafts (model referenced in code with no schema declaration, migration without model update, schema-vs-business-code mismatches). Pick those drafts up and include them in your own findings output, upgrading them to canonical (fill `root_cause` and `fix_direction`, keep their `problem_statement`/`evidence`/`applies_to`, set `depth: "canonical"`, `source: "deep:synthesis"`). Also read `$PROJECT_ROOT/.archie/findings.json` **if it exists** — it is the accumulated findings store across every prior scan and deep-scan run, each entry shaped as `{id, problem_statement, evidence, root_cause, fix_direction, depth, source, ...}`. If the file is absent, proceed without it and produce findings from scratch. Also read key source files: entry points, main configs, core abstractions.
>
> With the COMPLETE picture of what was built and how, produce deep architectural reasoning. You will upgrade any draft findings in the accumulated store, emit new findings you discover, AND emit pitfalls (classes of problem rooted in architectural decisions). Both findings and pitfalls share the same 4-field core (`problem_statement`, `evidence`, `root_cause`, `fix_direction`); pitfalls differ in altitude (class-of-problem, not instance) and ownership (blueprint-durable, not per-run).
>
Expand Down Expand Up @@ -109,7 +109,9 @@ Tell the Reasoning agent:
>
> **Probe C — Seams:** locate every place designed for substitution or extension — abstract interfaces with multiple concrete implementations, registry- or config-driven dispatch, protocol boundaries, plugin surfaces, hook/callback systems. For each: what **varies** across the seam, what is held **stable**, and what is the **mechanism** for adding a new implementation.
>
> **Working from Wave 1 output.** Wave 1 has already read the codebase (skeletons and, where needed, source). Run each probe primarily against `blueprint_raw.json` — specifically the `communication`, `components`, and `technology` sections, plus any raw agent output captured there. The probes are synthesis questions over Wave 1's data, not a fresh read pass.
> **Probe D — Data architecture** (run only when `blueprint_raw.data_models` is non-empty — i.e. the Data agent spawned): consolidate write fan-in (which components mutate each entity), read fan-out (which components read each entity), cross-entity coupling (entities that always change together — single migration touches both), store-role split (why a primary DB *and* a cache *and* a search index — what does each absorb), denormalization choices visible in the schema (same field stored on two entities, audit columns, soft-delete columns), and the model lifecycle observed by the Data agent (migration discipline, repository discipline). Each finding should name the specific models and stores from `blueprint_raw.data_models` / `blueprint_raw.persistence_stores`. Surfaces decisions the other probes miss ("we chose event sourcing for `AuditLog` because…", "we keep a denormalized `latest_order_total` on `User` to avoid a join on every dashboard load — the trade-off is dual-write discipline in `OrderService`").
>
> **Working from Wave 1 output.** Wave 1 has already read the codebase (skeletons and, where needed, source). Run each probe primarily against `blueprint_raw.json` — specifically the `communication`, `components`, `technology`, `data_models`, and `persistence_stores` sections, plus any raw agent output captured there. The probes are synthesis questions over Wave 1's data, not a fresh read pass.
>
> Only read source directly when Wave 1's output is genuinely insufficient to answer a probe (e.g., Wave 1 named a seam but didn't record what varies across it, or flagged a gate without naming the invariant). Judge file-by-file — no blanket re-read. If a signal clearly exists in the codebase but Wave 1's data is thin, prefer to record that as a gap in `pitfalls` (so the next scan catches more) over re-doing Wave 1's job here.
>
Expand Down Expand Up @@ -159,6 +161,8 @@ Tell the Reasoning agent:
>
> **Novelty check for pitfalls.** If the blueprint already contains pitfalls, reuse their `id`s when you upgrade them (preserve `first_seen`, bump `confirmed_in_scan`). Before minting a new `pf_NNNN`, verify no existing pitfall covers the same class of problem — the store is durable across runs, and a pitfall that re-emerges under a new id loses its history. Spend your cognitive budget surfacing NEW classes of problem (architectural traps visible only from the whole-system view) rather than restating existing ones.
>
> **Data-shaped pitfall classes to look for actively** (when `blueprint_raw.data_models` is non-empty): **schema drift** (model field set ≠ migration set), **orphan FK** (referenced table absent or column type mismatch across models), **unbounded collection** (entity grows without retention strategy), **denormalized field drift** (same value stored in two places, only one mutation path updates both — root cause: missing dual-write discipline), **missing audit trail** (mutable entity with no `updated_at` / no event log), **migration-without-model-update** (or vice versa — schema and ORM declaration diverged), **N+1 query risk** (relationship without a documented batch-loading strategy). These are *classes* of problem; only emit them when the architectural conditions are present (e.g. emit denormalized-field-drift only when you can name two entities that share a column with no single owner). Each pitfall must trace to a decision or pattern visible in `blueprint_raw.data_models` or the lifecycle observations.
>
> Quality bar: only emit pitfalls whose `root_cause` traces to something visible in the blueprint (decision, pattern, component absence). Soft floor of 3; if fewer meet the bar, say so. If no new pitfalls emerged, say so explicitly rather than duplicating existing ones under new wording.
>
> Only describe problems grounded in actual code and observed decisions. Do NOT recommend alternatives the code doesn't use.
Expand Down
Loading
Loading