diff --git a/AGENTS.md b/AGENTS.md index 85830e8f..0346ac32 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -22,6 +22,7 @@ Use this routing before editing so the right package and tests get updated: | Agent runtime (clone, tools, prompts, container) | `agent/src/` (`pipeline.py`, `runner.py`, `config.py`, `hooks.py`, `policy.py`, `prompts/`, Dockerfile, etc.) | `agent/tests/`, `agent/README.md` for env/PAT | | Agent progress events (written to `TaskEventsTable` from the MicroVM; read by `bgagent watch`) | `agent/src/progress_writer.py`, `agent/src/pipeline.py` and `agent/src/runner.py` (integration points) | `agent/tests/test_progress_writer.py`; `cli/src/commands/watch.ts` for the consumer side | | User-facing or design prose | `docs/guides/`, `docs/design/` | Run **`mise //docs:sync`** or **`mise //docs:build`** (do not edit `docs/src/content/docs/` by hand) | +| Architecture decisions (ADRs) | `docs/decisions/` | Run **`mise //docs:sync`** after adding or editing an ADR | | Monorepo tasks, CI glue | Root `mise.toml`, `scripts/`, `.github/workflows/` | — | ### CDK handler tests (quick map) diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md index 8b87db96..9dc52c21 100644 --- a/CONTRIBUTING.md +++ b/CONTRIBUTING.md @@ -28,6 +28,7 @@ Guidelines: - If you change API types in `cdk/src/handlers/shared/types.ts`, update `cli/src/types.ts` to match. - If you change docs sources (`docs/guides/`, `docs/design/`), run `mise //docs:sync` so generated content stays in sync. - For significant features, add a design document to `docs/design/`. +- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](./docs/decisions/README.md)). ### 4. Commit diff --git a/docs/astro.config.mjs b/docs/astro.config.mjs index 6f086742..4bbe89e4 100644 --- a/docs/astro.config.mjs +++ b/docs/astro.config.mjs @@ -98,6 +98,11 @@ export default defineConfig({ { slug: 'architecture/repo-onboarding' }, ], }, + { + label: 'Decisions', + collapsed: true, + autogenerate: { directory: 'decisions' }, + }, { label: 'Roadmap', autogenerate: { directory: 'roadmap' }, diff --git a/docs/decisions/001-stacked-pull-requests.md b/docs/decisions/001-stacked-pull-requests.md new file mode 100644 index 00000000..0b4db726 --- /dev/null +++ b/docs/decisions/001-stacked-pull-requests.md @@ -0,0 +1,120 @@ +# ADR-001: Stacked pull requests for multi-PR features + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems: + +- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes. +- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains. +- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues. +- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation). + +The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high. + +## Decision + +Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules: + +### 1. Position statement + +Every PR description states its position: + +```markdown +## Stack position + +PR {N} for #{parent-issue} — {overall goal one-liner} + +### Prior: {what the previous PR delivered} +### This PR: {what this adds} +### Next (optional): {what comes next, if scope is known} +``` + +This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress. + +### 2. Branch targeting + +- PR 1 targets `main` +- PR N targets PR N-1's branch +- Final PR merges the full stack to `main` + +``` +main + └── feat/first-concern (PR 1) + └── feat/second-concern (PR 2) + └── feat/third-concern (PR 3 → merge to main) +``` + +### 3. Self-contained reviewability + +Each PR: +- Compiles and passes tests independently +- Can be deployed without breaking the system (see exception below) +- Has a single clear responsibility (one concern per PR) +- Does not leave dead code, TODOs, or broken intermediate states + +**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables. + +### 4. Size guidelines + +| Metric | Target | Maximum | +|--------|--------|---------| +| Lines changed | 200–400 | 600 | +| Review time | 20–30 min | 45 min | +| Files touched | 3–8 | 12 | + +If a PR exceeds these, decompose further. + +### 5. Rebase discipline + +When a lower PR changes after review feedback: +- All PRs above it in the stack must be rebased +- CI must pass on each PR independently after rebase +- Reviewers are notified of the rebase (GitHub does this automatically) + +### 6. Sub-issue linking + +- Parent issue lists all sub-issues with a stack visualization diagram +- Each sub-issue references the parent and its position in the stack +- GitHub's task list in the parent tracks completion +- Estimated review time is listed per sub-issue to help reviewers plan +- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup + +### 7. When NOT to use stacked PRs + +- Changes under ~200 lines that fit naturally in one PR +- Hotfixes that need immediate merge +- Dependency bumps (use Dependabot grouping instead) +- Documentation-only changes that are self-contained + +### 8. Merge semantics + +The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed: + +1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit --base main` or GitHub's "Retarget" button. +2. **Rebase** each retargeted PR onto its new base so the diff is clean. +3. **CI must pass** on each retargeted PR independently after rebase. + +After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception. + +**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains. + +## Consequences + +- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min) +- (+) Agents can pick up any sub-issue independently — the position statement provides full context +- (+) Partial delivery is meaningful — each merged PR adds value independently +- (+) Reviewers approve incrementally without needing full-stack mental context +- (+) Early PRs can merge and ship while later ones are still in review +- (-) Rebase cascades when early PRs receive feedback +- (-) More overhead in PR descriptions and branch management +- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1") +- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks + +## References + +- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs) +- RFC #120 — first formal use of this pattern in ABCA +- Issue #129 — implementation of this ADR diff --git a/docs/decisions/README.md b/docs/decisions/README.md new file mode 100644 index 00000000..05983c2f --- /dev/null +++ b/docs/decisions/README.md @@ -0,0 +1,69 @@ +# Architecture Decision Records (ADRs) + +This directory captures significant design decisions for the ABCA project. Each ADR explains **why** a decision was made — not just what was decided — so that future contributors (human and AI) can understand the reasoning without excavating git history or PR discussions. + +## When to write an ADR + +Write an ADR when a decision: + +- Affects multiple packages or the overall architecture +- Establishes a pattern other code will follow +- Is non-obvious — a reasonable person might choose differently +- Is hard to reverse once implemented + +Do **not** write an ADR for routine implementation choices that are self-evident from the code. + +## Template + +```markdown +# ADR-NNN: Title + +**Status:** proposed | accepted | superseded | deprecated +**Date:** YYYY-MM-DD +**Supersedes:** ADR-NNN (if applicable) +**Superseded by:** ADR-NNN (if applicable) + +## Context + +What is the problem or situation that requires a decision? Include constraints, requirements, and forces at play. + +## Decision + +What was decided and why. Be specific — name the approach chosen. + +## Consequences + +What follows from this decision: +- (+) Positive outcomes +- (-) Negative outcomes or trade-offs +- (!) Risks or things to watch + +## References + +- Links to RFCs, issues, PRs, or external resources that informed the decision +``` + +## Numbering + +ADRs are numbered sequentially with zero-padded three-digit prefixes: `001-slug.md`, `002-slug.md`, etc. Numbers are never reused. + +## Lifecycle + +| Status | Meaning | +|--------|---------| +| `proposed` | Under discussion, not yet binding | +| `accepted` | Active and authoritative | +| `superseded` | Replaced by a newer ADR (link to successor) | +| `deprecated` | No longer applicable (context changed) | + +A decision starts as `proposed` during RFC discussion and moves to `accepted` when the implementing PR merges. To change an accepted decision, write a new ADR that supersedes it — do not edit the original. + +## Relationship to `docs/design/` + +Design documents describe system shape, interfaces, and implementation detail. ADRs capture cross-cutting choices that constrain multiple designs. When a design decision is significant enough to be "hard to reverse" or "non-obvious," extract it as an ADR and reference it from the design doc. An ADR may supersede another ADR; a design doc is simply updated in place. + +## Discovery + +- **Agents:** `AGENTS.md` routes to this directory for understanding past design rationale. +- **Humans:** Browse this directory or the docs site under the "Decisions" section. +- **Search:** Each ADR title and context section are written to be grep-friendly. diff --git a/docs/scripts/sync-starlight.mjs b/docs/scripts/sync-starlight.mjs index 9cfbea24..96c080c2 100644 --- a/docs/scripts/sync-starlight.mjs +++ b/docs/scripts/sync-starlight.mjs @@ -262,5 +262,8 @@ mirrorMarkdownFile( // AGENTS.md reference for contributors. The rename happens only at the site level. mirrorDirectory(path.join(docsRoot, 'design'), path.join('src', 'content', 'docs', 'architecture')); +// --- Decision records (ADRs): mirror to decisions/ --- +mirrorDirectory(path.join(docsRoot, 'decisions'), path.join('src', 'content', 'docs', 'decisions')); + // Guardrail: ensure target tree exists when running in a clean checkout. fs.mkdirSync(targetRoot, { recursive: true }); diff --git a/docs/src/content/docs/decisions/001-stacked-pull-requests.md b/docs/src/content/docs/decisions/001-stacked-pull-requests.md new file mode 100644 index 00000000..ebcef43e --- /dev/null +++ b/docs/src/content/docs/decisions/001-stacked-pull-requests.md @@ -0,0 +1,124 @@ +--- +title: 001 stacked pull requests +--- + +# ADR-001: Stacked pull requests for multi-PR features + +**Status:** accepted +**Date:** 2026-05-19 + +## Context + +Complex features in ABCA often span multiple packages, resource types, and concerns. Delivering these as a single large PR creates several problems: + +- **Review fatigue:** PRs exceeding ~500 lines suffer from diminished reviewer attention — critical issues get missed in the noise of mechanical changes. +- **Context loss:** Without a framework, sequential PRs leave reviewers without knowledge of where they are in the overall delivery, what came before, or what remains. +- **Agent discoverability:** AI coding agents picking up a sub-task cannot determine the broader goal, prior decisions, or remaining work without reconstructing context from scattered commits and issues. +- **Blocked progress:** A single large PR blocks all progress until the entire feature is reviewed. Stalling on one concern (e.g., IAM review) blocks unrelated work (e.g., documentation). + +The [Pragmatic Engineer analysis of stacked diffs](https://newsletter.pragmaticengineer.com/p/stacked-diffs) documents how organizations (Meta, Google, Graphite users) use this pattern to maintain velocity on complex changes while keeping review quality high. + +## Decision + +Use **stacked pull requests** for features spanning multiple concerns or where review time and blast radius justify decomposition. The numeric thresholds below are guidelines — the primary signal is whether a single PR would exceed a reasonable review session, not file count alone. Each PR in the stack follows these rules: + +### 1. Position statement + +Every PR description states its position: + +```markdown +## Stack position + +PR {N} for #{parent-issue} — {overall goal one-liner} + +### Prior: {what the previous PR delivered} +### This PR: {what this adds} +### Next (optional): {what comes next, if scope is known} +``` + +This gives reviewers and agents immediate orientation. The "Next" section is optional — include it when the remaining scope is fixed and known; omit it when scope is still evolving. The parent issue is the source of truth for overall progress. + +### 2. Branch targeting + +- PR 1 targets `main` +- PR N targets PR N-1's branch +- Final PR merges the full stack to `main` + +``` +main + └── feat/first-concern (PR 1) + └── feat/second-concern (PR 2) + └── feat/third-concern (PR 3 → merge to main) +``` + +### 3. Self-contained reviewability + +Each PR: +- Compiles and passes tests independently +- Can be deployed without breaking the system (see exception below) +- Has a single clear responsibility (one concern per PR) +- Does not leave dead code, TODOs, or broken intermediate states + +**Infrastructure stack exception:** For multi-PR CDK/IAM changes where intermediate slices cannot deploy independently (e.g., a policy referencing a resource added in a later PR), the validation gate is **synth + tests passing** — not a successful deploy. In this case, designate a **deploy-gate PR** in the stack position block: the specific PR where the stack becomes end-to-end deployable. Acceptable intermediate states include feature-flagged resources, no-op stubs, and constructs gated behind context variables. + +### 4. Size guidelines + +| Metric | Target | Maximum | +|--------|--------|---------| +| Lines changed | 200–400 | 600 | +| Review time | 20–30 min | 45 min | +| Files touched | 3–8 | 12 | + +If a PR exceeds these, decompose further. + +### 5. Rebase discipline + +When a lower PR changes after review feedback: +- All PRs above it in the stack must be rebased +- CI must pass on each PR independently after rebase +- Reviewers are notified of the rebase (GitHub does this automatically) + +### 6. Sub-issue linking + +- Parent issue lists all sub-issues with a stack visualization diagram +- Each sub-issue references the parent and its position in the stack +- GitHub's task list in the parent tracks completion +- Estimated review time is listed per sub-issue to help reviewers plan +- Sub-issues use `blocked by #NNN` / `blocking #NNN` relationships to express dependency order — agents and reviewers can identify which issues are unblocked and ready for pickup + +### 7. When NOT to use stacked PRs + +- Changes under ~200 lines that fit naturally in one PR +- Hotfixes that need immediate merge +- Dependency bumps (use Dependabot grouping instead) +- Documentation-only changes that are self-contained + +### 8. Merge semantics + +The default topology is a **classic stack** — each PR targets its predecessor's branch. When an early PR merges to `main` before later PRs are reviewed: + +1. **Retarget** all PRs that pointed at the merged branch to `main` (or to the next unmerged predecessor). Use `gh pr edit --base main` or GitHub's "Retarget" button. +2. **Rebase** each retargeted PR onto its new base so the diff is clean. +3. **CI must pass** on each retargeted PR independently after rebase. + +After retargeting, the remaining PRs form a shorter stack rooted on `main`. This is the expected, normal path — not an exception. + +**When the stack diverges:** If review feedback on PR 2 invalidates assumptions in PRs 3+, prefer closing and re-opening the affected PRs over accumulating fixup commits that obscure intent. The parent issue remains the source of truth for what shipped and what remains. + +## Consequences + +- (+) Each PR stays in the "reviewable without fatigue" window (~15–40 min) +- (+) Agents can pick up any sub-issue independently — the position statement provides full context +- (+) Partial delivery is meaningful — each merged PR adds value independently +- (+) Reviewers approve incrementally without needing full-stack mental context +- (+) Early PRs can merge and ship while later ones are still in review +- (-) Rebase cascades when early PRs receive feedback +- (-) More overhead in PR descriptions and branch management +- (-) Requires discipline to keep each PR independently valid (no "this will be fixed in PR N+1") +- (!) If the stack grows beyond ~8 PRs, consider decomposing into independent sub-stacks + +## References + +- [Stacked Diffs — Pragmatic Engineer](https://newsletter.pragmaticengineer.com/p/stacked-diffs) +- RFC #120 — first formal use of this pattern in ABCA +- Issue #129 — implementation of this ADR diff --git a/docs/src/content/docs/decisions/Readme.md b/docs/src/content/docs/decisions/Readme.md new file mode 100644 index 00000000..a6e8d0af --- /dev/null +++ b/docs/src/content/docs/decisions/Readme.md @@ -0,0 +1,73 @@ +--- +title: Readme +--- + +# Architecture Decision Records (ADRs) + +This directory captures significant design decisions for the ABCA project. Each ADR explains **why** a decision was made — not just what was decided — so that future contributors (human and AI) can understand the reasoning without excavating git history or PR discussions. + +## When to write an ADR + +Write an ADR when a decision: + +- Affects multiple packages or the overall architecture +- Establishes a pattern other code will follow +- Is non-obvious — a reasonable person might choose differently +- Is hard to reverse once implemented + +Do **not** write an ADR for routine implementation choices that are self-evident from the code. + +## Template + +```markdown +# ADR-NNN: Title + +**Status:** proposed | accepted | superseded | deprecated +**Date:** YYYY-MM-DD +**Supersedes:** ADR-NNN (if applicable) +**Superseded by:** ADR-NNN (if applicable) + +## Context + +What is the problem or situation that requires a decision? Include constraints, requirements, and forces at play. + +## Decision + +What was decided and why. Be specific — name the approach chosen. + +## Consequences + +What follows from this decision: +- (+) Positive outcomes +- (-) Negative outcomes or trade-offs +- (!) Risks or things to watch + +## References + +- Links to RFCs, issues, PRs, or external resources that informed the decision +``` + +## Numbering + +ADRs are numbered sequentially with zero-padded three-digit prefixes: `001-slug.md`, `002-slug.md`, etc. Numbers are never reused. + +## Lifecycle + +| Status | Meaning | +|--------|---------| +| `proposed` | Under discussion, not yet binding | +| `accepted` | Active and authoritative | +| `superseded` | Replaced by a newer ADR (link to successor) | +| `deprecated` | No longer applicable (context changed) | + +A decision starts as `proposed` during RFC discussion and moves to `accepted` when the implementing PR merges. To change an accepted decision, write a new ADR that supersedes it — do not edit the original. + +## Relationship to `docs/design/` + +Design documents describe system shape, interfaces, and implementation detail. ADRs capture cross-cutting choices that constrain multiple designs. When a design decision is significant enough to be "hard to reverse" or "non-obvious," extract it as an ADR and reference it from the design doc. An ADR may supersede another ADR; a design doc is simply updated in place. + +## Discovery + +- **Agents:** `AGENTS.md` routes to this directory for understanding past design rationale. +- **Humans:** Browse this directory or the docs site under the "Decisions" section. +- **Search:** Each ADR title and context section are written to be grep-friendly. diff --git a/docs/src/content/docs/developer-guide/Contributing.md b/docs/src/content/docs/developer-guide/Contributing.md index 900da4ed..1d186a7d 100644 --- a/docs/src/content/docs/developer-guide/Contributing.md +++ b/docs/src/content/docs/developer-guide/Contributing.md @@ -32,6 +32,7 @@ Guidelines: - If you change API types in `cdk/src/handlers/shared/types.ts`, update `cli/src/types.ts` to match. - If you change docs sources (`docs/guides/`, `docs/design/`), run `mise //docs:sync` so generated content stays in sync. - For significant features, add a design document to `docs/design/`. +- For cross-cutting or hard-to-reverse decisions, add an ADR to `docs/decisions/` (see [ADR README](/architecture/readme)). ### 4. Commit