diff --git a/.agents/skills/writing-tech-post/SKILL.md b/.agents/skills/writing-tech-post/SKILL.md new file mode 100644 index 000000000..629beec79 --- /dev/null +++ b/.agents/skills/writing-tech-post/SKILL.md @@ -0,0 +1,229 @@ +--- +name: writing-tech-post +description: >- + Authors engineering blog posts end-to-end: launch deep-dives, incident + postmortems, architecture migrations, performance case studies, tutorials, + AI/agent system writeups, security disclosures, and research-to-product + translations. Picks the correct archetype, plans the abstraction ladder, + enforces an evidence cadence (diagrams, benchmarks, profiles, traces, code, + ablations), tunes voice against publisher house styles (Datadog, Vercel, + GitHub, AWS, Meta, Cloudflare, Jane Street), and runs a pre-publish gate for + narrative momentum and disclosure ethics. Use when drafting a new engineering + post, restructuring a draft that feels flat, deciding which evidence form + belongs where, validating that depth and product context are balanced, or + preparing a postmortem, migration, or performance narrative for external + publication. Do not use for API reference documentation, README authoring, + marketing copy, release notes, generic SEO content, ghost-written executive + thought leadership, or non-engineering long-form essays. +metadata: + author: Pedro Nauck + github: https://github.com/pedronauck + repository: https://github.com/pedronauck/skills +--- +# Writing Engineering Posts + +SOTA authoring loop for technical blog posts. The philosophy is **archetype-first** (pick the genre), **depth-second** (commit the abstraction ladder), **evidence-third** (every claim attaches to an artifact), **voice & disclosure fourth** (publisher register + ethics), **momentum last** (narrative spine, lede, closer). Inline content in this SKILL.md is a dispatcher; the contract lives in `references/`. + +## Required Reading Router + +Match the post's archetype (or the active phase) to the row. Read the listed files **in full before** producing the corresponding output. Inline content in this SKILL.md is a pointer, not a substitute. + +| Task / Archetype | MUST read | +| --------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------ | +| Selecting an archetype (Phase 1, every post) | `references/archetypes-and-structure.md` | +| Planning depth / abstraction ladder (Phase 2, every post) | `references/depth-and-abstraction.md` | +| Writing an incident postmortem | `references/postmortems.md` + `references/voice-and-disclosure.md` | +| Writing an architecture migration narrative | `references/migrations.md` + `references/depth-and-abstraction.md` | +| Writing a performance deep-dive | `references/performance-deep-dive.md` + `references/evidence-diagrams-code.md` | +| Writing an AI/agent system post | `references/ai-and-agents.md` + `references/evidence-diagrams-code.md` | +| Writing a security or reliability post | `references/security-and-reliability.md` + `references/voice-and-disclosure.md` | +| Writing a tutorial or research-to-product translation | `references/archetypes-and-structure.md` + `references/depth-and-abstraction.md` | +| Placing diagrams, charts, code, tables, ablations (Phase 3) | `references/evidence-diagrams-code.md` | +| Tuning to a publisher house voice (Phase 4) | `references/voice-and-disclosure.md` + `references/publisher-voice-matrix.md` | +| Tightening lede, headlines, transitions, closer (Phase 5) | `references/narrative-and-pacing.md` | +| Pre-publish gate (Phase 6) | `references/pre-publish-checklist.md` + `references/anti-patterns.md` | + +## Reference Index + +- `references/archetypes-and-structure.md` — Eight canonical archetypes (launch deep-dive, postmortem, migration, performance, tutorial, research-to-product translation, AI/agent, security/reliability) with their 9-column contract (opening / sequence / closing / length / byline / evidence / voice / hybrid partner / silent failure mode) plus the 9-question decision tree and hybrid-disclosure rules. +- `references/depth-and-abstraction.md` — Five-rung abstraction ladder (R1 user → R5 measurement) with three traversal patterns (staircase, yo-yo, spiral), the rung-whiplash diagnostic, the anchor-N rule, and per-archetype default depth profiles. +- `references/postmortems.md` — Canonical postmortem section sequence (Summary → Background → Incident → Timeline → Contributing factors → Mitigation → Action items), blameless register three rules, UTC-timeline + named-artifact obligations, failed-mitigation requirement. +- `references/migrations.md` — Five migration sub-types (rewrite, decomposition, stack replacement, pipeline replacement, component modernization, dual-stack, maturity-level), seven-panel canonical structure, empathetic strategic voice, "what we'd do differently" honesty, dated-status-snapshot closing. +- `references/performance-deep-dive.md` — Detective-arc structure (hypothesis → measurement → reveal → next bottleneck), distribution-shift evidence contract (percentile + sample size + window + environment), iterative bottleneck-peeling, partial-victory disclosure between fixes. +- `references/ai-and-agents.md` — Paper-link-first attribution, named-benchmark + ablation evidence contract, named-checker idiom (debugging-agent + leakage-checker + usage-checker), "AI handles the long tail" closing motif and its slop variant (bolted-on AI-future paragraphs). +- `references/security-and-reliability.md` — Threat-model opening, layered-defense walkthrough, coordinated-disclosure four-panel (threat-model → background → defense → mitigations), probabilistic register for adversary capabilities, CVE + upstream-commit citation contract. +- `references/evidence-diagrams-code.md` — Twelve-form evidence taxonomy (architecture diagram, sequence/flow, timeline, dashboard, distribution chart, flamegraph/profile, trace view, results table, code listing, config/manifest, ablation matrix, benchmark plot), captioning conventions (caption states the finding), `claim → artifact → reading` cadence rule, code-length thresholds (1–15 / 16–30 / >30). +- `references/voice-and-disclosure.md` — Blameless register, coordinated-disclosure four-panel, paper-link-first attribution, "what we'd do differently" honesty pattern, vendor-naming conventions, charity toward predecessor systems, disclaimer-paragraph genre signal. +- `references/narrative-and-pacing.md` — Five-lede taxonomy (result-first / mystery / stakes-first / 3-W summary / paper-link-first / shipping-status), H2-as-question-resolution discipline, story-shape catalogue (detective / migration / blameless / paper-link-first / tutorial arcs), closer multiple-choice gate, first-200 + last-200 callback-coupling test. +- `references/publisher-voice-matrix.md` — Cross-publisher matrix (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street) by six surface features (byline weight, sentence length, vendor density, evidence reflex, opening register, closing register), plus banned moves per publisher. +- `references/anti-patterns.md` — Banned moves catalogue: buried lede, rung whiplash, evidence-free percent claims, blame-by-implication, code-without-context, depth-without-product-framing, paper-name-dropping, false-precision metrics, hedged ledes, exciting-announcement templates, coordinated-disclosure violations, AI-eval-as-anecdote, archetype-bait headlines, framework-without-instance, bolted-on AI-future close. +- `references/pre-publish-checklist.md` — Archetype-conditional checklist (postmortem-only / migration-only / performance-only / AI-only / security-only rows), disclosure-blocker list, lint-thresholds, publishable / hold-for-review / rework rubric. + +## Bundled Path Rule + +Resolve every bundled helper relative to the directory containing this `SKILL.md`. When the command below appears as `scripts/`, treat the actual invocation as `/scripts/` — expand `` to the absolute skill directory before running. Likewise expand `assets/` and `references/` to the bundled paths under the skill directory. + +## Operating Loop (6 Phases) + +Each phase ends in a STOP directive. The inline content in each phase is a gist tripwire — enough to detect violations during scanning — not the contract. The reference file holds the contract. + +### Phase 1 — Archetype Selection & Audience Framing + +1. Identify the artifact's purpose against the eight archetypes (launch deep-dive, incident postmortem, architecture migration, performance case study, developer tutorial, research-to-product translation, AI/agent system writeup, security/reliability post). When the post straddles two, name the primary as load-bearing and the secondary as the absorbed archetype; the primary's contract wins. +2. Name the audience by abstraction rung target (`product-user` / `engineer-adopter` / `peer-engineer-deep` / `infra-or-research-peer`). The lede must speak to that rung. +3. Capture archetype + audience in the draft's frontmatter before any prose. Treat this commitment as a contract — every later phase derives its rules from it. + +Gist tripwires (the moves the archetype demands, in one line each): + +- Postmortems open with **date + scope + impact in the first two sentences**, never internal cause. +- Migrations open with **empathetic legacy framing + why-now**, never contemptuous-of-predecessor. +- Performance posts open with **felt experience + stakes**, never a headline number. +- Launches open with **scale-then-headline-number** and land the number above the first H2. +- AI/agent posts open with **one-paragraph capability claim + paper/repo link in the first scroll**. +- Security posts open with **threat-model + adversary capability**, never the fix. + +**STOP. Read `references/archetypes-and-structure.md` in full before drafting the outline.** That file holds the 9-column contract per archetype (opening / sequence / closing / length / byline / evidence / voice / hybrid partner / silent failure mode), the 9-question decision tree, the archetype-straddle rules, and the per-archetype silent failure modes. The tripwires above are pointers, not the contract. + +### Phase 2 — Outline & Depth Planning + +1. Sketch every section and tag it with its abstraction rung: **R1 (product / user experience) → R2 (system shape) → R3 (component design) → R4 (mechanism / algorithm) → R5 (measurement / proof)**. +2. Commit a four-tuple `(opening rung, body residency band, closing rung, traversal)`. Pick the traversal: *staircase* (R1→R5 monotonic), *yo-yo* (R3↔R5 oscillation around a probe), or *spiral* (R1→R3→R1→R4→R1→R5, restating product context between depth dives). +3. Load the archetype's outline skeleton from `assets/outline..md`. Mark sections that the archetype requires versus optional. +4. Apply the **anchor-N rule**: if more than four consecutive R4/R5 paragraphs appear without surfacing to R2/R3, insert an anchor paragraph or open a new section. + +Gist tripwires: + +- Every section gets a rung tag (R1–R5); jumping >2 rungs without an anchor is **rung whiplash**. +- Tutorials live at R1–R3; deep-dives live at R3–R5; both still need an **R1 anchor in the lede**. +- Vendor names appear at **R5 (implementation)**, never at R1/R2 (framing). + +**STOP. Read `references/depth-and-abstraction.md` in full before fixing the outline.** That file holds the rung definitions with corpus examples, the three traversal patterns with worked walkthroughs, the rung-whiplash diagnostic, and the per-archetype default depth profile. **Also STOP and read the archetype's deep-dive reference file** (`references/postmortems.md`, `migrations.md`, `performance-deep-dive.md`, `ai-and-agents.md`, or `security-and-reliability.md`) when the archetype matches one of those five. The outline skeleton in `assets/` is a starting point, not a contract. + +### Phase 3 — Draft with Evidence Cadence + +1. For each outlined section, pre-declare the evidence form it will carry (one of: architecture diagram, sequence/flow diagram, timeline, dashboard screenshot, distribution/percentile plot, flamegraph/profile, trace/span view, table-of-results, code listing, config/manifest, ablation matrix, named-benchmark plot). Sections without evidence stay prose-only; flag them explicitly. +2. Enforce the **`claim → artifact → reading` cadence**: every artifact is preceded by a prose claim and followed by a prose reading. Captions state the *finding*, never the artifact name. +3. Apply the per-archetype mandatory evidence-form set (postmortems require UTC timeline + named-artifact root cause; migrations require phase-completion dates + per-phase tracking metrics + named cutover safety mechanism; performance requires distribution-shift charts not means; AI requires named benchmark + ablation + named guardrails; security requires CVE + upstream commit + coordinated-disclosure timeline). +4. Write code blocks at the right rung: tutorial code is runnable end-to-end with prerequisites stated; deep-dive code is the **minimal disclosing slice with elision markers** (`// ...` patterns and a one-line note of what was cut). Warn when a snippet crosses 30 lines — split with prose, elide, or replace with a source link plus a load-bearing slice. + +Gist tripwires: + +- Every figure caption states the **finding**, not the artifact name. Captions starting with "Figure showing…" or "Diagram of…" are rejected. +- Every percent claim points to a chart, table, or named benchmark. Mean-only performance charts are rejected. +- Code blocks declare their **elision marker** and what was cut. + +**STOP. Read `references/evidence-diagrams-code.md` in full before placing any chart, diagram, code block, or table.** That file holds the twelve-form evidence taxonomy with selection guide, the captioning conventions, the prose↔evidence cadence rule, the elision-marker code standard, the distribution-shift evidence pattern (mandatory for any performance claim), the named-benchmark + ablation contract (mandatory for AI/agent capability claims), and the alt-text-as-prose discipline. + +### Phase 4 — Voice & Disclosure Pass + +1. Pick the publisher house voice the post targets — Datadog (systems-pragmatic), Vercel (product-tight), GitHub (team-narrative), AWS (deliberate-measured), Meta (cross-organisational), Cloudflare (technical-confident), or Jane Street (precise-academic). Rewrite the lede and closer against the chosen register. +2. Apply the disclosure layer matching the archetype: + - **Postmortems** → blameless register (system-subject sentences; polarity-correct passive/active; first-person plural ownership). Engineers' names never appear in causality. + - **Security posts** → coordinated-disclosure four-panel (threat-model opening → background → layered-defense walkthrough → mitigations close). Probabilistic register for adversary capabilities. + - **AI/agent posts** → paper-link-first attribution. Named benchmarks. Capability claims declarative, limitations hedged conditional. + - **Migrations** → empathetic strategic voice; "what we'd do differently" honesty paragraph. +3. Apply vendor-naming conventions: partners named verbatim with shared responsibility; own products named as instruments; competitors named specifically when load-bearing; redaction reasons disclosed when scope-limited. + +Gist tripwires: + +- Active third-person team voice. **"We" or "the system" — never "you" except in tutorials.** +- Postmortems use the **blameless register**; no person's name attached to causation. +- AI posts cite the **paper before claiming the result**. +- Migrations include **"what we'd do differently"**. Triumphal language ("successfully", "smoothly", "without issue") at any density is a regression. + +**STOP. Read `references/voice-and-disclosure.md` in full before the voice pass.** Disclosure ethics, blameless register rules, paper-link-first attribution conventions, coordinated-disclosure constraints, and vendor-naming conventions live there. **STOP and additionally read `references/publisher-voice-matrix.md`** when targeting a specific publisher voice; that file holds the seven-publisher matrix (byline weight, sentence length, vendor density, evidence reflex, opening register, closing register) and the per-publisher banned moves. + +### Phase 5 — Narrative & Momentum Pass + +1. Audit the lede against the five-lede taxonomy. It must promise either a *result* (numbers/outcome), a *mystery* (the unexplained behaviour), a *stakes frame* (what was at risk), a *3-W summary* (postmortem), or a *paper-link-first* attribution (AI). Hedged "we want to share some thoughts on…" ledes are rejected. +2. Audit the H2 chain against the **H2-as-question-resolution** discipline. Each H2 must answer the question the previous section left open. List the open questions; if any H2 starts without resolving one, restructure. +3. Apply the archetype-specific story arc: + - **Detective-arc** (performance): hypothesis → measurement → reveal → next bottleneck. Require partial-victory paragraphs between fixes. + - **Migration-arc**: why now → what stayed fixed → what changed → cutover → results → what we'd do differently. + - **Blameless-arc** (postmortem): impact → timeline → causality trace → recovery → prevention. Require at least one failed-mitigation paragraph. + - **Paper-link-first arc** (AI/agent): paper → gap → system → eval → ablation → limits. + - **Tutorial-arc**: prerequisites → primer → stepwise → verify → pitfalls → next. +4. Run the **first-200 + last-200 callback-coupling check**: extract the first 200 words and the last 200 words; pair them. If they do not describe the same thread, the post has lost its spine — restructure. + +Gist tripwires: + +- The lede promises a **result, mystery, stakes, 3-W summary, or paper-link-first**. Not "we want to share." +- Each H2 **resolves a question** the previous section opened. +- The **first 200 and last 200 words** must thread back to each other. + +**STOP. Read `references/narrative-and-pacing.md` in full before finalizing the lede, headlines, transitions, and closer.** That file holds the five lede types with archetype pairings, the H2-as-question-resolution rule with worked examples, the story-shape catalogue, the momentum-stall diagnostics, the closer multiple-choice gate (call-to-build / call-to-adopt / open-question / shipping-status-roadmap / prevention-list), and the headline-as-compressed-lede check. + +### Phase 6 — Pre-publish Gate + +1. Run `python3 /scripts/lint-post.py ` (read-only). It scans for slop signatures: triumphal-vocabulary density, hedged-lede patterns, uncaptioned figures, evidence-free percent claims, blame language inside a postmortem context, missing rung tags in outline comments, code blocks over 30 lines without elision markers. +2. Walk the archetype-conditional checklist in `references/pre-publish-checklist.md` row-by-row against the draft. Each row is a hard gate; warnings are not optional. +3. Run the **headline-as-compressed-lede check**: compare the title's commitment (number, decision, surprise, system-name, stakes) against the body's evidence load. A "60x" title with no 60x evidence is **archetype-bait**; reject it. +4. Run the **closer multiple-choice gate**: the closer must be one of the five forward-pointing shapes (call-to-build / call-to-adopt / open-question / shipping-status-roadmap / prevention-list). Summarising closers are a regression. + +Gist tripwires: + +- No uncaptioned figures, no evidence-free percentages, no blame language in postmortems. +- Disclosure layer matches the archetype (blameless / threat-model / paper-link-first / coordinated-disclosure / what-we'd-do-differently). +- Headline does not promise an archetype the body does not deliver. + +**STOP. Read `references/pre-publish-checklist.md` in full before declaring the post publishable.** That file holds the archetype-conditional checklist (postmortem-only / migration-only / performance-only / AI-only / security-only rows), the disclosure-blocker list, the lint-threshold rationale, and the publishable / hold-for-review / rework rubric. **STOP and additionally read `references/anti-patterns.md`** to confirm no banned move slipped in. + +## Anti-Patterns (gist tripwires) + +The full elaborated list with examples lives in `references/anti-patterns.md`. The list below catches the most common slop during scanning. + +- **Buried lede.** Three paragraphs of context before the result, mystery, or stakes. +- **Rung whiplash.** Jumping R1→R5 mid-section with no restated product anchor. +- **Evidence-free percent claims.** "30% faster" with no chart, table, or methodology block. +- **Blame-by-implication in postmortems.** Naming a team, a person, or "the on-call engineer." +- **Code without context.** A 60-line listing with no surrounding prose claim and no elision marker. +- **Depth without product framing.** A deep-dive that never returns to R1 — peer-impressive, user-illegible. +- **Paper-name-dropping.** Citing a paper title without summarising its contribution or linking it. +- **False-precision metrics.** "p99 = 47.3ms" with no measurement window, sample size, or environment. +- **Hedged ledes.** "We wanted to share some thoughts on…", "In this post we'll explore…". +- **Exciting-announcement template.** "We are excited to announce that we are excited to announce." +- **Coordinated-disclosure violations.** Naming the vulnerable version before mitigation deployment. +- **AI-eval-as-anecdote.** "It works great in our tests" without a named benchmark or ablation. +- **Archetype-bait headlines.** Title promises archetype A; body delivers archetype B. +- **Framework-without-instance.** Migration post that names a maturity model but never populates the publisher's own trajectory through it. +- **Bolted-on AI-future close.** Generic "AI handles the long tail" paragraph on a non-AI post. + +**STOP. Read `references/anti-patterns.md` in full before any review pass.** The bullets above are tripwires, not the full catalogue. + +## Error Handling + +- **Unknown or hybrid archetype.** When the artifact does not match exactly one of the eight archetypes, declare it as `hybrid: +` and follow the primary's structural contract. Add a one-sentence note in the post's frontmatter declaring the hybridisation. Do not invent a ninth archetype. +- **No measurable data for a performance claim.** When a performance claim cannot be backed by a chart or table, either (a) downgrade the claim to qualitative language ("noticeably faster on cold start") or (b) cut the claim. Never inflate with placeholder numbers. The post may still be publishable as a launch deep-dive or migration narrative — but **not** as a performance deep-dive. +- **Security disclosure constraints.** When CVE coordination, embargo, or legal review limits what may be disclosed, generate the post with explicit `[REDACTED: pending disclosure]` markers and produce a "what can be shared now / what comes later" addendum. Do not publish until `references/security-and-reliability.md`'s disclosure-blocker checklist clears. +- **Multi-author conflicts.** When two contributors disagree on archetype framing, generate two outlines side-by-side (not two drafts) and surface the disagreement as a decision request. Do not silently pick a side or average them. Tag the diff: archetype-level / rung-level / voice-level / evidence-level. +- **No engineering depth available.** When the underlying work is not engineering-substantive (pure rebrand, OKR recap, executive vision), reject the request and redirect to marketing or thought-leadership channels. This skill does not produce non-engineering long-form. +- **Postmortem before remediation closure.** When remediations are not yet shipped, generate the postmortem with a "what we're doing next" section that names the remediations as in-flight with owners and ETAs. Do not omit the remediations section. +- **AI/agent post with no public benchmark.** When no named benchmark applies, build an ablation matrix against the system's own prior version and label it explicitly as "internal ablation, no external benchmark." Do not name-drop a benchmark that was not run. +- **Lint failure from `scripts/lint-post.py`.** When the lint script flags slop signatures, treat each as a blocker, not a warning. The pre-publish gate does not pass with open lint findings. +- **Metadata validation failure (`scripts/validate-metadata.py`).** Re-read the failure (NAME ERROR / DESCRIPTION ERROR / STYLE WARNING) and rewrite the offending field. Do not bypass the validator. + +## When NOT To Use + +- API reference documentation — use Diátaxis reference patterns instead. +- README authoring — use `crafting-effective-readmes`. +- Marketing copy, landing pages, pricing pages — use `copywriting`. +- Release notes, changelogs. +- Generic SEO content with no engineering substance. +- Ghost-written executive thought leadership — sibling genre, redirect elsewhere. +- Non-engineering long-form essays — strategic essays (Stripe-style) are a sibling genre; label honestly rather than disguise. +- Internal-only design documents — use `creating-spec`. + +## Verification + +Before declaring the skill output publishable, confirm: + +1. **Archetype committed.** The draft's frontmatter names a primary archetype (and an absorbed archetype when hybrid). +2. **Depth four-tuple committed.** `(opening rung, body residency band, closing rung, traversal)` is recorded. +3. **Outline gate passed.** Every H2 resolves the previous section's open question. +4. **Evidence cadence honoured.** Every artifact is preceded by a claim and followed by a reading; every percent claim attaches to an artifact. +5. **Voice and disclosure pass run.** Publisher register applied; archetype-specific disclosure layer present. +6. **First-200 + last-200 callback coupling.** Threaded back to each other. +7. **Lint clean.** `scripts/lint-post.py` exits 0; no open findings. +8. **Pre-publish checklist passed.** Archetype-conditional gate cleared. + +Each verification step has a corresponding gate in `references/pre-publish-checklist.md`. The skill is not done until every gate clears. diff --git a/.agents/skills/writing-tech-post/assets/evidence-block.snippets.md b/.agents/skills/writing-tech-post/assets/evidence-block.snippets.md new file mode 100644 index 000000000..facb20aaf --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/evidence-block.snippets.md @@ -0,0 +1,135 @@ +# Evidence-Block Snippets + +Reusable captioned-evidence templates. Each snippet honours the `claim → artifact → reading` triple and the captioning conventions. + +## Figure with finding caption + +```markdown +[Claim sentence that sets up what the reader should look for.] + +![Alt text written as prose reconstructing the diagram's claim — e.g., "A line graph shows the count of instance conntrack entries over time for different node types, with the spike at 14:32 UTC corresponding to the deployment window."](path/to/figure.png) + +*Figure N: [Finding stated declaratively — "Latency dropped from p99 1s to p99 100ms after the cache rollout." Never "Figure showing latency." Never "Diagram of the cache."]* + +[Reading sentence interpreting what the reader has just seen.] +``` + +## Distribution-shift chart (performance) + +```markdown +[Claim — what the change was supposed to do.] + +![Alt text describing the bucket boundaries and the visible shift across versions.](path/to/distribution.png) + +*Figure N: p99 latency distribution before (top) and after (bottom) the cache rollout, measured on m1 MacBook Pro with 4x slowdown across 12,500 navigations during the 2026-04-08 to 2026-04-15 rollout window.* + +[Reading — what the shift confirms or refutes.] +``` + +## Before/after metrics table (performance closer) + +```markdown +| Metric | Before | After | Delta | +|--------|--------|-------|-------| +| Total lines of code | 2,800 | 2,000 | -27% | +| Unique component types | 19 | 10 | -47% | +| Components rendered | ~183,504 | ~50,004 | -73% | +| DOM nodes | ~200,000 | ~180,000 | -10% | +| Memory | 150-250 MB | 80-120 MB | -50% | +| INP (large PR) | 450 ms | 100 ms | -78% | + +*Table caption: Evaluated on a pull request using a split-diff setting with 10,000 line changes on m1 MacBook Pro with 4x slowdown.* +``` + +## Architecture diagram with scope contract + +```markdown +*The diagram below outlines the high-level architecture of [SYSTEM]. Anything outside the dashed box is out of scope for this post.* + +![Alt text walking the named components and their relationships in prose.](path/to/architecture.png) + +*Figure N: High-level overview of [SYSTEM] node. [Component A] receives [DATA] from [SOURCE]; [Component B] forwards [PROCESSED DATA] to [DESTINATION]; [Component C] persists [STATE] in [STORE].* +``` + +## Sequence diagram (timing-sensitive) + +```markdown +[Claim — why the order matters.] + +![Alt text describing the message flow chronologically as a step list.](path/to/sequence.png) + +*Figure N: Sequence diagram for the [INITIAL / FAILURE / RECOVERY] design. Top-to-bottom denotes time. Failure handoff occurs at step 4 when [COMPONENT] times out.* + +[Reading — how this ordering surfaces the bug or the fix.] +``` + +## UTC-timeline table (postmortem) + +```markdown +## Timeline + +| Time (UTC) | Event | +|--------------|-------| +| 2024-11-12 09:08 | Automated upgrade was triggered. | +| 2024-11-12 09:11 | First customer-facing 5xx surfaced in [REGION]. | +| 2024-11-12 09:14 | On-call paged; investigation began. | +| 2024-11-12 09:23 | First mitigation attempted (task count scaling). It did not mitigate the issue. | +| 2024-11-12 09:41 | Second mitigation deployed (CDN block + traffic shift). | +| 2024-11-12 10:00 | canva.com fully recovered. | +``` + +## Code listing with elision marker + +```markdown +[Claim — what the snippet illustrates.] + +```rust +// Excerpt from src/router.rs lines 145–162 (see GitHub PR #482) +pub fn route_request(req: Request) -> Response { + let target = lookup_target(&req.host)?; + // ... validation and authorization checks elided ... + let response = forward(target, req).await?; + record_metric(&response); + response +} +``` + +[Reading — what to notice in the code; what the elided section does.] +``` + +## Named-benchmark result table (AI/agent) + +```markdown +| Method | MLE-Bench-Lite (Kaggle) | Finance-Agent | PlanCraft | +|--------|--------------------------|---------------|-----------| +| AIDE (baseline) | 25.8% | 41.2% | 28.7% | +| **MLE-STAR** | **63.6%** | **74.3%** | 8.4% (regression) | + +*Table caption: Medal rate on MLE-Bench-Lite across 100 Kaggle competitions; baseline AIDE evaluated on the same competition set. Note: MLE-STAR regresses on PlanCraft (sequential tasks); ablation in §[In-depth analysis] decomposes the parallelisable vs sequential gain.* +``` + +## Failed-mitigation paragraph (postmortem) + +```markdown +We attempted to work around this issue by significantly increasing the desired task count manually. Unfortunately, it didn't mitigate the issue, and additional tasks ended up immediately failing health checks once they came up. + +[Next mitigation paragraph naming what we tried next and why.] +``` + +## "What we'd do differently" paragraph (migration) + +```markdown +In retrospect, we'd underestimated the complexity of [SPECIFIC SUB-PROBLEM]. The dual-stack approach surfaced thousands of duplicate symbol errors that required modifying hundreds of thousands of lines across thousands of files. If we were starting over, we would [SPECIFIC CHANGE WE'D MAKE]. +``` + +## Verbatim partner quote (postmortem with vendor responsibility) + +```markdown +Cloudflare provided the following statement regarding the contributing factor on their side: + +> [Verbatim quote from the partner, blockquoted, with attribution to a named role or team.] +> +> — [Named team / role at partner company] + +We've been working closely with [PARTNER] to gain an in-depth understanding of the contributing factor. On our side, we underestimated [PUBLISHER-SIDE FACTOR]. +``` diff --git a/.agents/skills/writing-tech-post/assets/frontmatter.template.md b/.agents/skills/writing-tech-post/assets/frontmatter.template.md new file mode 100644 index 000000000..cc145c7b4 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/frontmatter.template.md @@ -0,0 +1,67 @@ +# Frontmatter template + +Publish-time metadata block. Drop this at the top of any draft authored with `writing-tech-post`. The skill validates the draft against the values committed here. + +```yaml +--- +title: [POST TITLE — should match the headline conventions in narrative-and-pacing.md] +slug: [kebab-case-slug] +authors: + - name: [Author 1] + role: [Engineer / Staff Engineer / Principal SRE] + - name: [Author 2] + role: [...] +acknowledgements: [Optional — for cross-team contributions or external researchers] + +# Archetype commitment (gates every later phase) +archetype: + primary: [launch | postmortem | migration | performance | tutorial | research-translation | ai-agent | security] + absorbed: [optional secondary archetype if hybrid] + hybrid-note: [one sentence if hybrid — e.g., "launch + migration: the lineage section is structural ornament; the launch contract is load-bearing"] + +# Audience commitment +audience: + rung-target: [product-user | engineer-adopter | peer-engineer-deep | infra-or-research-peer] + +# Depth four-tuple (commit before drafting prose) +depth-tuple: + opening-rung: [R1 | R2 | R3 | R4 | R5] + body-residency: [e.g., "R3 → R4 → R5 with R3 re-measurement"] + closing-rung: [R1 | R2 | R3 | R4 | R5] + traversal: [staircase | yo-yo | spiral | anchor-and-dive | sidebar-interlude | braided] + +# Publisher voice target +voice: + publisher: [datadog | vercel | github | aws | meta | cloudflare | jane-street | canva | docker | slack | tailscale | other] + register: [systems-pragmatic | product-tight | team-narrative | deliberate-measured | cross-organisational | technical-confident | precise-academic] + +# Length budget +length-band: + estimated-words: [number] + archetype-band: [e.g., "5,000–8,000 for launch deep-dive"] + +# Evidence forms declared upfront +evidence-forms: + - [architecture-diagram | sequence-diagram | flowchart | data-flow-diagram | before-after-migration-diagram | code-snippet | shell-session | assembly | chart | distribution-chart | table | screenshot | embedded-quote | named-benchmark-table | ablation-matrix | agent-trace | role-graph | knowledge-pyramid | eval-harness-evolution | structured-output-schema | alert-screenshot] + +# Disclosure layer (per archetype) +disclosure: + blameless-register: [required-for-postmortem | not-applicable] + coordinated-disclosure: [required-for-cve-response | not-applicable] + paper-link-first: [required-for-ai-agent-or-research | not-applicable] + what-wed-do-differently: [required-for-migration-or-retrospective | not-applicable] + vendor-naming-discipline: [audit-required] + +# Closing register +closer: + shape: [call-to-build | call-to-adopt | open-question | shipping-status-roadmap | prevention-list | distribution-chart-close] + +# Pre-publish gate status +status: [drafting | outline-review | evidence-pass | voice-pass | narrative-pass | pre-publish-gate | publishable | hold-for-review | rework] + +# Optional: SEO / metadata +meta: + description: [≤160 char description — first 200 words of the lede compressed] + tags: [optional list] +--- +``` diff --git a/.agents/skills/writing-tech-post/assets/outline.ai-agent-post.md b/.agents/skills/writing-tech-post/assets/outline.ai-agent-post.md new file mode 100644 index 000000000..e44dacea9 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.ai-agent-post.md @@ -0,0 +1,92 @@ +# [AI/AGENT TITLE — "[SYSTEM-NAME]: A [CAPABILITY] agent" or "How we built [SYSTEM]"] + + + +## Quick Links + + + +- **[Paper]** — [arXiv link] +- **[Repository]** — [GitHub link] + +## [Capability claim + task framing] + + + +[One-paragraph capability claim — what the agent does, the named task, why it matters.] + +In our recent [paper](LINK), we introduce [SYSTEM NAME] — [LOAD-BEARING SUMMARY]. + +## Product context + + + +[Workload context: scale, fleet size, engineer-hours saved or capacity unlocked.] + +## System architecture + + + +[Architecture diagram with named agent personas (e.g., Director / Expert / Critic), tools, MCP integrations.] + +[Each persona's role; each tool's contract.] + +## Evaluation setup + + + +[Cited benchmark — public (MLE-Bench-Lite, BrowseComp-Plus, Finance-Agent, PlanCraft, Workbench, SWE-Bench) or internal with documented composition.] + +| Method | [Benchmark 1] | [Benchmark 2] | +|--------|---------------|---------------| +| [Baseline] | [X] | [Y] | +| **[SYSTEM NAME]** | **[X']** | **[Y']** | + +[Methodology disclosure — what the eval harness does, what counts as success, what was held back as ground truth.] + +## In-depth analysis (ablation) + + + +### Component 1: [Named contribution] + +[How much of the headline number this contributed. Include a chart or table.] + +### Component 2: [Named contribution] + +[...] + +### Negative finding (if any) + +[Honest disclosure of a regression or a domain where the system underperforms. *"+81% on parallelizable tasks (Finance-Agent), −70% on sequential tasks (PlanCraft)"* style.] + +## Guardrails (named checkers) + + + +- **[Checker 1]** — [Role; detection contract.] +- **[Checker 2]** — [Role.] +- **[Checker 3]** — [Role.] + +## Failure modes + + + +[LLM-generated [OUTPUT] carries the risk of [FAILURE MODE].] [In our evaluation, [SYSTEM] would sometimes [INCORRECT BEHAVIOUR].] + +## What's next + + + +[Engineers who previously [TIME-CONSUMING TASK] now [RECOVERED TIME — review AI-generated analyses in minutes].] + +[Forward work — at least two specific applications, not generic "AI handles the long tail" unless the work is genuinely on the roadmap.] + +## Open-source / availability + +[Repo link + how to try the system.] diff --git a/.agents/skills/writing-tech-post/assets/outline.launch-deep-dive.md b/.agents/skills/writing-tech-post/assets/outline.launch-deep-dive.md new file mode 100644 index 000000000..3f01246b7 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.launch-deep-dive.md @@ -0,0 +1,60 @@ +# [LAUNCH TITLE — system-name introduction or number-first framing] + + + +## [Scale-then-headline-number opening] + + + +As [COMPANY] continues to scale, [VOLUME / COMPLEXITY / CARDINALITY] of [METRIC] steadily [GROWS]. + +Today we're sharing [SYSTEM NAME] — [ONE-PARAGRAPH CAPABILITY CLAIM]. The headline result: [60x] increase in [METRIC] and [5x] [METRIC] at peak scale. + +## The lineage that brought us here (optional — launch+migration hybrid) + + + +[Gen 1 → Gen N narrative compressed into a section that lands the reader at the new system.] + +## Overview of [SYSTEM NAME] + + + +[High-level architecture diagram. Named subsystems that recur in the rest of the post.] + +## [Component 1] — [name + role] + + + +[How the component works; what it replaces or augments; named tooling.] + +## [Component 2] — [name + role] + +[...] + +## [Component N] — [name + role] + +[...] + +## Results + + + +[Concrete production numbers: ingestion rate, query latency, cost per request, fleet size deployed against.] + +## Looking ahead + + + +- [Specific next-direction commitment 1] +- [Specific next-direction commitment 2] +- [Specific next-direction commitment 3] + +## Acknowledgements + +[Named contributors across teams.] diff --git a/.agents/skills/writing-tech-post/assets/outline.migration.md b/.agents/skills/writing-tech-post/assets/outline.migration.md new file mode 100644 index 000000000..90dcac7e9 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.migration.md @@ -0,0 +1,78 @@ +# [MIGRATION TITLE — decision-narrative or system-name framing] + + + +## [Why we relied on the predecessor — empathetic legacy framing] + + + +Like many other organizations, [COMPANY] has long relied on [PREDECESSOR]. This pattern is pervasive because [WHY IT WORKED WELL]. But eventually, the trade-offs started to pile up. + +## Why now + + + +[The specific trigger — scaling ceiling / security pressure / cost trajectory / organisational restructuring.] + +## The before state + + + +[Architecture diagram before. Quantified scope (e.g., 700+ jobs across 8 regions).] + +## Constraints we kept fixed + + + +[The constraints that bounded the migration — backwards compatibility, SLAs, customer interfaces.] + +## The shape we picked (numbered phases) + + + +### Phase 1: [Logical separation] + +[What changed in this phase; tracking metric for completion; named safety mechanism.] + +### Phase 2: [Access reduction] + +[...] + +### Phase 3: [Physical separation] + +[...] + +## How we cut over (named safety mechanisms) + + + +[Cutover discipline — per-phase, per-region, or per-cohort. Name the rollback window and how it was validated.] + +## Results & costs + + + +[Concrete numbers: phase-completion dates, tracking metric trajectory, customer-facing impact (or absence thereof).] + +## What we'd do differently + + + +[What surprised us; what we under-estimated; what we'd skip or do earlier next time. *"thousands of duplicate symbol errors"* / *"we'd underestimated the impact"* style.] + +## Where we are now (dated status snapshot) + + + +Phase 1 started in [Q1 2024] and finished in [middle of Q1 2025]. Phase 2 started [halfway into Q1 2024] and will continue through [2026]. + +## Forward work + + + +[Two or more specific next-direction commitments.] diff --git a/.agents/skills/writing-tech-post/assets/outline.performance.md b/.agents/skills/writing-tech-post/assets/outline.performance.md new file mode 100644 index 000000000..49b37c0cc --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.performance.md @@ -0,0 +1,66 @@ +# [PERFORMANCE TITLE — surprise-clause or stakes/journey framing] + + + +## [Felt-experience + stakes opening — not a number] + + + +[Felt-experience claim about user pain or team operational reality.] + +[Promise the arc — name the multi-fix structure to come.] + +## Our [system] at a glance + + + +[Architecture diagram with the metric the post will move. Define the percentile bands.] + +## The first hypothesis & how it broke + + + +[Hypothesis claim.] + +[Named tool / instrument used (`pg_walinspect`, `lldb`, ENA metric ID).] + +[Distribution-shift chart showing what changed.] + +This was a slight improvement, but clearly still higher than normal. [Partial-victory paragraph re-arming the investigation.] + +## Bottleneck N — [name the bottleneck] + + + +[Hypothesis] → [Instrument] → [Distribution-shift chart] → [Partial-victory paragraph.] + +## Bottleneck N+1 — [name the next bottleneck] + +[...] + +## Final distribution shift + + + +[Final percentile distribution chart on the same axes as the baseline.] + +## What we left on the table + + + +[Specific known bottlenecks not addressed yet.] + +## Recap & lessons + + + +[Bulleted chain of fixes + operational change in monitoring/alerting.] + +[Optionally: one transferable general lesson + the tool that surfaced it.] + +[Final partial-victory cadence: *"this didn't end here"*-style forward-pointing close.] diff --git a/.agents/skills/writing-tech-post/assets/outline.postmortem.md b/.agents/skills/writing-tech-post/assets/outline.postmortem.md new file mode 100644 index 000000000..bcf06cfce --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.postmortem.md @@ -0,0 +1,82 @@ +# [POSTMORTEM TITLE — date + scope, no jargon, no blame] + + + +## Summary + + + +On [DATE], [COMPANY] experienced [WHAT] that affected [SCOPE]. From [START UTC] to [END UTC], [USER-VISIBLE IMPACT]. + +## Background + + + +[The system being affected; what it does; why the failure mattered.] + +## The incident + + + +[Narrative of the unfolding event.] + +## Timeline + + + +| Time (UTC) | Event | +|-----------|-------| +| [HH:MM] | [System-subject sentence describing the event.] | +| [HH:MM] | [...] | +| [HH:MM] | [Recovery milestone.] | + +## Contributing factors + + + +1. **[Factor 1]** — [System-subject description, linking to specific artifact, e.g., PR #17477 / CVE-2026-XXXXX]. +2. **[Factor 2]** — [...] +3. **[Telemetry / detection factor]** — [...] + +## Mitigation (including failed attempts) + + + +[Description of the first mitigation attempt.] + +We attempted [X]. Unfortunately, it didn't mitigate [Y]. + +[Description of the next attempt that worked.] + +## Action items + + + +### [Category 1, e.g., Incident response process improvements] + +- [Action with owner / ETA] +- [...] + +### [Category 2, e.g., Resilience improvements] + +- [Action with owner / ETA] + +### [Category 3, e.g., Collaboration with vendor] + +- [Action with owner / ETA] + +## Lessons (optional — reliability essay variant) + + + +- [Principle 1, e.g., "Always start with what's important to the end user."] +- [Principle 2] + +## Acknowledgements (optional) + +[Named cross-team contributors; partner companies; external researchers.] diff --git a/.agents/skills/writing-tech-post/assets/outline.research-translation.md b/.agents/skills/writing-tech-post/assets/outline.research-translation.md new file mode 100644 index 000000000..03d5dfeca --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.research-translation.md @@ -0,0 +1,73 @@ +# [RESEARCH-TRANSLATION TITLE — "[SYSTEM-NAME]: A state-of-the-art [system class]"] + + + +## Quick Links + + + +- **[Paper]** — [arXiv link] +- **[Codebase]** — [GitHub link, if available] +- **[Blog announcement]** — [if multi-post] + +## [Motivation — discipline-framed problem] + + + +[Discipline-framed problem statement. e.g., *"crafting these models remains an arduous endeavor"* — frames the contribution within the field's open challenges.] + +In our recent [paper](LINK), we introduce [SYSTEM NAME], a [DISCIPLINE-FRAMED CONTRIBUTION]. + +## Introducing [SYSTEM NAME] + + + +[Architecture / method diagram.] + +[How the system works at a glance — components, key innovations.] + +## Evaluations and results + + + +| Method | [Benchmark 1] | [Benchmark 2] | [Benchmark 3] | +|--------|---------------|---------------|---------------| +| [Baseline] | [X] | [Y] | [Z] | +| **[SYSTEM NAME]** | **[X']** | **[Y']** | **[Z']** | + +[One-paragraph interpretation of the result.] + +## In-depth analysis (ablation) + + + +### Component 1: [Named contribution] + +[How much of the headline number this component contributed.] + +### Component 2: [Named contribution] + +[...] + +### Component N: [Named guardrail / checker] + +[Distinct role and detection contract.] + +## Open-source / availability + + + +[Developers and researchers can now accelerate their [USE CASE] by using our newly released [open-source codebase](LINK).] + +## Acknowledgements + +[Named contributors not on the byline — inherited from papers.] + +--- + +*[Licensing footnote — "Intended for research purposes only; the expectation is for the user to verify that the models and other content sourced by [SYSTEM NAME] adhere to appropriate licensing restrictions."]* diff --git a/.agents/skills/writing-tech-post/assets/outline.security-disclosure.md b/.agents/skills/writing-tech-post/assets/outline.security-disclosure.md new file mode 100644 index 000000000..d5fc42d88 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.security-disclosure.md @@ -0,0 +1,84 @@ +# [SECURITY TITLE — "How [COMPANY] responded to [VULNERABILITY]" or threat-model-first framing] + + + +## TL;DR (severity + scope) + + + +On [DATE], a [VULNERABILITY TYPE] vulnerability was publicly disclosed under the name [CVE-XXXX-XXXXX]. The vulnerability affected [SCOPE]. By the end of the rollout, [OUTCOME — including absence of customer impact when preparedness paid off]. + +## Threat model + + + +[Adversary capability — what an attacker can do. Use probabilistic register: "could", "might", "if X then Y".] + +[Asset at risk — what is being defended.] + +[Impact horizon — 10–15 years (PQC) / immediate (CVE response).] + +## Background — the affected system + + + +[How the affected system works. Name the specific kernel subsystem (`AF_ALG`), cryptographic algorithm (`sntrup761x25519-sha512`), or library version.] + +## How the vulnerability works + + + +[Mechanism walkthrough at the right level of abstraction.] + +A comprehensive write-up of the exploit can be found in the original [researcher disclosure post](LINK). + +## How we responded (UTC timeline) + + + +| Time (UTC) | Event | +|--------------|-------| +| [DATE TIME] | [Public disclosure event.] | +| [+N hours] | [Detection and triage.] | +| [+N hours] | [Surgical mitigation deployed (e.g., bpf-lsm program).] | +| [+N hours] | [Patched-LTS rollout begins.] | +| [+N days] | [Patched-LTS rollout completes across fleet.] | + +## Layered defense + + + +### Defense 1: [Name] + +[What it does; how it caught / prevented the issue.] + +### Defense 2: [Name] + +[...] + +## Scope and caveats + + + +This [analysis/mitigation/disclosure] only affects [SPECIFIC SCOPE] and does not impact [ADJACENT SCOPE]. [Specific FIPS / regional / product caveats.] + +## Remediation and follow-up steps + + + +- [Better visibility into kernel-API dependencies / similar.] +- [Better runtime mitigation infrastructure.] +- [Reduce attack surface — specific architectural change.] + +## Acknowledgements + +[Named external researchers + multi-engineer response team (e.g., 26 engineers + Linux upstream maintainers).] + +--- + +*[Disclaimer — "The information in this article is shared for informational purposes only and does not constitute professional, technical, or legal advice, nor does it constitute a guarantee of any particular security outcome."]* diff --git a/.agents/skills/writing-tech-post/assets/outline.tutorial.md b/.agents/skills/writing-tech-post/assets/outline.tutorial.md new file mode 100644 index 000000000..ccb32d390 --- /dev/null +++ b/.agents/skills/writing-tech-post/assets/outline.tutorial.md @@ -0,0 +1,73 @@ +# [TUTORIAL TITLE — "How to X" or "We show you how to X"] + + + +## What you'll build + + + +In this tutorial, we show you how to [BUILD/IMPLEMENT X] to achieve [QUANTIFIED RESULT]. By the end, you will have [DELIVERABLE]. + +## Prerequisites + + + +- [Account, language version, runtime] +- [Tools to install] +- [Knowledge assumed] + +## How [the system / API / runtime] works + + + +[Brief explanation of the mechanism the tutorial will use.] + +| Configuration | Value | Notes | +|---------------|-------|-------| +| [...] | [...] | [...] | + +## Step 1: [Action verb + object] + + + +[Direct address — *"Create a new Lambda project using Cargo Lambda"*.] + +```[language] +[Runnable code block, end-to-end, copy-paste safe.] +``` + +[One sentence interpreting the code.] + +## Step 2: [Action verb + object] + +[...] + +## Step 3: [Action verb + object] + +[...] + +## Verification + + + +Run [VERIFICATION COMMAND]. You should see [EXPECTED OUTPUT]. + +## Common pitfalls + + + +- **[Pitfall 1]** — [How to detect it; how to recover.] +- **[Pitfall 2]** — [...] + +## Next steps + + + +- Try [VARIATION] to explore [ADJACENT CAPABILITY]. +- Reference: [GitHub repo / docs link / further tutorial]. +- Optional results context: [Closing with launch-style number is acceptable when the tutorial is also a capability demo]. diff --git a/.agents/skills/writing-tech-post/references/ai-and-agents.md b/.agents/skills/writing-tech-post/references/ai-and-agents.md new file mode 100644 index 000000000..ade206c97 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/ai-and-agents.md @@ -0,0 +1,117 @@ +# AI and Agent Posts (2026 wave) + +The AI/agent specialty surface's contract: paper-link-first attribution, named-benchmark + ablation evidence, named-checker idiom, "AI handles the long tail" close motif, and the four sub-variants (research-translation, capability-launch, tooling-platform, strategic-essay). + +## Contents + +- [Opening: capability claim + paper/repo link in first scroll](#opening-capability-claim--paperrepo-link-in-first-scroll) +- [Nine-stage canonical section sequence](#nine-stage-canonical-section-sequence) +- [Named-benchmark contract (mandatory evidence)](#named-benchmark-contract-mandatory-evidence) +- [Ablation as load-bearing credibility move](#ablation-as-load-bearing-credibility-move) +- [Named guardrails (checkers as concrete artifacts)](#named-guardrails-checkers-as-concrete-artifacts) +- [Closing move: open-source repo or "long tail" motif](#closing-move-open-source-repo-or-long-tail-motif) +- [Four sub-variants](#four-sub-variants) +- [Capability vs limitation register switch](#capability-vs-limitation-register-switch) +- [Silent failure modes (AI/agent slop)](#silent-failure-modes-aiagent-slop) + +## Opening: capability claim + paper/repo link in first scroll + +Hero image, one-paragraph capability claim, then a paper link or open-source repo link inside the first scroll. The convention signals the post participates in a research conversation even when the host blog is product-facing. + +Exemplars: + +- MLE-STAR (`074:40-54`) — capability claim followed by *"In our recent [paper], we introduce MLE-STAR, a novel ML engineering agent"* with arXiv link inside "Quick links" block above the first body paragraph. +- Datadog Bits AI eval platform (`019:46-51`) — capability claim before product context. + +**Paper-link-first attribution** is the cohort's load-bearing voice move. Pair the citation with a one-sentence summary of the load-bearing claim. Slop variant: name-drop the paper title without summarising the result. + +## Nine-stage canonical section sequence + +1. **Capability claim** — one paragraph. +2. **Paper / repo link** — first scroll. +3. **Product context** — what task the system performs. +4. **System overview diagram** — named agents, tools, MCP. +5. **Evaluation table** — named benchmarks with baseline. +6. **Ablation / "In-depth analysis" sub-section** — decompose the headline gain. +7. **Guardrails enumeration** — named checkers as concrete artifacts. +8. **Lessons** — what we learned about the eval, the system, the failure modes. +9. **"AI handles the long tail" close** — or open-source repo link. + +## Named-benchmark contract (mandatory evidence) + +Any AI/agent capability claim attaches to: + +- **Cited benchmark.** Public — MLE-Bench-Lite, BrowseComp-Plus, Finance-Agent, PlanCraft, Workbench, SWE-Bench. Or internal with documented composition — BewAIre's curated dataset of malicious + simulated + benign PRs, with weekly updates. +- **Baseline.** MLE-STAR vs AIDE (25.8% → 63.6%). Scaling-agents single-agent baseline against centralised / independent / decentralised / hybrid architectures. +- **Methodology disclosure.** Datadog's evaluation-platform regression — publishing an 11% pass-rate drop and a 35% label-count drop as deliberate short-term degradation in service of a more honest evaluation — is the corpus's standing example of how to disclose methodology shifts. +- **Evaluation slice.** Task-specific reporting: scaling-agents reports *"+81% on parallelizable tasks (Finance-Agent), −70% on sequential tasks (PlanCraft)"* — the negative finding is published as a load-bearing result, not a buried caveat. + +Capability claims without a benchmark are **proof-of-concept**, not production. The skill rejects them at the pre-publish gate. + +## Ablation as load-bearing credibility move + +The cohort's load-bearing momentum move: decompose the headline gain across components. + +MLE-STAR's "In-depth analysis" section breaks the medal-rate gain into: + +1. Model-usage shift (EfficientNet/ViT vs ResNet). +2. Human intervention (RealMLP integration). +3. Per-checker contribution (debugging agent + data-leakage checker + usage checker). + +Scaling-agents publishes a five-architecture box-plot comparison with per-architecture computational complexity, communication overhead, and coordination mechanisms — plus an error-amplification reliability chart showing 17.2× for independent multi-agent vs 4.4× for centralised. + +**Distinct obligation:** publish negative findings as load-bearing results, not buried caveats. + +**Requirement:** the skill refuses to mark an AI-cohort draft as complete until the H2 chain includes an "In-depth analysis" or "Ablation" section that names what fraction of the headline number each component contributed. + +## Named guardrails (checkers as concrete artifacts) + +Three exemplars: + +- **MLE-STAR** — *"debugging agent, data leakage checker, data usage checker."* Three named artifacts; each has a distinct role. +- **Slack security-investigation** — three-persona Director/Expert/Critic loop with the Critic as a *"weakly adversarial"* check. Knowledge-pyramid diagram showing acquired knowledge flowing up between progressively more advanced models. +- **BewAIre** — prompt engineering + recursive chunking + pattern exclusion + balanced accuracy + manual validation as named checkers with quantitative claims. + +**Slop variant:** vague "we added safety checks" without naming what each check does and what it catches. Strong posts name the checker, its role, and its detection contract. + +## Closing move: open-source repo or "long tail" motif + +Two variants: + +- **Call-to-build** — open-source repo link. MLE-STAR closes with `google/adk-samples`; Google production-ready-agents closes with a fork link. +- **"AI handles the long tail" motif** — Meta capacity-efficiency: *"the end goal is a self-sustaining efficiency engine where AI handles the long tail."* The motif is so consistent that its absence is itself diagnostic of a different sub-genre. + +**Slop:** bolted-on "AI handles the long tail" on non-AI posts. The trope has become so common it reads as performative when the AI work is not genuinely on the roadmap. If used, the forward-section must name at least two specific applications (build-health + conflict-resolution in Meta WebRTC). + +## Four sub-variants + +1. **Research-translation** (MLE-STAR, scaling-agent-systems) — strongest constraints, academic-closest voice. Co-authored with the paper's researchers. +2. **Capability-launch** (Meta capacity-efficiency, Datadog BewAIre) — operational metric instead of benchmark, lessons instead of ablation. +3. **Tooling-platform** (Datadog Bits AI eval platform, Datadog hackerbot-claw) — meta-infrastructure as subject; closest to a migration narrative in shape. +4. **Strategic-essay** (contrast case — Stripe checkout) — same vocabulary, different reader contract; **explicitly not a cohort post**. Sibling genre. + +## Capability vs limitation register switch + +Failure modes shift register: capabilities in declarative present, limitations in hedged conditional. + +- **Capability:** *"MLE-STAR uses web search to retrieve relevant and potentially state-of-the-art approaches."* +- **Limitation:** *"LLM-generated Python scripts carry the risk of introducing data leakage."* +- **Limitation:** *"errors cascaded unchecked"* (scaling-agents). +- **Limitation:** *"it would quickly jump to a convenient or spurious conclusion"* (Slack). + +These are observations of failure in past or hedged present, never in the assertive present that capability claims use. + +**Audit test:** if a post reports limitations in the same voice as capabilities, the post is over-claiming. + +## Silent failure modes (AI/agent slop) + +- **Capability claims without benchmarks.** Proof-of-concept, not production. +- **Demo-only evidence** — a single transcript or screenshot. +- **Missing ablation.** Headline gain not decomposed. +- **"We used AI agents to…" as a credential.** AI-as-buzzword. +- **Bolt-on AI-future closing on unrelated posts.** "AI handles the long tail" on a non-AI post. +- **Conflating LLM-calls with "agentic."** Single-call systems are not agent systems. +- **Under-disclosed failure modes.** Limitations buried as caveats instead of published as load-bearing results. +- **Paper-name-dropping without summary.** Citing a paper title without summarising its contribution. +- **AI-eval-as-anecdote.** "It works great in our tests" without a named benchmark. +- **No regression disclosed.** Strong posts publish methodological regressions (Datadog's 11% pass-rate drop) as honesty signals. Absence is a possible over-claim flag. diff --git a/.agents/skills/writing-tech-post/references/anti-patterns.md b/.agents/skills/writing-tech-post/references/anti-patterns.md new file mode 100644 index 000000000..bbf5ef4cb --- /dev/null +++ b/.agents/skills/writing-tech-post/references/anti-patterns.md @@ -0,0 +1,82 @@ +# Anti-Patterns + +The full banned-move catalogue. Each pattern: name, why it fails, banned example, fixed example, and the archetype(s) it most often appears in. + +## Contents + +- [Structural anti-patterns (across archetypes)](#structural-anti-patterns-across-archetypes) +- [Lede anti-patterns](#lede-anti-patterns) +- [Voice and disclosure anti-patterns](#voice-and-disclosure-anti-patterns) +- [Evidence anti-patterns](#evidence-anti-patterns) +- [Depth and abstraction anti-patterns](#depth-and-abstraction-anti-patterns) +- [Narrative and closer anti-patterns](#narrative-and-closer-anti-patterns) +- [Per-archetype-specific anti-patterns](#per-archetype-specific-anti-patterns) + +## Structural anti-patterns (across archetypes) + +- **Archetype-bait headlines.** Title promises archetype A; body delivers archetype B. Erodes publisher trust across a series of posts. Example: a "60x" title with no 60x evidence. Fix: align headline commitment with body evidence; reject mismatches at the pre-publish gate. +- **Framework-without-instance.** Migration post names a maturity model or phase taxonomy but never populates the publisher's own trajectory. Example: introducing "PQC Migration Levels" without showing where Meta itself sits. Fix: every framework must be populated with the publisher's own concrete journey. +- **Triumphal language at any density.** *"Successfully," "smoothly," "without issue."* Triumphal language signals luck or untruth; the genre rejects it. Fix: replace with concrete evidence (named cutover safety mechanism, dated phase completion, concrete job count). +- **One-size-fits-all "engineering blog template."** A skill that emits a single generic template — *Problem / Architecture / Results / Conclusion* — produces archetype-confused posts. Fix: archetype selection gates everything else. + +## Lede anti-patterns + +- **Buried lede.** Headline number is in section three. *"If the 60x is in section three, the reader has already left."* Fix: launch deep-dives land the headline number above the first H2. +- **Hedged-lede.** *"We wanted to share some thoughts on…"* *"In this post we'll explore…"* *"This article is written for…"* Burns the first paragraph on apologia. Fix: use the five-lede taxonomy (result-first / mystery / stakes-first / 3-W summary / paper-link-first / shipping-status). +- **Exciting-announcement template.** *"We are excited to announce that we are excited to announce."* Banned at Datadog by editorial reflex; reads as marketing on any engineering post. Fix: replace with feature framing + code block (Vercel changelog voice) or stakes-and-problem two-step (GitHub). +- **Abstract-then-concrete inversion.** Opens with a generic industry trend and waits until section two to introduce the specific system. Strategic essays do this on purpose; engineering deep-dives that imitate forfeit pull. Fix: open with the specific system or pathology, defer industry framing to the body if needed. + +## Voice and disclosure anti-patterns + +- **Blame-by-implication in postmortems.** Naming a team, a person, or *"the on-call engineer."* Fix: system-subject sentences. *"On start-up of v248, systemd-networkd flushes all IP rules"* — not *"the systemd maintainers chose to flush."* +- **Defensive "While/Although" register in postmortems.** *"While our autoscaling capability was outpaced…"* Signals concern for reputation over learning. Fix: *"Our autoscaling was outpaced."* +- **"We have you covered" closures** without naming residual gaps. The promise of completeness invites the reader to discover the gaps later. Fix: name specific defenses, scopes, and explicit residual gaps. +- **Ghost-written-byline.** Engineer's name on a post written by a marketer ventriloquising the engineer. The prose loses the texture of decisions-made-under-uncertainty. Fix: multi-engineer bylines on incident and security posts; refuse to author posts requiring author voice the publisher does not actually have. +- **Mismatched closing CTA.** A 4,000-word postmortem ending with *"Sign up for a free trial"* alienates both audiences. Fix: postmortems close with engineering reflection or prevention; security responses close with follow-up work + hedge; migrations close with dated remaining work; AI posts close with a callable artifact. +- **Over-redacted postmortem.** *"A critical microservice," "an upstream provider."* Readers notice. Fix: disclose why redaction is necessary; do not silently genericise. +- **FUD-style threat framing.** *"The quantum apocalypse," "the AI threat is reshaping…"* Fix: probabilistic register with concrete impact horizons and literature-anchored terms. +- **Paper-name-dropping without summary.** Citing a paper title without summarising its contribution. The academic equivalent of vendor name-dropping. Fix: cite the paper, then summarise the load-bearing claim in one sentence. +- **AI-eval-as-anecdote.** *"It works great in our tests"* without a named benchmark or ablation. Fix: cite a public benchmark or label internal evaluation explicitly ("internal ablation, no external benchmark") with documented composition. +- **Triumphal migration post.** *"We migrated 700 jobs with zero issues"* reads as luck or untruth. Fix: disclose specific problems and how each was resolved ("what we'd do differently" paragraph). +- **Capability and limitation in the same voice.** Reporting failure modes in the assertive present that capability claims use = over-claiming. Fix: declarative present for capabilities; hedged future-conditional for limitations. + +## Evidence anti-patterns + +- **Evidence-free percent claims.** *"30% faster"* with no chart, table, or methodology block. Fix: every percent claim points to a chart, table, or named benchmark. +- **Mean-only performance charts.** Performance is about distributions, not averages. Fix: percentile reporting with sample size, measurement window, and environment. +- **Code without context.** A 60-line listing with no surrounding prose claim and no elision marker. Fix: split with prose, elide with `// ...`, or replace with source-link plus a load-bearing slice. +- **False-precision metrics.** *"p99 = 47.3ms"* with no measurement window, sample size, or environment. *"99.3% accuracy"* without dataset, baseline, and class-imbalance disclosure. Fix: every metric attaches to its four-tuple of context. +- **Decoration in evidence's clothing.** Marketing-styled dashboards with sparklines, summary cards, and product logos. Before/after charts with cropped y-axes implying improvement without quantifying it. Fix: cover the figure with a hand and re-read the surrounding prose — if the claim still extracts, the figure is evidence; if not, it is filler. +- **Caption-as-label.** Captions starting with *"Figure showing…"* or *"Diagram of…"* Fix: caption states the *finding*, not the artifact name. +- **Missing alt text** or **alt text as label** (not as prose reconstructing the diagram's claim). +- **Screenshots of code instead of code blocks.** Unsearchable, uncopyable, inaccessible. Fix: use code blocks with language tag. Exception: when the DOM is the evidence (diff-lines `054`), not the source. +- **Missing-distribution.** Quoting *"p99 latency dropped from 1s to 100ms"* without graphing the distribution between. Fix: distribution-shift evidence is mandatory for any perf claim. + +## Depth and abstraction anti-patterns + +- **Rung whiplash.** Jumping >2 rungs without an anchor. Most often: R1 testimonial → R5 vendor product name with no R3 or R4 bridge (customer-story collapse). Fix: gradual descent; no jump exceeds one rung without a transition sentence or section break. +- **Missing-rung descent.** Post moves through rungs in order but skips one — most often R3. The reader is left holding a design they cannot evaluate. Fix: insert an R3 paragraph that states the measurable property in units, with a baseline. +- **Premature R1 ascent in close.** *"We rewrote the storage engine in Rust, which means our customers can sleep at night"* — skips R2-R4. Fix: graduated ascent — end R5, name R3 result, name R2 implication, only then close at R1. +- **Vendor-name padding.** Vendor product names at R1/R2 (framing) instead of R5 (implementation). Fix: name the abstract capability first, then disclose the specific product. Apply the "remove the vendor's name" test. +- **Depth without product framing.** A deep-dive that never returns to R1 — peer-impressive, user-illegible. Fix: R1 anchor in opener and/or closer (mandatory for posts owing user-perceived impact). +- **Customer-story collapse.** R1 testimonial → R5 vendor product name with no R3 bridge → R1 testimonial. **Whiplash when sold as engineering.** Fix: refuse to author customer stories in this skill; route to the customer-story template. + +## Narrative and closer anti-patterns + +- **Every section a noun phrase.** H2s read as a TOC, not as the spine of an argument. Diagnostic: when the H2s can be skimmed in isolation and re-ordered without changing meaning, the question-chain is broken. Fix: each H2 is a verb-phrase that names an action taken to resolve a question. +- **Callbacks that never connect.** A fact named in the lede is not referenced again. Diagnostic: extract the first 200 words and the closing 200 words — if they cannot be paired into a same-thread statement, the post has no spine. +- **Closers that summarise instead of forward-pointing.** *"In conclusion, we have shown that…"* Almost every read-at-depth closer in the corpus picks one of the five forward-pointing shapes (call-to-build / call-to-adopt / open-question / shipping-status-roadmap / prevention-list). +- **Bolted-on AI-future close** on a non-AI post. Generic *"AI handles the long tail"* paragraph that reads as performative. Fix: if used, the forward-section must name at least two specific applications. +- **Rhetorical-question budget exceeded.** More than three rhetorical-question H2s feels performative. Cap at three per long post. + +## Per-archetype-specific anti-patterns + +- **Launch — bury the headline number.** The marketing instinct to "set up" the number before revealing it is the genre's most common credibility leak. +- **Postmortem — naming a specific engineer** in the causality section. +- **Migration — aspirational language.** *"We are beginning to roll out…"* without a current state. Or **percentage-complete without a base** — *"we're 75% migrated"* with no denominator. +- **Performance — single-fix-no-failures register.** Real perf work produces failed attempts; their absence reads as fabrication. Or **headline-number-only lede** (launch register on perf archetype). +- **Tutorial — no prerequisites stated.** Wastes reader time and is the genre's most common craft failure. +- **Research — no paper link, no method diagram, no evaluation table.** Reads as a corporate announcement disguised as research. +- **AI/agent — capability claim without benchmark or ablation.** Proof-of-concept, not production. Or **"we used AI agents to" as a credential** — AI-as-buzzword. +- **Security — over-disclosure of working PoC** before upstream patch is widely deployed. Or **missing coordinated-disclosure timeline** for CVE response posts. +- **Strategic essay disguised as engineering deep-dive.** Sibling genre — label honestly rather than disguise. Engineering readers discount disguised essays. diff --git a/.agents/skills/writing-tech-post/references/archetypes-and-structure.md b/.agents/skills/writing-tech-post/references/archetypes-and-structure.md new file mode 100644 index 000000000..bca211cea --- /dev/null +++ b/.agents/skills/writing-tech-post/references/archetypes-and-structure.md @@ -0,0 +1,215 @@ +# Archetypes and Structure + +The eight canonical archetypes of engineering blog posts, each with its 9-column contract, plus the decision tree for picking the right archetype and the rules for legitimate hybridisation. + +## Contents + +- [Why archetype selection is load-bearing](#why-archetype-selection-is-load-bearing) +- [A — Launch deep-dive](#a--launch-deep-dive) +- [B — Incident postmortem](#b--incident-postmortem) +- [C — Architecture migration](#c--architecture-migration) +- [D — Performance case study](#d--performance-case-study) +- [E — Developer tutorial](#e--developer-tutorial) +- [F — Research-to-product translation](#f--research-to-product-translation) +- [G — AI / agent post (2026 wave specialty surface)](#g--ai--agent-post-2026-wave-specialty-surface) +- [H — Security / reliability post (specialty surface)](#h--security--reliability-post-specialty-surface) +- [Archetype-straddle rules (legitimate hybrids)](#archetype-straddle-rules-legitimate-hybrids) +- [Decision tree — pick the archetype](#decision-tree--pick-the-archetype) +- [Per-archetype silent failure modes (slop catalogue)](#per-archetype-silent-failure-modes-slop-catalogue) + +## Why archetype selection is load-bearing + +The corpus converges on eight archetypes: six canonical (launch, postmortem, migration, performance, tutorial, research-to-product translation) plus two specialty surfaces that have hardened into sub-genres (AI/agent, security/reliability). An archetype is the recognisable *shape* of a post: opening move, H2 progression, obligatory evidence form, closing move, length norm, multi-author signature, and the hybridisation rules that decide which shape dominates when a post straddles two genres. + +Archetype selection gates every other dimension: voice rules, abstraction-ladder mechanics, evidence forms, and lede architecture all attach to a specific archetype. **Get the archetype wrong and the other dimensions misfire** — a tutorial in postmortem voice loses readers in paragraph one; a migration with launch closing moves reads as wishful marketing. + +## A — Launch deep-dive + +| Column | Contract | +|--------|----------| +| Opening move | Scale-then-headline-number. Quantified scaling problem in the first sentences, headline result above the first H2. | +| Section sequence | Problem → Architecture overview → Component walkthroughs → Results → Roadmap. | +| Closing move | Forward roadmap with named next steps. "Looking ahead: smarter routing and integrated indexing." Declarative, scoped, never aspirational. | +| Length norm | 5,000–8,000 words. | +| Byline norm | Multi-author. | +| Obligatory evidence | Headline before/after numbers in the lede, with both baseline and slice. | +| Voice register | Confident technical "we". First-person plural, declarative, present tense for live state. | +| Dominant hybrid partner | Migration (lineage narrative inside a launch). | +| Silent failure mode | Burying the headline number. The marketing instinct to "set up" the number before revealing it is the genre's most common credibility leak. | + +**Exemplar:** Datadog Rust storage launch (`raw/articles/016-datadoghq-com-evolving-our-real-time-timeseries-storage-again-built-in-rust-for-performance.md`) opens with the volume/cardinality scaling problem and lands "60x ingestion / 5x query" above the first H2. Six-generation lineage hybrid; 5,977 words; four-author byline. + +## B — Incident postmortem + +| Column | Contract | +|--------|----------| +| Opening move | Date + scope + impact in the first two sentences. The 3 W's + impact (when, what, where, how-bad) inside the first 200 words. Delay reads as evasive. | +| Section sequence | Summary → Background → The incident → Timeline → Contributing factors → Mitigation (including failed attempts) → Action items → Optional lessons → Optional acknowledgments. | +| Closing move | Prevention commitments per category, OR a forward-looking design statement for reliability essays. | +| Length norm | 3,000–12,000 words (standalone 3,000–4,000; paired diptychs 11,000+). | +| Byline norm | Single-author and senior (principal SRE, platform lead). Reliability essays drift to 2–3 authors. | +| Obligatory evidence | UTC timestamps with defined granularity + a specific named artifact (commit, package version, kernel function) for root cause. Vagueness here is fatal. | +| Voice register | Blameless retrospective. System-subject for causality. Organisational "we" for ownership. Passive isolates fault; active claims learning. Engineers' names never appear in the causality section. | +| Dominant hybrid partner | Reliability essay (postmortem zoomed out 6–18 months later). | +| Silent failure modes | Naming a specific engineer; defensive register; "we have you covered" closures; over-redacted ("a critical microservice"); 24-hour postmortems with placeholder action items; missing timeline; "five whys without the answer". | + +**Exemplars:** + +- Datadog 2023-03-08 (`raw/articles/034-…platform-level-impact.md`) — single senior author; dormant-fault causality back to 2020 systemd PRs `#17477` and `#19287`. +- Canva API gateway outage (`raw/articles/050-canva-dev-canva-incident-report-api-gateway-outage…md`) — explicit Cloudflare quote; "We attempted to work around this issue… Unfortunately, it didn't mitigate" failure-disclosure language; per-category action items. + +## C — Architecture migration + +| Column | Contract | +|--------|----------| +| Opening move | Empathetic legacy framing + why-now justification. The legacy charity is deliberate — migration posts are read by the engineers who built the prior system. | +| Section sequence | Empathetic legacy framing → Why-now → Phased plan (numbered) → Tracking metrics per phase → Cutover discipline → Where we are now (dated) → Forward work. | +| Closing move | Dated status snapshot + forward work. Explicit and falsifiable: "Phase 1 finished Q1 2025; Phase 2 will continue through 2026." | +| Length norm | 5,000–8,000 words. | +| Byline norm | Multi-author (2–9). Named acknowledgements paragraph common for cross-team work. | +| Obligatory evidence | Phase-completion dates + per-phase tracking metrics + named cutover safety mechanisms (PG Proxy, shadow tables, dual-stack A/B testing) + quantified scope. | +| Voice register | Empathetic strategic "we"; probabilistic for risks; first-person plural across teams. Rejects heroic-engineer and post-hoc-clean registers; retains the messiness. | +| Sub-variants | System rewrite, decomposition, stack replacement, pipeline replacement, component modernization, dual-stack, maturity-level (industry-wide), security-driven. | +| Silent failure modes | Aspirational ("we are beginning to roll out…"); triumphal ("700 jobs migrated with zero issues"); percentage-complete claims without a base ("we're 75% migrated"); toolset-as-story; framework-without-instance; premature-completion. | + +**Exemplars:** + +- Datadog unwinding shared database (`raw/articles/036-…unwinding-shared-database.md`) — decomposition reference. "Shared database limps along… and then some" lede. +- Meta WebRTC fork escape (`raw/articles/199-…escaping-the-fork.md`) — dual-stack variant; explicit difficulty disclosure ("thousands of duplicate symbol errors"). +- Meta PQC migration (`raw/articles/113-…pqc-migration.md`) — maturity-level variant; PQ-Unaware → PQ-Enabled ladder; formal disclaimer paragraph. +- Slack SSH→REST (`raw/articles/211-…ssh-to-rest.md`) — security-driven variant; "We had a massive security surface. Not ideal." + +## D — Performance case study + +| Column | Contract | +|--------|----------| +| Opening move | Felt experience + stakes, not a number. Promise the arc in the second paragraph by naming the multi-fix structure to come. | +| Section sequence | Baseline metric definition → Bottleneck identification → Architectural moves → Distribution shift evidence → Tradeoffs / next bottleneck. Middle sections repeat: hypothesis → instrument → distribution-shift graph → transition. | +| Closing move | Two species: (a) honest-recap close (bulleted chain of fixes + monitoring change); (b) lessons-from-the-trenches close (one transferable lesson + the tool that surfaced it). Both refuse the wrapped-bow conclusion. | +| Length norm | 1,500–5,500 words (cluster 3,000–4,500; depth-chases extend longer). | +| Byline norm | Two authors. The genre rewards co-authorship because the investigations were collaborative. | +| Obligatory evidence | Distribution-shift charts, **not means**. Per-fix percentile graphs on the same axes so the reader can read the shift visually. Named tooling (`pg_walinspect`, `lldb`, ENA metric IDs) is part of the evidence contract. | +| Voice register | Confident-quantitative narrator-as-detective. "We thought X, then we measured Y." Past for investigation, present for current state, future for remaining work. Honest disclosure of partial victories between fixes. | +| Sub-variants | Narrow-detective (one query, one root cause); multi-bottleneck peeling (one symptom, chain of independent causes); root-cause-chase-to-depth (descending layers most readers won't visit). | +| Dominant hybrid partner | Migration (perf framing for an underlying architecture migration). | +| Silent failure modes | "We sped it up by 60%" headline without baseline disclosure; mean-only evidence chart; single-fix-no-failures register (real perf work produces failed attempts); P99 graph without x-axis baseline; missing distribution shift. | + +**Exemplars:** + +- GitHub Issues navigation (`raw/articles/017-…from-latency-to-instant.md`) — felt-experience opening; HPC threshold-bucketing into Instant/Fast/Slow. +- Datadog network-latency (`raw/articles/041-…network-latency.md`) — multi-bottleneck peeling reference; five fixes, each section closing "a slight improvement, but clearly still higher than normal." +- GitHub diff-lines (`raw/articles/054-…diff-lines.md`) — six-row before/after metrics table as performance climax; "this didn't end here" close. + +## E — Developer tutorial + +| Column | Contract | +|--------|----------| +| Opening move | Use case + capability promise + "we show you how". Declare both the *what* (the procedure) and the *expected result* before any code. | +| Section sequence | Prerequisites → Conceptual primer → Stepwise procedure → Working example → Caveats → Next steps. | +| Closing move | Next-steps pointer with a runnable artifact. GitHub repo or "what to try next" path. | +| Length norm | 1,500–3,000 words. | +| Byline norm | Single author (developer advocate or solutions architect). | +| Obligatory evidence | Runnable code (end-to-end, copy-paste safe) + explicit prerequisites. Tutorials that hide code behind prose have misidentified the archetype. | +| Voice register | Imperative procedural. Second person or "we show you." "Create a new Lambda project using Cargo Lambda" — direct address, present tense. | +| Dominant hybrid partner | Launch (tutorials closing with launch-style headline numbers). | +| Silent failure mode | No prerequisites stated. Wastes reader time and is the genre's most common craft failure. | + +**Exemplar:** AWS Lambda multi-threaded Rust (`raw/articles/170-…multi-threaded-rust-on-lambda.md`) — explicit prerequisites; vCPU-by-memory table; closes with launch-style "4-6x performance improvements" (tutorial + launch hybrid). + +## F — Research-to-product translation + +| Column | Contract | +|--------|----------| +| Opening move | Discipline-framed problem + paper link in the first scroll. arXiv link inside the Quick Links block above the first body paragraph. | +| Section sequence | Motivation → Method overview → Evaluation → Analysis (ablation) → Open-source / availability → Acknowledgments + licensing footnote. | +| Closing move | Acknowledgments block + licensing/scope footnote. Inheritance from academic papers. | +| Length norm | 1,500–2,500 words. Deliberately the shortest archetype — it exists to point at a longer paper. | +| Byline norm | Multi-author, plus a separate acknowledgments section listing contributors not named on the byline — inherited from papers. | +| Obligatory evidence | Paper link (preferred). When no paper exists, a method diagram + evaluation table. | +| Voice register | Academic gloss + engineering imperative. Third-person constructions, inline citations, footnotes carry licensing or scope caveats. Capability declarative; limitations hedged conditional. | +| Dominant hybrid partner | Tutorial (research post ships an open-source artifact; close pulls in tutorial structure). | +| Silent failure mode | No paper link, no method diagram, no evaluation table. Reads as a corporate announcement disguised as research. | + +**Exemplar:** Google Research MLE-STAR (`raw/articles/074-…mle-star.md`) — arXiv link in Quick Links; MLE-Bench-Lite benchmark; ablation: model usage, human intervention, leakage/usage checkers; licensing footnote. + +## G — AI / agent post (2026 wave specialty surface) + +| Column | Contract | +|--------|----------| +| Opening move | One-paragraph capability claim + paper/repo link in the first scroll. Hero image, capability claim, then paper or open-source repo link before the second screen. Signals participation in a research conversation. | +| Section sequence | Capability claim → Paper/repo link → Product context → System overview diagram (named agents/tools/MCP) → Evaluation table (named benchmarks) → Ablation / "In-depth analysis" sub-section → Guardrails enumeration (named checkers) → Lessons → "AI handles the long tail" close. | +| Closing move | Open-source repo link OR "AI handles the long tail" motif. The motif's absence is itself diagnostic of a different sub-genre. | +| Length norm | 2,000–5,000 words (provisional — cohort still consolidating). | +| Byline norm | Multi-author; research-translation variants co-authored with the paper's researchers. | +| Obligatory evidence | Named benchmark + ablation + named guardrails + cost/token-burn disclosure (newer). Ablation is the load-bearing credibility move — *not* an aside. Named checkers as concrete artifacts (MLE-STAR's debugging-agent + data-leakage-checker + usage-checker). | +| Voice register | Hybrid academic gloss + engineering imperative. Declarative present for capability, hedged conditional for limitations. Capability and limitation in same voice = over-claiming. | +| Sub-variants | Research-translation (MLE-STAR — strongest constraints, academic-closest voice); capability-launch (operational metric instead of benchmark, lessons instead of ablation); tooling-platform (meta-infrastructure as subject; closest to migration shape); strategic-essay (contrast case — sibling genre, *not* a cohort post). | +| Silent failure modes | Capability claims without benchmarks; demo-only evidence; missing ablation; "we used AI agents to" as a credential; bolt-on AI-future closing on unrelated posts; conflating LLM-calls with "agentic"; under-disclosed failure modes. | + +**Exemplars:** + +- Google MLE-STAR (`raw/articles/074-…`) — research-translation reference. +- Datadog Bits AI eval platform (`raw/articles/019-…eval-platform.md`) — tooling-platform variant. +- Slack security agents (`raw/articles/160-…security-agents.md`) — three-persona Director-Expert-Critic loop with Critic as "weakly adversarial" check. + +## H — Security / reliability post (specialty surface) + +| Column | Contract | +|--------|----------| +| Opening move | Threat model first, not the fix. Adversary capability + asset + impact horizon. | +| Section sequence | Threat-model opening → Background/system context → Layered-defense walkthrough → Disclosure-timing section (for CVE responses) → Mitigations close. | +| Closing move | Follow-up work + what we'd do differently + explicit disclaimer. Notably more modest than launch-post closes. | +| Length norm | 1,000–7,000 words. | +| Byline norm | Multi-author for response posts with named acknowledgments paragraph; single or paired for security launches. | +| Obligatory evidence | CVE numbers, upstream commit links, named adversary capabilities, behavioural-detection validation timestamps, named external researchers, explicit scope caveats. | +| Voice register | Careful-declarative. Probabilistic for uncertainty, exact for load-bearing terms. Mature-without-alarmist. "An attacker could save encrypted sessions now and, if a suitable quantum computer is built in the future, decrypt them later" — conditional, calibrated, never "the quantum apocalypse." | +| Sub-variants | Preparedness / non-incident response (Cloudflare Copy Fail — no impact, "outcome" reports the *absence* of impact); industry migration / maturity-model framework (Meta PQC); educational explainer / failure analysis (Docker horror stories). | +| Dominant hybrid partner | Migration (when the security work is a long-running migration). | +| Silent failure modes | FUD-style threat framing; over-disclosure of working PoC; under-disclosure paraphrased into uselessness; "we have you covered" closures; missing coordinated-disclosure timeline. | + +**Exemplars:** + +- Cloudflare Copy Fail (`raw/articles/232-…copy-fail.md`) — preparedness / non-incident-response reference; CVE-2026-31431; upstream commit `a664bf3d603d`; bpf-lsm surgical mitigation. +- GitHub PQ SSH (`raw/articles/053-…post-quantum-security-for-ssh.md`) — compact security launch (1,145 words); explicit FIPS scope caveat; reader-action one-liner (`ssh -Q kex`). +- Meta PQC migration (`raw/articles/113-…pqc-migration.md`) — industry-migration / maturity-model variant. + +## Archetype-straddle rules (legitimate hybrids) + +- **Rule 1 — One archetype must be load-bearing.** Pure single-archetype posts are rare; mature posts hybridise. The opening 200 words decide which archetype the reader picks up. Whichever shape the lede establishes is the *dominant* archetype; the secondary archetype is recognised by structural absorption (lineage section inside a launch; performance result inside a migration; tutorial code inside a research post). +- **Rule 2 — Canonical hybrids:** + - Launch + migration (Datadog Rust storage; signal: chronological "Gen 1… Gen N" framing inside an otherwise launch-shaped post). + - Migration + performance (GitHub Issues navigation; signal: performance is the framing, migration is the substance). + - Research translation + tutorial (MLE-STAR closing with ADK link inviting hands-on use). + - Postmortem + reliability essay (Datadog 034/038 pair — same event, different time horizon). + - Migration + security (Meta PQC, Slack SSH→REST — why-now leans on security). + - AI-agent + postmortem (Datadog hackerbot-claw — cohort fingerprint meets incident response). +- **Rule 3 — Archetype-bait headlines are the silent failure.** A headline that promises archetype A and a body that delivers archetype B trains readers to mistrust the publisher's headlines. Once that trust erodes, even well-written posts get skimmed. +- **Rule 4 — Strategic essays are a sibling genre, not a hybrid.** Stripe-style industry essays share vocabulary with engineering archetypes but participate in a different genre. Engineering readers discount disguised essays unless the publisher has earned the right to opine. + +## Decision tree — pick the archetype + +Answer in order. The first "yes" wins. + +1. **Did a user-visible failure occur?** → Postmortem (incident or non-incident-response if preparedness avoided customer impact). +2. **Is the artifact an architectural transition between two states with cutover discipline?** → Migration. Sub-decide: rewrite / decomposition / stack replacement / pipeline replacement / component modernization / maturity-level framework. +3. **Is the artifact a new public capability with a quantified envelope?** → Launch deep-dive. Sub-decide: pure launch or launch + migration (lineage-shaped). +4. **Is the artifact a system that was fine, became slow, then fast, and the work was the hunt?** → Performance deep-dive. Sub-decide: narrow-detective / multi-bottleneck peeling / root-cause-chase-to-depth. +5. **Is the artifact "how to do X, with runnable code"?** → Developer tutorial. +6. **Is the artifact a research result becoming a product-shipped capability?** → Research-to-product translation. Sub-decide: pure or with tutorial close (open-source release). +7. **Is the load-bearing claim an autonomous/LLM system + its capability/correctness contract?** → AI/agent post. Sub-decide: research-translation / capability-launch / tooling-platform. +8. **Is the load-bearing claim about *risk* (vulnerability, adversary capability, defense)?** → Security/reliability post. Sub-decide: preparedness / industry-migration / educational-explainer. +9. **None of the above; the post is market/industry framing?** → Strategic essay (sibling genre — label honestly, do not disguise as deeper-than-it-is technical post). + +## Per-archetype silent failure modes (slop catalogue) + +Each archetype has a single most-likely failure mode. Catch it during draft review. + +- **Launch** — bury the headline number below the first H2. +- **Postmortem** — name a specific engineer in the causality section. +- **Migration** — claim percentage-complete without a base, or omit the dated status snapshot. +- **Performance** — mean-only chart, or "we sped it up by 60%" without baseline + slice + distribution. +- **Tutorial** — no prerequisites stated. +- **Research** — no paper link, no method diagram, no evaluation table. +- **AI/agent** — capability claim without benchmark or ablation. +- **Security** — FUD framing, missing coordinated-disclosure timeline, or "we have you covered" close without naming residual gaps. +- **All archetypes** — archetype-bait headline (title promises archetype A, body delivers archetype B). diff --git a/.agents/skills/writing-tech-post/references/depth-and-abstraction.md b/.agents/skills/writing-tech-post/references/depth-and-abstraction.md new file mode 100644 index 000000000..49ef49409 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/depth-and-abstraction.md @@ -0,0 +1,135 @@ +# Depth and Abstraction + +The five-rung abstraction ladder, the five traversal patterns the corpus surfaces, the rung-whiplash diagnostic, the anchor-N rule, and the per-archetype default depth profile. + +## Contents + +- [The five rungs](#the-five-rungs) +- [The four-tuple commitment](#the-four-tuple-commitment) +- [Traversal patterns](#traversal-patterns) +- [Per-archetype default depth profile](#per-archetype-default-depth-profile) +- [Rung-whiplash diagnostic](#rung-whiplash-diagnostic) +- [The anchor obligation (empirical N)](#the-anchor-obligation-empirical-n) +- [Decision tree — pick the depth](#decision-tree--pick-the-depth) +- [Vendor-name placement rule](#vendor-name-placement-rule) + +## The five rungs + +Every paragraph of an engineering post lives at exactly one of these five rungs. Mixing rungs at the sentence level is rare and deliberate; mixing them inside a paragraph without a transition is a craft failure. + +- **R1 — User experience.** What the reader feels when the system is good or bad. Voice is empathetic, often second-person. The only rung where addressing the reader directly is unambiguously appropriate. + - Exemplar: *"When you're working through a backlog—opening an issue, jumping to a linked thread, then back to the list—latency isn't just a metric. It's a context switch."* — GitHub Issues navigation (`raw/articles/017-…from-latency-to-instant…md:53`). +- **R2 — Business / product framing.** Why the organisation invested engineering time. Headcount, fleet size, opportunity cost, lineage. Voice is institutional first-person plural. + - Exemplar: *"When the code you ship serves more than 3 billion people, even a 0.1% performance regression can translate to significant additional power consumption."* — Meta capacity efficiency (`raw/articles/031-…capacity-efficiency…md:97`). +- **R3 — System behaviour.** Externally observable properties with units, percentiles, baselines. Voice is technical-quantitative; adverbs of certainty attach only to numbers. + - Exemplar: *"Instant: HPC < 200 ms / Fast: HPC < 1000 ms / Slow: HPC >= 1000 ms"* — GitHub Issues (`raw/articles/017-…:65-67`). +- **R4 — Architecture.** Components, boundaries, data flow. Subjects are systems; verbs describe interaction patterns. + - Exemplar: *"client-side caching layer backed by IndexedDB, added a preheating strategy … and introduced a service worker so cached data remains usable even on hard navigations"* — GitHub Issues (`raw/articles/017-…:54`). +- **R5 — Implementation detail.** Code, library names, exact function calls, named tooling. Subjects are functions, types, files. + - Exemplar: *"`sntrup761x25519-sha512` … combining a new post-quantum-secure algorithm, Streamlined NTRU Prime, with the classical Elliptic Curve Diffie-Hellman algorithm using the X25519 curve"* — GitHub PQ SSH (`raw/articles/053-…`). + +A post does not have to visit every rung, but the *opening rung* and the *closing rung* together encode its archetype: open R1 / close R5 reads as a deep-dive; open R1 / close R1 reads as strategic; live entirely at R5 is documentation; live entirely at R1–R2 is marketing. + +## The four-tuple commitment + +Before writing prose, commit a tuple: + +``` +(opening rung, body residency band, closing rung, traversal pattern) +``` + +Examples: + +- Launch deep-dive: `(R2, R3–R5, R2 roadmap, Staircase with R2 anchor)` +- Performance deep-dive: `(R1, R3 + R4 + R5 with R3 re-measurement, R1 return, Staircase braided with Spiral)` +- Migration: `(R2 legacy charity, R4 dominant, R2 dated status, Anchor-and-dive)` +- Postmortem: `(R2 date+scope, R3 + R4 + R5, R2 prevention, Staircase with R2 anchor at both ends)` +- Tutorial: `(R2 use case, R5 dominant, R5 next-steps OR R2 results, Anchor-and-dive into R5)` +- Research translation: `(R2 discipline framing — skips R1, R3 eval + R4 method, R2 acknowledgments, Anchor-and-dive with no R1 visit)` +- AI/agent: `(R2 workload scale, R4 + R5, R1 in book-ended form, Yo-yo)` +- Security: `(R2 threat framing, R5 dominant + thin R4, R2 forward posture, Anchor-and-dive into R5)` + +Writing to the tuple is the procedural move. The tuple replaces "discovering it mid-draft." + +## Traversal patterns + +The corpus surfaces five traversal patterns. Most elite posts braid two or more. + +- **A — Staircase (descending ladder).** Monotonic descent R1 → R2 → R3 → R4 → R5, with optional one-paragraph return to R1 in the close. The default pattern because it mirrors the reader's natural curiosity arc: convince me this matters, then teach me how it works. + - Canonical: GitHub Issues navigation — R1 lede → R2 amplifier → R3 HPC buckets → R4 IndexedDB+service-worker → R5 service-worker request-header code → R1 return. +- **B — Yo-yo (book-ended ladder).** R1/R2 → sustained middle at R3–R5 → ascent back to R1 in close. The close is a *deliberate* move that tells the reader the architecture they just learned is, finally, in service of human time. + - Canonical: Meta capacity efficiency — opens R2 ("3 billion people"), descends through R4 (offense/defense AI-agent platform with FBDetect), closes R1 ("Engineers who spent mornings on defensive triage now review AI-generated analyses in minutes"). +- **C — Spiral (iterative re-measurement beat).** Revisit the same R3/R4 anchor multiple times across sections; each loop tightens on a new fix. The genre's defining shape for performance deep-dives. + - Canonical: Datadog network-latency — five fixes, each closing with the same cadence ("This was a slight improvement, but clearly still higher than normal"). + - Licensed only when the work is *investigative* (no foreknowledge of destination). Reads fake if applied to teleological work. +- **D — Anchor-and-dive (depth-first / pillar-and-spike).** Brief R2 opening (one to three paragraphs), then sustained R3–R4 residency with R5 in named tooling. Licensed by audience model (publisher's reader is presumed already convinced) and archetype (migration tolerates depth). + - Canonical: Datadog "Breaking up a monolith" — R2 charity ("limps along for as long as it possibly can — and then some"), then sustained R4 through the three-phase plan. +- **E — Sidebar interlude.** R3–R5 chain pauses to drop in a *labelled* R2 or pure-explanation block, then resumes. Diátaxis would forbid this; engineering-blog craft permits it when explicitly labelled. + - Canonical: "A crash course on JIT in Postgres" sidebar inside the Postgres segfault deep-dive. + +**Braiding is normal.** Meta WebRTC migration runs anchor-and-dive at the document level but spiral inside its dual-stack difficulty disclosures. Datadog Rust storage runs staircase at the document level but injects a Gen 1 → Gen 5 R2 lineage interlude before reaching the Gen 6 R4 architecture. + +## Per-archetype default depth profile + +| Archetype | Opening rung | Body residency | Closing rung | Default traversal | +|-----------|-------------|----------------|-------------|------------------| +| Performance case study | R1 (felt experience) | R3 → R4 → R5 with R3 re-measurement after each R4 move | R1 return | Staircase, often braided with Spiral | +| Launch deep-dive | R2 (scale or lineage) | R3 → R4 → R5 with R3 headline number in lede | R2 roadmap | Staircase with R2 anchor | +| Architecture migration | R2 (legacy charity) | R4 dominant; R3 only as phased tracking metrics; R5 only in named tooling | R2 dated status snapshot | Anchor-and-dive | +| Incident postmortem | R2 (date + scope) | R3 (timestamps, duration) → R4 (causality walk) → R5 (named commit, package version) | R2 prevention | Staircase with R2 at both ends | +| Developer tutorial | R2 (use case + promise) | R5 dominant; R3 surfaces to prove the lesson | R5 next-steps OR R2 results | Anchor-and-dive into R5 | +| Research-to-product translation | R2 (discipline framing — skips R1) | R3 (evaluation tables) → R4 (method overview) | R2 acknowledgments + footnote | Anchor-and-dive without R1 visit | +| AI / agent post | R2 (workload scale) | R4 (agent platform components) → R5 (specific skills) | R1 in book-ended form | Yo-yo | +| Security post | R2 (threat framing) | R5 dominant (algorithm names, exact commands) + thin R4 mechanism walk | R2 forward security posture | Anchor-and-dive into R5 | + +**Sibling genres (refuse these in this skill):** + +- Strategic essay (R1/R2 only; sustained R1/R2; R2 product positioning; flat — no descent). +- Customer story (R1 testimonial → R5 vendor product names with no R3 bridge → R1 testimonial; **whiplash** when sold as engineering). + +A migration post that opens R1 is suspect (the legacy-charity opener is the migration's load-bearing voice move). A tutorial that opens R1 with a user testimonial has misidentified its archetype. + +## Rung-whiplash diagnostic + +The reader bounces between R1 and R5 inside a single section. A paragraph about user pain followed by a paragraph of code with no R2/R3/R4 bridge. + +- **Slop example (whiplash unresolved):** `raw/articles/033-vercel-com-how-scale-ai-unifies-design-and-performance-with-next-js-and-vercel-vercel.md:55-77` — opens R1 ("only three designers"), descends to vendor-product names at R5 ("Preview Deployments, performance analytics, Vercel CLI") in three sentences. Zero R3 evidence; zero R4 walkthrough. 670 words total. The post collapses under the **"remove the vendor's name" test** (see below). +- **Fixed example (whiplash repaired in one paragraph):** `raw/articles/043-research-google-coral-npu-a-full-stack-platform-for-edge-ai.md` opens with a generalised R1 industry claim, then by paragraph two has descended to three specific R2 constraints ("the performance gap … the fragmentation tax … the user trust deficit"), and by paragraph four is at R4 ("RISC-V ISA compliant architectural IP blocks"). Recovery rescues the opener because the descent is *gradual* — no jump exceeds one rung. +- **Subtler whiplash — the missing-rung descent.** The post moves through rungs in order but skips one. Most often, **R3 (system behaviour) is the missing rung**. Diagnosis: the reader is left holding a design they cannot evaluate. Repair: insert an R3 paragraph that states the measurable property in units, with a baseline. +- **Premature R1 ascent in close.** "We rewrote the storage engine in Rust, which means our customers can sleep at night" — skips R2-R4. The strong move is graduated ascent: end R5, name R3 result, name R2 implication, only then close at R1. +- **Vendor-name padding (publisher-specific whiplash).** Vendor product names appear at the R1/R2 *framing* rung instead of the R5 *implementation* rung. Repair: name the abstract capability first, then disclose the specific product. + +## The anchor obligation (empirical N) + +Cross-referencing the corpus for "how many paragraphs of R4/R5 a post can sustain before re-surfacing": + +- **Inside a single paragraph:** at most one rung-shift, and the shift must be motivated by a transition sentence or section break. Abrupt rung-shifts inside a paragraph are usually a craft failure — the reader has to recalibrate mid-sentence. +- **Inside a section:** ~3–5 paragraphs of R4/R5 before re-anchoring at R3 (a number) or R2 (a user/business referent). +- **Across sections:** every section heading is itself an anchor checkpoint — the heading restates an R2 motivation, names an R3 result, or labels an R4 component. Pure-R5 sections without an anchor heading are rare and read as missing structure. +- **Performance deep-dives** are the strictest — they re-measure (R3) *after every intervention*, making the spiral's anchor N effectively ≤ 2 paragraphs per loop. +- **Architecture migrations** are the loosest — phased-plan structure lets them sustain R4 across 8–15 paragraphs between R2 anchors, because each phase heading is implicitly an R2 anchor. +- **Research translations** are paradoxical — they often sustain R3 (eval tables) without ever surfacing to R1, because the audience expects academic-gloss residency. + +**Operational rule of thumb:** if a draft has more than four consecutive R4/R5 paragraphs without surfacing to R3 (a metric) or R2 (a user/business referent), the next paragraph must either re-anchor or open a new section with an anchored heading. + +## Decision tree — pick the depth + +Run in order; the tree's output is the four-tuple. + +1. **Q1 — Audience.** Mixed-readership (engineers + decision-makers, public landing-page traffic) → mandatory R1 anchor in opener *and* closer. Peer-engineer only → R2 anchor sufficient; R1 optional. +2. **Q2 — Publisher credibility on this topic.** Publisher established the *why* in prior posts → Anchor-and-dive licensed. Publisher never published on this topic → Staircase required to earn the descent. +3. **Q3 — Work nature.** Investigative (no foreknowledge of destination) → Spiral. Teleological (planned outcome) → Staircase or Anchor-and-dive. Mixed → braided. +4. **Q4 — Artifact obligation.** Owes user-perceived impact (perf, AI/agent) → R1 anchor mandatory in opener and closer; Yo-yo is the safest. Owes organisational-impact (capacity, scale, migration) → R2 anchor sufficient. +5. **Q5 — Closing handoff.** Operational change (perf, postmortem, migration) → close at R2 status snapshot or R3 distribution result. Advertised capability (launch, research) → close at R2 roadmap or footnote. User-time recovery (AI/agent) → close at R1. +6. **Q6 — Length budget.** ≤2,000 words → pick one rung band (R2–R3 or R4–R5) and stay; trying to traverse all five guarantees whiplash. 3,000–5,500 words → full staircase or spiral feasible. ≥5,000 words → braided patterns mandatory or the reader fatigues. + +## Vendor-name placement rule + +**Vendor product names appear at R5 (implementation), never at R1/R2 (framing).** The rule separates engineering posts from customer stories. + +**The "remove the vendor's name" test.** If removing the vendor's name from every sentence still leaves a coherent post, the post has engineering depth. If the post collapses without the vendor's name, it is marketing in engineering-blog clothing. + +- Engineering: "We selected a CDN with edge compute support; we use Vercel." (vendor name at R5; capability described first.) +- Marketing: "Vercel's Preview Deployments enabled our designers to ship faster." (vendor name at R1/R2; capability collapses without it.) + +Apply the test as the **final pre-publish gate** for any post written from a vendor or hosted platform. diff --git a/.agents/skills/writing-tech-post/references/evidence-diagrams-code.md b/.agents/skills/writing-tech-post/references/evidence-diagrams-code.md new file mode 100644 index 000000000..955de6e12 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/evidence-diagrams-code.md @@ -0,0 +1,128 @@ +# Evidence, Diagrams, and Code + +The twelve-form evidence taxonomy, captioning conventions, the `claim → artifact → reading` cadence, code-curation rules, and the distribution-shift / named-benchmark contracts. + +## Contents + +- [The atomic unit: claim → artifact → reading](#the-atomic-unit-claim--artifact--reading) +- [The twelve evidence forms](#the-twelve-evidence-forms) +- [Emergent 2026 evidence forms](#emergent-2026-evidence-forms) +- [Captioning conventions (six rules)](#captioning-conventions-six-rules) +- [Prose↔evidence cadence (three shapes)](#proseevidence-cadence-three-shapes) +- [Code-curation rules](#code-curation-rules) +- [Distribution-shift contract (mandatory for performance)](#distribution-shift-contract-mandatory-for-performance) +- [Named-benchmark contract (mandatory for AI/agent)](#named-benchmark-contract-mandatory-for-aiagent) +- [Per-archetype mandatory evidence forms](#per-archetype-mandatory-evidence-forms) +- [The "no-decoration" rule](#the-no-decoration-rule) + +## The atomic unit: claim → artifact → reading + +Every evidence asset must obey this triple: + +1. **Claim** — a prose sentence that sets up what the reader should look for. +2. **Artifact** — the chart, diagram, code listing, table, or screenshot. +3. **Reading** — a prose sentence that interprets what the reader has just seen. + +Textbook execution from Datadog network-latency `041:62-63`: + +> *"We noticed that when `counter` restarted, the Envoy sidecar would max out its CPU allocation and get throttled, as shown in the graph below."* **(claim)** +> +> *[CPU usage chart]* **(artifact)** +> +> *"which explained the spikes in TCP retransmits and remote cache latency."* **(reading)** + +Any artifact missing either the preceding claim or the following reading is a draft warning. Any claim without an artifact on the same screen is a credibility leak. + +## The twelve evidence forms + +Each form names a specific claim. Pick the form by the claim, not the artifact you have. + +| Form | Claim | Caption convention | Common slop variant | +|------|-------|-------------------|--------------------| +| **Architecture diagram** | "Here are the components and how data crosses between them." | Noun phrase identifying the system tier. The diagram contains nothing the post will not name in prose. | Boxes never reappear in prose; arrows unlabelled; unique iconography forces icon-to-meaning mapping mid-narrative. | +| **Sequence diagram** | "Here is the order of operations across components, including failure handoffs." | Verb-led ("Sequence diagram for the initial Courier design"). Use only when timing matters. | Steady-state topology depicted as a sequence; time axis unlabelled. | +| **Flowchart / decision tree** | "Here is the branching logic of a process." | Names the decision being branched. | Branches non-exhaustive; "Yes/No" labels missing; used for architecture (a static structure is not a flow). | +| **Data-flow / pipeline diagram** | "Here is what the bytes look like as they move." | Names the data stage. Diagram and prose use identical terminology — if the prose says "fragment," the diagram cannot say "shard." | Vocabulary drift between diagram and prose. | +| **Before/after migration diagram** | "Here is what changed." | Names both the dimension and the delta. Migration archetypes require **phase intermediaries**, not just two states. | Cropped y-axes; no annotation of what the delta represents. | +| **Code snippets** | "This is the actual code, not a paraphrase." | Language-tagged code block; provenance link when borrowed. | Length thresholds: 1–15 lines at a glance; 16–30 lines require slowdown; >30 lines without intermediate prose almost always fail. | +| **Shell sessions / SQL traces** | "We ran this and observed this output." | Name the host, the time, and the command if any matter. | Prompt text decorative; host IPs inconsistent across snippets; output trimmed without an ellipsis marking the cut. | +| **Assembly / disassembly** | "The bug is at this level of the stack." | Print the address span and the function symbol. Identify architecture (Arm64 vs x86_64). | No register dump; reader not told which line is the focus. | +| **Charts and time-series plots** | "Here is the numeric shape of a phenomenon over a range." | Both axes labelled with units, time range stated, baseline shown if any, specific metric named in title or caption. | Single number ("we sped it up 60%") without distribution. | +| **Distribution charts** | "The population shape changed" — a stronger claim than "the mean changed." | State bucket width; y-axis labelled as count or share; tail visible. | "Improvement" claimed only on the median while the tail is hidden. | +| **Tables** | "Here is the comparison in normalized rows." | Caption names measurement conditions. | Units inconsistent within a column; "improvement" column absent; sprawls past ten rows without sub-grouping. | +| **Screenshots** | "This is what the operator saw." | Redact sensitive content; name what to look at; crop tightly. | Used as a substitute for code (always show the code); zoom level unstated; chrome distracts from focus. | +| **Embedded quotes / citations** | "This is not our wording." | Blockquote + link to original. | Quoted out of context to seem to support a stronger claim than the source actually makes. | + +## Emergent 2026 evidence forms + +Sub-types of existing slots with distinct obligations, surfacing in the AI/agent + security cohorts: + +- **Named-benchmark result tables / charts** — first-class evidence in the AI cohort. Must cite a public benchmark (MLE-Bench-Lite, BrowseComp-Plus, Finance-Agent, PlanCraft, SWE-Bench) or an internal benchmark with documented composition, plus baseline and evaluation slice. +- **Ablation matrices / box-plot comparisons** — the cohort's credibility move. Decompose the headline gain across components. Publish negative findings as load-bearing results, not buried caveats. +- **Agent-trace transcripts** — closer to a shell capture than a chart. Identify agent role, timestamp, and what was redacted. +- **Multi-persona / role-graph diagrams** — hybrid of flowchart and architecture diagram. Every named persona must recur in prose and at least one structured-output schema. +- **Knowledge-pyramid / cost-shape diagrams** — disclose operational cost without dollar amounts (Slack's Director/Expert/Critic pyramid). +- **Eval-harness evolution diagrams** — trace the evaluation platform's phases; each diagram anchored to a quantitative claim (Datadog's 95% validation-time reduction, 11% pass-rate regression, 30% root-cause quality increase). +- **Structured-output schemas / JSON rubrics** — code snippet whose claim is contractual ("the exact format the model is constrained to produce"), not illustrative. +- **Alert / Slack-message screenshots as deployment evidence** — proves the system is wired into a real on-call workflow, not just running in a notebook. + +## Captioning conventions (six rules) + +1. **Declarative, not imperative.** A caption states what the figure shows, not what the reader should do with it. *"High-level overview of real-time timeseries database (RTDB) node"* — declarative. *"Note the three subsystems on the left"* — wrong. +2. **Tense.** Static diagrams take present tense ("the request flows through Envoy"). Time-bounded charts take past tense ("p99 latency dropped from 1s to 100ms after we increased Envoy CPU"). +3. **Subject of the caption is the artifact, not the system.** *"The diagram shows the reverse path filter diagnosing traffic coming in on ens6 as Martian packets"* — diagram-as-subject. *"Reverse path filtering drops Martian packets coming in on ens6"* — wrong; the system is the subject and the caption has become redundant prose. +4. **Alt text as prose, not as label.** Alt text should let a screen-reader user reconstruct the diagram's claim. GitHub Issues' flowchart alt text reads as a step list — the corpus's strongest example. +5. **Code blocks must be syntactically copyable.** No smart quotes, no zero-width spaces, no ellipses inside code, no line numbers the reader has to strip. +6. **Captions name the finding, not the artifact.** Strong: *"Increasing Envoy's CPU helped mitigate the high latency, which now oscillated between 300ms-1s"* — the chart's reading. Weak: *"Latency chart"* — the chart's label. Captions starting with "Figure showing…" or "Diagram of…" are rejected. + +## Prose↔evidence cadence (three shapes) + +- **Lead-with-architecture.** A high-level system diagram appears in the first or second screen. The diagram functions as a **scope contract**: here is the map, we will not relitigate what is outside the box. Used by Datadog's Husky and RTDB posts. +- **Chart-as-cold-open.** A distribution chart appears immediately and reframes the post around its shape. The chart is the thesis. GitHub Issues' navigation-mix distribution graph; scaling-agents' task-performance hero image. +- **Alternating block.** Six successive sections follow *prose claim → diagram → prose interpretation*. The reader's eye and mind refresh on every screen. Husky `040`; DNS post `022` (charts + shell captures); `041` runs five iterations of *hypothesis → measurement → distribution graph → partial-victory paragraph*. + +## Code-curation rules + +1. **Minimum readable fragment over completeness.** Show the smallest excerpt that supports the claim; link the rest. The DNS post `022` uses 1–3 line kernel excerpts (`if (no_addr) goto last_resort;`). +2. **Elision marker conventions.** When cutting code, mark the cut explicitly with `// ...` or `# ...` ellipses; never silently truncate. +3. **Provenance.** Borrowed code (Linux kernel, Postgres, open-source library) must link back to its source line with a named PR or file/line link. +4. **Tutorial vs deep-dive split.** Tutorial code is runnable, end-to-end, and copy-paste-safe. Deep-dive code is the minimal disclosing slice — explanations are allowed fewer code examples than tutorials, but each must be irrefutable rather than illustrative. +5. **Language tag + syntax highlighting are mandatory.** No screenshots of code instead of code blocks (a screenshot of an IDE displaying code is unsearchable, uncopyable, and inaccessible). Exception: when the DOM is the evidence (diff-lines `054`), not the source. +6. **Code-without-context anti-pattern.** A 30+ line block with no surrounding prose claim almost always fails. Fix: split with explanatory prose, elide irrelevant lines with `// ...`, or link the full file and quote only the load-bearing region. + +## Distribution-shift contract (mandatory for performance) + +Any performance claim attaches to four pieces of context, or it is not falsifiable: + +- **Percentile.** p50 / p90 / p99 / pXX. Mean-only charts are rejected. +- **Sample size.** Number of pull requests, investigations, competitions, nodes. State explicitly. +- **Measurement window.** Time range, traffic level, rollout window. Charts on the same axes so the reader can read the shift visually. +- **Environment.** Instance type, browser, OS, hardware. `054:107` reports *"m1 MacBook pro with 4x slowdown"* — without this, the INP numbers are not falsifiable. + +The corpus's standing rebuke is the **missing-distribution anti-pattern**: a post quoting "p99 latency dropped from 1s to 100ms" without graphing the distribution between leaves the reader unable to see whether the improvement was uniform or whether a long tail moved while the body stayed put. + +## Named-benchmark contract (mandatory for AI/agent) + +For 2025–26 AI/agent capability claims, the contract substitutes capability-and-correctness for latency-and-throughput. Discipline is identical: + +- **Cited benchmark.** Public (MLE-Bench-Lite, BrowseComp-Plus, Finance-Agent, PlanCraft, Workbench, SWE-Bench) or internal with documented composition. +- **Baseline.** MLE-STAR vs AIDE (25.8% → 63.6%); scaling-agents single-agent vs centralised/independent/decentralised/hybrid. +- **Methodology.** What the eval harness does — Datadog's evaluation-platform regression (publishing an 11% pass-rate drop and 35% label-count drop as deliberate short-term degradation) is the standing example. +- **Ablation.** Decompose the headline gain. MLE-STAR's "In-depth analysis" breaks the medal-rate gain into model-usage shift, human intervention, and per-checker contribution. Posts without ablation read as proof-of-concept, not production. + +## Per-archetype mandatory evidence forms + +Each archetype has a non-negotiable evidence form set: + +- **Performance deep-dive** → distribution-shift evidence per fix + named tooling (`pg_walinspect`, `lldb`, ENA metric IDs) + partial-victory disclosure between fixes. +- **Postmortem** → UTC timestamps with defined granularity + named services and versions + quantitative impact + specific root-cause artifact (commit SHA, PR number, CVE). +- **Architecture migration** → paired before/after **with phase intermediaries** + dated phase-completion milestones + named cutover safety mechanisms + quantified scope. +- **AI/agent cohort post** → named benchmark + ablation + guardrails enumeration (named checkers / personas) + tools-and-MCP diagram + open-source repo or product preview at the close. +- **Incident-postmortem timeline** → both prose and tabular or graphical timeline. +- **Security post** → CVE number + upstream commit link + named adversary capability + behavioural-detection validation timestamp + named external researchers + explicit scope caveats. + +## The "no-decoration" rule + +Marketing-styled dashboards with sparklines, summary cards, and product logos are not engineering evidence. Before/after charts with cropped y-axes imply improvement without quantifying it. Undated traces cannot be correlated. Assembly listings without architecture labels do not localise the bug. + +**The single test:** *cover the figure with a hand and re-read the surrounding prose — can the reader still extract the claim? If yes, the figure is doing additional work and is evidence. If no, the prose has not earned the figure and the figure is filler.* diff --git a/.agents/skills/writing-tech-post/references/migrations.md b/.agents/skills/writing-tech-post/references/migrations.md new file mode 100644 index 000000000..6720de319 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/migrations.md @@ -0,0 +1,98 @@ +# Architecture Migrations + +The migration archetype's contract: legacy-charity opening, seven-panel canonical structure, phase-completion dates + tracking metrics, named cutover safety mechanisms, "what we'd do differently" honesty, and the seven sub-variants the corpus surfaces. + +## Contents + +- [Opening move: legacy charity + why-now](#opening-move-legacy-charity--why-now) +- [Seven-panel canonical structure](#seven-panel-canonical-structure) +- [Evidence obligations](#evidence-obligations) +- [Cutover discipline and named safety mechanisms](#cutover-discipline-and-named-safety-mechanisms) +- ["What we'd do differently" honesty pattern](#what-wed-do-differently-honesty-pattern) +- [Seven sub-variants](#seven-sub-variants) +- [Closing move: dated status snapshot](#closing-move-dated-status-snapshot) +- [Silent failure modes (migration slop)](#silent-failure-modes-migration-slop) + +## Opening move: legacy charity + why-now + +Migration writing characterises the prior state with **charity, not contempt**. Migration posts are read by the engineers who built the prior system; contemptuous framing burns credibility internally and externally. + +Exemplars: + +- Datadog: *"Like many other organizations, Datadog has long relied on the convenience of a large, shared relational database. This pattern is pervasive in the industry because it works well for many workloads — and continues to work well even at surprisingly large scales. But eventually, the trade-offs start to pile up."* +- Meta WebRTC: *"Permanently forking a big open-source project can result in a common industry trap. It starts with good intentions."* +- Slack SSH→REST: *"It worked. But it came with some potential problems."* + +The "why now" pre-empts the reader's natural question. State the trigger explicitly (security pressure, scaling ceiling, cost trajectory, organisational restructuring). + +## Seven-panel canonical structure + +1. **Empathetic legacy framing** — see above. +2. **Why-now justification** — concrete trigger. +3. **Phased plan (numbered)** — three to seven phases, each named. +4. **Tracking metrics per phase** — the numbers the team watched. +5. **Cutover discipline** — named safety mechanisms (see below). +6. **Where we are now (dated)** — explicit "Phase 1 finished Q1 2025; Phase 2 will continue through 2026." +7. **Forward work** — what comes next, with named applications. + +Meta WebRTC walks the variant: *Challenge of monorepo+linker → Shim Layer and Dual-Stack Architecture → patch automation → upgrade cycle → results → Future Work: AI-Driven Maintenance.* + +## Evidence obligations + +- **Phase-completion dates.** Datadog: *"Phase 1 started in Q1 2024 and finished in the middle of Q1 2025."* Explicit and falsifiable. +- **Per-phase tracking metrics.** Datadog: `postgresql.table.count` aggregated by schema converging toward 0. +- **Quantified scope.** Slack: *"700+ jobs in production… across 8 data regions."* Meta WebRTC: *"over 50 use cases"* and explicit difficulty disclosure (*"thousands of duplicate symbol errors"*). +- **Paired before/after with phase intermediaries** — not just two states. Migration archetypes have a stricter contract than other archetypes: show the *journey*, not just endpoints. +- **Vendor names at R5 (implementation)**, not R1/R2 (framing). See `depth-and-abstraction.md` for the rule. + +## Cutover discipline and named safety mechanisms + +Named tooling for rollback is the **credibility hinge**: + +- Datadog: PG Proxy, shadow tables. +- Meta WebRTC: dual-stack A/B testing per cohort. +- Meta data ingestion: shadow tables with reconciliation. +- Slack: per-region cutover with retained-rollback window. + +Generic claims ("we used industry-standard practices") fail the test. Name the tool, name the strategy, name the rollback window. + +## "What we'd do differently" honesty pattern + +The pattern converts a marketing-shaped draft into a credibility-shaped one. The strong form retains the messiness: + +- Meta WebRTC: *"thousands of duplicate symbol errors, hundreds of thousands of lines were modified across thousands of files."* +- Slack EMR: discloses orphaned processes, custom SSH operators, audit complexity — not "we migrated 700 jobs with zero issues." +- Canva (postmortem-derived): *"we'd underestimated the impact of the bug and didn't expedite deploying the fix."* + +The "what we'd do differently" paragraph is **load-bearing**, not a courtesy hedge. Skipping it produces a triumphal-shaped post that the genre rejects. + +## Seven sub-variants + +Each is a legitimate variant, not a deviation: + +1. **System rewrite / generational replacement** — Datadog Rust storage (six-generation lineage). +2. **Decomposition / monolith split** — Datadog unwinding shared database. +3. **Stack replacement / protocol modernization** — Slack SSH→REST. +4. **Pipeline replacement** — Meta data ingestion. +5. **Component modernization / hybrid retrofit** — Meta Groups Search, Meta WebRTC fork escape. +6. **Dual-stack variant** — Meta WebRTC. Adds a "Shim Layer" / "Adapter" section; per-cohort cutover. +7. **Maturity-level variant** — Meta PQC's PQ-Unaware → PQ-Enabled. Industry-wide framework instead of phased plan. **Must populate the framework with the publisher's own trajectory** — framework-without-instance is an explicit anti-pattern. + +Eighth variant: **Security-driven** — Slack SSH→REST. Why-now leans on security pain (*"we had a massive security surface… not ideal"*). + +## Closing move: dated status snapshot + +*"Phase 1 started in Q1 2024 and finished in the middle of Q1 2025. Phase 2 started halfway into Q1 2024 and will continue through 2026."* — explicit and falsifiable. Migration posts that omit the dated status snapshot read as if completion is being claimed before earned. + +**Forward work** names specific next applications (not "AI will help" or "we plan to optimise further"). Meta WebRTC names build-health automation and conflict-resolution as the two specific applications of the Future Work section. + +## Silent failure modes (migration slop) + +- **Aspirational language** — *"we are beginning to roll out…"* without a current state. +- **Triumphal claims** — *"700 jobs migrated with zero issues."* +- **Percentage-complete claims without a base** — *"we're 75% migrated"* with no denominator. +- **Toolset-as-story** — the narrative is "here are the tools we used" rather than "here is what changed and why." +- **Over-weighted forward-work** — "AI-Driven Maintenance" close on an otherwise unrelated migration; the bolt-on AI-future anti-pattern. +- **Premature-completion claim** — *"the migration is done"* without explicit "fully deprecated" language. +- **Framework-without-instance** — naming a maturity model or phase taxonomy without showing the publisher's own trajectory through it. +- **Contemptuous predecessor framing** — the most common credibility leak. Charity is mandatory. diff --git a/.agents/skills/writing-tech-post/references/narrative-and-pacing.md b/.agents/skills/writing-tech-post/references/narrative-and-pacing.md new file mode 100644 index 000000000..229e01fee --- /dev/null +++ b/.agents/skills/writing-tech-post/references/narrative-and-pacing.md @@ -0,0 +1,159 @@ +# Narrative, Momentum, and Pacing + +The five-lede taxonomy, the H2-as-question-resolution discipline, the story-shape catalogue (detective / migration / blameless / paper-link-first / tutorial arcs), momentum-stall diagnostics, the closer multiple-choice gate, and the headline taxonomy. + +## Contents + +- [Five lede types (paired to archetypes)](#five-lede-types-paired-to-archetypes) +- [H2-as-question-resolution discipline](#h2-as-question-resolution-discipline) +- [Story-shape catalogue](#story-shape-catalogue) +- [Momentum-stall diagnostics](#momentum-stall-diagnostics) +- [Closer taxonomy (multiple-choice gate)](#closer-taxonomy-multiple-choice-gate) +- [Headline + title patterns](#headline--title-patterns) +- [The first-200 + last-200 callback-coupling test](#the-first-200--last-200-callback-coupling-test) +- [Anti-narrative patterns](#anti-narrative-patterns) + +## Five lede types (paired to archetypes) + +Pick the lede by archetype. Mispairing is the first failure mode the skill catches. + +| Lede type | Archetype | Mechanism | Exemplar | +|-----------|-----------|-----------|----------| +| **Result-first** | Launch deep-dive | Open with the headline number, often a multiple. The number anchors every later section. | Datadog Rust storage: "60x increase in ingestion / 5x faster queries at peak scale" (`016`); Datadog 100x process-metrics post (`067`). | +| **Mystery / surprise** | Performance, Jane Street narrative | Open with a pathology the reader has no obvious explanation for. The post commits to explaining the gap. | Datadog Postgres upsert: "We expected this new query to have minimal impact… But when we rolled out the new query, disk writes doubled" (`023`); Jane Street build system: "Ha! What actually happened is that nobody really wanted to use Jenga" (`219`). | +| **Stakes-first** | Performance, migration, postmortem (3-W variant) | Open with what is at risk for the user or the team. Earns the right to spend the next 2,000 words on engineering minutiae. | Datadog network-latency: "Getting paged to investigate high-urgency issues is a normal aspect of being an engineer. But none of us expect…" (`041`); GitHub diff-lines: "Pull requests are the beating heart of GitHub" (`054`); GitHub eBPF: "If github.com were ever to go down, we wouldn't be able to access our own source code" (`028`). | +| **Shipping-status** | Launch, product changelog | "Today we shipped" or "now generally available." Carries an explicit CTA contract. | Vercel AI Gateway GA (`059`); Vercel streaming (`062`). | +| **Paper-link-first** | AI/agent, research-to-product translation | Open with a capability claim and a citation. The paper link is structural, not decorative — transfers credibility from the literature. | Google Research MLE-STAR with arXiv link in "Quick links" block above first body paragraph (`074`). | + +**Postmortem 3-W variant** (a stakes-first specialization): date + scope + impact in the first two sentences. Canva: *"On November 12, 2024, Canva experienced a critical outage that affected the availability of canva.com. From 9:08 AM UTC to approximately 10:00 AM UTC, canva.com was unavailable."* The 3-W contract: when, what, where, how-bad, all inside the first 200 words. + +**Hedged-lede slop (banned):** + +- *"In this post, we'll explore…"* — burns paragraph one on apologia. +- *"We wanted to share some thoughts on…"* — issues no debt to the reader. +- *"This article is written for…"* — announces an explainer rather than showing an arresting fact. + +The **productive form** uses "In this post, we'll…" as a road-marker AFTER the pathology paragraph (Datadog 016 third paragraph; 023 fourth paragraph; 036 fourth paragraph), never as the lede itself. + +## H2-as-question-resolution discipline + +Each H2 should answer the question the previous section left dangling, opening a new question that the next H2 answers, until the post lands. The H2 chain is *not* a noun-phrase index — each header is a verb-phrase that names an action taken to resolve a question, which is what gives the reader the sense the investigation is progressing. + +**Textbook execution — Datadog network-latency** (`041:50-90`): + +> "Our usage estimation service at a glance" *(sets up the system)* +> → "Allocating more CPU to a remote cache dependency" *(answers: what was the first hypothesis? what did fixing it teach us?)* +> → "Patching a Linux kernel bug" *(answers: why didn't the CPU fix complete the work?)* +> → "Optimizing AWS instance network configurations" *(answers: why didn't the kernel patch land the distribution?)* +> → "Routing client requests away from terminating pods" *(answers: what was left after AWS?)* +> → "Recap" *(closes the chain)* +> → "Bolster your visibility and learn to look twice" *(lifts the lesson)* + +Each section closes with the partial-victory paragraph (e.g. *"This was a slight improvement, but clearly still higher than normal"*) that licenses the next H2. + +**Pre-prose outline review:** produce the H2 chain and annotate each H2 with what question it answers and what question it leaves open. Reject outlines where any H2 fails the gate. + +**Counter-example — noun-phrase H2s:** AWS architecting-for-agentic-AI (`044:62-75`) runs noun-phrase H2s ("Why traditional architectures hinder agentic AI", "System architecture for fast agentic feedback loops") and resists the question-chain. The result is correct for the reference archetype (the H2s function as a TOC, not a narrative spine) but reads as catalogue-paced. + +## Story-shape catalogue + +Each archetype gets a default arc. + +### Detective-arc (performance) + +Hypothesis → instrument → measurement → partial victory → re-arm. + +Signature beat: the **partial-victory paragraph** that the network-latency post `041` repeats five times. Two closing variants: + +- *Honest-recap close* — bullets the chain and names the operational gap that should make next time faster (Datadog `041`, `067`). +- *Lessons-from-the-trenches close* — generalises one transferable lesson (Datadog `023`, `069`). + +Both refuse the wrapped-bow close; the diff-lines post `054` says explicitly *"the improvements didn't end there"* and lists ongoing work. + +### Migration-arc + +Legacy-charity opening → why-now → phased plan with tracking metrics → cutover discipline → dated status snapshot → forward work. + +Datadog shared-database (`036:51-83`) opens with three full sections of legacy charity: *"Why do shared databases exist?"*, *"When is it time to take apart the shared database?"*, *"What keeps teams from moving off shared infrastructure?"* — each H2 a literal question, the section answering it. Rhetorical-question H2s do the work of justifying the migration's existence before the phased-plan section begins. + +Meta WebRTC (`199:50-117`) runs a tighter variant — Challenge → Solution 1 (Shim Layer) → Solution 2 (Feature Branches) → The Result → Future Work. The "Future Work: AI-Driven Maintenance" close is the trope flagged as over-weighted in 2026 posts; if used, must name at least two specific applications. + +### Blameless-arc (postmortem) + +3-W summary → background → the incident → timeline → contributing factors → mitigation (including mitigations that failed) → action items → optional lessons / acknowledgments. + +Signature beat: at least one **failed-mitigation paragraph**. Canva `050`'s *"We attempted to work around this issue by significantly increasing the desired task count manually. Unfortunately, it didn't mitigate the issue"* is what licenses the next mitigation paragraph to exist. Without it, the escalation reads as panic; with it, it reads as ordered learning. + +Datadog `034` runs an ambitious temporal-displacement variant: opens with the outage date, then traces causality back two years to a December 2020 systemd commit. + +### Paper-link-first arc (AI/agent) + +Capability claim → paper link → motivation → method overview → architecture diagram → named-benchmark table → ablation → named guardrails → forward-section motif. + +MLE-STAR `074:43-89` runs it tightly: Quick links → Introducing MLE-STAR → Evaluations and results → In-depth analysis of MLE-STAR's gains → Conclusion → Acknowledgements. The "In-depth analysis" is the cohort's load-bearing momentum move — it decomposes the headline number so the reader can audit which part of the system delivered it. + +Slack security-investigation `160:48-130` runs a related variant where the H2 chain is The Development Process → From Prototype to Production → Service Architecture → Example Report → Conclusion, with the *Critic-finds-what-Expert-missed* worked example doing the ablation work in narrative form. + +### Tutorial-arc + +Prerequisites → primer → stepwise procedure → working example → caveats → next steps. + +AWS Lambda multi-threaded Rust (`170:36-90`) runs it cleanly: "Our Test Workload: Why Bcrypt Password Hashing?" → "Understanding Lambda's vCPU Allocation" → "Solution Overview" → "Creating a Multi-threaded Rust Lambda Function" (stepwise). + +Tutorial momentum runs on different fuel from the detective-arc: the reader is asked to *do* something at each step, and the H2 chain's job is to confirm they have not lost the thread. Tutorial H2s are allowed to be more noun-phrase-heavy ("Dependencies", "Solution Overview") because the action is in the code blocks. + +### Narrative-essay (corpus minor variant) + +Jane Street's build-system post `219:48-77`: We had a tool → We released it and it failed → We built a smaller compatibility shim → The shim accidentally became popular → We had to rename it → We had to migrate to it → We made it scale → It worked. Six-paragraph story; H2 chain functions as chapter titles. + +Rare in the corpus because most engineering organisations cannot write in this voice; Jane Street can because its publisher voice tolerates first-person individual register. + +## Momentum-stall diagnostics + +Four diagnostics recur. + +- **Scene-setting density too high.** Pre-pathology paragraph runs longer than necessary; reader skims past the lede. Resolved by pairing one scene-setting paragraph with one pathology paragraph (one-and-one). Longer scene-setting needs a stronger pathology to balance it. +- **Payload-density ratio too low.** Sections that introduce no new fact, number, code reference, or distinction. Diagnostic: read the H2s and ask "what does the reader know after this section that they did not know before?" If the answer is "we explained the architecture in more words," cut. +- **Callback frequency too low.** A fact introduced in section 1 is not referenced again before section 5. Datadog Rust storage resolves this by introducing the "6th generation in a lineage that started 15 years ago" frame in the lede and explicitly running Gen 1 → Gen 6 H2s under "How we built the 6th generation of our real-time metrics storage" (line 77). The lede claim is paid back in the body's spine, then again in the closing. +- **Rhetorical-question budget exceeded.** Datadog shared-database `036:51-83` runs three rhetorical-question H2s — the maximum the genre tolerates before the rhetorical move starts to feel performative. Cap at three per long post. + +## Closer taxonomy (multiple-choice gate) + +The closer must be one of five forward-pointing shapes. Summarising closers ("In conclusion, we have shown that…") are a regression. + +- **Call-to-build.** Close on an artifact the reader can run, build, or extend. GitHub eBPF `028:317-326`: *"Want to dive in? Get started by having a look through the examples in [cilium/ebpf](https://github.com/cilium/ebpf/tree/main/examples)…"*. MLE-STAR `074:81-86`: link to `google/adk-samples`. The open-source / research-translation move. +- **Call-to-adopt.** Close on a managed-service link, a sign-up button, or an "available now" CTA. Vercel product-changelog signature. Tolerable when the post has paid out the engineering substance; bait-and-switch when the substance is thin. +- **Open-question.** Close on the unresolved part. Datadog diff-lines `054:53-56`: *"even within our large and mature codebase, can deliver meaningful benefits to all users — and that sometimes focusing on small, simple improvements can have the largest impact."* Lifts the post's specific finding into a general principle without claiming the principle is settled. +- **Shipping-status / roadmap.** Close on what is now in production and what is next. Datadog Rust storage `016:194-210`: *"Looking ahead: smarter routing and integrated indexing"* with three concrete next-direction commitments. Launch and migration archetype's signature; forward-points without overclaiming. +- **Prevention-list.** Close on a bulleted action items list naming what changes are now committed. Canva `050:95-115` groups action items by category. Cloudflare Copy Fail `232:178-195`: "Remediation and follow-up steps" then "Conclusion" naming the specific resolved state. Postmortem and non-incident-response signature. + +A sixth variant exists for performance posts: **distribution-chart close** — close on a graph that shows the distribution moving. The chart *is* the closing argument. + +## Headline + title patterns + +- **Number-first.** Lead the title with the quantified result. *"How we improved efficiency of live process metrics by 100x"* (`067`). The number is the contract; the post must pay it back. A "60x" title with no 60x evidence is **archetype-bait**; reject. +- **Decision-narrative.** Lead with the verb that names the strategic choice. *"Breaking up a monolith: How we're unwinding a shared database at scale"* (`036`). *"Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases"* (`199`). Migration-arc's most common title shape. +- **Surprise-clause / pathology.** Open the title with the negation of the reader's expectation. *"Not just another network latency issue: How we unraveled a series of hidden bottlenecks"* (`041`). *"When upserts don't update but still write: Debugging Postgres performance at scale"* (`023`). Detective-arc signature; pairs with the mystery lede. +- **System-name / introduction.** Lead with the named artifact. *"MLE-STAR: A state-of-the-art machine learning engineering agent"* (`074`). Launch deep-dive and research-translation signature; forces the post to define the system before the reader leaves the lede. +- **Stakes / journey.** Lead with the user experience. *"From latency to instant: Modernizing GitHub Issues navigation performance"* (`017`). Pre-commits the post to a journey shape. +- **Question-form.** Less common but distinctive. True interrogative titles invite skim-and-skip behaviour unless the question is genuinely puzzling. + +## The first-200 + last-200 callback-coupling test + +Extract the first 200 words and the last 200 words and pair them. If they cannot be paired into a same-thread statement, the post has no spine. + +Datadog Rust storage `016` passes the test: lede introduces "6th generation in a lineage that started 15 years ago" + "60x ingestion / 5x query" → closer pays back lineage callback ("Looking ahead: smarter routing and integrated indexing") + roadmap. + +Datadog network-latency `041` passes: lede issues debt ("we don't expect to get paged about every single deployment") → closer "Bolster your visibility and learn to look twice" pays the lesson. + +Run this as a **pre-flight check** before any other review. + +## Anti-narrative patterns + +- **Buried lede.** Headline number in section three. Catalogued as launch-archetype failure. +- **Abstract-then-concrete inversion.** Post opens with a generic industry trend and waits until section two to introduce the specific system. Strategic essays do this on purpose; engineering deep-dives that imitate forfeit pull. +- **Every section a noun phrase.** H2s read as a TOC, not as the spine of an argument. Diagnostic: when the H2s can be skimmed in isolation and re-ordered without changing meaning, the question-chain is broken. +- **Callbacks that never connect.** A fact named in the lede is not referenced again. Diagnostic: extract the first 200 words and the closing 200 words — if they cannot be paired into a same-thread statement, the post has no spine. +- **Closers that summarise instead of forward-pointing.** *"In conclusion, we have shown that…"* recapitulations. Almost every read-at-depth closer in the corpus picks one of the five forward-pointing shapes. +- **Bolted-on AI-future close on a non-AI post.** Generic "AI handles the long tail" paragraph that reads as performative when the AI work is not genuinely on the roadmap. If used, the forward-section must name at least two specific applications. +- **Archetype-bait headline.** Title promises archetype A; body delivers archetype B. Erodes publisher trust over a series of posts. diff --git a/.agents/skills/writing-tech-post/references/performance-deep-dive.md b/.agents/skills/writing-tech-post/references/performance-deep-dive.md new file mode 100644 index 000000000..d99ab2a3d --- /dev/null +++ b/.agents/skills/writing-tech-post/references/performance-deep-dive.md @@ -0,0 +1,106 @@ +# Performance Deep-Dive + +The performance archetype's contract: detective-arc structure, distribution-shift evidence (not means), iterative bottleneck-peeling, partial-victory paragraph cadence, and the two honest-recap closing variants. + +## Contents + +- [Opening: felt experience + stakes (not a number)](#opening-felt-experience--stakes-not-a-number) +- [The detective-arc structure](#the-detective-arc-structure) +- [The partial-victory paragraph (load-bearing cadence)](#the-partial-victory-paragraph-load-bearing-cadence) +- [Distribution-shift contract (mandatory)](#distribution-shift-contract-mandatory) +- [Named tooling as evidence](#named-tooling-as-evidence) +- [Three sub-variants](#three-sub-variants) +- [Closing move (two honest-recap variants)](#closing-move-two-honest-recap-variants) +- [Silent failure modes (performance slop)](#silent-failure-modes-performance-slop) + +## Opening: felt experience + stakes (not a number) + +Performance posts open with the *pathology + stakes*, not the headline number. The post then "promises the arc" in the second paragraph by naming the multi-fix structure to come. + +Exemplars: + +- GitHub Issues: *"Latency isn't just a metric. It's a context switch. Even small delays add up, and they hit hardest at the exact moments developers are trying to stay in flow."* +- Datadog network-latency: *"Getting paged to investigate high-urgency issues is a normal aspect of being an engineer. But none of us expect (or want) to get paged about every single deployment."* + +**Why not result-first?** Result-first is the launch archetype's lede. Performance is a *detective story* — the reader has to live through the investigation. Opening with the result spoils the arc. + +**Promise the arc in paragraph 2.** Datadog network-latency names "a series of hidden bottlenecks" within the second paragraph. The reader now knows the post is multi-fix and stays through the H2 chain to collect. + +## The detective-arc structure + +Each performance section repeats a beat: + +> Hypothesis → instrument → measurement → partial victory → re-arm + +Datadog network-latency `041:50-90` runs this beat five times — each section closing with the partial-victory cadence ("a slight improvement, but clearly still higher than normal"). + +**Five-beat section skeleton:** + +1. **Baseline metric definition** — what we measured, how, in what units. +2. **Bottleneck identification** — current hypothesis, what made us look here. +3. **Architectural moves** — what we changed. +4. **Distribution shift evidence** — chart aligned with prior charts on the same axes. +5. **Tradeoffs / next bottleneck** — partial-victory paragraph re-arming the investigation. + +## The partial-victory paragraph (load-bearing cadence) + +The signature beat. Honest disclosure of partial victories between fixes is what licenses the next H2. + +Examples (Datadog network-latency, repeated five times): + +- *"This was a slight improvement, but clearly still higher than normal."* +- *"Instead of plateauing at about one second, the p99 remote cache latency was now oscillating between 300 ms and one second."* + +**Without it,** the post reads as a sequence of unrelated fixes. **With it,** the post reads as ordered investigation. + +The skill must require at least one partial-victory paragraph between successive fixes in any detective-arc draft. + +## Distribution-shift contract (mandatory) + +Any performance claim attaches to four pieces of context, or it is not falsifiable. (Full text in `evidence-diagrams-code.md`.) + +- **Percentile.** p50 / p90 / p99. Mean-only charts are rejected. +- **Sample size.** State explicitly. +- **Measurement window.** Time range, traffic level, rollout window. Per-fix charts on the same axes. +- **Environment.** Instance type, browser, OS, hardware. + +GitHub Issues uses HPC distribution histograms as the spine of the post — distribution at the start, after cache rollout, after preheating, after Turbo navigations, plus a final percentile chart. + +**Missing-distribution anti-pattern:** a post quoting *"p99 latency dropped from 1s to 100ms"* without graphing the distribution between leaves the reader unable to see whether the improvement was uniform or whether a long tail moved while the body stayed put. + +## Named tooling as evidence + +Vague tooling reads as marketing. Strong perf posts name the exact tool: + +- `pg_walinspect`, `pg_test_fsync`, `lldb` (with explicit version requirements: *"Starting in Postgres 15"*). +- ENA metric IDs like `system.net.aws.ec2.bw_in_allowance_exceeded`. +- Network Performance Monitoring as the instrument that surfaced a bottleneck. + +The contract: when an investigation depends on a specific tool, name it. When the tool is a publisher's own product, name it **as an instrument**, not as a CTA. + +## Three sub-variants + +1. **Narrow-detective** — one query, one tool, one root cause. Datadog Postgres upsert. +2. **Multi-bottleneck peeling** — one symptom, chain of independent causes. Datadog network-latency, five fixes. The signature shape of the archetype. +3. **Root-cause-chase-to-depth** — single symptom, descending layers most readers won't visit. Datadog Postgres segfault → LLVM Arm64. + +## Closing move (two honest-recap variants) + +Both refuse the wrapped-bow close. + +- **Honest-recap close** — bulleted chain of fixes + operational change in monitoring/alerting. Datadog network-latency, Datadog scaling-down-to-speed-up. Names the operational gap that should make next time faster. +- **Lessons-from-the-trenches close** — one transferable general lesson + the tool that surfaced it. Datadog Postgres upsert, Postgres segfault. + +GitHub diff-lines says explicitly *"this didn't end here"* and lists ongoing work. + +A third variant: **distribution-chart close** — close on a graph that shows the distribution moving. The chart *is* the closing argument; prose around it is supplementary. + +## Silent failure modes (performance slop) + +- **Headline-number-only lede.** "We sped it up by 60%" without disclosure of how. Launch register on performance archetype. +- **Mean-only evidence chart.** Performance is about distributions, not averages. +- **Single-fix-no-failures register.** Real performance work produces failed attempts; their absence reads as luck or fabrication. +- **P99 graph without x-axis baseline.** No time range, traffic volume, or version annotation. +- **Missing distribution shift.** Quoting a percentile move without graphing the distribution between. +- **Missing partial-victory cadence.** Each section claims complete success before the next one starts. +- **Triumphal close ("we sped it up by Xx and everyone was happy").** Performance closers are honest-recap or lessons-from-the-trenches, not wrapped-bow. diff --git a/.agents/skills/writing-tech-post/references/postmortems.md b/.agents/skills/writing-tech-post/references/postmortems.md new file mode 100644 index 000000000..2c808b316 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/postmortems.md @@ -0,0 +1,103 @@ +# Incident Postmortems + +The postmortem archetype's contract: canonical section sequence, blameless register, UTC-timeline + named-artifact root cause obligations, failed-mitigation discipline, and the postmortem/reliability-essay hybrid rule. + +## Contents + +- [Opening move and section sequence](#opening-move-and-section-sequence) +- [The blameless register (three rules)](#the-blameless-register-three-rules) +- [Evidence obligations](#evidence-obligations) +- [The failed-mitigation paragraph](#the-failed-mitigation-paragraph) +- [Closing move (prevention vs reliability essay)](#closing-move-prevention-vs-reliability-essay) +- [Postmortem / reliability-essay hybrid](#postmortem--reliability-essay-hybrid) +- [Tone calibration: between detached and contrite](#tone-calibration-between-detached-and-contrite) +- [Silent failure modes (postmortem slop)](#silent-failure-modes-postmortem-slop) + +## Opening move and section sequence + +**Opening:** date + scope + impact in the first two sentences. The 3 W's + impact (when, what, where, how-bad) inside the first 200 words. Delay reads as evasive. + +Exemplars: + +- Canva: *"On November 12, 2024, Canva experienced a critical outage that affected the availability of canva.com. From 9:08 AM UTC to approximately 10:00 AM UTC, canva.com was unavailable."* +- Datadog 2023-03-08: *"On March 8, 2023, Datadog experienced an outage that affected all services across multiple regions."* + +**Canonical section sequence** (strict, near-canonical): + +1. **Summary** — third-person ("Datadog experienced an outage"). +2. **Background** — system context the reader needs. +3. **The incident** — narrative of what happened. +4. **Timeline** — UTC timestamps with defined granularity, narrated. +5. **Contributing factors** — usually multiple, not "one root cause." +6. **Mitigation** — including failed attempts (see §[failed-mitigation paragraph](#the-failed-mitigation-paragraph)). +7. **Action items** — per category, with owners and ETAs. +8. **Optional: Lessons** — for reliability essay hybrid (see below). +9. **Optional: Acknowledgements** — for cross-team or vendor-collaborative incidents. + +The sequence is not aspirational — every section is required *unless* the post is explicitly labelled as a non-canonical variant (e.g., "preparedness post" / "non-incident response"). + +## The blameless register (three rules) + +1. **System-subject sentences.** Subjects in the causality narrative are systems, not people. *"The affected asset was a JavaScript file responsible for displaying the editor's object panel"* (Canva `050:61`) — not *"the engineer who deployed that bundle."* *"On start-up of v248, systemd-networkd flushes all IP rules it does not know about"* (Datadog `034:48`) — not *"the systemd maintainers chose to flush."* + +2. **Passive voice where it isolates fault, active voice where it claims learning.** *"An automated upgrade was triggered at 6:00 UTC"* (passive — isolates fault). *"We have built substantially more robust persistent disk storage"* (active — claims the learning). The polarity is the genre's signature; reversing it is the most common failure mode. Google's general active-voice preference relaxes specifically in postmortem causality sections. + +3. **First-person plural for ownership.** Postmortems use "we" — not "Datadog" or "the SRE team" — when claiming responsibility *and* learning. Canva: *"We attempted to work around this issue… Unfortunately, it didn't mitigate."* Third-person ("Datadog experienced an outage") appears only in the summary. + +**Test:** read the draft aloud and replace every "we" with the publisher's name. If the sentence still scans, the register is correct. If awkward, the "we" was hiding individual blame. + +## Evidence obligations + +- **UTC timestamps with defined granularity.** Minute-resolution for incidents under an hour; ten-minute or hour-resolution for longer incidents. Always UTC; never local time without UTC conversion alongside. +- **A specific named artifact for root cause.** Commit SHA (Datadog 034 links systemd PRs `#17477` and `#19287`), CVE number (Cloudflare Copy Fail cites `CVE-2026-31431`), kernel function, package version. **Vagueness here is fatal.** +- **Quantitative impact.** Sample: Canva's "1.5M req/s, 3× peak load, 1700% TTFB increase, 270,000+ pending requests." Numbers, not "severe impact." +- **Named services and versions.** Canva names Netty, Amazon ECS, Cloudflare tiered cache, AWS S3. Datadog 034 names systemd v248/v249, Ubuntu 22.04. +- **Verbatim partner quote** when a third party shares responsibility. Canva blockquotes Cloudflare's full statement about the stale traffic-management rule with attribution. The verbatim quote establishes accountability without Canva making claims on Cloudflare's behalf. + +## The failed-mitigation paragraph + +Strong postmortems disclose mitigations that **did not work**. The failed-mitigation paragraph is *load-bearing momentum*: it licenses the next mitigation paragraph to exist. Without it, escalation reads as panic; with it, it reads as ordered learning. + +Canonical execution — Canva `050`: *"We attempted to work around this issue by significantly increasing the desired task count manually. Unfortunately, it didn't mitigate the issue."* + +**Requirement:** at least one failed-mitigation paragraph in any blameless-arc draft before flagging the draft as complete. + +## Closing move (prevention vs reliability essay) + +Two variants depending on archetype scope: + +- **Pure incident postmortem** → prevention commitments per category. Canva closes with grouped action items: *"Incident response process improvements / Increased resilience of the API Gateway / Fix the telemetry bug / Improvements to detecting page deployment failures / Collaboration with Cloudflare."* + +- **Reliability essay derived from postmortem** → bulleted design principles. Datadog 038 (`038-…rethinking-reliability.md`) closes with *"Always start with what's important to the end user… Persist data early…"* The post is the postmortem zoomed out 6–18 months later, with the incident as motivation and design principles as the load. + +Both variants forbid the **"we have you covered"** closure that promises completeness without naming residual gaps. + +## Postmortem / reliability-essay hybrid + +The reference example is the Datadog 034 / 038 diptych: + +- `034` (March 2023, single author) — pure incident postmortem; date + scope opening; dormant-fault causality back to 2020 systemd PRs. +- `038` (October 2025, 3 authors) — reliability essay derived from the March 2023 incident; "square-wave failure pattern" coined-term lede; "never-fail architecture" / "failing better" vocabulary; published 18 months after. + +The hybrid produces *paired diptychs* in the 3,000–4,000 + 4,000–7,000 word band, with the second post linked from the first's closing and vice versa. + +## Tone calibration: between detached and contrite + +Blameless does not require false neutrality. + +- Datadog: *"This incident reminded us"* and *"we have to accept"* — acknowledges significance. +- Canva: *"We've been working closely with Cloudflare to gain an in-depth understanding"* — accepts shared responsibility without ducking. + +Overly neutral postmortems read as detached. Overly contrite postmortems read as performative. + +## Silent failure modes (postmortem slop) + +- **Naming a specific engineer** in the causality section. +- **Defensive register** ("While our autoscaling…"). The "While"/"Although" construction signals more concern for reputation than learning. +- **"We have you covered" closures** without naming residual gaps. +- **Over-redaction** ("a critical microservice," "an upstream provider"). Disclose why redaction is necessary, do not silently genericise. +- **24-hour postmortems with placeholder action items.** Wait for the action items to be specific. +- **Missing timeline.** UTC timestamps are non-negotiable. +- **"Five whys without the answer."** Asking five whys but stopping at the third without commitment. +- **No failed-mitigation paragraph.** Escalation reads as panic; the post lacks the cadence that licenses the next mitigation section. +- **Mismatched closing CTA.** A 4,000-word postmortem ending with *"Sign up for a free trial"* alienates both audiences. Postmortems close with engineering reflection or prevention commitments. diff --git a/.agents/skills/writing-tech-post/references/pre-publish-checklist.md b/.agents/skills/writing-tech-post/references/pre-publish-checklist.md new file mode 100644 index 000000000..2141ebd4a --- /dev/null +++ b/.agents/skills/writing-tech-post/references/pre-publish-checklist.md @@ -0,0 +1,179 @@ +# Pre-Publish Checklist + +The archetype-conditional checklist that gates a draft before publication. Each row is a hard gate; warnings are not optional. The lint script `scripts/lint-post.py` automates a subset of these checks; the rest are manual review gates. + +## Contents + +- [How to use this checklist](#how-to-use-this-checklist) +- [Universal gates (every post)](#universal-gates-every-post) +- [Postmortem gates](#postmortem-gates) +- [Migration gates](#migration-gates) +- [Performance gates](#performance-gates) +- [AI/agent gates](#aiagent-gates) +- [Security gates](#security-gates) +- [Launch gates](#launch-gates) +- [Tutorial gates](#tutorial-gates) +- [Research-translation gates](#research-translation-gates) +- [Publishable / hold-for-review / rework rubric](#publishable--hold-for-review--rework-rubric) +- [Disclosure blockers](#disclosure-blockers) +- [Lint script integration](#lint-script-integration) + +## How to use this checklist + +1. Identify the post's primary archetype (and absorbed archetype if hybrid). +2. Walk the **Universal gates** + the **archetype-specific gates** row by row. +3. Mark each as ✅ pass / ⚠️ hold / ❌ rework. +4. Apply the publishable / hold-for-review / rework rubric. +5. Run `scripts/lint-post.py ` to verify the automated subset. + +A single ❌ blocks publication. Any ⚠️ requires explicit acknowledgement from the author plus a one-line note in the draft frontmatter. + +## Universal gates (every post) + +- [ ] **Archetype committed.** Primary archetype named in frontmatter; absorbed archetype named if hybrid. +- [ ] **Depth four-tuple committed.** `(opening rung, body residency band, closing rung, traversal)` recorded in frontmatter or outline header. +- [ ] **Lede matches archetype.** See lede-taxonomy / archetype-pairing in `narrative-and-pacing.md`. Hedged ledes ("we want to share some thoughts on…") are rejected. +- [ ] **H2-as-question-resolution.** Every H2 resolves a question the previous section opened. Noun-phrase TOC-style H2s are rejected unless the archetype is reference-shaped (AWS-style how-to). +- [ ] **Claim → artifact → reading cadence.** Every artifact (chart, diagram, code, table) is preceded by a prose claim and followed by a prose reading. Captions state the *finding*, not the artifact name. +- [ ] **Evidence-form set per archetype.** The archetype's mandatory evidence forms are present (see archetype-specific gates). +- [ ] **First-200 + last-200 callback coupling.** Extract first 200 words and last 200 words; they thread back to each other. +- [ ] **Closer matches one of five forward-pointing shapes.** Call-to-build / call-to-adopt / open-question / shipping-status-roadmap / prevention-list. Summarising closers are rejected. +- [ ] **Headline does not promise an archetype the body does not deliver.** No archetype-bait. +- [ ] **Vendor names live at R5 (implementation), not R1/R2 (framing).** Apply the "remove the vendor's name" test. +- [ ] **Triumphal-language lint.** *"Successfully," "smoothly," "without issue"* appear at most twice per long post (≤5,000 words) or once per short post. +- [ ] **No hedged-lede slop.** *"In this post we'll explore…"* / *"We wanted to share some thoughts on…"* / *"This article is written for…"* +- [ ] **No "we're excited to announce" template.** +- [ ] **Anti-patterns sweep.** Run `anti-patterns.md` line-by-line against the draft; no patterns hit. + +## Postmortem gates + +- [ ] **Date + scope + impact in the first two sentences.** 3-W summary inside first 200 words. +- [ ] **UTC timeline with defined granularity.** Minute-resolution for incidents under an hour; hour-resolution for longer. +- [ ] **Specific root-cause artifact named.** Commit SHA, PR number, CVE, kernel function, package version. Vagueness is fatal. +- [ ] **Blameless register.** System-subject sentences in causality; no engineer's name attached to causation. +- [ ] **Passive/active polarity correct.** Passive isolates fault; active claims learning. +- [ ] **First-person plural for ownership.** Third-person only in summary. +- [ ] **At least one failed-mitigation paragraph.** *"We attempted X. Unfortunately, it didn't mitigate."* +- [ ] **Quantitative impact.** Numbers, not "severe impact." +- [ ] **Verbatim partner quote** if a third party shares responsibility. +- [ ] **Closing move: prevention commitments per category** (pure postmortem) OR **design principles** (reliability essay variant). No "we have you covered" closure. +- [ ] **No 24-hour postmortem with placeholder action items.** Wait until action items are specific. + +## Migration gates + +- [ ] **Empathetic legacy framing.** Charity toward predecessor system; no contempt. +- [ ] **Why-now justification.** Concrete trigger named. +- [ ] **Phased plan (numbered).** Three to seven phases, each named. +- [ ] **Per-phase tracking metrics.** Datadog: `postgresql.table.count` aggregated by schema; analogous in other migrations. +- [ ] **Named cutover safety mechanisms.** PG Proxy / shadow tables / dual-stack A/B / per-region cutover. +- [ ] **Quantified scope.** 700+ jobs, 50+ use cases, 30 schemas — explicit numbers. +- [ ] **Paired before/after WITH phase intermediaries.** Not just two states. +- [ ] **Dated status snapshot in close.** *"Phase 1 finished Q1 2025; Phase 2 will continue through 2026."* +- [ ] **"What we'd do differently" paragraph.** Honest disclosure of difficulties. +- [ ] **Forward work names specific applications.** No bolt-on AI-future paragraphs. +- [ ] **Multi-author byline** OR named acknowledgements paragraph. +- [ ] **No framework-without-instance.** Maturity model must be populated with the publisher's own trajectory. + +## Performance gates + +- [ ] **Felt-experience + stakes opening** (not a number). Promise the arc in paragraph 2. +- [ ] **Five-beat section skeleton per fix.** Hypothesis → instrument → measurement → partial victory → re-arm. +- [ ] **Partial-victory paragraph between fixes.** *"A slight improvement, but clearly still higher than normal."* +- [ ] **Distribution-shift evidence per fix.** Percentile + sample size + measurement window + environment. Mean-only charts rejected. +- [ ] **Per-fix charts on the same axes.** Reader can read the shift visually across interventions. +- [ ] **Named tooling.** `pg_walinspect` / `lldb` / ENA metric IDs / specific instrument used. +- [ ] **Two-author byline** (standard for the genre). +- [ ] **Closing move: honest-recap OR lessons-from-the-trenches.** No wrapped-bow close. + +## AI/agent gates + +- [ ] **Capability claim + paper/repo link in first scroll.** Paper-link-first attribution. +- [ ] **Cited benchmark** — public (MLE-Bench-Lite, BrowseComp-Plus, etc.) or internal with documented composition. +- [ ] **Baseline reported.** MLE-STAR vs AIDE; single-agent vs multi-architecture; etc. +- [ ] **Methodology disclosed.** What the eval harness does. +- [ ] **Ablation / "In-depth analysis" section.** Decomposes the headline gain across components. +- [ ] **Negative findings published as load-bearing**, not buried caveats. +- [ ] **Named guardrails / checkers.** Each named, with role and detection contract. +- [ ] **Capability vs limitation register switch.** Declarative present for capabilities; hedged future-conditional for limitations. +- [ ] **Closing move: open-source repo link OR "AI handles the long tail" motif** (only if genuinely on roadmap with at least two specific applications). +- [ ] **No "we used AI agents to" as a credential.** No AI-as-buzzword. + +## Security gates + +- [ ] **Threat-model opens before the fix.** Adversary capability + asset + impact horizon. +- [ ] **CVE number explicit** (response posts). +- [ ] **Upstream commit linked** (response posts). +- [ ] **UTC timeline with parallel workstreams** (response posts). +- [ ] **Upstream attribution** for exploit mechanics. Defer to upstream researcher write-up; do not re-publish working exploit. +- [ ] **Scope caveats explicit.** What the post does not cover. +- [ ] **Probabilistic register for adversary capabilities.** *"Could," "might," "if X then Y."* No FUD framing. +- [ ] **Closing move: follow-up + what we'd do differently + disclaimer.** No "we have you covered" closure. +- [ ] **Named external researchers credited.** +- [ ] **Multi-engineer byline + acknowledgements paragraph** for CVE response. + +## Launch gates + +- [ ] **Scale-then-headline-number opening.** Headline result above the first H2. +- [ ] **Confident technical "we" register.** Declarative present. +- [ ] **Architecture overview in first or second screen.** +- [ ] **Component walkthroughs in body.** +- [ ] **Results section with quantified envelope.** +- [ ] **Closing move: forward roadmap with named next steps.** Declarative, scoped, not aspirational. +- [ ] **Multi-author byline.** +- [ ] **Headline number not buried below the first H2.** + +## Tutorial gates + +- [ ] **Use case + capability promise + "we show you how" lede.** +- [ ] **Prerequisites stated.** Explicit. The genre's most common craft failure is missing prerequisites. +- [ ] **Conceptual primer before stepwise.** +- [ ] **Runnable code, end-to-end, copy-paste safe.** No elision in tutorial code; full snippets. +- [ ] **Imperative procedural voice.** Second person or "we show you." +- [ ] **Caveats section.** +- [ ] **Closing move: next-steps pointer with a runnable artifact** (GitHub repo / "what to try next" path). + +## Research-translation gates + +- [ ] **Discipline-framed problem + paper link in first scroll.** arXiv link inside Quick Links block above first body paragraph. +- [ ] **Method overview section** with diagram. +- [ ] **Evaluation table** with named benchmarks. +- [ ] **Analysis / ablation section.** +- [ ] **Open-source / availability section.** +- [ ] **Acknowledgments block + licensing/scope footnote.** Inheritance from academic papers. +- [ ] **Capability declarative; limitations hedged conditional.** +- [ ] **No corporate-announcement disguised as research.** + +## Publishable / hold-for-review / rework rubric + +After walking the gates: + +- **✅ Publishable** — all universal gates and all archetype-specific gates pass. Lint script exits 0. +- **⚠️ Hold for review** — universal gates pass; ≥1 archetype-specific gate is ambiguous; lint script passes. Author + editor jointly decide. +- **❌ Rework** — ≥1 universal gate fails OR ≥2 archetype-specific gates fail OR lint script fails. Do not publish; return to the relevant phase (1–5) for repair. + +## Disclosure blockers + +Specific blockers that prevent publication regardless of other gates: + +- **Security disclosure constraints** — CVE coordination embargo, legal review pending, customer notification not yet complete. Hold until resolved. +- **Postmortem before remediation closure** — remediations not shipped; publish only if "what we're doing next" section names remediations as in-flight with owners and ETAs. +- **AI/agent post with no benchmark** — must label internal evaluation explicitly as "internal ablation, no external benchmark" with documented composition. Do not name-drop a benchmark that was not run. +- **Migration with no dated status snapshot** — reads as if completion is being claimed before earned. Add the snapshot or downgrade the closing to "in-flight migration update." +- **Performance claim with no measurable data** — downgrade to qualitative language ("noticeably faster on cold start") or cut the claim. Do not inflate with placeholder numbers. + +## Lint script integration + +Run `python3 /scripts/lint-post.py ` (read-only). The script automates the following gates from the checklist: + +- Triumphal-vocabulary density (configurable threshold). +- Hedged-lede patterns (regex against the first 200 words). +- Uncaptioned figures (Markdown image syntax without surrounding caption). +- Evidence-free percent claims (numeric percentages not followed by a metric reference within N lines). +- Blame-by-implication patterns ("While our", "Although the team"). +- "We're excited to announce" template. +- Code blocks over 30 lines without an elision marker (`// ...` or `# ...`). +- Headline-vs-body callback (extracts first 200 + last 200 words; reports if they share no nouns). + +Lint failures are blockers, not warnings. The pre-publish gate does not pass with open lint findings. + +The script exits 0 on a clean draft and non-zero with a structured findings report on any failure. Run it before walking the manual gates so the automated subset is already resolved. diff --git a/.agents/skills/writing-tech-post/references/publisher-voice-matrix.md b/.agents/skills/writing-tech-post/references/publisher-voice-matrix.md new file mode 100644 index 000000000..d62a2f442 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/publisher-voice-matrix.md @@ -0,0 +1,64 @@ +# Publisher Voice Matrix + +Cross-publisher matrix for seven publishers (Datadog, Vercel, GitHub, AWS, Meta, Cloudflare, Jane Street) by six surface features, plus banned moves per publisher. + +## Contents + +- [How to use this matrix](#how-to-use-this-matrix) +- [The seven-publisher matrix](#the-seven-publisher-matrix) +- [Extended publisher signatures](#extended-publisher-signatures) + +## How to use this matrix + +Triangulate publisher voice on six axes (none alone is the voice): + +1. **Byline weight** — single / two / multi / multi + named acknowledgements +2. **Sentence length distribution** — short imperative / medium technical / long essayistic +3. **Vendor / product mention density** — low / moderate / high +4. **Evidence reflex** — which evidence form the publisher reaches for first +5. **Opening register** — lede preference +6. **Closing register** — engineering reflection / hard CTA / forward-work / open-source / HN link / acknowledgements + +When tuning a draft to a publisher's house style, audit the lede and closer against the publisher's row, then sweep the body for evidence-reflex match. + +## The seven-publisher matrix + +| Publisher | Register / Person / Tense | Lede preference | Evidence reflex | Hedging budget | Closing register | Banned move | +|-----------|---------------------------|----------------|-----------------|---------------|------------------|-------------| +| **Datadog** | Formal-quantitative, first-person plural, mixed-tense narrative ("we observed", "we now persist") | Problem-from-experience hook; sometimes a "square-wave failure pattern"-style coined-term lede | Annotated percentile graphs over time, named services and package versions | Moderate; uses "we have to accept", "this incident reminded us" to signal genuine learning | Engineering reflection or system-design claim — never a hard CTA on postmortems and reliability essays | Never opens an incident post with "we are excited"; refuses to genericise subsystem names | +| **Vercel** | Two sub-voices. **Customer-narrative:** marketing-adjacent, short sentences, frequent block quotes from named customer roles. **Product-changelog:** short imperative, code-first | Customer name (customer voice) OR feature framing + code block (changelog voice) | Customer block quotes; or code blocks + architecture diagrams | Low; customer voice avoids hedge entirely, asserting outcomes via quote | Hard CTA — "Start Deploying" / "Talk to an Expert" buttons; never reflection | Customer-narrative voice never narrates engineering tradeoffs in first-person engineer voice; the marketing-byline guarantees the register | +| **GitHub** | Engineer-as-individual under a personal byline; first-person plural for the team plus occasional first-person singular asides; sentence length range is widest in corpus | Stakes-then-problem two-step: one sentence on user stakes, one sentence on engineering problem | Mixed: metric tables, screenshots, code listings, INP percentile graphs | Moderate; one-line teaser subtitle often functions as the hedge ("The path to better performance is often found in simplicity") | Reflection paragraph plus a one-line invitation; never a hard CTA | An engineer is never anonymised in a GitHub post; ghost-written-byline anti-pattern is structurally prevented by named photo + per-author archive | +| **AWS** | Capability-catalog third-person prose; passive constructions; mid-to-long sentences with heavy noun phrases; "you" as reader, never first-person plural for the team | Problem-the-reader-already-has framing ("If you're architecting cloud systems for AI development on AWS, you've likely discovered that…") | Architecture diagrams + captioned figures + "**Note:**" advisory call-out boxes after each sub-section | Low; "you can", "you should" predominates over "we believe" | Roll-up of services plus documentation links; sub-blog footer; documentation *is* the CTA | Never surfaces individual author byline in the visible header; the sub-blog identity outranks the engineer | +| **Meta** | Institutional-multi-author; first-person plural across teams; ALL-CAPS "POSTED ON" prelude; medium-to-long sentences with frequent passives | TL;DR bullet summary of "we're sharing / we're proposing / we hope" *before* any prose | Labelled framework diagrams with capitalised category names; step-numbered methodology sections | Moderate; framework introductions ("PQC Migration Levels") stated assertively, every section bounded by what the framework can claim | Forward-looking paragraph on what the program will enable next; optional formal disclaimer in italics | Refuses single-author byline for migration / security / capacity-efficiency posts; if the work is institutional, the byline must reflect it | +| **Cloudflare** | Network-engineering-academic; short-to-medium high-info-density sentences; ISO date format and explicit minute read; first-person plural with occasional first-person singular in author asides | CVE date or network-event framing ("On April 29, 2026, a Linux kernel local privilege escalation vulnerability was publicly disclosed…") | Kernel-level C in the body, sequence diagrams of cryptographic handshakes, RFC + IETF draft citations as hyperlinks | High where uncertain (probabilistic framing), low where load-bearing (specific commit hash, FIPS standard number) | "Discuss on Hacker News" link, tag list, related posts; explicit acknowledgements paragraph naming all responders | Never re-publishes a working exploit before upstream patch is widely deployed; explicitly defers exploit mechanics to upstream researcher write-up | +| **Jane Street** | Technical-essayistic; single-author byline; longest sentences in corpus; willing to use first-person singular alongside editorial "we"; OCaml/compilers/hardware vocabulary | Essayistic, meta-discursive — situates the post within a larger conversation; occasional self-aware aside | Long code listings (full OCaml function definitions), trace visualisations, performance graphs | High where the assumed-reader bar is high; allows author confessional | Reflection, optional pointer to a tech talk, prev/next pair; no CTA, no Hacker News link, no marketing | Refuses to dilute author voice for a corporate template; refuses to attach a sales footer; refuses to genericise OCaml-specific vocabulary | + +## Extended publisher signatures + +Sketched from sampled material — useful when targeting a non-matrix publisher. + +- **Canva** — single-author postmortem byline with a personal LinkedIn link and a "post incident review" pre-header. Register is institutional-restrained: *"This is our first publicly shared incident report. We're doing this as part of our commitment to transparency, accountability, and continuous improvement."* Strong vendor-naming discipline (verbatim Cloudflare quote with attribution). + +- **Docker** — single-engineer voice with personal asides in a series format ("This is issue 1 of a new series"). Mixes citation-heavy explainer prose with first-person reflection ("The simplest mental model I've found"). Never names a single competitor product without immediately citing public coverage that already named it. + +- **Slack** — multi-author institutional voice for security/investigation posts with named agent personas (Director/Expert/Critic) treated as discrete artifacts. Knowledge-pyramid diagrams. Three-persona structured-output rubrics named explicitly. + +- **Tailscale** — wider personal register (founder-and-engineer mix) with regular monthly-update digests. Tolerates first-person singular more freely than most corporate engineering blogs. + +- **Pinterest** and **Dropbox** — tend toward Meta-shape multi-author posts but with lighter framework branding. + +- **Netflix** — "In a previous post…" continuation lede; mid-length; lessons-from-the-trenches close. + +- **Stripe** — strategic-essay register (sibling genre — refuse to author with this skill unless explicitly labeled as strategic). + +- **AWS sub-blogs** (Architecture, Database, Compute, Machine Learning) — each has slight register variation; the Architecture sub-blog tolerates more first-person plural than the parent main blog. + +## When the publisher is not in the matrix + +Pick the closest match across the six axes. If unclear, default to: + +- **Single-engineer voice** → GitHub or Jane Street, depending on whether the post is engineer-narrative (GitHub) or essayistic (Jane Street). +- **Multi-author institutional** → Meta or Cloudflare, depending on whether the post is framework-shaped (Meta) or response-shaped (Cloudflare). +- **Product-marketing-adjacent** → Vercel customer voice (and strongly consider whether the post should be authored at all by this skill — sibling genre). +- **Tutorial / how-to** → AWS register. +- **Operational deep-dive** → Datadog register. diff --git a/.agents/skills/writing-tech-post/references/security-and-reliability.md b/.agents/skills/writing-tech-post/references/security-and-reliability.md new file mode 100644 index 000000000..cc1a366c3 --- /dev/null +++ b/.agents/skills/writing-tech-post/references/security-and-reliability.md @@ -0,0 +1,102 @@ +# Security and Reliability + +The security/reliability specialty surface's contract: threat-model opening, layered-defense walkthrough, coordinated-disclosure four-panel, probabilistic register for adversary capabilities, CVE + upstream-commit citation contract, and the three sub-variants (preparedness, industry-migration, educational-explainer). + +## Contents + +- [Opening: threat model first, not the fix](#opening-threat-model-first-not-the-fix) +- [Four-panel canonical structure](#four-panel-canonical-structure) +- [Coordinated-disclosure contract (four moves)](#coordinated-disclosure-contract-four-moves) +- [Probabilistic register for adversary capabilities](#probabilistic-register-for-adversary-capabilities) +- [Evidence obligations](#evidence-obligations) +- [Three sub-variants](#three-sub-variants) +- [Closing move: follow-up + disclaimer](#closing-move-follow-up--disclaimer) +- [Silent failure modes (security slop)](#silent-failure-modes-security-slop) + +## Opening: threat model first, not the fix + +The security genre opens with *adversary capability + asset + impact horizon* before describing the work. + +Exemplars: + +- **Cloudflare Copy Fail** — opens with `CVE-2026-31431`, the kernel subsystem (`AF_ALG` and the crypto API), and the exploit primitive (*"4-byte write past boundary"*). +- **GitHub PQ SSH** — opens with the SNDL (Store Now, Decrypt Later) attacker who stores ciphertext today to decrypt later. +- **Meta PQC** — opens with the 10–15-year quantum timeline and SNDL paragraph before any Meta-specific work. +- **Docker horror stories** — structures the entire post as six numbered threat categories with sub-shapes. + +Opening with the fix instead of the threat model reads as marketing; the reader is told what to deploy without being told what they are defending against. + +## Four-panel canonical structure + +Not every panel is used by every post; preventive posts skip disclosure timing, response posts foreground it. + +1. **Threat-model opening** — adversary capability + asset + impact horizon. +2. **Background / system context** — what the reader needs to know about the system being defended. +3. **Layered-defense walkthrough** — explicit enumeration of the defenses, each named. +4. **Disclosure-timing section** (for CVE responses) — UTC timeline with parallel workstreams. +5. **Mitigations close** — follow-up + disclaimer (see closing move). + +## Coordinated-disclosure contract (four moves) + +When responding to a public CVE, four moves are non-negotiable. + +1. **Threat-model opens before the fix.** GitHub PQ-SSH opens with SNDL adversary capability, not with the cipher rollout. Cloudflare Copy Fail opens with the CVE number and exploit primitive. + +2. **UTC timeline non-negotiable.** Cloudflare's table starts at `2026-04-29 16:00 — Copy Fail publicly disclosed` (the moment the secret leaves the coordinated-disclosure circle) and runs through patched-LTS rollout. The timeline shows the gap between public disclosure and protection, names parallel workstreams, and proves the response was not retroactive narrative. + +3. **Upstream attribution is the canonical source.** Cloudflare defers exploit mechanics with *"A comprehensive write-up can be found in the original Xint Code disclosure post"* and links the upstream kernel-tree commit `a664bf3d603d`. The post participates in responsible disclosure by pointing at upstream sources rather than re-publishing a self-contained exploit. + +4. **Scope caveats made explicit.** GitHub PQ-SSH: *"This only affects SSH access and doesn't impact HTTPS access at all… does not affect GitHub Enterprise Cloud with data residency in the United States region. Only FIPS-approved cryptography may be used within the US region, and this post-quantum algorithm isn't approved by FIPS."* Cloudflare PQ-IPsec openly admits the Palo Alto Networks interop gap. + +Naming the boundary stops the reader from generalising past it. + +## Probabilistic register for adversary capabilities + +Threat-modeling uses conditional verbs. The hedge is not weakness; it is calibration. + +- Meta: *"sensitive information could be eventually at risk even if quantum computers are still years away."* +- GitHub: *"an attacker could save encrypted sessions now and, if a suitable quantum computer is built in the future, decrypt them later."* + +**Mature form:** *"Research indicates that quantum computers will eventually break conventional public-key cryptography… Although experts estimate this could happen within 10–15 years, sophisticated adversaries could collect encrypted data today"* — produces alarm without panic. + +**FUD anti-pattern:** *"Quantum computers will break the Internet's encryption and your secrets are at risk right now."* + +## Evidence obligations + +- **CVE number** — explicit. *"CVE-2026-31431."* +- **Upstream commit link** — Cloudflare cites `a664bf3d603d`. +- **Named adversary capabilities** — Shor's algorithm, SNDL, 4-byte write past boundary. +- **Behavioural-detection validation timestamps** — when monitoring/lab tests confirmed the mitigation. +- **Named external researchers** — Cloudflare credits *"the Linux upstream maintainers and Copy Fail researchers"*. +- **Explicit scope caveats** — see coordinated-disclosure rule 4. + +## Three sub-variants + +1. **Preparedness / non-incident response** — Cloudflare Copy Fail. No customer impact; structure mirrors postmortem; "outcome" reports the *absence* of impact (*"By the end of the rollout, every machine in our fleet was protected by either a patched kernel or a bpf-lsm program"*). Demonstrates preparedness paid off. + +2. **Industry migration / maturity-model framework** — Meta PQC (PQ-Unaware → PQ-Enabled ladder), Cloudflare PQ-IPsec. Industry-wide framework instead of an immediate response. Must populate the framework with the publisher's own trajectory (no framework-without-instance). + +3. **Educational explainer / failure analysis** — Docker horror stories (six numbered categories anchored in dated cited incidents), Datadog eBPF hardening 5-year retrospective. Pedagogical register; cites public coverage of named incidents. + +## Closing move: follow-up + disclaimer + +Notably more modest than launch-post closes. + +- **Cloudflare Copy Fail** closes with three remediation items and *"at Cloudflare we're always learning and improving."* +- **Meta PQC** closes with *"Sharing our strategy and learnings doesn't mean the process is complete"* followed by formal disclaimer paragraph. + +The **disclaimer paragraph** is a genre signal: the post is sharing a framework, not warrantying an outcome. Meta closes its PQC post with an italicised paragraph stating the article *"does not constitute professional, technical, or legal advice, nor does it constitute a guarantee of any particular security outcome."* + +**Acknowledgements** of named external researchers + a multi-engineer team are part of the credibility contract. A risk story by one named author looks fragile; one credited across a full response team looks load-tested. + +## Silent failure modes (security slop) + +- **FUD-style threat framing** — *"the next quantum apocalypse,"* *"the AI threat is reshaping…"* +- **Over-disclosure of working PoC** — re-publishing the exploit before upstream patch is widely deployed. +- **Under-disclosure paraphrased into uselessness** — *"there was a vulnerability that has been addressed."* +- **"We have you covered" closures** without naming residual gaps. +- **Missing coordinated-disclosure timeline** — for response posts. +- **Generic "industry best practices"** — without naming the specific defenses, scopes, residual gaps. +- **Capability without scope caveat** — generalising past what the deployment actually covers. +- **Single-author byline on a CVE response** — distributed responsibility expected; multi-engineer byline is part of the credibility contract. +- **Missing upstream attribution** — exploit mechanics described as if discovered by the publisher when external researchers found them. diff --git a/.agents/skills/writing-tech-post/references/voice-and-disclosure.md b/.agents/skills/writing-tech-post/references/voice-and-disclosure.md new file mode 100644 index 000000000..79bc99a6b --- /dev/null +++ b/.agents/skills/writing-tech-post/references/voice-and-disclosure.md @@ -0,0 +1,140 @@ +# Voice and Disclosure + +House voice management plus the four disclosure contracts: blameless register (postmortems), coordinated-disclosure four-panel (security), paper-link-first attribution (AI), and "what we'd do differently" honesty (migrations). + +## Contents + +- [Six diagnostic axes of publisher voice](#six-diagnostic-axes-of-publisher-voice) +- [Blameless register (three rules)](#blameless-register-three-rules) +- [Coordinated-disclosure four-panel (security)](#coordinated-disclosure-four-panel-security) +- [Paper-link-first attribution (AI/agent)](#paper-link-first-attribution-aiagent) +- ["What we'd do differently" honesty (migration / retrospective)](#what-wed-do-differently-honesty-migration--retrospective) +- [Probabilistic register for threat-modeling](#probabilistic-register-for-threat-modeling) +- [Vendor-naming conventions](#vendor-naming-conventions) +- [Acknowledgements as distributed-responsibility signal](#acknowledgements-as-distributed-responsibility-signal) +- [Disclaimer paragraphs as genre signal](#disclaimer-paragraphs-as-genre-signal) +- [Charity toward the predecessor system](#charity-toward-the-predecessor-system) +- [Voice anti-patterns (consolidated)](#voice-anti-patterns-consolidated) + +## Six diagnostic axes of publisher voice + +Publisher voice is the residue: what persists across posts at a publisher when authors, archetypes, and subject matter change. Triangulate on six axes (no single axis is the voice): + +1. **Byline weight.** Single-author / two-author / multi-author / multi-author + named acknowledgements paragraph. +2. **Sentence length distribution.** Short imperative / medium technical / long essayistic. +3. **Vendor / product mention density.** How often the publisher's own products surface in prose (low for engineering-deep-dive; high for product-changelog voice). +4. **Evidence reflex.** Which evidence form the publisher reaches for first (Datadog: percentile graphs; Vercel customer voice: block quotes; Jane Street: long code listings; Cloudflare: kernel-level C + sequence diagrams). +5. **Opening register.** Lede preference — felt experience / scale-then-number / threat-model / paper-link-first / shipping-status. +6. **Closing register.** Engineering reflection / hard CTA / forward-work / open-source repo / explicit acknowledgements / "Discuss on Hacker News" link. + +Full per-publisher matrix lives in `publisher-voice-matrix.md`. + +## Blameless register (three rules) + +The postmortem genre's signature voice contract. Reverse the polarity (active where blame would land, passive where credit is claimed) and the genre fails. + +1. **System-subject sentences.** Subjects in the causality narrative are systems, not people. *"The affected asset was a JavaScript file responsible for displaying the editor's object panel"* (Canva `050:61`) — not *"the engineer who deployed that bundle."* *"On start-up of v248, systemd-networkd flushes all IP rules it does not know about"* (Datadog `034:48`) — not *"the systemd maintainers chose to flush."* + +2. **Passive voice where it isolates fault, active voice where it claims learning.** *"An automated upgrade was triggered at 6:00 UTC"* (passive — isolates fault from any individual decision). *"We have built substantially more robust persistent disk storage"* (active — claims the learning as institutional). Google's general active-voice preference relaxes specifically in postmortem causality sections. + +3. **First-person plural for ownership.** Postmortems use "we" — not "Datadog" or "the SRE team" — when claiming responsibility *and* learning. Canva's *"We attempted to work around this issue… Unfortunately, it didn't mitigate"* (`050:76`) names the failed mitigation without distancing from it. Third-person ("Datadog experienced an outage") appears only in the summary. + +**Tone calibration: between detached and contrite.** Blameless does not require false neutrality. Datadog's *"this incident reminded us"* and *"we have to accept"* acknowledges the event was significant. Overly neutral postmortems read as detached; overly contrite ones read as performative. + +**The "read aloud and replace 'we' with the publisher's name" test.** If the sentence still scans, the register is correct. If it sounds awkward, the "we" was hiding individual blame. + +## Coordinated-disclosure four-panel (security) + +Four moves the security genre encodes. Not optional; reading conventions of the genre. + +1. **Threat-model opens before the fix.** GitHub PQ-SSH opens with SNDL adversary capability, not with the cipher rollout. Cloudflare Copy Fail opens with the CVE number, kernel subsystem (`AF_ALG`), and exploit primitive ("4-byte write past the boundary"), then sequences how mitigation followed. + +2. **UTC timeline non-negotiable for public CVE response.** Cloudflare's table starts at `2026-04-29 16:00 — Copy Fail publicly disclosed` (the moment the secret leaves the coordinated-disclosure circle) and runs through patched-LTS rollout. The timeline anchors disclosure ethics: it shows the gap between public disclosure and protection, names parallel workstreams, and proves the response was not retroactive narrative. + +3. **Upstream attribution is the canonical source.** Cloudflare defers exploit mechanics with *"A comprehensive write-up can be found in the original Xint Code disclosure post"* and links the upstream kernel-tree commit `a664bf3d603d`. The post participates in responsible disclosure by pointing at upstream sources rather than re-publishing a self-contained exploit. + +4. **Scope caveats made explicit.** GitHub PQ-SSH: *"This only affects SSH access and doesn't impact HTTPS access at all… does not affect GitHub Enterprise Cloud with data residency in the United States region. Only FIPS-approved cryptography may be used within the US region."* Cloudflare PQ-IPsec openly admits the Palo Alto Networks interop gap. Naming the boundary stops the reader from generalising past it. + +## Paper-link-first attribution (AI/agent) + +MLE-STAR places the arXiv link inside a "Quick links" block above the first body paragraph. Slack's investigation-agent post cites Meta-Prompting and Multi-Persona Self-Collaboration in its second scroll. Google's scaling-agents post cites four named benchmarks in its second paragraph. + +The convention signals that the post participates in a research conversation, even when the host blog is product-facing. + +**Slop variant:** name-drop the paper title without summarising the result, or worse, claim the result without linking the paper. + +**Strong form:** pair the citation with a one-sentence summary of the load-bearing claim the citation supports. + +## "What we'd do differently" honesty (migration / retrospective) + +Migration writing retains the messiness: + +- Meta WebRTC explicitly describes *"thousands of duplicate symbol errors,"* *"hundreds of thousands of lines were modified across thousands of files"* (`199:96`). +- Slack EMR discloses orphaned processes, custom SSH operators, audit complexity — rather than claiming "we migrated 700 jobs with zero issues." +- Datadog's evaluation-platform publishes a *deliberate short-term regression* (11% pass-rate drop) as a credibility signal. +- Canva's API-gateway report admits *"we'd underestimated the impact of the bug and didn't expedite deploying the fix"* — a Canva-side contributing factor alongside the upstream contribution. + +Triumphal language ("successfully," "smoothly," "without issue") at any density is a regression. The "successfully / smoothly / without issue" lint catches drift into triumphal register. + +## Probabilistic register for threat-modeling + +Sentences like *"sensitive information could be eventually at risk even if quantum computers are still years away"* (Meta) and *"an attacker could save encrypted sessions now and, if a suitable quantum computer is built in the future, decrypt them later"* (GitHub) signal calibrated uncertainty. The verbs are conditional. + +The hedge is not weakness; it is calibration. + +**Mature form:** *"Research indicates that quantum computers will eventually break conventional public-key cryptography… Although experts estimate this could happen within 10–15 years, sophisticated adversaries could collect encrypted data today"* — produces alarm without panic. + +**FUD anti-pattern:** *"Quantum computers will break the Internet's encryption and your secrets are at risk right now."* + +**Capability vs limitation register switch.** Capability claims declarative present: *"MLE-STAR uses web search to retrieve relevant and potentially state-of-the-art approaches"*. Limitations hedged future-conditional: *"LLM-generated Python scripts carry the risk of introducing data leakage."* Capability and limitation in the same voice = over-claiming. + +## Vendor-naming conventions + +- **Name the partner when shared responsibility exists, and quote them verbatim.** Canva quotes Cloudflare's full statement about the stale traffic-management rule with attribution (`050:59`), then immediately follows with a Canva-side contributing factor. The verbatim quote establishes the partner's accountability without Canva making claims on the partner's behalf. + +- **Name your own products as instruments, not features.** Datadog network-latency mentions Network Performance Monitoring because it is what the engineers used — not as a CTA. Vendor density is moderate and instrumental. Vercel customer posts invert this: Vercel/Next.js/Preview Deployments/performance analytics/Vercel CLI appear in close succession as a capability catalog. + +- **Name competitor and third-party products specifically when they are load-bearing.** Cloudflare PQ-IPsec names Cisco 8000 Series Secure Routers (version 26.1.1+), Fortinet FortiOS 7.6.6+, Palo Alto Networks RFC 9370 implementation, including the interop gap. Specificity is the credibility move. + +- **Genericise only when security requires it, and disclose the genericisation.** A postmortem that cannot name its own systems should disclose why ("for security reasons, we are not naming the specific kernel module") rather than silently substituting "a critical microservice" or "an upstream provider." + +- **Vendor names live at R5 (implementation), not R1/R2 (framing).** The single rule that separates an engineering post from a customer story. Apply the **"remove the vendor's name" test** as a pre-publish gate (full text in `depth-and-abstraction.md`). + +## Acknowledgements as distributed-responsibility signal + +A risk story by one named author looks fragile; one credited across a full response or migration team looks load-tested. + +- Cloudflare Copy Fail names 26 engineers plus *"the Linux upstream maintainers and Copy Fail researchers."* +- Meta's PQC framework lists colleagues across Transport Security, WhatsApp, Facebook/Messenger, Infrastructure, Reality Labs, Hardware, and Payments. + +Multi-engineer bylines on incident and security posts act as partial defense against the ghost-written-byline anti-pattern. + +## Disclaimer paragraphs as genre signal + +Meta closes its PQC post with an italicised paragraph stating the article *"does not constitute professional, technical, or legal advice, nor does it constitute a guarantee of any particular security outcome."* The disclaimer signals the post is sharing a framework, not warrantying an outcome. + +Launch posts routinely warranty throughput, latency, and cost claims. The security genre explicitly refuses to. + +## Charity toward the predecessor system + +Migration writing characterises the prior state with charity, not contempt: + +- Datadog: *"The shared database limps along for as long as it possibly can — and then some."* +- Meta WebRTC: *"Permanently forking a big open-source project can result in a common industry trap. It starts with good intentions."* +- Slack: *"It worked. But it came with some potential problems."* + +The charity is a craft move: migration posts are often read by the engineers who built the prior system, and contemptuous register burns credibility internally and externally. + +## Voice anti-patterns (consolidated) + +- **Hedged "we want to share some thoughts on…" lede** — burns the first paragraph on apologia. Replace with a stakes-and-problem two-step (GitHub) or a CVE-date framing (Cloudflare). +- **"We're excited to announce…" template** — banned at Datadog by editorial reflex; reads as marketing on any engineering post. +- **Blame-by-implication** — *"While our autoscaling capability was outpaced…"* is weaker than *"Our autoscaling was outpaced."* The "While"/"Although" construction signals more concern for reputation than learning. +- **Paper-name-dropping without summary** — cite the paper, then summarise the load-bearing claim in one sentence. +- **False-precision metrics** — *"99.3% accuracy"* is meaningless without the dataset, the baseline, and the class-imbalance disclosure. +- **AI-eval-as-anecdote** — a single transcript or screenshot is proof-of-concept, not a production claim. +- **Mismatched closing CTA** — a 4,000-word postmortem that ends with "Sign up for a free trial" alienates both audiences. Postmortems close with engineering reflection; security responses close with follow-up work + hedge; migrations close with dated remaining work; AI posts close with a callable artifact. +- **Ghost-written-byline** — a post carrying an engineer's name but written by a marketer ventriloquising the engineer. The prose loses the texture of decisions-made-under-uncertainty. +- **Triumphal migration post** — *"We migrated 700 jobs with zero issues"* reads as luck or untruth. +- **"We have you covered" security close** — promises completeness without naming defenses, scopes, and residual gaps. Strong specimens close with explicit follow-up lists and hedge clauses. +- **Over-redacted postmortem** — *"a critical microservice," "an upstream provider."* Readers notice. Disclose why redaction is necessary, do not silently genericise. diff --git a/.agents/skills/writing-tech-post/scripts/lint-post.py b/.agents/skills/writing-tech-post/scripts/lint-post.py new file mode 100755 index 000000000..454a8c512 --- /dev/null +++ b/.agents/skills/writing-tech-post/scripts/lint-post.py @@ -0,0 +1,390 @@ +#!/usr/bin/env python3 +"""lint-post.py — read-only slop-signature scanner for engineering blog drafts. + +Automates a subset of the pre-publish gates from +references/pre-publish-checklist.md. Treats every finding as a blocker (exit +non-zero); warnings live in the manual checklist. + +Usage: + python3 lint-post.py [--config ] + python3 lint-post.py --help + +Scans: + 1. Triumphal-vocabulary density ("successfully", "smoothly", "without issue") + 2. Hedged-lede patterns (first 200 words) + 3. "We're excited to announce" template + 4. Blame-by-implication patterns ("While our", "Although the team") + 5. Uncaptioned figures (image syntax without surrounding caption) + 6. Evidence-free percent claims (% not followed by metric reference within N lines) + 7. Code blocks over 30 lines without elision marker (`// ...` or `# ...`) + 8. Headline-vs-body callback (first 200 + last 200 words share zero nouns) + +Output: + Human-readable report on stdout. Exits 0 on a clean draft; exits 1 with a + findings count on any blocker. + +Read-only. Never writes to the draft. Never invokes git, package managers, or +network. +""" +from __future__ import annotations + +import argparse +import re +import sys +from dataclasses import dataclass, field +from pathlib import Path + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- + +TRIUMPHAL_WORDS = ( + "successfully", + "smoothly", + "without issue", + "without issues", + "seamlessly", + "flawlessly", + "with zero issues", +) +TRIUMPHAL_MAX = 2 # long posts tolerate up to two occurrences; short posts up to one + +HEDGED_LEDE_PATTERNS = ( + r"\bin this (?:post|article),?\s+we['’]ll\s+(?:explore|share|walk|cover)", + r"\bwe wanted to share\b", + r"\bwe['’]?d like to share\b", + r"\bthis article is written for\b", + r"\bin this blog post,?\s+we['’]ll\b", + r"\bthis post (?:will|aims to|attempts to)\b", +) + +EXCITED_TEMPLATES = ( + r"\bwe['’]?re excited to announce\b", + r"\bwe are excited to announce\b", + r"\bwe['’]?re thrilled to (?:announce|share|introduce)\b", + r"\bwe are thrilled to (?:announce|share|introduce)\b", +) + +BLAME_BY_IMPLICATION = ( + r"\bwhile our\b", + r"\bwhile the team\b", + r"\balthough the team\b", + r"\balthough our\b", + r"\bunfortunately,?\s+(?:the|our)\s+(?:engineer|developer|on-?call)\b", +) + +PERCENT_CLAIM = re.compile(r"(\b\d+(?:\.\d+)?\s?%)") +EVIDENCE_KEYWORDS = ( + "p50", + "p90", + "p95", + "p99", + "p999", + "percentile", + "baseline", + "benchmark", + "chart", + "table", + "figure", + "graph", + "histogram", + "distribution", + "sample size", + "sample", + "ablation", + "measured", + "evaluated", + "vs ", + "before", + "after", + "regression", +) + +LEDE_WINDOW_WORDS = 200 +EVIDENCE_LOOKAHEAD_LINES = 6 +CODE_BLOCK_MAX_LINES = 30 +ELISION_MARKERS = ("// ...", "# ...", "/* ... */", "// elided", "") + + +@dataclass +class Finding: + rule: str + severity: str # "blocker" | "warning" + location: str + snippet: str + + def render(self) -> str: + return f" [{self.severity.upper()}] {self.rule} @ {self.location}\n > {self.snippet}" + + +@dataclass +class Report: + draft: Path + findings: list[Finding] = field(default_factory=list) + + @property + def blockers(self) -> list[Finding]: + return [f for f in self.findings if f.severity == "blocker"] + + @property + def warnings(self) -> list[Finding]: + return [f for f in self.findings if f.severity == "warning"] + + def render(self) -> str: + if not self.findings: + return f"✅ {self.draft} — clean (0 findings)\n" + lines = [f"❌ {self.draft} — {len(self.blockers)} blocker(s), {len(self.warnings)} warning(s)\n"] + for finding in self.findings: + lines.append(finding.render()) + return "\n".join(lines) + "\n" + + +# --------------------------------------------------------------------------- +# Helpers +# --------------------------------------------------------------------------- + +def _first_n_words(text: str, n: int) -> str: + return " ".join(text.split()[:n]) + + +def _last_n_words(text: str, n: int) -> str: + return " ".join(text.split()[-n:]) + + +def _line_for_offset(text: str, offset: int) -> int: + return text.count("\n", 0, offset) + 1 + + +def _extract_lede(body: str, n: int = LEDE_WINDOW_WORDS) -> str: + # Skip frontmatter if present. + if body.startswith("---"): + end = body.find("\n---", 3) + if end != -1: + body = body[end + 4 :] + return _first_n_words(body.strip(), n) + + +def _extract_closer(body: str, n: int = LEDE_WINDOW_WORDS) -> str: + return _last_n_words(body.strip(), n) + + +def _strip_code_blocks(text: str) -> str: + return re.sub(r"```.*?```", "", text, flags=re.DOTALL) + + +def _strip_markdown_links(text: str) -> str: + return re.sub(r"\[([^\]]+)\]\([^)]+\)", r"\1", text) + + +def _nouns(text: str) -> set[str]: + text = _strip_markdown_links(text).lower() + tokens = re.findall(r"\b[a-z][a-z0-9-]{3,}\b", text) + stop = { + "the", "this", "that", "with", "from", "into", "your", "they", "their", + "have", "were", "been", "would", "could", "should", "about", "what", + "when", "where", "while", "after", "before", "between", "through", + "post", "blog", "article", "section", "paragraph", "team", "company", + } + return {tok for tok in tokens if tok not in stop} + + +# --------------------------------------------------------------------------- +# Checks +# --------------------------------------------------------------------------- + +def check_triumphal(body: str, report: Report, max_occurrences: int = TRIUMPHAL_MAX) -> None: + stripped = _strip_code_blocks(body) + total = 0 + for word in TRIUMPHAL_WORDS: + for match in re.finditer(rf"\b{re.escape(word)}\b", stripped, re.IGNORECASE): + total += 1 + if total > max_occurrences: + line = _line_for_offset(body, match.start()) + report.findings.append( + Finding( + rule="triumphal-vocabulary", + severity="blocker", + location=f"line {line}", + snippet=match.group(0), + ) + ) + + +def check_hedged_lede(body: str, report: Report) -> None: + lede = _extract_lede(body) + for pattern in HEDGED_LEDE_PATTERNS: + match = re.search(pattern, lede, re.IGNORECASE) + if match: + report.findings.append( + Finding( + rule="hedged-lede", + severity="blocker", + location="first 200 words", + snippet=match.group(0), + ) + ) + + +def check_excited_template(body: str, report: Report) -> None: + for pattern in EXCITED_TEMPLATES: + for match in re.finditer(pattern, body, re.IGNORECASE): + line = _line_for_offset(body, match.start()) + report.findings.append( + Finding( + rule="exciting-announcement-template", + severity="blocker", + location=f"line {line}", + snippet=match.group(0), + ) + ) + + +def check_blame_by_implication(body: str, report: Report) -> None: + for pattern in BLAME_BY_IMPLICATION: + for match in re.finditer(pattern, body, re.IGNORECASE): + line = _line_for_offset(body, match.start()) + report.findings.append( + Finding( + rule="blame-by-implication", + severity="blocker", + location=f"line {line}", + snippet=match.group(0), + ) + ) + + +def check_percent_claims(body: str, report: Report) -> None: + lines = body.splitlines() + for idx, line in enumerate(lines): + for match in PERCENT_CLAIM.finditer(line): + window = "\n".join( + lines[max(0, idx - 2) : min(len(lines), idx + 1 + EVIDENCE_LOOKAHEAD_LINES)] + ).lower() + if not any(keyword in window for keyword in EVIDENCE_KEYWORDS): + report.findings.append( + Finding( + rule="evidence-free-percent-claim", + severity="blocker", + location=f"line {idx + 1}", + snippet=match.group(0), + ) + ) + + +def check_code_block_length(body: str, report: Report) -> None: + inside = False + start_line = 0 + buf: list[str] = [] + for idx, line in enumerate(body.splitlines(), start=1): + if line.lstrip().startswith("```"): + if inside: + if len(buf) > CODE_BLOCK_MAX_LINES: + has_elision = any(marker in "\n".join(buf) for marker in ELISION_MARKERS) + if not has_elision: + report.findings.append( + Finding( + rule="long-code-block-without-elision", + severity="blocker", + location=f"lines {start_line}-{idx}", + snippet=f"{len(buf)} lines; no elision marker", + ) + ) + inside = False + buf = [] + else: + inside = True + start_line = idx + buf = [] + elif inside: + buf.append(line) + + +def check_uncaptioned_figures(body: str, report: Report) -> None: + # Detect Markdown image syntax not followed within 2 lines by a caption. + lines = body.splitlines() + image_re = re.compile(r"!\[[^\]]*\]\([^)]+\)") + for idx, line in enumerate(lines): + if image_re.search(line): + # Look ahead 2 lines for an italics caption or "Figure N" / "Table N". + window = "\n".join(lines[idx : min(len(lines), idx + 3)]) + if not re.search(r"\*[^*]+\*|Figure\s+\d|Table\s+\d", window): + report.findings.append( + Finding( + rule="uncaptioned-figure", + severity="blocker", + location=f"line {idx + 1}", + snippet=image_re.search(line).group(0), + ) + ) + + +def check_callback_coupling(body: str, report: Report) -> None: + lede = _extract_lede(body, 200) + closer = _extract_closer(body, 200) + lede_nouns = _nouns(lede) + closer_nouns = _nouns(closer) + if not lede_nouns or not closer_nouns: + return + overlap = lede_nouns & closer_nouns + if len(overlap) == 0: + report.findings.append( + Finding( + rule="callback-coupling-failure", + severity="blocker", + location="first 200 + last 200 words", + snippet="zero shared nouns between lede and closer", + ) + ) + elif len(overlap) < 2: + report.findings.append( + Finding( + rule="weak-callback-coupling", + severity="warning", + location="first 200 + last 200 words", + snippet=f"only one shared noun: {', '.join(sorted(overlap))}", + ) + ) + + +# --------------------------------------------------------------------------- +# Entry point +# --------------------------------------------------------------------------- + +def lint(draft_path: Path) -> Report: + body = draft_path.read_text(encoding="utf-8") + report = Report(draft=draft_path) + check_triumphal(body, report) + check_hedged_lede(body, report) + check_excited_template(body, report) + check_blame_by_implication(body, report) + check_percent_claims(body, report) + check_code_block_length(body, report) + check_uncaptioned_figures(body, report) + check_callback_coupling(body, report) + return report + + +def main(argv: list[str] | None = None) -> int: + parser = argparse.ArgumentParser( + description=( + "Read-only slop-signature scanner for engineering blog drafts. " + "Automates a subset of writing-tech-post's pre-publish gates." + ) + ) + parser.add_argument("draft", type=Path, help="Path to a Markdown draft.") + args = parser.parse_args(argv) + + if not args.draft.exists(): + print(f"ERROR: draft not found: {args.draft}", file=sys.stderr) + return 2 + if args.draft.is_dir(): + print(f"ERROR: draft is a directory: {args.draft}", file=sys.stderr) + return 2 + + report = lint(args.draft) + print(report.render()) + return 1 if report.blockers else 0 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/.agents/skills/writing-tech-post/scripts/validate-metadata.py b/.agents/skills/writing-tech-post/scripts/validate-metadata.py new file mode 100644 index 000000000..fd501e1f2 --- /dev/null +++ b/.agents/skills/writing-tech-post/scripts/validate-metadata.py @@ -0,0 +1,53 @@ +import re +import sys +import argparse + +def validate_metadata(name, description): + errors = [] + + # 1. Validate Name Length + if not (1 <= len(name) <= 64): + errors.append(f"NAME ERROR: '{name}' is {len(name)} characters. Must be between 1-64.") + + # 2. Validate Name Characters (lowercase, numbers, single hyphens) + # Regex: Starts/ends with alphanumeric, allows single hyphens in between + if not re.match(r"^[a-z0-9]+(-[a-z0-9]+)*$", name): + errors.append( + f"NAME ERROR: '{name}' contains invalid characters. " + "Use only lowercase letters, numbers, and single hyphens. " + "No consecutive hyphens, and cannot start/end with a hyphen." + ) + + # 3. Validate Description Length + if len(description) > 1024: + errors.append( + f"DESCRIPTION ERROR: Description is {len(description)} characters. " + "Must be 1,024 characters or fewer." + ) + + # 4. Check for Third-Person Perspective (Basic Heuristic) + first_person_words = {"i", "me", "my", "we", "our", "you", "your"} + desc_words = set(re.findall(r'\b\w+\b', description.lower())) + found_forbidden = first_person_words.intersection(desc_words) + if found_forbidden: + errors.append( + f"STYLE WARNING: Description contains first/second person terms: {found_forbidden}. " + "Use third-person imperative (e.g., 'Creates...', 'Updates...')." + ) + + if errors: + print("\n".join(errors), file=sys.stderr) + sys.exit(1) + else: + print("SUCCESS: Metadata is valid and optimized for discovery.") + sys.exit(0) + +if __name__ == "__main__": + parser = argparse.ArgumentParser( + description="Validate agent skill metadata (name and description) against the agentskills.io spec." + ) + parser.add_argument("--name", required=True, help="Skill name (1-64 chars, lowercase, numbers, single hyphens).") + parser.add_argument("--description", required=True, help="Skill description (max 1024 chars, third-person).") + args = parser.parse_args() + validate_metadata(args.name, args.description) + diff --git a/docs/articles/2026-05/01-agh-network-protocol.mdx b/docs/articles/2026-05/01-agh-network-protocol.mdx new file mode 100644 index 000000000..af19be934 --- /dev/null +++ b/docs/articles/2026-05/01-agh-network-protocol.mdx @@ -0,0 +1,235 @@ +--- +title: "agh-network/v0: an open protocol for agent-to-agent coordination" +description: "MCP connects agents to tools. agh-network/v0 connects agents to each other — six message kinds, deterministic envelopes, and a reference NATS profile any ACP-compatible runtime can implement." +date: 2026-05-20 +category: protocol +tags: + - protocol + - agh-network + - agent-coordination + - nats + - launch +author: pedronauck +archetype: + primary: ai-agent + absorbed: launch +audience: peer-engineer-deep +depth: + opening: R1 + body: R2-R4 + closing: R1 + traversal: spiral +status: publishable +--- + +MCP gave agents a wire to their tools. Two agents in two terminals still have no wire to each other. They can both call a database, run a script, hit an API — and never address each other. That gap is the whole point of `agh-network/v0`, the AGH agent-to-agent protocol whose spec lives at [agh.network/protocol](https://agh.network/protocol). + +The protocol is small on purpose. Six message kinds. One envelope. One reference NATS profile. An optional Ed25519 trust profile that does not require a certificate authority. Any ACP-compatible runtime — Claude Code, OpenClaw, Hermes — can implement it without importing the AGH SDK. + +AGH Runtime ships the reference implementation today. The protocol ships separately, as a wire contract any runtime can speak. The sections below walk the envelope, the six kinds, the NATS subject mapping, the signing rules, the three conformance tiers, and the minimal subset a non-AGH runtime needs to participate. + +## The gap MCP leaves + +MCP solved one direction. An agent in a terminal can reach a database server, a file server, a prompt server, or a tool server — anything one MCP server can wrap. Two agents in two terminals can each reach the same MCP servers and never address each other. + +That second wire is what most agent stacks paper over. One team writes a custom Slack relay. Another tails stdout from one CLI into another. A third hands the second agent the first agent's API key and prays the audit trail makes sense later. None of those compose. None survive the next runtime. + +The shape of the missing wire is not exotic. A peer needs to announce itself, get found, take work, report progress, return a receipt, and transfer a capability another agent can run. Those are the discrete verbs an agent network needs. `agh-network/v0` defines exactly that set — six message kinds — and stops there. The point is not a planner. The point is the boundary that lets independent runtimes hand off work without inventing a new integration for every pair. + +Two contrasts make the boundary concrete (internal contrast; no external benchmark): + +- **Without the protocol.** Each pairing is bespoke glue. Sender semantics, retry rules, capability discovery, and audit shape get re-derived per integration. Switching runtimes means rewriting the glue. +- **With `agh-network/v0`.** Both peers emit the same envelope grammar over the same routing rules. A new runtime joins by speaking the protocol, not by importing an SDK or registering with a central broker. + +The next question is how that envelope is actually shaped on the wire. + +## What an agh-network/v0 envelope looks like + +Every message is a UTF-8 JSON object. The required top-level fields are fixed: `protocol`, `id`, `workspace_id`, `kind`, `channel`, `from`, `ts`, `body`. Conversation-bearing kinds add `surface` plus `thread_id` or `direct_id`. Lifecycle-bearing kinds add `work_id`. The full grammar is in the [envelope reference](https://agh.network/protocol/envelope). + +A minimal `say` posted inside a public thread carries this much: + +```json +{ + "protocol": "agh-network/v0", + "id": "msg_demo_1776366000", + "workspace_id": "ws_alpha", + "kind": "say", + "channel": "builders", + "surface": "thread", + "thread_id": "thread_release_check", + "from": "ops-coordinator.session-42", + "to": null, + "ts": 1776366000, + "body": { + "text": "Release branch staging is ready for smoke checks.", + "intent": "notice" + }, + "proof": null +} +``` + +Four fields make that envelope portable. `protocol` pins the version. `workspace_id` scopes every routing decision, so the same channel slug in two workspaces is two different channels. `from` is a peer identifier matching `[a-z0-9][a-z0-9._-]{0,127}` — no DNS, no CA, no global registry. `proof` is an explicit `null` for v0; a v1 receiver knows the field exists and is intentionally empty rather than absent. + +Receivers validate in a fixed order: parse JSON → check required fields → check `expires_at` and replay-age → check kind-specific surface and work rules → check the body — then route. The reference implementation rejects messages whose `expires_at` has passed and rejects messages older than 300 seconds when `expires_at` is absent. Two peers built from these rules can hand a message back and forth without any other agreement. + +What sits in `kind` is the next question. + +## Six kinds, one verb each + +`agh-network/v0` defines exactly six kinds. Each carries one verb. Nothing else is on the wire today, and adding a kind is a protocol-version change — not a runtime feature flag. + +| Kind | Verb | +| ------------ | ---------------------------------------------------------- | +| `greet` | "I'm here, here is my Peer Card." | +| `whois` | "Who matches this query?" | +| `say` | "Talk inside this thread or direct room." | +| `capability` | "Take this reusable artifact." | +| `receipt` | "I accept, reject, or cancel this work." | +| `trace` | "Here is progress, results, or a failure on this work." | + +`greet` broadcasts a Peer Card — display name, supported profiles, advertised capabilities, accepted trust modes. Receivers refresh local peer presence. The reference runtime repeats `greet` every 30 seconds as a presence heartbeat and expires peers that miss two intervals. + +`whois` walks the directory. A request can ask by Peer ID, capability, artifact type, or trust mode. The response carries a Peer Card and a `reply_to` field that points at the request envelope. Discovery never sets `surface`, `thread_id`, `direct_id`, or `work_id`; envelopes that try are rejected. + +`say` is the only kind built for conversation. It carries `body.text`, an optional `intent`, and optional artifact descriptors. A `say` without a `work_id` is free-form chat. A `say` with a `work_id` opens lifecycle-bearing work bound to one conversation container. + +`capability` transfers one full capability artifact — `id`, `summary`, `outcome`, optional outline, constraints, examples, and a runtime-computed digest. The receiver inspects, adapts, or runs it. `capability` is the canonical artifact term across AGH; the protocol does not invent a synonym such as recipe, workflow, or playbook. + +`receipt` is the admission control surface. Statuses are fixed: `accepted`, `rejected`, `duplicate`, `expired`, `unsupported`, `canceled`. A receipt is the only honest way for a peer to take responsibility for a `work_id`, and the only way another peer learns the work was refused. + +`trace` carries progress. State (`working`, `waiting`, `completed`, `failed`, `canceled`), step labels, and result or artifact references travel inside the body. Trace events make work visible without a separate telemetry pipeline. + +Six kinds is enough surface for delegation; one more would be a different contract. What carries the bytes is the next question. + +## NATS as the reference profile + +The core protocol is transport-independent. The reference v0 profile binds it to NATS Core because NATS already gives subject hierarchy, broadcast, and direct routing without inventing a new fabric. Another transport — WebSocket, QUIC, libp2p — is protocol-legal; the v0 binding is the first one published. + +Subjects follow one shape: + +```text +agh.network.v0...broadcast +agh.network.v0...peer. +``` + +Broadcast carries `greet`, broadcast `whois`, public-thread `say`, and broadcast `capability`. Directed traffic carries targeted `whois`, addressed `say`, `receipt`, and `trace` on the receiver's direct subject. + +```text + broadcast subject direct subject + agh.network.v0.ws_alpha.builders agh.network.v0.ws_alpha.builders + .broadcast .peer. + ▲ ▲ + greet, whois, │ │ receipt, trace, + public say │ │ targeted say, + │ │ capability transfer + ┌────────┴─────────┐ ┌─────────┴────────┐ + │ AGH Runtime A │ ─── envelope on NATS ─→ │ Non-AGH runtime B│ + │ (peer alpha) │ ←── envelope on NATS ── │ (peer bravo) │ + └──────────────────┘ └──────────────────┘ +``` + +The same envelope crosses both runtimes; the NATS subject decides whether it reaches everyone in the channel or only the addressed peer. + +The route token derives from the peer identity. For an unverified v0 peer, the token is the first 32 lowercase hex characters of `SHA-256(peer_id)` — no central registry, no allocation step. For a v1 verified handle in `nickname@fingerprint` form, the token is the fingerprint itself. Each peer subscribes to two subjects per joined channel: the channel broadcast and its own peer-direct. + +A tight slice of the reference subject mapping makes the contract concrete: + +```go +// internal/network/transport.go — minimal slice; validation elided. +func DirectSubject(workspaceID, channel, peerID string) (string, error) { + // ... ValidateWorkspaceID + ValidateChannel checks omitted + token, err := RouteToken(peerID) + if err != nil { + return "", err + } + return subjectPrefix + "." + workspaceID + "." + channel + ".peer." + token, nil +} +``` + +`subjectPrefix` is the constant `"agh.network.v0"`. `RouteToken` is the SHA-256 derivation above. There is nothing else between the envelope's `to` field and the NATS subject. + +NATS gives best-effort delivery, not exactly-once. Receivers deduplicate by envelope `id` inside a bounded replay window. Application acceptance does not piggy-back on NATS publish success; it lives in the `receipt` kind. JetStream is optional — a deployment may layer it for persistence and replay, but JetStream acknowledgements do not replace `receipt`. + +The split is deliberate. Transport delivery means the broker accepted bytes. Protocol acceptance means a peer took responsibility for a `work_id`. Conflating the two is how agent systems ship phantom work. + +How a peer establishes that the bytes came from a real sender is the next question. + +## Signing without a CA + +Trust is a separate profile. The v0 core leaves `proof` opaque; it does not require any verification. The v1 baseline trust profile — `agh-network.trust.ed25519-jcs/v1` — adds Ed25519 signatures over RFC 8785 JSON Canonicalization Scheme bytes. No certificate authority. No revocation list. No DNS round-trip. + +The identity shape is self-certifying. A verified handle is `nickname@fingerprint`, where the fingerprint is the first 32 lowercase hex characters of `SHA-256(public_key)`. A receiver who sees `patch-worker@56475aa75463474c0285df5dbf2bcab7` already knows which key to expect. The `proof` block carries the algorithm, key fingerprint, raw public key, and the Ed25519 signature over the JCS-canonicalized envelope with `proof.sig` omitted. + +This shape buys two properties. First, the signature covers every routing field — `channel`, `from`, `to`, `surface`, `thread_id`, `direct_id`, `work_id`, `ts`, and the full body — so an attacker between peers cannot alter routing without invalidating the signature. Second, there is no global trust root to compromise. A receiver decides policy after verification; the protocol classifies the proof as `verified`, `unverified`, or `rejected` and stops there. + +One defense ships with the model: a `nickname@fingerprint`-shaped identity arriving without a valid `proof` is `rejected`, not `unverified`. Stripping the proof off a verified-format identity is not a downgrade path. + +The current AGH Runtime ships v0 only — `proof` is preserved as opaque JSON, and the trust profile is documented for v1 implementers. What "supporting `agh-network/v0`" actually means in conformance terms is the next question. + +## Three conformance tiers + +Conformance is graded, not binary. Three tiers, each strictly additive: + +| Tier | What it includes | Required of | +| --------- | ----------------------------------------------------------------- | --------------------------------------------------------------------------------- | +| **Core** | Envelope + the six kinds + lifecycle + delivery + extension rules. | Any runtime, bridge, test harness, or library that exchanges AGH Network messages. | +| **Trust** | Core + the Ed25519 + JCS baseline. | Implementations that claim cryptographic sender verification. | +| **Full** | Trust + the NATS v0 transport binding. | Peers that interoperate over `agh-network/v0` on NATS. | + +Every tier requires all six kinds. The split is not "which kinds to ship" but "which profiles to add": + +| Tier | greet | whois | say | capability | receipt | trace | Adds | +| ----- | ----- | ----- | ---- | ---------- | ------- | ----- | --------------------- | +| Core | MUST | MUST | MUST | MUST | MUST | MUST | envelope grammar | +| Trust | MUST | MUST | MUST | MUST | MUST | MUST | Ed25519+JCS proofs | +| Full | MUST | MUST | MUST | MUST | MUST | MUST | NATS subject binding | + +Every tier requires all six kinds; the differentiator is the additional profile each tier layers on. A Core Peer validates envelopes, routes them, and emits them. A Trust Peer additionally verifies baseline proofs and advertises trust support in its Peer Card. A Full Peer additionally binds to the NATS subject grammar above. + +Roles can be claimed narrowly. A harness that only validates incoming envelopes publishes itself as `Core Receiver`. A signed-only sender publishes itself as `Trust Sender`. A bridge that translates one runtime's messages into AGH Network envelopes publishes `Core Peer + NATS Peer`. The [conformance page](https://agh.network/protocol/conformance) lists every legal combination plus the self-certification record an implementation publishes — claimed level, fixture-suite commit, passed groups, and any deviations. + +This grading is what makes the protocol adoptable in pieces. The next question is what those pieces look like inside a runtime that does not start from AGH. + +## Adopting the protocol from a non-AGH runtime + +The minimal participant is a Core Sender. The [minimal sender guide](https://agh.network/protocol/guide/minimal-sender) builds one in any language; the Go version fits in roughly 60 lines and produces a single valid `say` envelope. Starting there is deliberate: the wire format is concrete before NATS or signing enters the picture. + +A useful peer comes next. To interoperate over the reference profile, a runtime adds: + +1. NATS subscriptions for `agh.network.v0...broadcast` and `agh.network.v0...peer.` for each joined channel. +2. A periodic `greet` broadcast that carries the local Peer Card. +3. Handlers for `whois` requests that match the local capability set. +4. `say` ingress that maps inbound messages into the local session model. + +Lifecycle-bearing work is one more step: emit `receipt` admissions, send `trace` events as state changes, and bind every directed `say` and `capability` to a single conversation container with a stable `work_id`. The container — `surface:"thread"` with `thread_id` or `surface:"direct"` with `direct_id` — is the thing receipts and traces correlate against. + +A non-AGH runtime that goes this far gains one operational property: its messages and AGH-originated messages are indistinguishable on the wire. The same `receipt` rules, the same delivery posture, the same audit shape. AGH Runtime gets no privileged path; the boundary is the envelope. + +The reference runtime's CLI is also the agent-manageable control point. `agh network send`, `agh network peers`, `agh network directs resolve`, and `agh network inbox` are the verbs an agent or operator uses to send, discover, open a direct room, and read incoming traffic — each available with `-o json` for structured callers. A third-party runtime that publishes the equivalent surface earns the same property: agents drive their own coordination through stable verbs, not through a UI. + +What is still alpha is the next question. + +## What is still alpha + + + `agh-network/v0` is alpha. The wire shape may still evolve before v1. The six kinds, the envelope grammar, and the workspace-qualified routing are stable enough to build against today, but each of the items below names a real boundary an implementer should plan around. + + +- **Trust is documented but not enforced by the reference runtime.** The Ed25519 + JCS profile is normative; AGH Runtime v0 routes `proof` as opaque JSON and does not classify trust state at the routing layer. Third-party trust implementations are valid protocol-side; cross-implementation verification is not yet a shipped runtime gate. +- **No standalone conformance suite ships today.** Repository tests cover the AGH Runtime v0 implementation. A transport-agnostic fixture suite for third-party self-certification is documented in the conformance page but not yet a runnable artifact in a separate repository. +- **NATS Core is the only reference transport binding.** Other carriers are protocol-legal because the core is transport-independent, but no second binding is published in v0. +- **JetStream is optional and operational.** The baseline binding is NATS Core. Persistence, replay, and dead-letter handling live in the JetStream profile and stay outside core acceptance semantics. +- **Peer Cards advertise capability claims, not proofs.** A receiver still validates actual behavior; the card is a hint, not a contract. + +The next protocol revision keeps the v0 wire shape and turns the trust profile on by default. The six kinds do not change. + +## An implementer's path + +A protocol is only as open as the smallest peer that can speak it. `agh-network/v0` starts at one valid envelope and grows in declared tiers, with the reference runtime, the spec, and the minimal-sender guide all pointing at the same wire. + +- **Read the spec.** The full normative reference is at [agh.network/protocol](https://agh.network/protocol). The envelope, the message kinds, the NATS binding, the trust profile, and the conformance levels each have their own page. +- **Run the minimal sender.** [agh.network/protocol/guide/minimal-sender](https://agh.network/protocol/guide/minimal-sender) walks through a Core Sender in any language. Output is one envelope on stdout; no transport or signing required. +- **Inspect the reference runtime.** [github.com/compozy](https://github.com/compozy) ships the daemon, the NATS-backed v0 binding, the CLI, and the protocol fixtures. The envelope code lives under `internal/network`, the only package in the runtime that imports NATS. + +Two agents that can speak this protocol are no longer trapped in two terminals, no longer routed by a custom relay, and no longer dependent on a shared SDK. MCP gave agents a wire to their tools. `agh-network/v0` gives agents a wire to each other. diff --git a/docs/articles/2026-05/02-memory-as-real-work.mdx b/docs/articles/2026-05/02-memory-as-real-work.mdx new file mode 100644 index 000000000..be4b20116 --- /dev/null +++ b/docs/articles/2026-05/02-memory-as-real-work.mdx @@ -0,0 +1,370 @@ +--- +title: "Memory as real work: Markdown scopes, dreaming, and why agents need a workplace" +description: "AGH gives memory three scopes, keeps it as Markdown on disk, indexes it with a typed taxonomy, and runs a gated consolidation loop. Memory is a first-class primitive, not a vector-DB afterthought." +date: 2026-05-20 +category: runtime +tags: + - runtime + - memory + - consolidation + - launch +author: pedronauck +archetype: + primary: launch +audience: engineer-adopter +depth: + opening: R1 + body: R2-R3 with R4 dive + closing: R1 + traversal: staircase +status: publishable +--- + +## The session that forgets + +A working agent ends a session knowing things. It learned that the deploy script +refuses to run from a dirty worktree. It learned that the QA lab on port 2125 is +the operator's; the one on 2127 is for parallel scenarios. It learned that the +user prefers verbs in commit messages, not nouns. + +Then the session stops. Everything the agent absorbed lives in transcript bytes +that the next session will not read. The user opens a new tab tomorrow and +re-explains the deploy script. The agent re-derives the port convention from +context, getting it wrong twice. The commit message comes back in the old +voice. + +This is not a documentation problem. It is a runtime problem. Real agent work +is iterative — the same workspace, the same operator, the same set of +preferences and rules and pinned references — and an agent that cannot carry +forward what it just learned is an agent that cannot do real work. The web is +full of "AI memory" pitches that mean a vector database wired behind a chat +box. That is not what an agent at a desk needs. An agent at a desk needs the +notebook on the desk: human-readable, file-backed, the same shape across +sessions, addressable by both the agent and the operator who sits next to it. + +AGH ships memory as a runtime primitive on those terms. Three scopes. Markdown +on disk. A typed four-term taxonomy. A consolidation loop gated by three +conditions before it ever wakes up. The next sections walk through each piece +and end on the command an operator runs to inspect the whole thing. + +## Three scopes, one merge order + +Where memory lives is the first question. The runtime answers it with three +scopes, each a directory of Markdown files, each addressed by a stable name in +the contract. + +```mermaid +flowchart LR + G["global
~/.agh/memory/"] -->|merged into| S + W["workspace
<workspace>/.agh/memory/"] -->|merged into| S + A["agent
tier: global | workspace"] -->|merged into| S + S["session view"] + classDef base fill:#1d1d22,stroke:#3a3a44,color:#e6e6ec; + classDef out fill:#2a1e15,stroke:#e8572a,color:#ffe8d6; + class G,W,A base; + class S out; +``` + +*Three scopes feed one session view. Agent overrides workspace; workspace +overrides global. Higher-precedence entries shadow lower ones by name.* + +The scopes are not a suggestion in the source — they are a closed enum. +`internal/memory/contract/enums.go:13-16` declares `ScopeGlobal`, +`ScopeWorkspace`, and `ScopeAgent` as the only legal values. Anything outside +this set fails validation at parse time. Agent scope adds a second axis: an +`AgentTier` of `workspace` or `global`, so an agent definition can carry +private preferences either for one project or across the whole AGH install. + +The merge order is the load-bearing part. When the same memory file name +appears in more than one scope, agent wins over workspace, workspace wins over +global, and the loser is shadowed — still on disk, still inspectable, but +hidden from the active session unless the operator asks for it with +`--include-shadowed`. That precedence is what lets an agent override a +workspace convention for one project without rewriting the global note, and +lets a workspace override a personal default without editing global truth. + +The default *write* scope is not picked at random either. The runtime resolves +it from the memory type: +`user` and `feedback` land in `global`, `project` and `reference` land in +`workspace` (`internal/memory/contract/enums.go:119-130`). The agent never has +to think "where does this go" for the common cases. + +That answers where memory lives. The next question is what each entry actually +looks like on disk. + +## Why Markdown, not a vector database + +A vector database is the obvious move for an "agent memory" feature. AGH +explicitly does not make that move. Memory is plain Markdown — one entry per +file, YAML frontmatter on top, free-form Markdown below — sitting in a flat +directory the operator can `ls`, `grep`, and `git diff` like any other text in +the repository. + +``` +~/.agh/memory/ +├── MEMORY.md # human-readable index, one line per entry +├── user_role.md +├── feedback_testing.md +└── reference_dashboards.md + +/.agh/memory/ +├── MEMORY.md +├── project_migration_plan.md +└── reference_api_spec.md +``` + +*Memory on disk: one Markdown file per entry, one index file per scope. The +index lives at `internal/memory/store.go:27` as `indexFilename = "MEMORY.md"`.* + +Each entry carries a validated header. `internal/memory/contract/types.go:16-27` +defines it: `name`, `description`, `type`, `scope`, optional `agent_name` and +`agent_tier`, optional `provenance` block. `internal/memory/document.go:11-28` +parses and validates it before any read or write path touches the file. +Frontmatter that does not match the schema is rejected — not silently +normalized — so an operator who hand-edits a file gets an explicit error +instead of a corrupted record. + +The shape choice is deliberate. Markdown on disk gives the runtime four +properties a vector store cannot: + +1. **Inspectable.** An operator opens a file and reads it. There is nothing to + decode and no embedding to interpret. When an agent does something + surprising, the memory directory is one of the first places to look. +2. **Diffable and version-controllable.** A workspace's `.agh/memory/` can sit + inside the repository. The same review tools that catch a bad commit catch + a bad memory write. +3. **Editable by hand.** The operator can fix a bad entry with a text editor. + The agent reads it the next session. +4. **Portable.** A workspace clone carries its memory. A new machine inherits + global memory by copying one directory. + +A vector database trades all of that for similarity search. AGH keeps the +Markdown and adds the index separately — covered in the next section — so the +file remains the truth and the index remains a derived view. + +## A typed index over plain files + +Storage is not retrieval. An agent reaching for "what does the operator prefer +about commit messages?" cannot scan every Markdown file on disk for every +prompt. The runtime needs a typed index over the directory, and that index has +to be opinionated enough to be useful. + +AGH closes the type vocabulary to four values. `Type` in +`internal/memory/contract/enums.go:95-99`: + +```go +const ( + TypeUser Type = "user" + TypeFeedback Type = "feedback" + TypeProject Type = "project" + TypeReference Type = "reference" +) +``` + +Each value is load-bearing. `user` covers persona, role, preferences, +knowledge. `feedback` covers rules and corrections — the "don't mock the +database in these tests" guidance the operator gave once and would rather not +give again. `project` covers ongoing work: who is doing what, why, by when. +`reference` covers pointers into external systems — Linear projects, Grafana +dashboards, Slack channels — so the agent knows where to look without +guessing. + +The four-value taxonomy is what lets recall be deterministic. When an agent +asks the runtime for memories at the start of a session, AGH walks the three +scopes in precedence order, applies the type filter, materializes a session +view, and records why each entry surfaced into the recall trace. The trace is +inspectable by `agh memory recall trace get` and the rebuild path is +`agh memory reindex` — both visible through the CLI in +`internal/cli/memory.go`. There is no opaque ranker. The same query produces +the same answer until the underlying files or scopes change. + +The closed type set has a second job: it constrains the *write* side. A new +entry without a valid `type` does not get written. An agent attempting to +invent a fifth type fails at the controller decision point +(`OpReject` in `internal/memory/contract/enums.go:155`) before anything +touches disk. + +That handles storage and retrieval at rest. The interesting work happens when +the runtime decides to fold transient session signal back into durable memory +— what AGH calls dreaming. + +## Dreaming: three gates before consolidation runs + +Consolidation is the move where a runtime looks at recent session activity, +distills it, and writes the distilled signal back into memory as a curated +entry. Done eagerly, it burns model tokens and grinds the disk. Done lazily, +it never catches up. AGH gates it with a cascade — three conditions, evaluated +cheapest to most expensive, every condition must pass before a dream session +spawns. + +The cascade lives in `internal/memory/dream.go`: + +```go +// ShouldRun evaluates the time and session gates in that order. +func (s *Service) ShouldRun() (bool, error) { + // ... + lastConsolidatedAt, err := s.lock.LastConsolidatedAt() + if err != nil { + return false, err + } + if !s.timeGatePasses(lastConsolidatedAt) { + // gate 1: enough wall-clock time has elapsed since the last run + return false, nil + } + + completedSessions := 0 + if s.minSessions > 0 { + completedSessions, err = s.countSessionsSince(lastConsolidatedAt) + // ... + } + if completedSessions < s.minSessions { + // gate 2: enough sessions have completed in that window + return false, nil + } + return true, nil +} +// ... +// gate 3: acquire the file lock; concurrent daemons cannot race +priorMtime, ok, err := s.acquireLock() +``` + +*The cascade is ordered by cost. The time gate is a single stat on the lock +file. The session gate walks a directory of session metadata. The lock gate +performs an atomic file-system acquire. Each gate is a cheaper "no" before a +more expensive one ever runs.* + +The defaults sit at the top of the same file (`dream.go:21-23`): +`defaultMinHours = 24` and `defaultMinSessions = 3`. Both are tunable through +the runtime config under `[memory.dream]`. The lock path resolves to a single +file under the global memory directory; the runtime uses `flock`-style +exclusive acquire so a second daemon instance — for example, a parallel QA +lab — cannot start a competing consolidation against the same store +(`ErrLockUnavailable` at `dream.go:28` carries that signal explicitly). + +When all three gates pass, the runtime spawns a dedicated session. The default +agent is `dreaming-curator` (`consolidation/runtime.go:62`), a one-shot +session of type `SessionTypeDream` that reads the recent session corpus, +proposes new memory candidates through the controller, and exits. The +controller's decision values — `OpAdd`, `OpUpdate`, `OpDelete`, `OpReject`, +`OpNoop` in `internal/memory/contract/enums.go:150-156` — are the only ways a +dream session can change disk state. There is no shortcut around the +controller. + +Two pieces of the dreaming path are deliberately partial. The catalog-driven +signal gate that selects which candidates a dream run promotes +(`dreamSignalGateResult` in `dream.go:283-311`) is scaffolded but expects a +populated catalog to do its sharpest work; without it, the runtime still +spawns dream sessions and still writes curated entries, but the candidate +ranking is conservative. The promotion CLI (`agh memory promote`) for moving +an entry between scopes or agent tiers is shipped and works +(`internal/cli/memory.go:131`) but lives alongside automatic promotion logic +that is still being tuned against real-session data. Both gaps are visible in +the source; neither blocks day-to-day use. + +That answers when consolidation fires. The remaining question is how a human +— or another agent — actually touches the memory the runtime holds. + +## Operator surface: `agh memory` + +Every primitive in this post is reachable from one CLI surface. `agh memory` +is the operator-facing verb set, mounted in `internal/cli/memory.go:116-143`, +and every command supports `-o json` for agent consumption: + +```bash +$ agh memory list --scope workspace +NAME TYPE AGE +project_migration_plan project 2h +reference_api_spec reference 1d +project_release_freeze project 3d + +$ agh memory show feedback_testing.md --scope global +--- +name: feedback-testing +description: Integration tests must hit a real database, not mocks. +type: feedback +--- +Reason: prior incident where mock/prod divergence masked a broken +migration. +How to apply: applies to every integration suite under `internal/`. + +$ agh memory list --scope agent --agent reviewer --agent-tier workspace +NAME TYPE AGE +feedback_review_tone feedback 12h +``` + +*The same verbs work for the runtime, the operator, and another agent +reaching across UDS. Output is structured under `-o json` for downstream +tools.* + +The verb set covers the full lifecycle. `list`, `show`, `write`, `edit`, +`delete` for direct manipulation. `search` for content queries. +`reindex` to rebuild the derived catalog. `history` to audit past +operations. `health` to inspect the catalog and provider state. `promote` +to move an entry between scopes or tiers. `recall` to fetch and trace the +deterministic recall path. `dream` to inspect or trigger consolidation +manually. `scope-show` to resolve which scope and tier a query lands in. +The HTTP surface mirrors the verb set at `/api/memory/*` — +`packages/site/content/runtime/api-reference/memory.mdx` is generated from +the OpenAPI schema and stays in sync with the implementation. + +Agent-manageable is the load-bearing property. Every memory operation an +operator can do through the CLI is reachable by another AGH agent over UDS +with structured input and structured output. There is no UI-only path; the +web view is one consumer of the same surface, not the authority for it. + +## Where memory is still partial + +Truthful posts name their gaps. Memory is shipped, but the runtime has three +specific places where execution is intentionally incomplete: + +- **Catalog signal-driven dream selection.** The dream signal gate + (`internal/memory/dream.go:367-390`) accepts candidate counts and minimum + thresholds, and the cascade respects them, but the upstream catalog that + feeds the signal is still expanding to cover every recall and edit path. + Until the catalog is fully populated, the dream signal gate defaults to a + conservative active-but-permissive posture. +- **Cross-workspace memory promotion automation.** Manual promotion via + `agh memory promote` works and is the canonical path. Automatic promotion + from `workspace` to `global` based on recurrence across workspaces is + scaffolded but not yet wired into the dream runtime. +- **Provider abstraction beyond local.** The `internal/memory/provider/local` + package is the only memory provider currently registered. The provider + interface exists; alternative backends (remote, encrypted, team-shared) + are planned and not shipped. + +None of the gaps block the primary loop: an agent writes a memory, the next +session reads it, the operator can inspect and edit it, and the dreaming loop +keeps the store curated. + +## Try it on your next session + +The fastest way to feel the difference is to give the next session something +to remember. From a workspace where AGH is running: + +```bash +# Write a workspace memory the next session will read by default. +agh memory write \ + --scope workspace \ + --type project \ + --name "Deploy script invariants" \ + --content @- <<'EOF' +The deploy script refuses to run from a dirty worktree. +Always commit or stash before invoking `make deploy`. +EOF + +# Confirm it landed. +agh memory list --scope workspace + +# Show the file as the agent will see it next session. +agh memory show "Deploy script invariants.md" --scope workspace +``` + +The next session in the workspace starts with that note in its recall view. +The session after that one starts with it too. The agent that ended yesterday +knowing things keeps knowing them tomorrow. The notebook stays on the desk. + +The full reference for every verb, every scope, and every gate config lives +in the AGH runtime docs at `agh.network/runtime/memory`. The Markdown is on +disk. The cascade is in `internal/memory/dream.go`. Inspect it, edit it, +break it, and tell the runtime where the model of an agent at a desk still +falls short. diff --git a/docs/articles/2026-05/03-one-daemon-four-surfaces.mdx b/docs/articles/2026-05/03-one-daemon-four-surfaces.mdx new file mode 100644 index 000000000..6395604c3 --- /dev/null +++ b/docs/articles/2026-05/03-one-daemon-four-surfaces.mdx @@ -0,0 +1,199 @@ +--- +title: "One daemon, four surfaces: how AGH stays agent-manageable" +description: "CLI, HTTP/SSE, UDS, and a web UI all read and write the same daemon. Surface parity is a product invariant — the contract is generated, not paraphrased." +date: 2026-05-20 +category: runtime +tags: + - runtime + - architecture + - cli + - http + - uds + - launch +author: pedronauck +archetype: + primary: launch +audience: engineer-adopter +depth: + opening: R1 + body: R2-R3 with R4 dives + closing: R1 + traversal: spiral +status: publishable +--- + +An operator starts a session through the web UI, watches the agent finish a long task, then opens a terminal and asks a second agent to keep going from there. The second agent reads its own scratchpad, runs `agh session list`, and finds an empty list. The state the operator just watched in the browser lives only in browser memory. The CLI agent cannot see it, cannot resume it, cannot inspect it. The work the operator just witnessed was never durable enough to cross one terminal tab. + +That is the failure shape AGH refuses. If a runtime is going to host AI agents as first-class operators, then every piece of state the agent might need has to be reachable from every surface the agent might be standing on. AGH treats this as a product invariant, not a roadmap aspiration: one local daemon, four operator surfaces — CLI, HTTP/SSE, UDS, and web UI — all reading and writing the same SQLite-backed truth through the same Go handler. The rest of this post walks through how the daemon enforces that parity, why the contract is generated instead of paraphrased, and what an operator can do with a runtime that refuses to hide state behind any one window. + +## What state-fragmentation costs operators + +AGH is positioned as a local-first agent operating system, and the word `operating system` is load-bearing. An operating system is the thing every program agrees to talk to. The moment one program — say, the web UI — keeps state the others cannot see, the system stops being an operating system and becomes a collection of apps that happen to share a logo. + +The agent-manageability claim in the AGH copy system makes the consequence explicit: *user-visible runtime capabilities must expose stable machine-readable control surfaces for agents*. The `internal/CLAUDE.md` invariants restate it from the implementation side. *No partial-surface completions. Any change touching a public surface closes the loop end-to-end in one pass: contract → HTTP handler → UDS handler → CLI client → CLI command → extension/config/docs surfaces → tests → docs.* A capability that ships only through the UI is an incomplete capability — a half-feature that breaks the day an agent needs it. + +## One daemon, four windows into it + +AGH is a single Go binary. When `agh daemon start` runs, that one process opens four operator surfaces over the same in-memory handler set and the same SQLite-backed event ledger. The CLI talks to the daemon over a Unix domain socket. The web UI talks to the daemon over HTTP and Server-Sent Events. Local subprocess agents talk to the daemon over the same Unix socket the CLI uses. The daemon never serves a request that bypasses its shared handlers. + +```mermaid +flowchart LR + subgraph daemon[AGH daemon · cmd/agh] + core[internal/api/core · BaseHandlers] + db[(runtime.db · event ledger)] + sessions[internal/session · Manager] + core --- sessions + sessions --- db + end + cli[CLI · agh session list] + web[Web UI · TanStack Query] + agent[Local agent subprocess] + cli -- "UDS · GET /api/sessions" --> core + agent -- "UDS · GET /api/sessions" --> core + web -- "HTTP · GET /api/sessions" --> core + core -- "SSE · named events" --> web +``` + +*Figure 1 — Every operator surface terminates at the same `BaseHandlers` instance; SQLite and the session Manager are the daemon's only sources of truth.* + +The diagram is also a scope contract: nothing in the rest of this post lives outside that center box. There is no second service, no shadow store, no per-surface cache the daemon pretends not to know about. + +## The CLI: scripting register, structured output + +The first surface is the one operators reach for when they want a script. Every CLI verb has the same shape: parse flags, call a typed method on a Unix-socket client, render the response with the operator's chosen format. The body of `agh session list` is small enough to read in one screen. + +```go +// internal/cli/session.go (excerpt — see file:160 for full RunE) +cmd := &cobra.Command{ + Use: sessionListKey, + Short: "List sessions", + Example: ` agh session list + agh session list --all --workspace checkout-api`, + RunE: func(cmd *cobra.Command, _ []string) error { + client, err := clientFromDeps(deps) + if err != nil { + return err + } + sessions, err := client.ListSessions(cmd.Context(), SessionListQuery{ + Workspace: workspaceFilter, + }) + // ... format-aware rendering + return writeCommandOutput(cmd, sessionListBundle(sessions, deps.now)) + }, +} +``` + +*Figure 2 — The CLI verb has no business logic; it delegates to `client.ListSessions`, which speaks the same HTTP wire format the web UI uses.* + +`client.ListSessions` is the load-bearing line. The implementation in `internal/cli/client.go:1916` issues `GET /api/sessions` against the Unix-socket-backed client; the daemon answers that request with the same `BaseHandlers.ListSessions` method that serves the browser. The `-o json` flag — present on every state-reading verb — is the agent-manageable escape hatch: agents pipe structured output to other agents without parsing TTY decoration. + +## HTTP and SSE: long-poll over a generated contract + +The second surface is the one the web UI lives on. AGH terminates HTTP on a loopback port, registers every handler against Gin, and exposes a streaming variant for any state that changes faster than the operator can hit refresh. The SSE writer is six lines plus envelope plumbing. + +```go +// internal/api/core/session_stream.go:33 (excerpt) +for _, event := range events { + afterSequence = event.Sequence + if err := WriteSSE(writer, SSEMessage{ + ID: strconv.FormatInt(event.Sequence, 10), + Name: event.Type, + Data: SessionEventPayloadFromEvent(event, info), + }); err != nil { + return afterSequence, err + } +} +``` + +*Figure 3 — Each SSE frame carries the event's persisted sequence number as its `ID`, its domain type as its `Name`, and the typed payload as `Data`. Reconnecting consumers replay from `Last-Event-ID`.* + +Three properties of that loop matter for parity. The event has already been durably appended to `runtime.db` before the broadcaster sees it — the SSE stream is a projection over the append-only ledger, not an in-memory bus. The `Name` field is the canonical domain event type (`session.stopped`, `prompt.tool_call`, and so on), so consumers branch on the same vocabulary the daemon uses internally. And the `ID` is the database sequence: a web tab that drops its connection reconnects with `Last-Event-ID: 4217` and the daemon resumes from event 4218, no replay storm, no missed transitions. + +The HTTP surface is bound to loopback by default and gated by an origin-aware middleware; nothing about the parity argument depends on exposing the daemon to a network. The point is that *if* a future operator decides to remote a session over WireGuard or an SSH tunnel, the wire format is already the one the local browser uses. + +## UDS: local subprocess control, no port + +The third surface is the one most agents never see by name, which is the point. The daemon listens on a Unix domain socket — `internal/api/udsapi/server.go:723` calls `listenConfig.Listen(ctx, "unix", socketPath)` — and registers the same handlers against a parallel Gin engine. The two registration files are essentially mirrors of each other. + +```go +// internal/api/httpapi/routes.go:72 // internal/api/udsapi/routes.go:65 +func registerSessionRoutes(...) { func registerSessionRoutes(...) { + sessions := api.Group("/sessions") sessions := api.Group("/sessions") + sessions.GET("", handlers.ListSessions) sessions.GET("", handlers.ListSessions) + sessions.POST("", handlers.CreateSession) sessions.POST("", handlers.CreateSession) + // ... // ... +} } +``` + +*Figure 4 — The HTTP and UDS registrars wire the identical `BaseHandlers` method into the same path. Transports differ in authentication and listener, not in business logic.* + +Three things become cheap once the local control plane is a socket instead of a port. Parallel QA runs no longer fight for `:2123`; each isolated `AGH_HOME` gets its own socket path, and an operator can drive five daemons from one shell. Local subprocess agents — Claude Code, OpenClaw, or Hermes spawned by the runtime — authenticate by peer credential rather than by handing tokens around. And the daemon never has to negotiate TLS for a request that never leaves the host. + +The UDS surface is also where the daemon exposes a small set of task-claim verbs that intentionally do not exist on HTTP. `ClaimTaskRun`, `CompleteTaskRun`, and their peers are agent-only operations that take a `claim_token`, and AGH keeps those tokens off any network-reachable surface by registering them only against the local socket. The parity invariant says *most state moves through both transports*; the security invariant says *some state, by design, only moves through the one transport an attacker on the LAN cannot reach*. Both are explicit, and both live in the same registration file an operator can read. + +## The web UI inspects, it does not own + +The fourth surface is the one most product screenshots will show, and it is deliberately the most boring participant. The web UI is a TanStack Query consumer of the same `/api/sessions` route the CLI calls. There is no UI-only state store standing between the browser and the daemon. + +```tsx +// web/src/systems/session/hooks/use-sessions.ts +export function useSessions(workspace: string | null = null, options?: UseSessionsOptions) { + return useQuery({ + ...sessionsListOptions(workspace), + enabled: options?.enabled ?? true, + }); +} + +// web/src/systems/session/adapters/session-api.ts:64 +export async function fetchSessions(workspace?: string, signal?: AbortSignal) { + const { data, error, response } = await apiClient.GET("/api/sessions", { + params: workspace ? { query: { workspace } } : undefined, + signal, + }); + // ... + return requireResponseData(data, response, "Failed to fetch sessions").sessions; +} +``` + +*Figure 5 — The `useSessions` hook resolves to a typed `GET /api/sessions` call against the same daemon route the CLI hits over UDS.* + +The `apiClient` is not a hand-written wrapper. `web/src/lib/api-client.ts:13` constructs it from `createClient()`, where `aghPaths` is the path table imported from the generated TypeScript declaration of the daemon's OpenAPI schema. The browser cannot ask for an endpoint the daemon does not document; the daemon cannot ship a public endpoint that the browser cannot see. The web UI's job is to render and to mutate, not to keep its own version of the truth. + +## Codegen enforces parity, not docs + +Four implementations of the same contract is exactly the situation where prose documentation drifts and a feature ends up working in two places, broken in a third, and aspirational in the fourth. AGH side-steps that drift by treating the contract as code. + +```go +// magefile.go:32 +openAPISpecPath = "openapi/agh.json" +webOpenAPITypePath = "web/src/generated/agh-openapi.d.ts" +// ... +webOpenAPIArtifacts = []openapits.Artifact{ + {SpecPath: openAPISpecPath, OutputPath: webOpenAPITypePath}, + // ... +} +``` + +*Figure 6 — `openapi/agh.json` is the single source of truth for the HTTP/UDS contract; `make codegen` regenerates the TypeScript declaration that the web client is typed against, and `make codegen-check` fails the verify gate if the two have drifted.* + +A backend engineer who adds a route to `internal/api/core/handlers.go` regenerates `openapi/agh.json` and the matching TypeScript declaration in the same change. A web engineer who relies on the new route gets a typed `paths["/api/new-route"]` entry; if the route is removed, the typecheck for every consuming hook breaks. `make verify` — the monorepo's commit gate — runs `codegen-check` before any test stage. The contract cannot quietly diverge because the build refuses to ship a divergent contract. + +The CLI client benefits from the same discipline from the Go side: `unixSocketClient.ListSessions` calls the same path the OpenAPI schema describes, so the same handler-level test fixtures protect the CLI's wire format and the web UI's typed client at once. + +## Pick a verb. Run it everywhere. + +Surface parity is the kind of property that is invisible when it works and catastrophic when it doesn't. The cheapest way to feel it is to pick one verb and run it through every door. + +| Verb | CLI | HTTP | UDS | Web UI | +| --------------------------------- | -------------------------------------- | ------------------------------------------ | ------------------------------------------ | ---------------------------- | +| List sessions | `agh session list -o json` | `GET /api/sessions` | `GET /api/sessions` (over socket) | Session table | +| Create a session | `agh session create --agent claude` | `POST /api/sessions` | `POST /api/sessions` | New-session dialog | +| Stream session events | `agh session events --follow` | `GET /workspaces/:w/sessions/:id/stream` | `GET /workspaces/:w/sessions/:id/stream` | Live transcript view | +| Stop a session | `agh session stop ` | `POST /workspaces/:w/sessions/:id/stop` | `POST /workspaces/:w/sessions/:id/stop` | Stop action on session row | +| Claim a task run (agent-only) | n/a | n/a | `POST /workspaces/:w/task-runs/:id/claim` | n/a | + +*Table 1 — The same daemon answers every cell except the last row, which is intentionally UDS-only because the `claim_token` it returns must never traverse a network surface.* + +That last row is the honest exception. Most verbs an operator or an agent will reach for live on every public surface; a small, named set is locked to UDS for security reasons and the table says so. No partial-surface completions, no hidden UI control, no "this only works in the browser" workflows. + +For an operator picking up AGH today: install the daemon, run `agh session list -o json` from a script, open the same workspace in the web UI, and watch the table populate from the same handler the script just called. For a runtime developer: add one route in `internal/api/core/`, regenerate the OpenAPI schema, and find that the CLI client, the UDS server, the HTTP server, and the web's typed `apiClient` all know about it before the next `make verify` finishes. Agent-manageability is not a promise written into the README; it is the shape the build refuses to break. diff --git a/docs/articles/2026-05/04-detached-lifetime.mdx b/docs/articles/2026-05/04-detached-lifetime.mdx new file mode 100644 index 000000000..78a36f9cd --- /dev/null +++ b/docs/articles/2026-05/04-detached-lifetime.mdx @@ -0,0 +1,157 @@ +--- +title: "Detached lifetime: how AGH keeps agent work running after the HTTP request closes" +description: "An agent runtime that ties execution to the request that started it can never host real work. AGH detaches execution from request lifetime using context.WithoutCancel, with one owner per goroutine and explicit cancel verbs." +date: 2026-05-20 +category: runtime +tags: + - runtime + - architecture + - concurrency + - go + - context +author: pedronauck +archetype: + primary: ai-agent + absorbed: launch + hybrid-note: "AI/agent post in architecture register; design pattern, not incident retrospective" +audience: peer-engineer-deep +depth: + opening: R1 + body: R3-R4 yo-yo + closing: R2 + traversal: yo-yo +status: publishable +--- + +The agent stops streaming mid-tool-call. Refresh the tab and the work is gone — not paused, gone. The model already decided to run the tool. The tool already opened a subprocess. The subprocess is still writing to disk. But the goroutine that was going to read its output, persist the result, and emit the next event has been cancelled — because the browser that triggered the prompt navigated away. + +Nothing in the agent is broken. The agent never got a chance to finish. + +This is the shape every web-bound agent runtime ships with first: the HTTP request that opens a prompt is also the context that cancels it. When the client disconnects, the context fires `Done`, the cancel propagates down through every `select { case <-ctx.Done(): return }` the engineer wrote in good faith, and the entire conversation collapses. Tabs are cheap; agent turns are not. AGH refuses this coupling. The HTTP handler in `internal/api/httpapi/prompt.go` derives the prompt's execution context with `context.WithoutCancel`, so the work outlives the request that started it. The writer loop streams to whoever is listening; the executor keeps going whether or not anyone is. + +The rest of this post is about why that distinction is load-bearing for agent runtimes, what AGH does to make it real, and the four rules that follow once the lifetime split is taken seriously. + +## Why request context is the wrong cancellation parent + +A request and a turn are different units of work. An HTTP request lasts as long as the client keeps the socket open: seconds at most for chat, often much less. A prompt turn lasts as long as the model takes to decide it is done — tool calls, retries, model thinking time, subprocess waits, network sends. Tying them together is a category error. It assumes the client owns the work, when in fact the daemon owns it; the client is one of several views onto it. + +The first time the assumption breaks is dramatic — a closed tab kills a 90-second tool call mid-execution — but the subtler failures are worse. Browser navigation issues a transport abort that closes the response stream; the default Go HTTP stack propagates that abort to the request's context; every downstream goroutine reads `ctx.Done()` and shuts itself down. The agent never sees an error. It sees what it has been told is a normal cancellation, and it cleans up. From the outside, this looks like the agent quietly losing its place. From inside the runtime, it looks like the system doing exactly what its lifetime graph said to do. + +The fix is not "handle the cancel more carefully." The fix is to stop treating the request as the parent. + +## Detaching with `context.WithoutCancel` + +`context.WithoutCancel` is the Go primitive that severs cancellation from a context while preserving everything else — values, instrumentation, request IDs. Calling it returns a context that ignores its parent's `Done` channel. Deadlines vanish too; the standing directive in `internal/CLAUDE.md` flags this as a footgun and requires re-attaching one when the work has a real time budget. + +AGH applies the primitive at the boundary between the HTTP layer and the session layer: + +```go +// internal/api/httpapi/prompt.go +func (h *Handlers) promptSession(c *gin.Context) { + // ... validation, lookup ... + promptCtx, cancelPrompt := context.WithCancel( + context.WithoutCancel(c.Request.Context()), + ) + // ... + events, err := h.Sessions.Prompt(promptCtx, sessionID, message) + // ... + for { + select { + case <-c.Request.Context().Done(): + h.drainPromptEventsAsync(c.Request.Context(), events, cancelOnReturn) + cancelOnReturn = nil + return + // ... event delivery ... + } + } +} +``` + +Two contexts live in the same handler. `c.Request.Context()` belongs to the HTTP request — it cancels when the client disconnects, and it governs the writer loop's appetite for emitting events. `promptCtx` is the detached one — it inherits the request's values but ignores its cancellation. The session manager runs the prompt under `promptCtx`. When the client goes away, the writer's `select` fires, the loop returns, and the prompt keeps executing. The drain goroutine consumes whatever events the executor still emits, so no channel deadlocks behind a closed reader. + +The reading: the line where the contexts diverge is the line where the agent stops being a request handler and starts being a job. + +## One goroutine, one owner + +Detached lifetime makes a different problem urgent. A goroutine that ignores its parent's cancel is also a goroutine that will not stop on its own. Fire-and-forget detachment is how runtimes leak. The runtime needs an explicit supervisor or it ships with a slow OOM. + +AGH's session manager solves this by giving every prompt turn its own execution context, wired into a fallback lifecycle context the manager owns and cancels on shutdown: + +```go +// internal/session/manager_prompt.go +func (m *Manager) promptExecutionContext( + ctx context.Context, +) (context.Context, context.CancelFunc) { + var base context.Context + if ctx != nil { + base = context.WithoutCancel(ctx) + } + if base == nil { + base = m.fallbackLifecycleContext() + } + executionCtx, cancel := context.WithCancel(base) + lifecycleStop := context.AfterFunc(m.fallbackLifecycleContext(), cancel) + return executionCtx, func() { + lifecycleStop() + cancel() + } +} +``` + +The execution context has three cancellation paths. The handler can call `cancelPromptExecution` directly. The manager's fallback lifecycle context fires on daemon shutdown via `context.AfterFunc`. The cancel endpoint can reach the same handle through the session's stored cancel function. The returned `CancelFunc` releases the `AfterFunc` registration when the turn ends normally — without that release, every completed prompt would leak a registration on the lifecycle context. + +The rule the code enforces: every goroutine the session manager spawns is tracked by a manager-owned `sync.WaitGroup` and joined during shutdown. The `agh-code-guidelines` skill states it explicitly — *"No fire-and-forget goroutines. Track with `sync.WaitGroup` (or equivalent owner-side primitive) and join on shutdown."* `internal/CLAUDE.md` adds the architectural form: *"Goroutines spawned by `internal/session/manager_*.go` MUST be tracked by Manager-owned WaitGroup and joined in Manager shutdown."* + +The combination — detached lifetime *plus* explicit supervisor — is what makes the pattern safe. Detachment without supervision is leaking. Supervision without detachment is request-bound execution dressed up. + +## Three places AGH applies the same shape + +The pattern is not a one-off in the prompt path. It recurs across every surface where work outlives the trigger that started it. + +- **Prompts.** The HTTP and UDS prompt handlers in `internal/api/httpapi/prompt.go` and `internal/api/udsapi/prompt.go` both derive their execution context with `context.WithoutCancel`. When the operator closes the web tab or the CLI process exits, the daemon keeps streaming events into the session's event store. A reconnecting client picks up from `after_seq` and replays. + +- **AGH Network channel sends.** Outbound sends on `agh-network/v0` channels detach the same way. A peer-to-peer envelope can take longer than any single request that triggered it; tying the send to the request would corrupt the receipt path. The daemon owns the send; the request owns the acknowledgement that the send was queued. + +- **Automation jobs.** `internal/automation/manager.go` runs scheduled triggers, webhooks, and cron dispatches under a runtime context derived from `context.WithoutCancel(ctx)`. The automation runtime has its own cancel signal — the configuration changes, the trigger is deleted, the daemon stops. None of those reasons should be tangled with whatever request set up the trigger. + +Three different surfaces, one pattern: the work is owned by the daemon, not by the requester. + +## Explicit cancel verbs are the user contract + +If the request can no longer cancel the work, something else has to. The operator needs a way to stop a running prompt, and that way has to be a named verb — not a closed tab, not a transport abort, not a hopeful expectation that disconnect-means-stop. + +AGH ships that verb. The route lives at: + +```text +POST /api/workspaces/:workspace_id/sessions/:session_id/prompt/cancel +``` + +The handler is six lines (`internal/api/httpapi/sessions.go:52`): require the session, call `Sessions.CancelPrompt(ctx, sessionID)`, return 200. The session manager's `CancelPrompt` in `internal/session/manager_prompt.go` looks up the live session, calls the stored cancel handle (`session.cancelCurrentPrompt()`), then asks the driver to interrupt the underlying ACP subprocess and any scoped tool calls in flight. Idempotent: cancelling an already-finished prompt returns nil. + +The same verb exists on the UDS surface for CLI parity. `agh session cancel ` is the operator path; the HTTP route is the web path; both end at the same `Manager.CancelPrompt`. The standing directive on supervision codifies the rule: *"`CancelPrompt`, session stop, and timeout collapse into ONE idempotent cancellation path."* + +Transport-level aborts are a fact about the wire, not about the work. A WebSocket dropping, a TCP reset, a refresh, a `curl` aborted with Ctrl-C — none of them carry intent. They carry network state. An agent runtime that conflates network state with operator intent has built a coordination layer on top of accident. Named cancel verbs make intent legible. They also make it auditable: every cancellation appears in the event store with `actor_id`, `release_reason`, and `claim_token_hash` populated, so an operator (or a peer agent) can answer the question "who stopped this turn?" later. + +## What this implies for any agent runtime + +Detached lifetime is not an AGH idiosyncrasy. It is the consequence of taking three observations seriously at the same time. + +The agent is the thing doing the work. The request is the thing asking for it. Conflating the two is the same category error as treating a database query as the same thing as the JDBC connection that issued it. + +Tool calls are slow and side-effecting. A tool that writes a file, sends a message, or charges a card cannot be cancelled the way a SELECT can be cancelled. Cancelling the parent context can leave the side effect half-done. Detached execution at least makes the inconsistency loud — the work is still going, the operator can see it, and a named cancel verb lets the operator decide whether to roll it back. + +Multiple surfaces watch the same work. AGH exposes a prompt's events through HTTP/SSE for the web UI, through UDS for the CLI, and through the network protocol for peer agents. If any one watcher could cancel the work by leaving, the others would live in a runtime where their view of progress was decided by someone else's network. The daemon has to be the owner. Watchers are watchers. + +## Four rules to take with you + +The pattern decomposes into four rules. They are load-bearing, and they are short. + +1. **The executor and the writer are not the same goroutine.** Derive the executor's context from `context.WithoutCancel`. Bind the writer loop to the request context. When the client leaves, the writer dies and the executor does not notice. + +2. **Every detached goroutine has one owner with a known stop signal.** Track it in a `sync.WaitGroup` the owner controls. Wire it to a lifecycle context the owner cancels on shutdown. Use `context.AfterFunc` to register cancel propagation without spawning supervisor goroutines. + +3. **`context.WithoutCancel` discards deadlines.** Re-attach a timeout when the work has a real time budget. The drain path in `prompt.go` does this explicitly with a 30-second `detachedPromptDrainTimeout` so leftover events do not hold the manager open forever. + +4. **Cancellation is a verb, not a side effect.** Expose explicit cancel endpoints with their own routes, their own audit trail, and their own idempotency guarantees. Transport-level aborts are not operator intent; named verbs are. + +A runtime that follows the four rules can host work whose lifetime is the daemon's, not the request's. That is the shape an agent operating system needs. AGH ships it today. diff --git a/docs/articles/2026-05/05-flat-depth-model.mdx b/docs/articles/2026-05/05-flat-depth-model.mdx new file mode 100644 index 000000000..fbae7cbf3 --- /dev/null +++ b/docs/articles/2026-05/05-flat-depth-model.mdx @@ -0,0 +1,146 @@ +--- +title: "Designing a dark UI without shadows: AGH's flat depth model" +description: "Drop shadows muddy dark UIs. AGH replaces them with a warm surface ramp, translucent hairlines, and inset focus rings — enforced by tokens and a lint rule." +date: 2026-05-20 +category: design +tags: + - design-system + - dark-mode + - tokens + - tailwind + - lint +author: pedronauck +archetype: + primary: launch +audience: engineer-adopter +depth: + opening: R1 + body: R2-R3 with R5 dive + closing: R1 + traversal: staircase +status: publishable +--- + +## Why shadows fail in dark UIs + +A drop shadow assumes light. Place an element above a white page and the shadow reads as physical depth — the eye expects the world to behave that way. Place the same shadow over a near-black canvas at `#131211` and the eye gets ambiguous information: the shadow lives in the same luminance band as the surface it tries to lift, so it reads as smear rather than separation. Stack three or four of those shadows across cards, popovers, headers, and dropdowns, and the screen turns into a haze. + +AGH is a local operator console for long-running agent work — the kind of UI that an engineer stares at for an entire afternoon. Haze on that surface costs more than aesthetics. It costs scan speed, focus visibility, and the operator's ability to tell apart a card, a hovered row, and a selected row at a glance. The flat depth model exists to remove that cost. + +AGH's flat depth model substitutes shadow-based elevation with four parts: a warm surface ramp, translucent hairlines, inset focus rings, and a token contract that holds them as the single source of truth — protected by a lint rule that rejects every inline workaround. + +## The warm surface ramp + +Depth in AGH comes from the canvas, not from light. The runtime ships six luminance steps that climb from the rail to the most elevated surface, all sharing one warm brown-black hue. The eye reads the climb as ordered planes because the hue stays constant while luminance steps remain perceptually even. + +The six steps live in `packages/ui/src/tokens.css`: + +| Token | Hex | Intended use | +| --- | --- | --- | +| `--color-rail` | `#0c0b0b` | Workspace rail; the deepest plane behind every shell | +| `--color-canvas` | `#131211` | Default page background | +| `--color-canvas-soft` | `#1a1918` | Cards, popovers, sidebar surfaces | +| `--color-canvas-tint` | `#1c1b1a` | Subtle separation from `canvas-soft` | +| `--color-elevated` | `#232220` | Active and prominent surfaces | +| `--color-disabled` | `#4a4847` | Disabled foreground only — not a surface plane | + +*Six warm-dark luminance steps form the depth ladder. The hue stays constant; luminance climbs in perceptually even intervals so the ramp reads as ordered planes when surfaces stack.* + +The ramp does the work a shadow used to do. A card on `canvas-soft` separates from a page on `canvas` because the lighter plane sits higher in the luminance order. A modal-style surface on `elevated` reads higher still. The reader never has to interpret blur radius, x/y offset, or shadow color — luminance alone carries the order. + +`DESIGN.md` §2 holds this rationale: *"Color carries state. The warm ramp separates shell, page, grouped surfaces, and active surfaces without decorative depth."* Hue choice is also operator-protective — extended sessions feel less harsh on warm-dark than on cool slate or pure black. + +## Hairlines, not borders + +The ramp separates planes. Hairlines edge them. AGH binds three hairline tokens, all translucent white at low alpha so the line layers consistently across every step of the ramp: + +| Token | Value | Intended use | +| --- | --- | --- | +| `--color-line-soft` | `rgba(255, 255, 255, 0.03)` | Quiet seams, table-row dividers | +| `--color-line` | `rgba(255, 255, 255, 0.055)` | Default surface edge — cards, popovers, inputs | +| `--color-line-strong` | `rgba(255, 255, 255, 0.09)` | Active rings, selected rims, emphasized seams | + +*Three translucent-white alpha tiers form the hairline system, baselined against the warm ramp. Because the line is alpha-on-canvas, it stays optically consistent whether the underlying surface is `--color-rail` or `--color-elevated`.* + +A solid `#222` border behaves differently on `canvas` than on `elevated` — it muddies one plane while disappearing on another. A translucent-white hairline lifts the same amount from every surface beneath it, so a card edge on the deepest rail and a card edge on the brightest elevated plane both read as the same kind of seam. Hairlines replace borders everywhere except the form-control focus ring, which graduates to the strong tier when the control is active. + +## Focus rings live inside the surface + +Keyboard navigation is the load-bearing case for visible depth. Most dark UIs solve it with an outer glow — a soft drop shadow in an accent color. The flat depth model refuses to introduce a glow just for focus, so the focus ring lives inside the surface. + +`tokens.css` declares four ring shapes, each routed through the hairline tier: + +| Token | Shape | +| --- | --- | +| `--shadow-focus-ring` | `0 0 0 1px var(--color-line-strong)` | +| `--shadow-focus-ring-soft` | `0 0 0 1px var(--color-line-soft)` | +| `--shadow-focus-ring-inset` | `inset 0 0 0 1px var(--color-line-strong)` | +| `--shadow-focus-ring-inset-soft` | `inset 0 0 0 1px var(--color-line-soft)` | + +*Four focus-ring tokens; each is a 1px solid hairline routed through `box-shadow`. The inset variants paint inside the element bounds — used by cards, accordions, and list items where an outer ring would crash into a neighbor.* + +The ring uses `box-shadow` only as a delivery mechanism for a stable line; the underlying value is still a hairline color from the same alpha tier. Two consequences follow. First, the focus ring never adds blur to the surface — it stays a sharp 1px edge that scales identically across the six ramp steps. Second, surfaces that pack tightly (lists, accordions, segmented controls) can wear the inset variant and never collide with the next row's outer ring. + +## Tokens are the only source of truth + +The pattern has one rule: every plane, every seam, every ring resolves through a `--color-*`, `--shadow-*`, or `--text-*` token declared in `packages/ui/src/tokens.css`. Raw hex is acceptable only in the token source itself or in generated artifacts that cannot read CSS variables. + +`tokens.css` exports inside Tailwind v4's `@theme` block, which means Tailwind generates the bare utility (`bg-canvas-soft`, `text-fg`, `shadow-focus-ring`) *and* exposes the variable globally (`var(--color-canvas-soft)`). Consumers never see two ways to spell the same value — the utility and the variable are the same compiled output. + +`DESIGN.md` (repo root) is the generated counterpart. It mirrors the token tables verbatim between `` markers, so the spec table for "Surface ramp" cannot drift from the runtime tokens. `make codegen` regenerates `DESIGN.md` from `tokens.css`; `make codegen-check` blocks any commit that lets the two views disagree. `COPY.md` is the verbal counterpart of the same contract — `DESIGN.md` governs how a surface looks; `COPY.md` governs what it is allowed to say. + +The contract holds because the substitution is mechanical. If a card's background needs to lift one plane, the change is `bg-canvas` → `bg-canvas-soft`, not a hand-picked hex. If a hairline needs to climb a tier, the change is `border-line` → `border-line-strong`, not a guessed alpha. The system reduces every visual decision to a token name. + +## A lint rule, not a style guide + +A style guide is advisory. A lint rule is unbuildable-if-violated. AGH's design system runs through `lint-plugins/compozy-design-system.mjs`, an oxlint plugin that scans JSX for the patterns the token system is supposed to make unreachable. + +The eyebrow rule is the cleanest specimen. The canonical eyebrow contract is **Inter UC 11px / 600 / -0.005em**, bound to `--text-eyebrow` and `--tracking-eyebrow` in `tokens.css`. Any consumer that needs an uppercase metadata label renders through `` from `@agh/ui` or applies the `.eyebrow` utility on a structural HTML element (`
`, `