feat: capture context-intelligence design knowledge into the mode (eval-driven) by colombod · Pull Request #33 · microsoft/amplifier-bundle-context-intelligence

colombod · 2026-06-07T16:27:50Z

What this adds

Captures the institutional design knowledge for context-intelligence tooling into the mode itself, so it produces deeper, more complete designs and drives its design pipeline more robustly.

Design depth

Bounded-navigation discipline moved to a single authoritative home (context/navigation-budget-discipline.md) and referenced by session-navigator via @mention — one source of truth for keeping disk navigation within a context budget (no duplication).
Tool-design skill enriched (R1/R2/R3): module-vs-CLI selection by consumer, narrow-domain specialization, and progressive discovery — each pointing to its authoritative home.
New evaluation-methodology skill: metric design (quality/efficiency/efficacy), precursor-over-artifact metrics, and A/B + statistical-N discipline.
Thin strategy file + mode wiring that surface this knowledge only when the mode is active (lean always-on preserved).

Interactive & autonomous driving

The design facilitator opens with its actual Phase-0 question, re-anchors on off-script replies instead of breaking role, and owns its pipeline end-to-end.
A new autonomous/seeded entry path: when the activation message already carries the goal, the facilitator treats it as a pre-answered Phase 0 and proceeds — enabling non-interactive / recipe-driven use.

How the evaluation framework shaped this

The work was driven and validated by an outcome-eval harness with three scenarios (a pre-seeded design run, a multi-turn simulated user, and a one-shot). The evals did more than verify at the end — the baselines reshaped the scope, showing the mode's design depth was the highest-leverage gap. Every change maps to a scenario that measures it, so we built what we set out to build and can show it.

Measured evidence

	Before (baseline)	After
Seeded design run	timed out before producing an evaluation plan	converges (exit 0, ~4 min) — deeper Phase-2 design (per-consumer module-vs-CLI table citing the navigation-budget discipline) plus a complete Phase-3 evaluation plan (precision/recall, success criteria, DTU validation)

Single-source/no-duplication and mode-gated injection verified by inspection; lean always-on preserved.
Bundle test suite: 657 passed.

Notes

Markdown/YAML prompt-and-config only; validation is via the eval scenarios (prompt content is validated behaviorally, not by unit tests).

DTU live validation evidence (added post-review)

The branch was validated in an isolated Digital Twin Universe (DTU) install. The bundle was served from a Gitea mirror of this branch's working-tree snapshot (d94e000) — i.e. a clean-room install of the actual PR content, not GitHub main. Provider credentials were injected via DTU passthrough (no secrets written to disk). All four behavioral seams were proven with genuine live Anthropic round-trips, not just file inspection.

#	Validation	Method	Result
a	Basic `amplifier run` round-trip	`amplifier run "Reply with exactly: SMOKE_OK_4F2A"`	PASS — model returned `SMOKE_OK_4F2A`, exit 0 (live anthropic, ~4s)
b	Mode activates + gating enforced	agent-initiated `mode set` is denied (`advertised:false` / `default_action:block` enforced); interactive `/mode context-intelligence` succeeds (prompt switches to `[context-intelligence]>`)	PASS
c	Always-on leanness (mode-gated content)	default no-mode session: `load_skill context-intelligence-evaluation-methodology` → "not found"; after `/mode context-intelligence` → loads	PASS — heavy skill available only when the mode is active
d	`session-navigator` obeys the `@mention`'d navigation discipline (critical)	delegate to `session-navigator` to locate a non-existent session ID against a seeded corpus	PASS — navigator explicitly cited "Rule 1 (probe first)", ran exactly one bounded probe, and hard-stopped with "does not exist" — no runaway search. Proves the loading `@mention` of `navigation-budget-discipline.md` actually injects the 6 bounded-navigation rules into the agent at runtime, confirming the 58-line inline→`@mention` refactor preserves the discipline.

Net: clean-room install loads from the branch, the mode correctly gates its heavy content, the always-on surface stays lean, and the refactored session-navigator @mention resolves and is obeyed live.

Caveats (honest):

Worker turns resolved to claude-sonnet-4-5 via the bundle's routing (still a live Anthropic call); the host default claude-opus-4-8 was independently confirmed reachable (HTTP 200 on /v1/models).
A non-fatal, pre-existing warning appears at session start (Removing corrupt skills cache (no metadata): …context-intelligence…); it self-heals (skills load fine, per checks c/b) and is not introduced by this PR.

Move the 6 defensive-navigation rules verbatim from session-navigator into context/navigation-budget-discipline.md (authoritative source). session-navigator now @mentions it (loading) and re-points its three in-document references; the always-on awareness file gets a single non-loading pointer row. No rule content changed; always-on behavior untouched (lean default preserved).

Enrich tool-design with R1 (module vs CLI by consumer, pointing to Standing Rule 3), R2 (narrow-domain specialization), R3 (progressive discovery → navigation discipline), plus an event-semantics guard. Add a new context-intelligence-evaluation-methodology skill (metric design, precursor metrics, A/B + statistical-N; points to eval-design and digital-twin-universe, never restating DTU-as-default or artifact-as-success). Add a thin context-intelligence-strategy.md pointer table (non-loading references; names the event-semantics principle once). Wire the strategy file via the mode's contributes.context and the eval skill via contributes.skills. Extend the eval-design catalog with structural scenarios 8-10 and behavioral Scenario C. Always-on behavior untouched.

6a: add PRE/POST-delegation constraints to the mode's file-not-found routing row (no preamble before delegate(); relay the facilitator's Part-A question verbatim). 6b: add a Phase-0 RE-ANCHOR rule so off-script user replies are treated as signal fragments and the opening question is re-asked, instead of breaking role. 6c: add a 'Pipeline ownership' standing rule to the facilitator countering the hooks-skills-visibility leak of brainstorming/using-superpowers mandates — no /brainstorm or /systems-design punt; the pipeline is self-contained from Phase 0. No design-philosophy change; edge-case hardening only. Always-on behavior untouched.

7a: add a seeded-path routing row to the mode — when the activation message already contains a clear goal and domain-concepts.md is absent, delegate with seed_statement="<verbatim user goal>" (context_depth=none). 7b: add a facilitator 'Seeded entry' variant at the top of Phase 0 — treat the seed as the pre-answered Part A, skip the opening question, run the Part-B probe, then open with a data-grounded candidate framed on the seed. Additive new path (does not change the interactive path). Always-on behavior untouched.

… user intent The pipeline-ownership rule was over-strict — it blanket-forbade recommending /brainstorm. Reworded to resist only automatic mid-flow derailment while following the user when they explicitly choose to brainstorm or switch workflows.

…are discoverable The mode contributes skills but listed load_skill under `warn` with default_action: block, so hooks-mode warned the contributed skills would be undiscoverable by the LLM while the mode is active. Move load_skill to tools.safe and reconcile the routing-first prose (routing-first stays a prompt discipline, not a tool gate).

… 'goal already provided' - Tighten the facilitator pipeline-ownership standing rule (crisper, same intent). - Rename the 'seeded' Phase-0 entry to plain 'goal already provided' in both the mode routing table and the facilitator, justified by the unattended/recipe use case. (seed_statement kept as the internal parameter name.) Generated with Amplifier Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

@bkrabach

Address PR #33 review feedback (CHANGES_REQUESTED, @bkrabach) on keeping force-loaded and agent-facing surfaces token-conservative. - context/context-intelligence-awareness.md: 87 -> 4 lines. This file is force-loaded into every top-level session via the behavior's context.include, so it now only announces that the bundle exists and captures session events. Removed the Delegation table (graph-analyst's meta.description already owns that guidance), the Navigation Discipline pointer (already covered by the mode's strategy file + session-navigator @mention), the Configuration env-var table (already in README.md), and the Upload Tool how-to (already in the upload module README + --help). No content relocated -- all four sections already have canonical homes. - skills: trim non-user-invocable descriptions (evaluation-methodology 646->277, eval-design 607->251, tool-design 735->308 chars) to agent-facing "what + when to load", dropping operational tails (phase number, context_depth, sub-session calling protocol) and cross-skill pointers. workflow-pattern-analysis (user-invocable): drop the internal dual-agent-loop clause, keep its Triggers list. Validated static (no dead refs, README/module-README coverage intact, 27 recipe tests pass) and live in DTU ci-pr33 (default no-mode session force-loads only the 4-line awareness; graph-analyst still routed-to, session-navigator @mention intact, mode still activates and contributes, trimmed skills load with new descriptions). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

… per re-review Follow-up to 37bf809 after a re-review of the landed state by the bundle-design and foundation experts plus the simplicity/ROB lenses. - awareness.md: drop the dangling "keeps them available for later analysis" clause -- it implied a consumer path the force-loaded file deliberately does not provide. Now states purpose only ("captures this session's events for later analysis"); graph-analyst's meta.description owns routing. - skill descriptions: the trimmed non-user-invocable descriptions were content taxonomies with no load trigger. Added a "Use when..." opener so a discovering agent knows WHEN to load (evaluation-methodology, eval-design, tool-design), and cut the remaining implementation-detail leaks from the description fields (Gitea-rewrite config; shared-library/thin-wrapper; model_role) -- those stay in the skill bodies, not the advertisement. - session-navigation: gave it a "Use when..." trigger too (behavior-level and always-visible); was opaque internal vocabulary with no load signal. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

colombod · 2026-06-10T18:40:12Z

Re: force-load discipline — change, impact, testing, and how it maps to your guidance

Thanks @bkrabach — this was the right call and it sharpened the whole surface. Here is the full picture across the two commits that address your three comments (37bf809 the trim, ae948f9 refinements from a re-review).

What changed

context/context-intelligence-awareness.md: 87 → 4 lines. It now only announces that the bundle exists and that it captures session events — no plumbing (no graph_query/blob_read, no JSONL paths), no tool/agent names, no delegation/load guidance. Removed: the Delegation table, the Navigation Discipline pointer, the Configuration env-var table, and the Upload Tool how-to.
Skill descriptions trimmed to agent-facing advertisements (evaluation-methodology, eval-design, tool-design, workflow-pattern-analysis), and given an explicit "Use when…" load trigger; implementation detail (Gitea-rewrite config, shared-library/thin-wrapper, model_role) was pulled out of the description fields into the skill bodies. session-navigation got a "Use when…" trigger too (it is behavior-level and always-visible).

Impact

The four removed awareness sections were deleted, not relocated — each already has a canonical home: config → README.md; upload how-to → modules/tool-context-intelligence-upload/README.md (+ --help); navigation discipline → the mode-contributed strategy file + the session-navigator @mention; delegation → graph-analyst's own meta.description. So nothing was lost, and the always-on per-session token cost drops to ~4 lines.

How it maps to the principles you pointed out

"awareness.md is force-loaded → only the highest-level 'what it is', nothing else." It now carries exactly that and announces capability, not architecture.
"don't even give guidance on what to delegate/load — let the descriptions own their advertisement / whatever is loaded owns its own truth." The Delegation table is gone; graph-analyst's description ("MUST be used… ALWAYS delegate to this agent first") is now the single source of truth, and removing the duplicate eliminates a future drift point.
"reduce skill descriptions to what an agent needs to know when/why to load — no human-reader or mode framing." The non-user-invocable descriptions are now a "Use when…" trigger + a tight scope statement, with the calling protocol (phase, context_depth, sub-session) left in the bodies.

How / where it was tested

Static: awareness 87→4; description leaks removed (description-line leak count = 0); frontmatter parses; no dead references; README + module-README coverage confirmed intact; the workflow-pattern-analysis recipe tests (27) pass.
Live in an isolated Digital Twin (clean reinstall from this branch): a default, no-mode top-level session force-loads only the 4-line awareness — ## Delegation, ## Configuration, ## Upload Tool, Navigation Discipline, graph_query, blob_read, and the env-var names all grep to 0. No-regression confirmed in the same environment: graph-analyst is still routed-to via its own description, the session-navigator @mention of navigation-budget-discipline.md still resolves at runtime, the mode still activates and contributes its skills/context, and the trimmed skills load with their new descriptions.

Note

After landing the first commit I re-ran it past the bundle-design and foundation experts (plus a simplicity/"is-it-real" pass). That surfaced two real gaps I then fixed in ae948f9: the awareness file had a dangling "…available for later analysis" clause that implied a consumer path it deliberately doesn't provide (now just states purpose), and the trimmed skill descriptions were content taxonomies with no load trigger (now lead with "Use when…"). Re-requesting review.

bkrabach · 2026-06-11T01:49:16Z

Has the bundle validation recipe been run since all of the changes? It should produce updated .dot diagrams for the bundle and some of its contents, as well as identify any other warnings/issues. Once that has been done, lgtm.

Regenerated bundle structural diagram and PNG render via foundation's generate-bundle-docs recipe to clear the stale bundle.dot warning from the bundle validator. The source_hash now matches the current bundle composition. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

colombod · 2026-06-11T10:55:57Z

Note on the `validate-bundle-repo` `unadvertised_but_referenced` finding (false positive)

The v3.6.0 bundle validator flags the context-intelligence mode (advertised: false) as "referenced by name" in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md. This is a false positive — the mode is intentionally an expert-behind-gate design and must stay advertised: false.

Why it fires: the advertising-rule check scans for /<mode-name> and name="<mode-name>". The mode name context-intelligence is identical to:

the bundle namespace (@context-intelligence:...),
the on-disk session subdirectory (.../sessions/{id}/context-intelligence/), and
skill-name prefixes (context-intelligence-graph-query, context-intelligence-session-navigation).

So the /context-intelligence pattern matches filesystem path separators, not slash-command invocations. Every flagged line is a path or @namespace: ref — never a mode-invocation cue. The genuine mode-by-name references live in the mode's own contributes.context files, which load only when the mode is already active, and were correctly NOT flagged.

Naming suggestion (to stop tripping the rule while keeping advertised: false): give the mode a name distinct from the bundle namespace / directory — e.g. context-intelligence-design (or ci-design). With a distinct name, /context-intelligence-design no longer collides with the .../context-intelligence/ paths or the @context-intelligence: namespace, so the validator stops flagging it. This would require updating mode-name references in dependent skills/recipes (e.g. workflow-pattern-analysis, which is "designed to run inside the context-intelligence mode").

Alternative (upstream validator fix): add a collision allowlist so a mode whose name equals its own bundle namespace/directory is exempt from the /<mode-name> path-separator match.

colombod · 2026-06-11T11:32:47Z

✅ Validation complete — green light on the validation parts

@bkrabach — ran foundation:recipes/validate-bundle-repo (v3.6.0) against this branch in full mode (installed hatchling + pip locally so the build dry-run could actually run). Summary:

Check	Result
Bundles load (BundleRegistry)	✅ 6/6, all classified `good`
Python packaging build (dry-run)	✅ PASS
Behavior hygiene	✅ PASS
Behavior reference hygiene	✅ PASS
Context-sink compliance	✅ PASS (root = 0 ctx tokens)
Tool placement	✅ PASS (no specialized tools leaked to root)
YAML structure lint	✅ PASS (no silent-failure patterns)
`bundle.dot` freshness	✅ fresh (regenerated + pushed in `bc2bd8e`)

Only repo change in this pass: regenerated bundle.dot / bundle.png (commit bc2bd8e) to clear the stale-diagram warning. Nothing else needed.

On the one remaining validator ERROR (unadvertised_but_referenced on the context-intelligence mode): confirmed false positive. The validator's /<mode-name> matcher collides with the bundle's own name — every flagged hit in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md is a filesystem path (.../sessions/{id}/context-intelligence/), an @context-intelligence: namespace ref, or a skill-name prefix — not a mode invocation. The mode is intentionally gated, so it stays advertised: false and is not renamed.

The validator's LLM also raised a §4 concern about the workflow-pattern-analysis skill description naming the mode as a run target — but that phrasing does not exist on this branch (verified: grep -c "Designed to run inside the context-intelligence mode" skills/workflow-pattern-analysis/SKILL.md → 0). That came from an older published copy, not this PR. So nothing to change there either.

Net: structurally + hygienically clean, packaging builds, diagram fresh. Good to go from the validation side. 🟢

The behavior previously declared the mode with a `modes: include:` block, which is not a valid Amplifier bundle field — foundation discovers modes by scanning the `modes/` directory by convention, so that block was silently ignored. The bundle also shipped no modes infrastructure, so the `context-intelligence` mode would not be registered unless the host happened to compose the modes bundle. Adopt the canonical self-contained pattern (as in amplifier-bundle-superpowers): - Remove the invalid `modes: include:` block. - Include the modes BEHAVIOR (amplifier-bundle-modes#subdirectory=behaviors/modes.yaml), not the full bundle (which would override session.orchestrator). This brings hooks-mode, tool-mode, and the `modes` namespace. - Add a hooks-mode hook with search_paths ["@Context-Intelligence:modes"] so this bundle's modes/ directory is explicitly registered even when the host does not otherwise provide modes infrastructure. The mode intentionally stays advertised: false (expert-behind-gate) — unchanged. Also regenerate bundle.dot + bundle.png to reflect the new composition. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

@main

The behavior now wires hooks-mode (for mode registration) in addition to hook-context-intelligence, so hooks[0] is no longer the CI hook. The bundle validation tests asserted against data["hooks"][0] positionally, which broke 5 assertions and silently mis-targeted a few others (source @main, graph_store checks were validating hooks-mode instead of the CI hook). Replace positional access with a _ci_hook() helper that locates the hook-context-intelligence spec by module name. This preserves the real contract (CI hook present + thin-forwarder config + source @main + no graph_store/enable_graph) while tolerating a longer hooks list. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

Per review (bkrabach) and confirmed by foundation-expert + bundle-design-expert: the force-loaded context/context-intelligence-awareness.md carried no actionable, always-on instruction that another mechanism doesn't already surface: - graph-analyst / session-navigator advertise themselves in the delegate catalog (meta.description: "ALWAYS delegate to this agent first"). - The context-intelligence skills advertise themselves in the per-turn skills visibility list, each with a "Use when…" trigger. - The context-intelligence mode is intentionally advertised: false (user-invoked only); a breadcrumb would undermine that gate. - hook-context-intelligence is passive event capture — the LLM takes no action on it. A justified *-awareness.md must carry a routing discriminator, an anti-pattern guard ("ALWAYS delegate / do NOT drive the CLI"), or a prerequisite. This file carried none, so it was pure always-front-loaded overhead. - Delete context/context-intelligence-awareness.md - Remove the now-empty context: block from behaviors/context-intelligence.yaml - Regenerate bundle.dot / bundle.png to match the new composition 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>

colombod · 2026-06-12T17:09:03Z

Removed it. You were right to question it — I had both foundation-expert and bundle-design-expert pressure-test it independently, and neither found a justification.

A force-loaded *-awareness.md earns its always-on cost only if it carries one of: a routing discriminator (when to pick this vs a competing path), an anti-pattern guard ("ALWAYS delegate / do NOT drive the CLI directly"), or a prerequisite (install/env check the schema can't express). This file had none:

graph-analyst / session-navigator self-advertise in the delegate catalog (meta.description — "ALWAYS delegate to this agent first").
The context-intelligence skills self-advertise in the per-turn skills-visibility list, each with a "Use when…" trigger.
The context-intelligence mode is intentionally advertised: false (user-invoked only) — a breadcrumb would actively undermine that gate.
hook-context-intelligence is passive event capture the LLM never acts on.

Changes (commit 9f53249):

Deleted context/context-intelligence-awareness.md
Removed the now-empty context: block from behaviors/context-intelligence.yaml
Regenerated bundle.dot / bundle.png via validate-bundle-repo

The behavior now force-loads nothing of its own. Validation is green (behavior hygiene, context-sink 0%, tool placement, build, YAML lint all pass) except the known unadvertised_but_referenced finding, which remains a false positive: the flagged hits in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md are the @context-intelligence: namespace and the .../sessions/{id}/context-intelligence/ storage path — not mode invocations. The genuine mode-by-name references live only in mode-contributed context that loads when the mode is active. Mode stays advertised: false per the expert-behind-gate design.

bkrabach

lgtm

Colombo D and others added 8 commits June 7, 2026 09:28

Merge origin/main: resolve load_skill conflicts and align routing prose

b0078f0

bkrabach requested changes Jun 10, 2026

View reviewed changes

Comment thread skills/context-intelligence-evaluation-methodology/SKILL.md Outdated

Comment thread context/context-intelligence-awareness.md Outdated

colombod requested a review from bkrabach June 10, 2026 17:44

Colombo D and others added 2 commits June 12, 2026 14:49

bkrabach requested changes Jun 12, 2026

View reviewed changes

Comment thread context/context-intelligence-awareness.md Outdated

bkrabach approved these changes Jun 12, 2026

View reviewed changes

colombod merged commit 5509b39 into main Jun 12, 2026
9 checks passed

colombod deleted the feat/context-intelligence-mode-knowledge branch June 12, 2026 20:21

colombod mentioned this pull request Jun 14, 2026

feat: composable context-intelligence behaviors — logging / design / full #27

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33
colombod merged 14 commits into
mainfrom
feat/context-intelligence-mode-knowledge

colombod commented Jun 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

colombod commented Jun 10, 2026

Uh oh!

bkrabach commented Jun 11, 2026

Uh oh!

colombod commented Jun 11, 2026

Uh oh!

colombod commented Jun 11, 2026

Uh oh!

Uh oh!

colombod commented Jun 12, 2026

Uh oh!

bkrabach left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

colombod commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this adds

How the evaluation framework shaped this

Measured evidence

Notes

DTU live validation evidence (added post-review)

Uh oh!

Uh oh!

Uh oh!

colombod commented Jun 10, 2026

Re: force-load discipline — change, impact, testing, and how it maps to your guidance

What changed

Impact

How it maps to the principles you pointed out

How / where it was tested

Note

Uh oh!

bkrabach commented Jun 11, 2026

Uh oh!

colombod commented Jun 11, 2026

Note on the validate-bundle-repo unadvertised_but_referenced finding (false positive)

Uh oh!

colombod commented Jun 11, 2026

✅ Validation complete — green light on the validation parts

Uh oh!

Uh oh!

colombod commented Jun 12, 2026

Uh oh!

bkrabach left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

colombod commented Jun 7, 2026 •

edited

Loading

Note on the `validate-bundle-repo` `unadvertised_but_referenced` finding (false positive)