feat: capture context-intelligence design knowledge into the mode (eval-driven)#33
Conversation
Move the 6 defensive-navigation rules verbatim from session-navigator into context/navigation-budget-discipline.md (authoritative source). session-navigator now @mentions it (loading) and re-points its three in-document references; the always-on awareness file gets a single non-loading pointer row. No rule content changed; always-on behavior untouched (lean default preserved).
Enrich tool-design with R1 (module vs CLI by consumer, pointing to Standing Rule 3), R2 (narrow-domain specialization), R3 (progressive discovery → navigation discipline), plus an event-semantics guard. Add a new context-intelligence-evaluation-methodology skill (metric design, precursor metrics, A/B + statistical-N; points to eval-design and digital-twin-universe, never restating DTU-as-default or artifact-as-success). Add a thin context-intelligence-strategy.md pointer table (non-loading references; names the event-semantics principle once). Wire the strategy file via the mode's contributes.context and the eval skill via contributes.skills. Extend the eval-design catalog with structural scenarios 8-10 and behavioral Scenario C. Always-on behavior untouched.
6a: add PRE/POST-delegation constraints to the mode's file-not-found routing row
(no preamble before delegate(); relay the facilitator's Part-A question verbatim).
6b: add a Phase-0 RE-ANCHOR rule so off-script user replies are treated as signal
fragments and the opening question is re-asked, instead of breaking role.
6c: add a 'Pipeline ownership' standing rule to the facilitator countering the
hooks-skills-visibility leak of brainstorming/using-superpowers mandates — no
/brainstorm or /systems-design punt; the pipeline is self-contained from Phase 0.
No design-philosophy change; edge-case hardening only. Always-on behavior untouched.
7a: add a seeded-path routing row to the mode — when the activation message already
contains a clear goal and domain-concepts.md is absent, delegate with
seed_statement="<verbatim user goal>" (context_depth=none).
7b: add a facilitator 'Seeded entry' variant at the top of Phase 0 — treat the seed as
the pre-answered Part A, skip the opening question, run the Part-B probe, then open
with a data-grounded candidate framed on the seed.
Additive new path (does not change the interactive path). Always-on behavior untouched.
… user intent The pipeline-ownership rule was over-strict — it blanket-forbade recommending /brainstorm. Reworded to resist only automatic mid-flow derailment while following the user when they explicitly choose to brainstorm or switch workflows.
…are discoverable The mode contributes skills but listed load_skill under `warn` with default_action: block, so hooks-mode warned the contributed skills would be undiscoverable by the LLM while the mode is active. Move load_skill to tools.safe and reconcile the routing-first prose (routing-first stays a prompt discipline, not a tool gate).
… 'goal already provided' - Tighten the facilitator pipeline-ownership standing rule (crisper, same intent). - Rename the 'seeded' Phase-0 entry to plain 'goal already provided' in both the mode routing table and the facilitator, justified by the unattended/recipe use case. (seed_statement kept as the internal parameter name.) Generated with Amplifier Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Address PR #33 review feedback (CHANGES_REQUESTED, @bkrabach) on keeping force-loaded and agent-facing surfaces token-conservative. - context/context-intelligence-awareness.md: 87 -> 4 lines. This file is force-loaded into every top-level session via the behavior's context.include, so it now only announces that the bundle exists and captures session events. Removed the Delegation table (graph-analyst's meta.description already owns that guidance), the Navigation Discipline pointer (already covered by the mode's strategy file + session-navigator @mention), the Configuration env-var table (already in README.md), and the Upload Tool how-to (already in the upload module README + --help). No content relocated -- all four sections already have canonical homes. - skills: trim non-user-invocable descriptions (evaluation-methodology 646->277, eval-design 607->251, tool-design 735->308 chars) to agent-facing "what + when to load", dropping operational tails (phase number, context_depth, sub-session calling protocol) and cross-skill pointers. workflow-pattern-analysis (user-invocable): drop the internal dual-agent-loop clause, keep its Triggers list. Validated static (no dead refs, README/module-README coverage intact, 27 recipe tests pass) and live in DTU ci-pr33 (default no-mode session force-loads only the 4-line awareness; graph-analyst still routed-to, session-navigator @mention intact, mode still activates and contributes, trimmed skills load with new descriptions). 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
… per re-review Follow-up to 37bf809 after a re-review of the landed state by the bundle-design and foundation experts plus the simplicity/ROB lenses. - awareness.md: drop the dangling "keeps them available for later analysis" clause -- it implied a consumer path the force-loaded file deliberately does not provide. Now states purpose only ("captures this session's events for later analysis"); graph-analyst's meta.description owns routing. - skill descriptions: the trimmed non-user-invocable descriptions were content taxonomies with no load trigger. Added a "Use when..." opener so a discovering agent knows WHEN to load (evaluation-methodology, eval-design, tool-design), and cut the remaining implementation-detail leaks from the description fields (Gitea-rewrite config; shared-library/thin-wrapper; model_role) -- those stay in the skill bodies, not the advertisement. - session-navigation: gave it a "Use when..." trigger too (behavior-level and always-visible); was opaque internal vocabulary with no load signal. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Re: force-load discipline — change, impact, testing, and how it maps to your guidanceThanks @bkrabach — this was the right call and it sharpened the whole surface. Here is the full picture across the two commits that address your three comments ( What changed
Impact
How it maps to the principles you pointed out
How / where it was tested
NoteAfter landing the first commit I re-ran it past the bundle-design and foundation experts (plus a simplicity/"is-it-real" pass). That surfaced two real gaps I then fixed in |
|
Has the bundle validation recipe been run since all of the changes? It should produce updated .dot diagrams for the bundle and some of its contents, as well as identify any other warnings/issues. Once that has been done, lgtm. |
Regenerated bundle structural diagram and PNG render via foundation's generate-bundle-docs recipe to clear the stale bundle.dot warning from the bundle validator. The source_hash now matches the current bundle composition. Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Note on the
|
✅ Validation complete — green light on the validation parts@bkrabach — ran
Only repo change in this pass: regenerated On the one remaining validator ERROR ( The validator's LLM also raised a §4 concern about the Net: structurally + hygienically clean, packaging builds, diagram fresh. Good to go from the validation side. 🟢 |
The behavior previously declared the mode with a `modes: include:` block, which is not a valid Amplifier bundle field — foundation discovers modes by scanning the `modes/` directory by convention, so that block was silently ignored. The bundle also shipped no modes infrastructure, so the `context-intelligence` mode would not be registered unless the host happened to compose the modes bundle. Adopt the canonical self-contained pattern (as in amplifier-bundle-superpowers): - Remove the invalid `modes: include:` block. - Include the modes BEHAVIOR (amplifier-bundle-modes#subdirectory=behaviors/modes.yaml), not the full bundle (which would override session.orchestrator). This brings hooks-mode, tool-mode, and the `modes` namespace. - Add a hooks-mode hook with search_paths ["@Context-Intelligence:modes"] so this bundle's modes/ directory is explicitly registered even when the host does not otherwise provide modes infrastructure. The mode intentionally stays advertised: false (expert-behind-gate) — unchanged. Also regenerate bundle.dot + bundle.png to reflect the new composition. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
The behavior now wires hooks-mode (for mode registration) in addition to hook-context-intelligence, so hooks[0] is no longer the CI hook. The bundle validation tests asserted against data["hooks"][0] positionally, which broke 5 assertions and silently mis-targeted a few others (source @main, graph_store checks were validating hooks-mode instead of the CI hook). Replace positional access with a _ci_hook() helper that locates the hook-context-intelligence spec by module name. This preserves the real contract (CI hook present + thin-forwarder config + source @main + no graph_store/enable_graph) while tolerating a longer hooks list. 🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier) Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Per review (bkrabach) and confirmed by foundation-expert + bundle-design-expert:
the force-loaded context/context-intelligence-awareness.md carried no actionable,
always-on instruction that another mechanism doesn't already surface:
- graph-analyst / session-navigator advertise themselves in the delegate catalog
(meta.description: "ALWAYS delegate to this agent first").
- The context-intelligence skills advertise themselves in the per-turn skills
visibility list, each with a "Use when…" trigger.
- The context-intelligence mode is intentionally advertised: false (user-invoked
only); a breadcrumb would undermine that gate.
- hook-context-intelligence is passive event capture — the LLM takes no action on it.
A justified *-awareness.md must carry a routing discriminator, an anti-pattern
guard ("ALWAYS delegate / do NOT drive the CLI"), or a prerequisite. This file
carried none, so it was pure always-front-loaded overhead.
- Delete context/context-intelligence-awareness.md
- Remove the now-empty context: block from behaviors/context-intelligence.yaml
- Regenerate bundle.dot / bundle.png to match the new composition
🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)
Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
|
Removed it. You were right to question it — I had both A force-loaded
Changes (commit
The behavior now force-loads nothing of its own. Validation is green (behavior hygiene, context-sink 0%, tool placement, build, YAML lint all pass) except the known |
What this adds
Captures the institutional design knowledge for context-intelligence tooling into the mode itself, so it produces deeper, more complete designs and drives its design pipeline more robustly.
Design depth
context/navigation-budget-discipline.md) and referenced bysession-navigatorvia@mention— one source of truth for keeping disk navigation within a context budget (no duplication).Interactive & autonomous driving
How the evaluation framework shaped this
The work was driven and validated by an outcome-eval harness with three scenarios (a pre-seeded design run, a multi-turn simulated user, and a one-shot). The evals did more than verify at the end — the baselines reshaped the scope, showing the mode's design depth was the highest-leverage gap. Every change maps to a scenario that measures it, so we built what we set out to build and can show it.
Measured evidence
Notes
Markdown/YAML prompt-and-config only; validation is via the eval scenarios (prompt content is validated behaviorally, not by unit tests).
DTU live validation evidence (added post-review)
The branch was validated in an isolated Digital Twin Universe (DTU) install. The bundle was served from a Gitea mirror of this branch's working-tree snapshot (
d94e000) — i.e. a clean-room install of the actual PR content, not GitHubmain. Provider credentials were injected via DTU passthrough (no secrets written to disk). All four behavioral seams were proven with genuine live Anthropic round-trips, not just file inspection.amplifier runround-tripamplifier run "Reply with exactly: SMOKE_OK_4F2A"SMOKE_OK_4F2A, exit 0 (live anthropic, ~4s)mode setis denied (advertised:false/default_action:blockenforced); interactive/mode context-intelligencesucceeds (prompt switches to[context-intelligence]>)load_skill context-intelligence-evaluation-methodology→ "not found"; after/mode context-intelligence→ loadssession-navigatorobeys the@mention'd navigation discipline (critical)session-navigatorto locate a non-existent session ID against a seeded corpus@mentionofnavigation-budget-discipline.mdactually injects the 6 bounded-navigation rules into the agent at runtime, confirming the 58-line inline→@mentionrefactor preserves the discipline.Net: clean-room install loads from the branch, the mode correctly gates its heavy content, the always-on surface stays lean, and the refactored
session-navigator@mentionresolves and is obeyed live.Caveats (honest):
claude-sonnet-4-5via the bundle's routing (still a live Anthropic call); the host defaultclaude-opus-4-8was independently confirmed reachable (HTTP 200 on/v1/models).Removing corrupt skills cache (no metadata): …context-intelligence…); it self-heals (skills load fine, per checks c/b) and is not introduced by this PR.