Skip to content

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33

Merged
colombod merged 14 commits into
mainfrom
feat/context-intelligence-mode-knowledge
Jun 12, 2026
Merged

feat: capture context-intelligence design knowledge into the mode (eval-driven)#33
colombod merged 14 commits into
mainfrom
feat/context-intelligence-mode-knowledge

Conversation

@colombod

@colombod colombod commented Jun 7, 2026

Copy link
Copy Markdown
Collaborator

What this adds

Captures the institutional design knowledge for context-intelligence tooling into the mode itself, so it produces deeper, more complete designs and drives its design pipeline more robustly.

Design depth

  • Bounded-navigation discipline moved to a single authoritative home (context/navigation-budget-discipline.md) and referenced by session-navigator via @mention — one source of truth for keeping disk navigation within a context budget (no duplication).
  • Tool-design skill enriched (R1/R2/R3): module-vs-CLI selection by consumer, narrow-domain specialization, and progressive discovery — each pointing to its authoritative home.
  • New evaluation-methodology skill: metric design (quality/efficiency/efficacy), precursor-over-artifact metrics, and A/B + statistical-N discipline.
  • Thin strategy file + mode wiring that surface this knowledge only when the mode is active (lean always-on preserved).

Interactive & autonomous driving

  • The design facilitator opens with its actual Phase-0 question, re-anchors on off-script replies instead of breaking role, and owns its pipeline end-to-end.
  • A new autonomous/seeded entry path: when the activation message already carries the goal, the facilitator treats it as a pre-answered Phase 0 and proceeds — enabling non-interactive / recipe-driven use.

How the evaluation framework shaped this

The work was driven and validated by an outcome-eval harness with three scenarios (a pre-seeded design run, a multi-turn simulated user, and a one-shot). The evals did more than verify at the end — the baselines reshaped the scope, showing the mode's design depth was the highest-leverage gap. Every change maps to a scenario that measures it, so we built what we set out to build and can show it.

Measured evidence

Before (baseline) After
Seeded design run timed out before producing an evaluation plan converges (exit 0, ~4 min) — deeper Phase-2 design (per-consumer module-vs-CLI table citing the navigation-budget discipline) plus a complete Phase-3 evaluation plan (precision/recall, success criteria, DTU validation)
  • Single-source/no-duplication and mode-gated injection verified by inspection; lean always-on preserved.
  • Bundle test suite: 657 passed.

Notes

Markdown/YAML prompt-and-config only; validation is via the eval scenarios (prompt content is validated behaviorally, not by unit tests).


DTU live validation evidence (added post-review)

The branch was validated in an isolated Digital Twin Universe (DTU) install. The bundle was served from a Gitea mirror of this branch's working-tree snapshot (d94e000) — i.e. a clean-room install of the actual PR content, not GitHub main. Provider credentials were injected via DTU passthrough (no secrets written to disk). All four behavioral seams were proven with genuine live Anthropic round-trips, not just file inspection.

# Validation Method Result
a Basic amplifier run round-trip amplifier run "Reply with exactly: SMOKE_OK_4F2A" PASS — model returned SMOKE_OK_4F2A, exit 0 (live anthropic, ~4s)
b Mode activates + gating enforced agent-initiated mode set is denied (advertised:false / default_action:block enforced); interactive /mode context-intelligence succeeds (prompt switches to [context-intelligence]>) PASS
c Always-on leanness (mode-gated content) default no-mode session: load_skill context-intelligence-evaluation-methodology"not found"; after /mode context-intelligenceloads PASS — heavy skill available only when the mode is active
d session-navigator obeys the @mention'd navigation discipline (critical) delegate to session-navigator to locate a non-existent session ID against a seeded corpus PASS — navigator explicitly cited "Rule 1 (probe first)", ran exactly one bounded probe, and hard-stopped with "does not exist" — no runaway search. Proves the loading @mention of navigation-budget-discipline.md actually injects the 6 bounded-navigation rules into the agent at runtime, confirming the 58-line inline→@mention refactor preserves the discipline.

Net: clean-room install loads from the branch, the mode correctly gates its heavy content, the always-on surface stays lean, and the refactored session-navigator @mention resolves and is obeyed live.

Caveats (honest):

  • Worker turns resolved to claude-sonnet-4-5 via the bundle's routing (still a live Anthropic call); the host default claude-opus-4-8 was independently confirmed reachable (HTTP 200 on /v1/models).
  • A non-fatal, pre-existing warning appears at session start (Removing corrupt skills cache (no metadata): …context-intelligence…); it self-heals (skills load fine, per checks c/b) and is not introduced by this PR.

Colombo D and others added 8 commits June 7, 2026 09:28
Move the 6 defensive-navigation rules verbatim from session-navigator into
context/navigation-budget-discipline.md (authoritative source). session-navigator
now @mentions it (loading) and re-points its three in-document references; the
always-on awareness file gets a single non-loading pointer row. No rule content
changed; always-on behavior untouched (lean default preserved).
Enrich tool-design with R1 (module vs CLI by consumer, pointing to Standing Rule 3),
R2 (narrow-domain specialization), R3 (progressive discovery → navigation discipline),
plus an event-semantics guard. Add a new context-intelligence-evaluation-methodology
skill (metric design, precursor metrics, A/B + statistical-N; points to eval-design and
digital-twin-universe, never restating DTU-as-default or artifact-as-success). Add a thin
context-intelligence-strategy.md pointer table (non-loading references; names the
event-semantics principle once). Wire the strategy file via the mode's contributes.context
and the eval skill via contributes.skills. Extend the eval-design catalog with structural
scenarios 8-10 and behavioral Scenario C. Always-on behavior untouched.
6a: add PRE/POST-delegation constraints to the mode's file-not-found routing row
    (no preamble before delegate(); relay the facilitator's Part-A question verbatim).
6b: add a Phase-0 RE-ANCHOR rule so off-script user replies are treated as signal
    fragments and the opening question is re-asked, instead of breaking role.
6c: add a 'Pipeline ownership' standing rule to the facilitator countering the
    hooks-skills-visibility leak of brainstorming/using-superpowers mandates — no
    /brainstorm or /systems-design punt; the pipeline is self-contained from Phase 0.

No design-philosophy change; edge-case hardening only. Always-on behavior untouched.
7a: add a seeded-path routing row to the mode — when the activation message already
    contains a clear goal and domain-concepts.md is absent, delegate with
    seed_statement="<verbatim user goal>" (context_depth=none).
7b: add a facilitator 'Seeded entry' variant at the top of Phase 0 — treat the seed as
    the pre-answered Part A, skip the opening question, run the Part-B probe, then open
    with a data-grounded candidate framed on the seed.

Additive new path (does not change the interactive path). Always-on behavior untouched.
… user intent

The pipeline-ownership rule was over-strict — it blanket-forbade recommending
/brainstorm. Reworded to resist only automatic mid-flow derailment while
following the user when they explicitly choose to brainstorm or switch workflows.
…are discoverable

The mode contributes skills but listed load_skill under `warn` with
default_action: block, so hooks-mode warned the contributed skills would be
undiscoverable by the LLM while the mode is active. Move load_skill to
tools.safe and reconcile the routing-first prose (routing-first stays a prompt
discipline, not a tool gate).
… 'goal already provided'

- Tighten the facilitator pipeline-ownership standing rule (crisper, same intent).
- Rename the 'seeded' Phase-0 entry to plain 'goal already provided' in both the mode
  routing table and the facilitator, justified by the unattended/recipe use case.
  (seed_statement kept as the internal parameter name.)

Generated with Amplifier

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Comment thread skills/context-intelligence-evaluation-methodology/SKILL.md Outdated
Comment thread context/context-intelligence-awareness.md Outdated
Address PR #33 review feedback (CHANGES_REQUESTED, @bkrabach) on keeping
force-loaded and agent-facing surfaces token-conservative.

- context/context-intelligence-awareness.md: 87 -> 4 lines. This file is
  force-loaded into every top-level session via the behavior's
  context.include, so it now only announces that the bundle exists and
  captures session events. Removed the Delegation table (graph-analyst's
  meta.description already owns that guidance), the Navigation Discipline
  pointer (already covered by the mode's strategy file + session-navigator
  @mention), the Configuration env-var table (already in README.md), and the
  Upload Tool how-to (already in the upload module README + --help). No
  content relocated -- all four sections already have canonical homes.
- skills: trim non-user-invocable descriptions (evaluation-methodology
  646->277, eval-design 607->251, tool-design 735->308 chars) to agent-facing
  "what + when to load", dropping operational tails (phase number,
  context_depth, sub-session calling protocol) and cross-skill pointers.
  workflow-pattern-analysis (user-invocable): drop the internal dual-agent-loop
  clause, keep its Triggers list.

Validated static (no dead refs, README/module-README coverage intact, 27
recipe tests pass) and live in DTU ci-pr33 (default no-mode session
force-loads only the 4-line awareness; graph-analyst still routed-to,
session-navigator @mention intact, mode still activates and contributes,
trimmed skills load with new descriptions).

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@colombod colombod requested a review from bkrabach June 10, 2026 17:44
… per re-review

Follow-up to 37bf809 after a re-review of the landed state by the
bundle-design and foundation experts plus the simplicity/ROB lenses.

- awareness.md: drop the dangling "keeps them available for later analysis"
  clause -- it implied a consumer path the force-loaded file deliberately does
  not provide. Now states purpose only ("captures this session's events for
  later analysis"); graph-analyst's meta.description owns routing.
- skill descriptions: the trimmed non-user-invocable descriptions were content
  taxonomies with no load trigger. Added a "Use when..." opener so a discovering
  agent knows WHEN to load (evaluation-methodology, eval-design, tool-design),
  and cut the remaining implementation-detail leaks from the description fields
  (Gitea-rewrite config; shared-library/thin-wrapper; model_role) -- those stay
  in the skill bodies, not the advertisement.
- session-navigation: gave it a "Use when..." trigger too (behavior-level and
  always-visible); was opaque internal vocabulary with no load signal.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@colombod

Copy link
Copy Markdown
Collaborator Author

Re: force-load discipline — change, impact, testing, and how it maps to your guidance

Thanks @bkrabach — this was the right call and it sharpened the whole surface. Here is the full picture across the two commits that address your three comments (37bf809 the trim, ae948f9 refinements from a re-review).

What changed

  • context/context-intelligence-awareness.md: 87 → 4 lines. It now only announces that the bundle exists and that it captures session events — no plumbing (no graph_query/blob_read, no JSONL paths), no tool/agent names, no delegation/load guidance. Removed: the Delegation table, the Navigation Discipline pointer, the Configuration env-var table, and the Upload Tool how-to.
  • Skill descriptions trimmed to agent-facing advertisements (evaluation-methodology, eval-design, tool-design, workflow-pattern-analysis), and given an explicit "Use when…" load trigger; implementation detail (Gitea-rewrite config, shared-library/thin-wrapper, model_role) was pulled out of the description fields into the skill bodies. session-navigation got a "Use when…" trigger too (it is behavior-level and always-visible).

Impact

  • The four removed awareness sections were deleted, not relocated — each already has a canonical home: config → README.md; upload how-to → modules/tool-context-intelligence-upload/README.md (+ --help); navigation discipline → the mode-contributed strategy file + the session-navigator @mention; delegation → graph-analyst's own meta.description. So nothing was lost, and the always-on per-session token cost drops to ~4 lines.

How it maps to the principles you pointed out

  • "awareness.md is force-loaded → only the highest-level 'what it is', nothing else." It now carries exactly that and announces capability, not architecture.
  • "don't even give guidance on what to delegate/load — let the descriptions own their advertisement / whatever is loaded owns its own truth." The Delegation table is gone; graph-analyst's description ("MUST be used… ALWAYS delegate to this agent first") is now the single source of truth, and removing the duplicate eliminates a future drift point.
  • "reduce skill descriptions to what an agent needs to know when/why to load — no human-reader or mode framing." The non-user-invocable descriptions are now a "Use when…" trigger + a tight scope statement, with the calling protocol (phase, context_depth, sub-session) left in the bodies.

How / where it was tested

  • Static: awareness 87→4; description leaks removed (description-line leak count = 0); frontmatter parses; no dead references; README + module-README coverage confirmed intact; the workflow-pattern-analysis recipe tests (27) pass.
  • Live in an isolated Digital Twin (clean reinstall from this branch): a default, no-mode top-level session force-loads only the 4-line awareness — ## Delegation, ## Configuration, ## Upload Tool, Navigation Discipline, graph_query, blob_read, and the env-var names all grep to 0. No-regression confirmed in the same environment: graph-analyst is still routed-to via its own description, the session-navigator @mention of navigation-budget-discipline.md still resolves at runtime, the mode still activates and contributes its skills/context, and the trimmed skills load with their new descriptions.

Note

After landing the first commit I re-ran it past the bundle-design and foundation experts (plus a simplicity/"is-it-real" pass). That surfaced two real gaps I then fixed in ae948f9: the awareness file had a dangling "…available for later analysis" clause that implied a consumer path it deliberately doesn't provide (now just states purpose), and the trimmed skill descriptions were content taxonomies with no load trigger (now lead with "Use when…"). Re-requesting review.

@bkrabach

Copy link
Copy Markdown
Collaborator

Has the bundle validation recipe been run since all of the changes? It should produce updated .dot diagrams for the bundle and some of its contents, as well as identify any other warnings/issues. Once that has been done, lgtm.

Regenerated bundle structural diagram and PNG render via foundation's
generate-bundle-docs recipe to clear the stale bundle.dot warning from
the bundle validator. The source_hash now matches the current bundle
composition.

Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@colombod

Copy link
Copy Markdown
Collaborator Author

Note on the validate-bundle-repo unadvertised_but_referenced finding (false positive)

The v3.6.0 bundle validator flags the context-intelligence mode (advertised: false) as "referenced by name" in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md. This is a false positive — the mode is intentionally an expert-behind-gate design and must stay advertised: false.

Why it fires: the advertising-rule check scans for /<mode-name> and name="<mode-name>". The mode name context-intelligence is identical to:

  • the bundle namespace (@context-intelligence:...),
  • the on-disk session subdirectory (.../sessions/{id}/context-intelligence/), and
  • skill-name prefixes (context-intelligence-graph-query, context-intelligence-session-navigation).

So the /context-intelligence pattern matches filesystem path separators, not slash-command invocations. Every flagged line is a path or @namespace: ref — never a mode-invocation cue. The genuine mode-by-name references live in the mode's own contributes.context files, which load only when the mode is already active, and were correctly NOT flagged.

Naming suggestion (to stop tripping the rule while keeping advertised: false): give the mode a name distinct from the bundle namespace / directory — e.g. context-intelligence-design (or ci-design). With a distinct name, /context-intelligence-design no longer collides with the .../context-intelligence/ paths or the @context-intelligence: namespace, so the validator stops flagging it. This would require updating mode-name references in dependent skills/recipes (e.g. workflow-pattern-analysis, which is "designed to run inside the context-intelligence mode").

Alternative (upstream validator fix): add a collision allowlist so a mode whose name equals its own bundle namespace/directory is exempt from the /<mode-name> path-separator match.

@colombod

Copy link
Copy Markdown
Collaborator Author

✅ Validation complete — green light on the validation parts

@bkrabach — ran foundation:recipes/validate-bundle-repo (v3.6.0) against this branch in full mode (installed hatchling + pip locally so the build dry-run could actually run). Summary:

Check Result
Bundles load (BundleRegistry) ✅ 6/6, all classified good
Python packaging build (dry-run) ✅ PASS
Behavior hygiene ✅ PASS
Behavior reference hygiene ✅ PASS
Context-sink compliance ✅ PASS (root = 0 ctx tokens)
Tool placement ✅ PASS (no specialized tools leaked to root)
YAML structure lint ✅ PASS (no silent-failure patterns)
bundle.dot freshness ✅ fresh (regenerated + pushed in bc2bd8e)

Only repo change in this pass: regenerated bundle.dot / bundle.png (commit bc2bd8e) to clear the stale-diagram warning. Nothing else needed.

On the one remaining validator ERROR (unadvertised_but_referenced on the context-intelligence mode): confirmed false positive. The validator's /<mode-name> matcher collides with the bundle's own name — every flagged hit in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md is a filesystem path (.../sessions/{id}/context-intelligence/), an @context-intelligence: namespace ref, or a skill-name prefix — not a mode invocation. The mode is intentionally gated, so it stays advertised: false and is not renamed.

The validator's LLM also raised a §4 concern about the workflow-pattern-analysis skill description naming the mode as a run target — but that phrasing does not exist on this branch (verified: grep -c "Designed to run inside the context-intelligence mode" skills/workflow-pattern-analysis/SKILL.md0). That came from an older published copy, not this PR. So nothing to change there either.

Net: structurally + hygienically clean, packaging builds, diagram fresh. Good to go from the validation side. 🟢

Colombo D and others added 2 commits June 12, 2026 14:49
The behavior previously declared the mode with a `modes: include:` block,
which is not a valid Amplifier bundle field — foundation discovers modes by
scanning the `modes/` directory by convention, so that block was silently
ignored. The bundle also shipped no modes infrastructure, so the
`context-intelligence` mode would not be registered unless the host happened
to compose the modes bundle.

Adopt the canonical self-contained pattern (as in amplifier-bundle-superpowers):
- Remove the invalid `modes: include:` block.
- Include the modes BEHAVIOR (amplifier-bundle-modes#subdirectory=behaviors/modes.yaml),
  not the full bundle (which would override session.orchestrator). This brings
  hooks-mode, tool-mode, and the `modes` namespace.
- Add a hooks-mode hook with search_paths ["@Context-Intelligence:modes"] so this
  bundle's modes/ directory is explicitly registered even when the host does not
  otherwise provide modes infrastructure.

The mode intentionally stays advertised: false (expert-behind-gate) — unchanged.

Also regenerate bundle.dot + bundle.png to reflect the new composition.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
The behavior now wires hooks-mode (for mode registration) in addition to
hook-context-intelligence, so hooks[0] is no longer the CI hook. The bundle
validation tests asserted against data["hooks"][0] positionally, which broke
5 assertions and silently mis-targeted a few others (source @main, graph_store
checks were validating hooks-mode instead of the CI hook).

Replace positional access with a _ci_hook() helper that locates the
hook-context-intelligence spec by module name. This preserves the real
contract (CI hook present + thin-forwarder config + source @main + no
graph_store/enable_graph) while tolerating a longer hooks list.

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
Comment thread context/context-intelligence-awareness.md Outdated
Per review (bkrabach) and confirmed by foundation-expert + bundle-design-expert:
the force-loaded context/context-intelligence-awareness.md carried no actionable,
always-on instruction that another mechanism doesn't already surface:

- graph-analyst / session-navigator advertise themselves in the delegate catalog
  (meta.description: "ALWAYS delegate to this agent first").
- The context-intelligence skills advertise themselves in the per-turn skills
  visibility list, each with a "Use when…" trigger.
- The context-intelligence mode is intentionally advertised: false (user-invoked
  only); a breadcrumb would undermine that gate.
- hook-context-intelligence is passive event capture — the LLM takes no action on it.

A justified *-awareness.md must carry a routing discriminator, an anti-pattern
guard ("ALWAYS delegate / do NOT drive the CLI"), or a prerequisite. This file
carried none, so it was pure always-front-loaded overhead.

- Delete context/context-intelligence-awareness.md
- Remove the now-empty context: block from behaviors/context-intelligence.yaml
- Regenerate bundle.dot / bundle.png to match the new composition

🤖 Generated with [Amplifier](https://github.com/microsoft/amplifier)

Co-Authored-By: Amplifier <240397093+microsoft-amplifier@users.noreply.github.com>
@colombod

Copy link
Copy Markdown
Collaborator Author

Removed it. You were right to question it — I had both foundation-expert and bundle-design-expert pressure-test it independently, and neither found a justification.

A force-loaded *-awareness.md earns its always-on cost only if it carries one of: a routing discriminator (when to pick this vs a competing path), an anti-pattern guard ("ALWAYS delegate / do NOT drive the CLI directly"), or a prerequisite (install/env check the schema can't express). This file had none:

  • graph-analyst / session-navigator self-advertise in the delegate catalog (meta.description — "ALWAYS delegate to this agent first").
  • The context-intelligence skills self-advertise in the per-turn skills-visibility list, each with a "Use when…" trigger.
  • The context-intelligence mode is intentionally advertised: false (user-invoked only) — a breadcrumb would actively undermine that gate.
  • hook-context-intelligence is passive event capture the LLM never acts on.

Changes (commit 9f53249):

  • Deleted context/context-intelligence-awareness.md
  • Removed the now-empty context: block from behaviors/context-intelligence.yaml
  • Regenerated bundle.dot / bundle.png via validate-bundle-repo

The behavior now force-loads nothing of its own. Validation is green (behavior hygiene, context-sink 0%, tool placement, build, YAML lint all pass) except the known unadvertised_but_referenced finding, which remains a false positive: the flagged hits in context/safe-extraction-patterns.md and context/agents/session-storage-knowledge.md are the @context-intelligence: namespace and the .../sessions/{id}/context-intelligence/ storage path — not mode invocations. The genuine mode-by-name references live only in mode-contributed context that loads when the mode is active. Mode stays advertised: false per the expert-behind-gate design.

@bkrabach bkrabach left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@colombod colombod merged commit 5509b39 into main Jun 12, 2026
9 checks passed
@colombod colombod deleted the feat/context-intelligence-mode-knowledge branch June 12, 2026 20:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants