feat(governance): block agent edits to goals/priorities/directives (Layer 1)#765
feat(governance): block agent edits to goals/priorities/directives (Layer 1)#765kokevidaurre wants to merge 2 commits intomainfrom
Conversation
Agents now cannot Edit/Write/MultiEdit governance files. Today the authority rule is documented as convention but unenforced — agents have been observed resetting goals to fit their own logic and contradicting directives. Drift accumulates over runs. Layer 1 of the governance epic (#764): - templates/guardrail.json: PreToolUse matcher for Edit|Write|MultiEdit blocks paths matching */goals.md, */priorities.md, */directives.md, */SQUAD.md. Exit code 2 with a clear message redirecting to the proposal channel. - templates/proposed/README.md: documents the proposal channel pattern. Agents write suggestions to .squads/proposed/<file>-<date>-<slug>.md; founder reviews and merges accepted ones into canonical files. - docs/governance.md: authority-by-file table, why the split, how enforcement works, founder override. The founder's own Claude Code sessions don't pass through the agent guardrail, so direct edits work normally for governance owners. Tested locally: hook blocks goals.md (exit 2), allows state.md (exit 0). Co-Authored-By: Claude <noreply@anthropic.com>
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
There was a problem hiding this comment.
Code Review
This pull request introduces a squad governance model that restricts autonomous agents from directly editing core governance files, such as goals and priorities, by implementing a guardrail hook. The changes include new documentation for the governance model and a proposal workflow for agents. However, the guardrail implementation in templates/guardrail.json has several technical issues, including a bypass for files in the root directory, incorrect JSON key extraction for tool inputs, and inconsistencies in the suggested naming convention for proposals.
| "hooks": [ | ||
| { | ||
| "type": "command", | ||
| "command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"file_path\\\",\\\"\\\"))\" 2>/dev/null || true); case \"$path\" in */goals.md|*/priorities.md|*/directives.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/<filename>-$(date +%Y%m%d).md instead — the founder reviews and merges.\" >&2; exit 2;; esac'", |
There was a problem hiding this comment.
The guardrail implementation has several technical issues that could lead to bypasses or poor user experience:
- Root Path Bypass: The glob patterns (e.g.,
*/goals.md) only match files in subdirectories. Governance files located in the root directory (e.g.,goals.md,SQUAD.md) will not be blocked, which contradicts the documentation indocs/governance.mdimplying**/coverage. - Key Mismatch: The script extracts
file_path, but many agent tools (including standard Claude Code tools) use the keypath. This would cause the guardrail to miss the target file entirely. - MultiEdit Support: The
MultiEdittool typically operates on a list of files. This one-liner only checks a single top-level field, allowing governance files to be modified if they are part of a batch edit. - Naming Inconsistency: The suggested filename format in the error message results in
goals.md-YYYYMMDD.md, whereas the convention intemplates/proposed/README.mduses a slug and avoids double extensions (e.g.,goals-YYYYMMDD-slug.md).
| "command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"file_path\\\",\\\"\\\"))\" 2>/dev/null || true); case \"$path\" in */goals.md|*/priorities.md|*/directives.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/<filename>-$(date +%Y%m%d).md instead — the founder reviews and merges.\" >&2; exit 2;; esac'", | |
| "command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"path\\\", d.get(\\\"file_path\\\", \\\"\\\")))\" 2>/dev/null || true); case \"$path\" in goals.md|*/goals.md|priorities.md|*/priorities.md|directives.md|*/directives.md|SQUAD.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/$(basename \\\"$path\\\" .md)-$(date +%Y%m%d)-slug.md instead — the founder reviews and merges.\" >&2; exit 2;; esac'", |
Empirical: a one-shot audit across 19 squads found 3 outright false "Achieved" claims (referencing PRs/issues that don't exist) and 66 entries with no checkable reference at all. Two-thirds of goals.md content is unverifiable today. Layer 1 (the guardrail) blocks unauthorized writes. This commit adds the format spec the founder will enforce on accepted proposals: - docs/governance.md: new section "goals.md format — every claim must cite evidence" with required ref types (PR / commit / file / issue), good examples, anti-patterns, and validation pointer. - templates/proposed/README.md: callout that goals.md proposals without refs will be rejected. Validator script lives in hq for now (scripts/validate-goals.sh); graduates to "squads coherence" in a later release (Layer 3 of #764). Co-Authored-By: Claude <noreply@anthropic.com>
Summary
Layer 1 of the founder governance epic (#764) — the highest-leverage piece.
Agents can no longer Edit/Write governance files (
goals.md,priorities.md,directives.md,SQUAD.md). The authority rule existed as convention but was unenforced; agents have been observed resetting goals to fit their own logic and contradicting directives. Drift accumulates over runs.Changes
templates/guardrail.json— extended PreToolUse with anEdit|Write|MultiEditmatcher that blocks governance-file paths. Exit code 2 with a clear redirect message.templates/proposed/README.md— documents the suggestion channel pattern. Agents write proposals to.squads/proposed/<file>-<date>-<slug>.mdinstead of editing canonical files.docs/governance.md— authority-by-file table, enforcement mechanism, founder override.How it works
Founder's own Claude Code sessions don't pass through the agent guardrail, so direct edits work normally.
Tested
*/goals.md(exit 2)*/state.md,*/learnings/*(exit 0)Out of scope (separate PRs in #764)
.squads/proposed/scaffold insquads init(this PR adds the README template only)squads coherencecommandTest plan
goals.md, confirm it's blocked with the redirect messagestate.md, confirm it succeeds