Skip to content

feat(governance): block agent edits to goals/priorities/directives (Layer 1)#765

Open
kokevidaurre wants to merge 2 commits intomainfrom
feature/governance-guardrail
Open

feat(governance): block agent edits to goals/priorities/directives (Layer 1)#765
kokevidaurre wants to merge 2 commits intomainfrom
feature/governance-guardrail

Conversation

@kokevidaurre
Copy link
Copy Markdown
Contributor

Summary

Layer 1 of the founder governance epic (#764) — the highest-leverage piece.

Agents can no longer Edit/Write governance files (goals.md, priorities.md, directives.md, SQUAD.md). The authority rule existed as convention but was unenforced; agents have been observed resetting goals to fit their own logic and contradicting directives. Drift accumulates over runs.

Changes

  • templates/guardrail.json — extended PreToolUse with an Edit|Write|MultiEdit matcher that blocks governance-file paths. Exit code 2 with a clear redirect message.
  • templates/proposed/README.md — documents the suggestion channel pattern. Agents write proposals to .squads/proposed/<file>-<date>-<slug>.md instead of editing canonical files.
  • docs/governance.md — authority-by-file table, enforcement mechanism, founder override.

How it works

agent: Edit /path/.agents/memory/engineering/goals.md
hook:  exit 2 — "BLOCKED: ... is a governance file. Propose changes by writing to .squads/proposed/..."
agent: Write /path/.squads/proposed/goals-engineering-20260425-add-mobile.md
hook:  exit 0 (allowed)

Founder's own Claude Code sessions don't pass through the agent guardrail, so direct edits work normally.

Tested

  • ✅ Hook blocks */goals.md (exit 2)
  • ✅ Hook allows */state.md, */learnings/* (exit 0)
  • ✅ JSON validates

Out of scope (separate PRs in #764)

  • Layer 2: actual .squads/proposed/ scaffold in squads init (this PR adds the README template only)
  • Layer 3: squads coherence command
  • Layer 4: goal-linked GitHub issue template

Test plan

  • CI green
  • Manual: run an agent, ask it to edit goals.md, confirm it's blocked with the redirect message
  • Manual: ask the same agent to edit state.md, confirm it succeeds

Agents now cannot Edit/Write/MultiEdit governance files. Today the
authority rule is documented as convention but unenforced — agents
have been observed resetting goals to fit their own logic and
contradicting directives. Drift accumulates over runs.

Layer 1 of the governance epic (#764):

- templates/guardrail.json: PreToolUse matcher for Edit|Write|MultiEdit
  blocks paths matching */goals.md, */priorities.md, */directives.md,
  */SQUAD.md. Exit code 2 with a clear message redirecting to the
  proposal channel.

- templates/proposed/README.md: documents the proposal channel pattern.
  Agents write suggestions to .squads/proposed/<file>-<date>-<slug>.md;
  founder reviews and merges accepted ones into canonical files.

- docs/governance.md: authority-by-file table, why the split, how
  enforcement works, founder override.

The founder's own Claude Code sessions don't pass through the agent
guardrail, so direct edits work normally for governance owners.

Tested locally: hook blocks goals.md (exit 2), allows state.md (exit 0).

Co-Authored-By: Claude <noreply@anthropic.com>
@codecov-commenter
Copy link
Copy Markdown

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a squad governance model that restricts autonomous agents from directly editing core governance files, such as goals and priorities, by implementing a guardrail hook. The changes include new documentation for the governance model and a proposal workflow for agents. However, the guardrail implementation in templates/guardrail.json has several technical issues, including a bypass for files in the root directory, incorrect JSON key extraction for tool inputs, and inconsistencies in the suggested naming convention for proposals.

Comment thread templates/guardrail.json
"hooks": [
{
"type": "command",
"command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"file_path\\\",\\\"\\\"))\" 2>/dev/null || true); case \"$path\" in */goals.md|*/priorities.md|*/directives.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/<filename>-$(date +%Y%m%d).md instead — the founder reviews and merges.\" >&2; exit 2;; esac'",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The guardrail implementation has several technical issues that could lead to bypasses or poor user experience:

  1. Root Path Bypass: The glob patterns (e.g., */goals.md) only match files in subdirectories. Governance files located in the root directory (e.g., goals.md, SQUAD.md) will not be blocked, which contradicts the documentation in docs/governance.md implying **/ coverage.
  2. Key Mismatch: The script extracts file_path, but many agent tools (including standard Claude Code tools) use the key path. This would cause the guardrail to miss the target file entirely.
  3. MultiEdit Support: The MultiEdit tool typically operates on a list of files. This one-liner only checks a single top-level field, allowing governance files to be modified if they are part of a batch edit.
  4. Naming Inconsistency: The suggested filename format in the error message results in goals.md-YYYYMMDD.md, whereas the convention in templates/proposed/README.md uses a slug and avoids double extensions (e.g., goals-YYYYMMDD-slug.md).
Suggested change
"command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"file_path\\\",\\\"\\\"))\" 2>/dev/null || true); case \"$path\" in */goals.md|*/priorities.md|*/directives.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/<filename>-$(date +%Y%m%d).md instead — the founder reviews and merges.\" >&2; exit 2;; esac'",
"command": "bash -c 'path=$(echo \"$CLAUDE_TOOL_INPUT\" | python3 -c \"import sys,json; d=json.load(sys.stdin); print(d.get(\\\"path\\\", d.get(\\\"file_path\\\", \\\"\\\")))\" 2>/dev/null || true); case \"$path\" in goals.md|*/goals.md|priorities.md|*/priorities.md|directives.md|*/directives.md|SQUAD.md|*/SQUAD.md) echo \"BLOCKED: $path is a governance file. Only the founder can edit goals/priorities/directives/SQUAD identity. Propose changes by writing to .squads/proposed/$(basename \\\"$path\\\" .md)-$(date +%Y%m%d)-slug.md instead — the founder reviews and merges.\" >&2; exit 2;; esac'",

Empirical: a one-shot audit across 19 squads found 3 outright false
"Achieved" claims (referencing PRs/issues that don't exist) and 66
entries with no checkable reference at all. Two-thirds of goals.md
content is unverifiable today.

Layer 1 (the guardrail) blocks unauthorized writes. This commit adds
the format spec the founder will enforce on accepted proposals:

- docs/governance.md: new section "goals.md format — every claim must
  cite evidence" with required ref types (PR / commit / file / issue),
  good examples, anti-patterns, and validation pointer.

- templates/proposed/README.md: callout that goals.md proposals
  without refs will be rejected.

Validator script lives in hq for now (scripts/validate-goals.sh);
graduates to "squads coherence" in a later release (Layer 3 of #764).

Co-Authored-By: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants