Skip to content

fix: sidebar prompt injection defense (v0.13.4.0)#611

Merged
garrytan merged 3 commits into
mainfrom
garrytan/extension-prompt-injection-defense
Mar 29, 2026
Merged

fix: sidebar prompt injection defense (v0.13.4.0)#611
garrytan merged 3 commits into
mainfrom
garrytan/extension-prompt-injection-defense

Conversation

@garrytan

Copy link
Copy Markdown
Owner

Summary

Three security layers for the Chrome sidebar extension, which has bash access via Claude:

  • XML prompt framing with trust boundaries — user messages wrapped in <user-message> tags, XML special chars escaped to prevent tag injection. System prompt explicitly instructs Claude to treat content as data.
  • Bash command allowlist — system prompt restricts bash to browse binary commands only ($B goto, $B click, $B snapshot). All other commands (curl, rm, cat) are forbidden. Prevents prompt injection from escalating to arbitrary code execution.
  • Opus default — sidebar now uses the most injection-resistant model by default.

Bug fix: sidebar-agent.ts was silently rebuilding its own Claude args from scratch, ignoring --model, --allowedTools, and all other server-side arg changes. Fixed to use queued args from server.ts.

Design doc: docs/designs/ML_PROMPT_INJECTION_KILLER.md covers the follow-up ML classifier PR (DeBERTa via @huggingface/transformers, BrowseSafe-bench red team harness, and the ambitious Bun-native 5ms inference vision).

Test Coverage

12 new tests in browse/test/sidebar-security.test.ts:

  • XML escaping (tag closing attacks, ampersands, clean passthrough)
  • Command allowlist (system prompt contains restrictions)
  • Opus model default
  • Trust boundary instructions
  • Sidebar-agent arg plumbing fix

All existing tests pass. Zero regressions.

Pre-Landing Review

CEO review (SCOPE EXPANSION): 6 proposals, 5 accepted, 1 deferred.
Eng review: 1 issue (ONNX + compiled Bun compat), resolved with @huggingface/transformers v4.
Codex review: 15 findings, 4 critical, all resolved (command allowlist instead of removing Bash, arg plumbing fix, XML escaping, salted payload hashes).

Test plan

  • 12 sidebar security tests pass
  • Full test suite passes (0 failures)

🤖 Generated with Claude Code

garrytan and others added 2 commits March 28, 2026 18:20
…t, arg plumbing

Three security fixes for the Chrome sidebar:

1. XML-framed prompts with trust boundaries and escape of < > & in user
   messages to prevent tag injection attacks.

2. Bash command allowlist in system prompt — only browse binary commands
   ($B goto, $B click, etc.) allowed. All other bash commands forbidden.

3. Fix sidebar-agent.ts ignoring queued args — server-side --model and
   --allowedTools changes were silently dropped because the agent rebuilt
   args from scratch instead of using the queue entry.

Also defaults sidebar to Opus (harder to manipulate).

12 new tests covering XML escaping, command allowlist, Opus default,
trust boundary instructions, and arg plumbing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
ML prompt injection defense design doc + P0 TODO for follow-up PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@github-actions

github-actions Bot commented Mar 29, 2026

Copy link
Copy Markdown

E2E Evals: ✅ PASS

8/8 tests passed | $1.06 total cost | 12 parallel runners

Suite Result Status Cost
e2e-browse 2/2 $0.13
e2e-deploy 2/2 $0.24
e2e-qa-workflow 1/1 $0.43
llm-judge 1/1 $0.02
e2e-deploy 2/2 $0.24

12x ubicloud-standard-2 (Docker: pre-baked toolchain + deps) | wall clock ≈ slowest suite

loadSession() was restoring worktreePath and claudeSessionId from prior
crashes. The worktree directory no longer existed (deleted on cleanup)
and --resume with a dead session ID caused claude to fail silently.

Now validates worktree exists on load and clears stale claude session
IDs on every server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@garrytan garrytan merged commit ea7dbc9 into main Mar 29, 2026
18 checks passed
arturmilachjr pushed a commit to arturmilachjr/gstack that referenced this pull request Mar 30, 2026
VERSION was bumped to 0.13.4.0 in garrytan#611 but package.json was left at 0.13.3.0,
causing the gen-skill-docs version-match test to fail.

https://claude.ai/code/session_013TrjT8R7grdziMP1ffRivA
mathiasmora2232 pushed a commit to mathiasmora2232/gstack that referenced this pull request May 30, 2026
* fix: sidebar prompt injection defense — XML framing, command allowlist, arg plumbing

Three security fixes for the Chrome sidebar:

1. XML-framed prompts with trust boundaries and escape of < > & in user
   messages to prevent tag injection attacks.

2. Bash command allowlist in system prompt — only browse binary commands
   ($B goto, $B click, etc.) allowed. All other bash commands forbidden.

3. Fix sidebar-agent.ts ignoring queued args — server-side --model and
   --allowedTools changes were silently dropped because the agent rebuilt
   args from scratch instead of using the queue entry.

Also defaults sidebar to Opus (harder to manipulate).

12 new tests covering XML escaping, command allowlist, Opus default,
trust boundary instructions, and arg plumbing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump version and changelog (v0.13.4.0)

ML prompt injection defense design doc + P0 TODO for follow-up PR.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: clear stale worktree and claude session on sidebar reconnect

loadSession() was restoring worktreePath and claudeSessionId from prior
crashes. The worktree directory no longer existed (deleted on cleanup)
and --resume with a dead session ID caused claude to fail silently.

Now validates worktree exists on load and clears stale claude session
IDs on every server restart.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant