Skip to content

fix: quarantine prompt-injection rule graduation#268

Merged
Gradata merged 1 commit into
mainfrom
gra-2088-graduation-safety
Jun 7, 2026
Merged

fix: quarantine prompt-injection rule graduation#268
Gradata merged 1 commit into
mainfrom
gra-2088-graduation-safety

Conversation

@Gradata

@Gradata Gradata commented Jun 7, 2026

Copy link
Copy Markdown
Owner

Summary

  • Add a conservative graduation-time prompt-injection quarantine gate before lesson candidates are promoted to durable rules / AGENTS.md.
  • Reuse the existing hook injection guard when available, then fall back to graduation-local regex patterns for direct overrides, system-prompt hijacks, tool overrides, and credential exfiltration.
  • Fail closed if the detector errors so suspicious candidates stay pending with an observable graduation_quarantine:* kill reason.

Paperclip issue UUID: 139431c6-0f40-445d-a937-34f5a3289982
Paperclip issue: GRA-2088

Verification

env -u BRAIN_DIR -u GRADATA_BRAIN python3 -m pytest tests/test_session_close_write_through_gate.py tests/security/test_prompt_injection_poc.py -q

Result:

............................                                             [100%]
28 passed in 0.33s

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai

coderabbitai Bot commented Jun 7, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 80c49af5-23c5-47a8-b02a-4bbd09651844

📥 Commits

Reviewing files that changed from the base of the PR and between 69ad787 and ec174fc.

📒 Files selected for processing (2)
  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
  • Gradata/tests/test_session_close_write_through_gate.py
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
  • GitHub Check: pytest macos-latest / py3.12
  • GitHub Check: pytest windows-latest / py3.12
  • GitHub Check: pytest macos-latest / py3.11
  • GitHub Check: pytest windows-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.11
  • GitHub Check: pytest ubuntu-latest / py3.12
  • GitHub Check: pytest (py3.11)
  • GitHub Check: pytest (py3.12)
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

  • Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

  • Gradata/tests/test_session_close_write_through_gate.py
🧠 Learnings (1)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
🔇 Additional comments (2)
Gradata/src/gradata/enhancements/self_improvement/_graduation.py (1)

70-90: LGTM!

Gradata/tests/test_session_close_write_through_gate.py (1)

96-118: LGTM!


📝 Walkthrough

Summary

  • Security Fix: Added a conservative graduation-time prompt-injection quarantine gate that prevents suspicious lesson candidates from being promoted to durable rules
  • Smart Fallback: Reuses existing gradata.hooks._injection_guard when available; falls back to graduation-local regex detectors covering direct overrides, system-prompt hijacks, tool overrides, and credential exfiltration
  • Fail-Closed Design: Detector errors result in candidates remaining pending with observable kill reasons prefixed with graduation_quarantine:*
  • Improved Error Handling: Distinguishes between import failures (falls back to regex) and runtime detector failures (returns "prompt_injection_detection_error" immediately)
  • Test Coverage: Added new test verifying fail-closed behavior when sanitizer raises RuntimeError; test suite passes (28 tests in 0.33s)
  • No Breaking Changes: Only internal logic and error handling modified; no public API signature changes

Walkthrough

This PR enhances error handling in the graduation quarantine system for prompt-injection detection. When the injection detector cannot be imported, the logic falls back to raw-text regex matching. When detection itself raises an exception, it immediately returns a quarantine error reason with logging instead of risking continued execution with unsanitized text. A new test validates fail-closed behavior when sanitization fails.

Changes

Fail-closed error handling in injection quarantine

Layer / File(s) Summary
Injection detector error handling in quarantine
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
_graduation_quarantine_reason() now distinguishes ImportError (skips detector, uses raw-text regex fallback) from runtime exceptions during sanitization (logs warning, returns immediate quarantine reason), ensuring fail-closed behavior when the injection guard is unavailable or fails.
Error handling validation tests
Gradata/tests/test_session_close_write_through_gate.py
Adds test_run_graduation_quarantines_when_injection_guard_errors to verify that when the injection sanitizer raises RuntimeError, suspicious pattern candidates remain quarantined (not promoted to rules), include the specific "prompt_injection_detection_error" kill reason, and do not create AGENTS.md export.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

  • Gradata/gradata#267: Both PRs modify the prompt-injection quarantine logic in self_improvement/_graduation.py via _graduation_quarantine_reason() (using gradata.hooks._injection_guard with fallback/quarantine behavior) and add/extend session-close graduation tests that ensure quarantined PATTERN candidates do not graduate or export to AGENTS.md.
  • Gradata/gradata#223: Both PRs center on the prompt-injection injection-guard (gradata.hooks._injection_guard), with the main PR adding fail-closed error handling around importing/sanitizing it for quarantine/graduation decisions, while the retrieved PR introduces that sanitization/detection layer and wires it into jit_inject.

Suggested labels

bug, security

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: quarantine prompt-injection rule graduation' directly and clearly describes the main change: adding a quarantine mechanism for prompt-injection during rule graduation.
Description check ✅ Passed The description is directly related to the changeset, providing clear context about the quarantine gate, fallback patterns, fail-closed behavior, and includes verification results.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch gra-2088-graduation-safety

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.25][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal


Comment @coderabbitai help to get the list of available commands and usage tips.

@Gradata Gradata force-pushed the gra-2088-graduation-safety branch from 1dc8af4 to ec174fc Compare June 7, 2026 18:37

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

@coderabbitai coderabbitai Bot added bug Something isn't working security labels Jun 7, 2026
@Gradata Gradata merged commit acde088 into main Jun 7, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working security

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant