fix: quarantine prompt-injection rule graduation by Gradata · Pull Request #267 · Gradata/gradata

Gradata · 2026-06-07T18:15:44Z

Summary

Add conservative prompt-injection quarantine at PATTERN -> RULE graduation boundary.
Reuse existing hook injection guard plus credential/tool-override regexes.
Quarantined candidates remain observable in lessons.md via Pending approval: yes and Kill reason: graduation_quarantine:<reason>; they are not exported to AGENTS.md.
Add regression coverage for malicious and benign rule candidates.

Paperclip issue UUID: 139431c6-0f40-445d-a937-34f5a3289982
Paperclip issue: GRA-2088

Verification

python3 -m pytest tests/test_session_close_write_through_gate.py -q
# 7 passed in 0.23s

/home/olive/.local/bin/uvx ruff check src/gradata/enhancements/self_improvement/_graduation.py tests/test_session_close_write_through_gate.py
# All checks passed!

python3 -m pytest tests/test_session_close_write_through_gate.py tests/test_graduation_notification.py tests/test_rule_graduated_events.py -q
# 20 passed in 12.27s

greptile-apps

Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.

coderabbitai · 2026-06-07T18:15:54Z

📝 Walkthrough

Summary

Security Enhancement: Introduces a conservative quarantine gate for prompt-injection candidates during PATTERN → RULE graduation boundary
Detection Mechanism: Reuses existing hook injection guard and applies credential/tool-override regexes to identify risky candidates
Quarantine Behavior: Marks candidates as pending_approval = True with kill_reason prefixed with graduation_quarantine:<reason>, preventing export to AGENTS.md while remaining observable in lessons.md
Implementation: Adds _graduation_quarantine_reason() helper function to _graduation.py with compiled regex patterns for injection detection
Test Coverage: New regression tests verify both malicious prompt-injection candidates are blocked and benign candidates still graduate normally
Import Refactoring: Adjusts imports to remove specific threshold constants, relying on graduation_thresholds() fields
No Breaking Changes: No alterations to exported or public function signatures

Walkthrough

This PR adds a safety gate to the graduation flow that quarantines lesson candidates suspected of prompt-injection attacks during PATTERN → RULE promotion. A new quarantine detector checks lesson metadata; malicious candidates are marked pending and blocked from promotion, while benign patterns graduate normally.

Changes

Graduation Quarantine Gating

Layer / File(s)	Summary
Injection detection patterns and helper `Gradata/src/gradata/enhancements/self_improvement/_graduation.py`	`re` import is added and `_graduation_quarantine_reason()` helper with compiled `_GRADUATION_INJECTION_PATTERNS` detects suspicious prompt-injection text in lesson metadata using `gradata.hooks._injection_guard` with regex fallback.
Graduation quarantine integration `Gradata/src/gradata/enhancements/self_improvement/_graduation.py`	During PATTERN → RULE promotion eligibility, `_graduation_quarantine_reason()` is called; if a reason is returned, the lesson is marked `pending_approval`, assigned a quarantine-prefixed `kill_reason`, logged as a warning, and skipped from promotion without hook installation.
Quarantine and normal graduation tests `Gradata/tests/test_session_close_write_through_gate.py`	Two new tests verify that malicious PATTERN candidates are quarantined (pending with kill_reason, no RULE/AGENTS.md entry) and that benign candidates still graduate normally (promoted to RULE and exported as AGENTS.md bullets).

Estimated Code Review Effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly Related PRs

Gradata/gradata#223: Both PRs use gradata.hooks._injection_guard for prompt-injection detection—feat(hooks): add prompt-injection sanitization layer to jit_inject (GRA-1295) #223 wires the guard into hooks/jit_inject, while this PR applies it to graduation promotion to quarantine suspicious candidates.
Gradata/gradata#183: Both PRs modify self-improvement graduation logic around PATTERN → RULE; #183 centralizes graduation thresholds via graduation_thresholds(), and this PR uses the same threshold wiring while adding the quarantine check.
Gradata/gradata#249: Both PRs modify the session_close._run_graduation promotion flow for PATTERN → RULE—this PR adds a quarantine/pending gate while #249 adds persistence and AGENTS.md auto-export for the same stage.

Suggested Labels

security, bug

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'fix: quarantine prompt-injection rule graduation' accurately summarizes the main change: adding a quarantine mechanism for prompt-injection candidates during PATTERN→RULE graduation.
Description check	✅ Passed	The description is directly related to the changeset, clearly explaining the quarantine feature, implementation approach, observability in lessons.md, and exclusion from AGENTS.md exports.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch gra-2088-graduation-safety

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 OpenGrep (1.22.0)

OpenGrep fatal error (exit code 2):
┌──────────────┐
│ Opengrep CLI │
└──────────────┘

�[32m✔�[39m �[1mOpengrep OSS�[0m
�[32m✔�[39m Basic security coverage for first-party code vulnerabilities.

�[1m Loading rules from local config...�[0m
[00.16][ERROR]: Error: exception Glob.Lexer.Syntax_error("malformed glob pattern: missing ']'")
Raised at Glob__Lexer.syntax_error in file "libs/glob/Lexer.mll", line 8, characters 2-26
Called from Glob__Lexer.__ocaml_lex_token_rec in file "libs/glob/Lexer.mll", line 29, characters 26-53
Cal

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py`:
- Around line 70-78: The current broad except hides errors from
gradata.hooks._injection_guard functions (sanitize, is_suspicious) and silently
falls back to the limited _GRADUATION_INJECTION_PATTERNS; change the handler so
ImportError is caught and handled as before, but any other Exception is logged
with logger.warning(..., exc_info=True) and then fail closed (return a
quarantine reason, e.g., the original exception message or a generic
"prompt_injection_detection_error") or re-run a local normalized fallback using
the same normalization as sanitize before checking
_GRADUATION_INJECTION_PATTERNS; update the try/except around the
sanitize/is_suspicious calls and reference sanitize, is_suspicious,
_GRADUATION_INJECTION_PATTERNS, and logger.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6228c646-09d0-4230-87a2-d7225dffe5e9

📥 Commits

Reviewing files that changed from the base of the PR and between f05b53d and f795d4d.

📒 Files selected for processing (2)

Gradata/src/gradata/enhancements/self_improvement/_graduation.py
Gradata/tests/test_session_close_write_through_gate.py

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)

GitHub Check: pytest (py3.11)
GitHub Check: pytest windows-latest / py3.12
GitHub Check: pytest (py3.12)
GitHub Check: pytest windows-latest / py3.11
GitHub Check: pytest macos-latest / py3.11
GitHub Check: pytest ubuntu-latest / py3.12
GitHub Check: pytest macos-latest / py3.12
GitHub Check: pytest ubuntu-latest / py3.11

🧰 Additional context used

📓 Path-based instructions (2)

Gradata/tests/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/tests/**/*.py: Set BRAIN_DIR environment variable via tmp_path in conftest.py for test isolation — ensure _paths.py module cache refreshes when calling Brain.init() directly inside tests
Add unit tests in tests/test_*.py for every CI push without LLM calls (deterministic); mark integration tests with @pytest.mark.integration and skip them by default (they hit real LLM APIs)

Files:

Gradata/tests/test_session_close_write_through_gate.py

Gradata/src/**/*.py

📄 CodeRabbit inference engine (Gradata/AGENTS.md)

Gradata/src/**/*.py: Prefer sentence-transformers for local embeddings, google-genai for Gemini embeddings, cryptography for AES-GCM encrypted system.db, bm25s for BM25 rule ranking, and mem0ai for external memory adapters — guard all optional dependency imports with try / except ImportError at the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bare except: pass — use typed exceptions or at minimum logger.warning(...) with exc_info=True to avoid silent failure in a memory product
Never import from out-of-scope sibling directories ../Sprites/ or ../Hausgem/ within gradata/* code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to ../Sprites/, ../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from inside gradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes

Files:

Gradata/src/gradata/enhancements/self_improvement/_graduation.py

🧠 Learnings (4)

📓 Common learnings

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

📚 Learning: 2026-04-17T17:18:07.439Z

Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.

Applied to files:

Gradata/tests/test_session_close_write_through_gate.py

📚 Learning: 2026-05-01T15:50:32.772Z

Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Add unit tests in `tests/test_*.py` for every CI push without LLM calls (deterministic); mark integration tests with `pytest.mark.integration` and skip them by default (they hit real LLM APIs)

Applied to files:

Gradata/tests/test_session_close_write_through_gate.py

📚 Learning: 2026-05-01T15:50:32.772Z

Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Ensure the 4 deterministic guarantees have tests: (1) Correction in → rule extracted out, (2) Rule retrieved/applied in subsequent session, (3) Contradicting evidence lowers confidence, (4) Stale rules decay below threshold

Applied to files:

Gradata/tests/test_session_close_write_through_gate.py

🔇 Additional comments (2)

Gradata/src/gradata/enhancements/self_improvement/_graduation.py (1)

13-56: LGTM!

Also applies to: 399-409

Gradata/tests/test_session_close_write_through_gate.py (1)

72-115: LGTM!

coderabbitai · 2026-06-07T18:19:48Z

+    try:
+        from gradata.hooks._injection_guard import is_suspicious, sanitize
+
+        normalized = sanitize(text)
+        suspicious, reason = is_suspicious(normalized)
+        if suspicious:
+            return reason or "prompt_injection_pattern"
+    except Exception:
+        normalized = text


⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Don’t silently downgrade the quarantine detector on _injection_guard failures.

Line 77 catches every detector error and falls back to _GRADUATION_INJECTION_PATTERNS, but that fallback is much narrower than gradata.hooks._injection_guard.is_suspicious(). If sanitize()/is_suspicious() ever raises, phrases like ignore previous instructions stop being quarantined and can graduate into durable RULE/AGENTS output again. Narrow this to ImportError, and on unexpected failures either log with exc_info=True and fail closed, or re-run an equivalent normalized fallback locally.

As per coding guidelines, use typed exceptions or at minimum logger.warning(..., exc_info=True) to avoid silent failure in a memory product; based on learnings from Gradata/src/gradata/hooks/_injection_guard.py:111-162, sanitize() + is_suspicious() provide broader normalized detection than the local regex fallback.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py` around lines 70 - 78, The current broad except hides errors from gradata.hooks._injection_guard functions (sanitize, is_suspicious) and silently falls back to the limited _GRADUATION_INJECTION_PATTERNS; change the handler so ImportError is caught and handled as before, but any other Exception is logged with logger.warning(..., exc_info=True) and then fail closed (return a quarantine reason, e.g., the original exception message or a generic "prompt_injection_detection_error") or re-run a local normalized fallback using the same normalization as sanitize before checking _GRADUATION_INJECTION_PATTERNS; update the try/except around the sanitize/is_suspicious calls and reference sanitize, is_suspicious, _GRADUATION_INJECTION_PATTERNS, and logger.

Source: Coding guidelines

fix: quarantine prompt-injection rule graduation

f795d4d

greptile-apps Bot reviewed Jun 7, 2026

View reviewed changes

coderabbitai Bot added bug Something isn't working security labels Jun 7, 2026

coderabbitai Bot requested changes Jun 7, 2026

View reviewed changes

Gradata merged commit 69ad787 into main Jun 7, 2026
9 checks passed

Gradata deleted the gra-2088-graduation-safety branch June 7, 2026 18:34

coderabbitai Bot mentioned this pull request Jun 7, 2026

fix: quarantine prompt-injection rule graduation #268

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: quarantine prompt-injection rule graduation#267

fix: quarantine prompt-injection rule graduation#267
Gradata merged 1 commit into
mainfrom
gra-2088-graduation-safety

Gradata commented Jun 7, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading

Summary

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Jun 7, 2026

Summary

Verification

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Jun 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Walkthrough

Changes

Estimated Code Review Effort

Possibly Related PRs

Suggested Labels

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Jun 7, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Jun 7, 2026 •

edited

Loading