fix: quarantine prompt-injection rule graduation#267
Conversation
There was a problem hiding this comment.
Your free trial has ended. If you'd like to continue receiving code reviews, you can add a payment method here.
📝 WalkthroughSummary
WalkthroughThis PR adds a safety gate to the graduation flow that quarantines lesson candidates suspected of prompt-injection attacks during PATTERN → RULE promotion. A new quarantine detector checks lesson metadata; malicious candidates are marked pending and blocked from promotion, while benign patterns graduate normally. ChangesGraduation Quarantine Gating
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly Related PRs
Suggested Labels
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 OpenGrep (1.22.0)OpenGrep fatal error (exit code 2): �[32m✔�[39m �[1mOpengrep OSS�[0m �[1m Loading rules from local config...�[0m Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py`:
- Around line 70-78: The current broad except hides errors from
gradata.hooks._injection_guard functions (sanitize, is_suspicious) and silently
falls back to the limited _GRADUATION_INJECTION_PATTERNS; change the handler so
ImportError is caught and handled as before, but any other Exception is logged
with logger.warning(..., exc_info=True) and then fail closed (return a
quarantine reason, e.g., the original exception message or a generic
"prompt_injection_detection_error") or re-run a local normalized fallback using
the same normalization as sanitize before checking
_GRADUATION_INJECTION_PATTERNS; update the try/except around the
sanitize/is_suspicious calls and reference sanitize, is_suspicious,
_GRADUATION_INJECTION_PATTERNS, and logger.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6228c646-09d0-4230-87a2-d7225dffe5e9
📒 Files selected for processing (2)
Gradata/src/gradata/enhancements/self_improvement/_graduation.pyGradata/tests/test_session_close_write_through_gate.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: pytest (py3.11)
- GitHub Check: pytest windows-latest / py3.12
- GitHub Check: pytest (py3.12)
- GitHub Check: pytest windows-latest / py3.11
- GitHub Check: pytest macos-latest / py3.11
- GitHub Check: pytest ubuntu-latest / py3.12
- GitHub Check: pytest macos-latest / py3.12
- GitHub Check: pytest ubuntu-latest / py3.11
🧰 Additional context used
📓 Path-based instructions (2)
Gradata/tests/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/tests/**/*.py: SetBRAIN_DIRenvironment variable viatmp_pathin conftest.py for test isolation — ensure_paths.pymodule cache refreshes when callingBrain.init()directly inside tests
Add unit tests intests/test_*.pyfor every CI push without LLM calls (deterministic); mark integration tests with@pytest.mark.integrationand skip them by default (they hit real LLM APIs)
Files:
Gradata/tests/test_session_close_write_through_gate.py
Gradata/src/**/*.py
📄 CodeRabbit inference engine (Gradata/AGENTS.md)
Gradata/src/**/*.py: Prefersentence-transformersfor local embeddings,google-genaifor Gemini embeddings,cryptographyfor AES-GCM encrypted system.db,bm25sfor BM25 rule ranking, andmem0aifor external memory adapters — guard all optional dependency imports withtry / except ImportErrorat the call site, never at module level
Maintain strict layering: Layer 0 (Primitives: _types.py, _db.py, _events.py, _paths.py, _file_lock.py; Patterns: contrib/patterns/) must never import from Layer 1 (Enhancements: enhancements/, rules/) or Layer 2 (Public API: brain.py, cli.py, daemon.py, mcp_server.py)
Never use bareexcept: pass— use typed exceptions or at minimumlogger.warning(...)withexc_info=Trueto avoid silent failure in a memory product
Never import from out-of-scope sibling directories../Sprites/or../Hausgem/withingradata/*code — that is a layering bug
Never leak private-sibling paths into public docs/code — no references to../Sprites/,../Hausgem/, email addresses, OneDrive paths, or Sprites-specific examples from insidegradata/*
Use atomic-write helper when writing JSON files to prevent corruption from mid-write crashes
Files:
Gradata/src/gradata/enhancements/self_improvement/_graduation.py
🧠 Learnings (4)
📓 Common learnings
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
📚 Learning: 2026-04-17T17:18:07.439Z
Learnt from: Gradata
Repo: Gradata/gradata PR: 0
File: :0-0
Timestamp: 2026-04-17T17:18:07.439Z
Learning: In PR `#102` (gradata/gradata), Round 2 addressed: cli.py env-first brain resolution (GRADATA_BRAIN > --brain-dir > cwd), _tenant.py corrupt .tenant_id overwrite, _env_int default clamping to minimum, and _events.py tenant-scoped fallback SELECT for dedup. All ruff and 99 tests green after these fixes.
Applied to files:
Gradata/tests/test_session_close_write_through_gate.py
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Applies to Gradata/tests/**/*.py : Add unit tests in `tests/test_*.py` for every CI push without LLM calls (deterministic); mark integration tests with `pytest.mark.integration` and skip them by default (they hit real LLM APIs)
Applied to files:
Gradata/tests/test_session_close_write_through_gate.py
📚 Learning: 2026-05-01T15:50:32.772Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: Gradata/AGENTS.md:0-0
Timestamp: 2026-05-01T15:50:32.772Z
Learning: Ensure the 4 deterministic guarantees have tests: (1) Correction in → rule extracted out, (2) Rule retrieved/applied in subsequent session, (3) Contradicting evidence lowers confidence, (4) Stale rules decay below threshold
Applied to files:
Gradata/tests/test_session_close_write_through_gate.py
🔇 Additional comments (2)
Gradata/src/gradata/enhancements/self_improvement/_graduation.py (1)
13-56: LGTM!Also applies to: 399-409
Gradata/tests/test_session_close_write_through_gate.py (1)
72-115: LGTM!
| try: | ||
| from gradata.hooks._injection_guard import is_suspicious, sanitize | ||
|
|
||
| normalized = sanitize(text) | ||
| suspicious, reason = is_suspicious(normalized) | ||
| if suspicious: | ||
| return reason or "prompt_injection_pattern" | ||
| except Exception: | ||
| normalized = text |
There was a problem hiding this comment.
Don’t silently downgrade the quarantine detector on _injection_guard failures.
Line 77 catches every detector error and falls back to _GRADUATION_INJECTION_PATTERNS, but that fallback is much narrower than gradata.hooks._injection_guard.is_suspicious(). If sanitize()/is_suspicious() ever raises, phrases like ignore previous instructions stop being quarantined and can graduate into durable RULE/AGENTS output again. Narrow this to ImportError, and on unexpected failures either log with exc_info=True and fail closed, or re-run an equivalent normalized fallback locally.
As per coding guidelines, use typed exceptions or at minimum logger.warning(..., exc_info=True) to avoid silent failure in a memory product; based on learnings from Gradata/src/gradata/hooks/_injection_guard.py:111-162, sanitize() + is_suspicious() provide broader normalized detection than the local regex fallback.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@Gradata/src/gradata/enhancements/self_improvement/_graduation.py` around
lines 70 - 78, The current broad except hides errors from
gradata.hooks._injection_guard functions (sanitize, is_suspicious) and silently
falls back to the limited _GRADUATION_INJECTION_PATTERNS; change the handler so
ImportError is caught and handled as before, but any other Exception is logged
with logger.warning(..., exc_info=True) and then fail closed (return a
quarantine reason, e.g., the original exception message or a generic
"prompt_injection_detection_error") or re-run a local normalized fallback using
the same normalization as sanitize before checking
_GRADUATION_INJECTION_PATTERNS; update the try/except around the
sanitize/is_suspicious calls and reference sanitize, is_suspicious,
_GRADUATION_INJECTION_PATTERNS, and logger.
Source: Coding guidelines
Summary
lessons.mdviaPending approval: yesandKill reason: graduation_quarantine:<reason>; they are not exported toAGENTS.md.Paperclip issue UUID: 139431c6-0f40-445d-a937-34f5a3289982
Paperclip issue: GRA-2088
Verification