feat(rule-to-hook): close auto-promotion loop in injection paths (Phase 5)#81
Conversation
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 6 minutes and 10 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (8)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
Deploying gradata-dashboard with
|
| Latest commit: |
0d30351
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://04b3acb1.gradata-dashboard.pages.dev |
| Branch Preview URL: | https://wt-phase5-rule-to-hook-autop.gradata-dashboard.pages.dev |
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
…xt injection Phase 5 rule-to-hook auto-promotion closes the loop between graduation and enforcement. Hook generation was already wired in graduate() (it installs a .js hook when a deterministic PATTERN promotes to RULE), but the injection pipelines kept re-injecting the same rule as text, wasting context. This change adds the missing demotion step and a public promote/demote API. - inject_brain_rules (SessionStart) and jit_inject (UserPromptSubmit) now skip rules whose metadata.how_enforced == "hooked" (or legacy "[hooked]" description prefix). Uses the existing is_hook_enforced helper. - New public promote()/demote() in rule_to_hook: classify + generate + install (or inverse remove) as a single entry point for CLI, tests, and external tooling. Graduation keeps its own inline path for the fast case. - cmd_rule_remove now round-trips metadata.how_enforced back to "injected" (via a Metadata JSON rewrite) and drops the trailing metadata block when --purge is used, so demotion fully re-enables text injection. - 15 new tests in test_rule_to_hook_promotion.py covering classification, promote/demote inverses, metadata round-trip, injection-pipeline filtering, graduation auto-promotion, and the full promote -> skip -> demote -> resume loop. Determinism classification is unchanged: the existing regex-matched DETERMINISTIC_PATTERNS table (em-dash, file-size, secret-scan, destructive-block, root-file, auto-test, read-before-edit, fstring) is the single source of truth. Anything not matching is routed to prompt injection, same as before. Tests: 2416 pass, 23 skip (unchanged). Ruff clean on all changed files.
… sessions + zero reversals) Applies the polyclaude council verdict for Phase 5: promotion to a hook is now EMPIRICAL, not theoretical. confidence >= 0.99 is necessary but no longer sufficient. A rule must also have: - fire_count >= 10 - correction_event_ids spanning >= 3 distinct sessions - zero RULE_PATCH_REVERTED or HOOK_DEMOTED events in the last 30 days Gate wiring: - rule_to_hook.py: new _passes_empirical_gate() + count_distinct_sessions() + count_human_reversals() helpers. promote() takes an optional lesson= kwarg and short-circuits with the gate reason when the lesson flunks. - self_improvement.py: the inline try_generate() path inside graduate() runs the same gate before rendering; failing lessons stay as text injection and log a debug line. Event wiring: - New module constants RULE_PATCH_REVERTED and HOOK_DEMOTED. - cli.py:cmd_rule_remove emits RULE_PATCH_REVERTED on any demote (text-metadata flip or hook-file removal) plus HOOK_DEMOTED when a .js file is actually deleted. - rule_to_hook.demote() also mirrors the existing RULE_TO_HOOK_REMOVED emit as a HOOK_DEMOTED event so the reversal counter sees it. AST / determinism class registry hook: - _types.Lesson gains determinism_class: str | None = None. Unused today; reserved for a future AST-transform registry (shell substitution, path enforcement, etc.). Deferred to a separate PR per council. Tests: - 6 new tests: gate blocks at fire_count<10, at distinct-sessions<3, and when a recent reversal exists; gate passes when all criteria met; determinism_class field exists + defaults None; HOOK_DEMOTED emitted on demote alongside RULE_TO_HOOK_REMOVED. - Existing test_graduate_promotes_and_installs_hook_for_em_dash and test_graduation_auto_promotes_deterministic_rule bumped to satisfy the new gate (fire_count 10, 3-session proof chain, seeded events.jsonl). 2422 pass / 23 skip. Ruff clean on the five new helpers + determinism_class. Co-Authored-By: Gradata <noreply@gradata.ai>
eba3127 to
0d30351
Compare
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
…ebase onto main Rewrite TestSelfHealingE2E::test_inline_auto_heal_emits_rule_patched and TestCorrectAutoHealIntegration::test_auto_heal_emits_patch_when_retroactive_test_passes to exercise ONLY the inline `brain.correct(auto_heal=True)` path. Remove the manual `review_rule_failures()` + `brain.patch_rule()` fallback so a regression in the inline orchestration fails the test instead of being masked by the fallback emitting RULE_PATCHED on its own. Also address three smaller CR items: - `_core.py`: forward `auto_heal` through the cloud branch by falling back to the local pipeline when the flag is set (cloud client does not yet carry the flag), so `auto_heal=True` is no longer silently dropped in cloud-connected brains. - `_core.py`: route the auto-heal notice through `_log.warning` instead of `print(..., file=sys.stderr)` to keep the SDK logging pipeline consistent. - `brain.py`: invalidate `_rule_cache` after `Brain.auto_heal()` applies patches, so later `apply_brain_rules()` calls see the patched prompt. Rebased onto origin/main (includes #76 cloud split, #80 distribution, #81 rule-to-hook auto-promotion). Rebase was conflict-free -- zero file overlap between this branch and the three landed PRs. Co-Authored-By: Gradata <noreply@gradata.ai>
…() (Phase 1) (#77) * feat(self-healing): close the auto-heal loop brain.auto_heal() reads recent RULE_FAILURE events, runs review_rule_failures, gates each candidate through retroactive_test, and applies the survivors via brain.patch_rule. Batches are deduped by (category, original_description) and hard-capped at max_patches (default 5) so a single session can't rewrite a rule multiple times. Before this commit, PR #21 emitted RULE_FAILURE but nothing consumed it. The full flow required the caller to manually call review_rule_failures + brain.patch_rule as the E2E test does. Adds auto_heal_failures(brain, ...) orchestrator in self_healing.py and Brain.auto_heal public method that delegates to it. 6 new tests covering empty input, passing candidate, retro-test failure, in-batch dedup, max_patches cap, and event-log read when no events are passed. * feat(self-healing): wire auto_heal into brain.correct() by default When brain.correct() detects a RULE_FAILURE it now invokes auto_heal_failures inline, capped at one patch per call so a single correction can rewrite at most the rule it just failed. Skipped in dry_run, approval_required, and renter modes; opt-out via correct(auto_heal=False). Adds auto_heal kwarg to brain.correct + brain_correct in _core.py and surfaces the heal summary on event["auto_healed"] when a patch lands. 3 new tests covering the default-on path, the opt-out path, and the dry_run skip. Closes the diagnose-then-do-nothing gap that PR #21 left open. * fix(self-healing): default auto_heal=False per council; add PatchReceipt for visibility Polyclaude council verdict (4/4 FALSE) on Phase 1: default auto_heal=True is too hot. Ship False default plus a visible PatchReceipt so any auto-heal that does fire is loud, not silent. Flip to True deferred until (a) a pending_patches review queue and (b) ablation showing rule-quality lift on a held-out broken-rule corpus. - brain.correct() and _core.brain_correct() default auto_heal to False - auto_heal_failures() returns PatchReceipt dicts: rule_id, old/new confidence, patch_diff, revert_command (legacy fields kept for BC) - _core emits one stderr line per successful auto-heal patch so silent rule edits cannot sneak through - tests: opt-in explicit auto_heal=True on existing integration tests, plus new test_default_off_no_patch and test_patch_receipt_shape Co-Authored-By: Gradata <noreply@gradata.ai> * fix(self-healing): address CodeRabbit review on #77 (docstring + type ann + test fixture) - Document auto_heal parameter on Brain.correct (defaults to False post-council) - Add forward-ref Brain type annotation on auto_heal_failures via TYPE_CHECKING - Hoist brain_with_rule to module-scope fixture and remove 5 duplicated class-level copies - Split weak test_auto_heal_triggers_on_rule_failure into two deterministic tests: one asserts rule_failure_detected only, the second drives auto_heal_failures with known-good delta words and asserts patched==1 plus auto_heal: reason prefix Co-Authored-By: Gradata <noreply@gradata.ai> * fix(self-healing): address CodeRabbit on inline auto-heal coverage; rebase onto main Rewrite TestSelfHealingE2E::test_inline_auto_heal_emits_rule_patched and TestCorrectAutoHealIntegration::test_auto_heal_emits_patch_when_retroactive_test_passes to exercise ONLY the inline `brain.correct(auto_heal=True)` path. Remove the manual `review_rule_failures()` + `brain.patch_rule()` fallback so a regression in the inline orchestration fails the test instead of being masked by the fallback emitting RULE_PATCHED on its own. Also address three smaller CR items: - `_core.py`: forward `auto_heal` through the cloud branch by falling back to the local pipeline when the flag is set (cloud client does not yet carry the flag), so `auto_heal=True` is no longer silently dropped in cloud-connected brains. - `_core.py`: route the auto-heal notice through `_log.warning` instead of `print(..., file=sys.stderr)` to keep the SDK logging pipeline consistent. - `brain.py`: invalidate `_rule_cache` after `Brain.auto_heal()` applies patches, so later `apply_brain_rules()` calls see the patched prompt. Rebased onto origin/main (includes #76 cloud split, #80 distribution, #81 rule-to-hook auto-promotion). Rebase was conflict-free -- zero file overlap between this branch and the three landed PRs. Co-Authored-By: Gradata <noreply@gradata.ai> * fix(self-healing): use contextlib.suppress for cache invalidation (Ruff SIM105) CodeRabbit's 3rd review on PR #77 + CI Ruff job both flagged the try/except/pass block at brain.py:548-552 as SIM105 violation. Replaced with contextlib.suppress(Exception) (contextlib already imported at line 51). Behavior unchanged. Co-Authored-By: Gradata <noreply@gradata.ai> --------- Co-authored-by: Gradata <noreply@gradata.ai>
…test LOC) Deletes dead code flagged in the autoresearch leanness audit after grep-verifying that no runtime import path exists. All 2453 tests pass. Source files removed (2101 LOC): - src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub) - src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers; SuperMetaRule dataclass + SQL table live in meta_rules.py / meta_rules_storage.py and remain wired) - src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only) - src/gradata/rules/budget.py (43 LOC, test-only) - src/gradata/rules/rw_lock.py (54 LOC, test-only) - src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py re-export + test) - src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest string + test reference) - src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest string + test references; contradiction_detector.py covers the live path via self_improvement.py:545) - src/gradata/security/privacy_model.py (113 LOC, test + docs only; _core.py / brain.py / _export_brain.py grep-clean) - src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example + test only, no CLI/docs runtime reference) Test files removed (1042 LOC): matching tests for each module plus targeted pruning of rule_evolution test classes (TestRuleConflicts, TestRuleRelationEnum, rule_evolution imports in TestIntegration) from tests/test_steals.py and the TestRuleABTesting block in tests/test_adaptations.py. Registry + docstring updates: - contrib/enhancements/install_manifest.py: drop rule_verifier from rule-integrity module components - _manifest_helpers.py: drop rule_evolution from _core_modules - enhancements/__init__.py: drop rule_verifier docstring line - cloud/__init__.py: drop WikiStore lazy re-export - enhancements/meta_rules_storage.py: docstring no longer points at the deleted super_meta_rules.py NOT DELETED (verified live via PRs #77/#81/#86): - enhancements/rule_ranker.py, self_healing.py, rule_canary.py, rule_to_hook.py (all have runtime callers) - middleware/ (flagged empty in the audit but actually contains _core.py + 4 adapters — kept) - src/gradata/graphify-out/ (did not exist in this tree) Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per task spec). Co-Authored-By: Gradata <noreply@gradata.ai>
…test LOC) (#90) Deletes dead code flagged in the autoresearch leanness audit after grep-verifying that no runtime import path exists. All 2453 tests pass. Source files removed (2101 LOC): - src/gradata/contrib/enhancements/outcome_feedback.py (1 LOC, docstring stub) - src/gradata/enhancements/super_meta_rules.py (197 LOC, no importers; SuperMetaRule dataclass + SQL table live in meta_rules.py / meta_rules_storage.py and remain wired) - src/gradata/enhancements/pubsub_pipeline.py (49 LOC, test-only) - src/gradata/rules/budget.py (43 LOC, test-only) - src/gradata/rules/rw_lock.py (54 LOC, test-only) - src/gradata/cloud/wiki_store.py (451 LOC, only cloud/__init__.py re-export + test) - src/gradata/enhancements/rule_verifier.py (243 LOC, only manifest string + test reference) - src/gradata/enhancements/rule_evolution.py (434 LOC, only manifest string + test references; contradiction_detector.py covers the live path via self_improvement.py:545) - src/gradata/security/privacy_model.py (113 LOC, test + docs only; _core.py / brain.py / _export_brain.py grep-clean) - src/gradata/benchmarks/swe_bench.py (516 LOC, docstring example + test only, no CLI/docs runtime reference) Test files removed (1042 LOC): matching tests for each module plus targeted pruning of rule_evolution test classes (TestRuleConflicts, TestRuleRelationEnum, rule_evolution imports in TestIntegration) from tests/test_steals.py and the TestRuleABTesting block in tests/test_adaptations.py. Registry + docstring updates: - contrib/enhancements/install_manifest.py: drop rule_verifier from rule-integrity module components - _manifest_helpers.py: drop rule_evolution from _core_modules - enhancements/__init__.py: drop rule_verifier docstring line - cloud/__init__.py: drop WikiStore lazy re-export - enhancements/meta_rules_storage.py: docstring no longer points at the deleted super_meta_rules.py NOT DELETED (verified live via PRs #77/#81/#86): - enhancements/rule_ranker.py, self_healing.py, rule_canary.py, rule_to_hook.py (all have runtime callers) - middleware/ (flagged empty in the audit but actually contains _core.py + 4 adapters — kept) - src/gradata/graphify-out/ (did not exist in this tree) Tests: 2453 passed, 24 skipped (test_integration_full.py ignored per task spec). Co-authored-by: Gradata <noreply@gradata.ai>
Summary
PR #26 + #30 shipped hook manifest management. Most of Phase 5 was already built (graduation calls
classify_rule()+ setsmetadata.how_enforced="hooked"; PreToolUse already honored it). Three real gaps closed here:inject_brain_rules.py(SessionStart) was NOT filtering hook-enforced rules — they kept wasting context.jit_inject.py(UserPromptSubmit per-call injection) had the same gap.metadata.how_enforcedfield, only the legacy[hooked]description prefix.Follow-up commit
eba3127adds the council empirical promotion gate (below).What ships
src/gradata/enhancements/rule_to_hook.py— publicpromote()/demote()API + empirical gate helpers (count_distinct_sessions,count_human_reversals,_passes_empirical_gate) +RULE_PATCH_REVERTED/HOOK_DEMOTEDevent constants.src/gradata/enhancements/self_improvement.py— inline graduation path now runs the empirical gate before installing a hook.src/gradata/hooks/inject_brain_rules.py— SessionStart skips hook-enforced rules.src/gradata/hooks/jit_inject.py— JIT skips hook-enforced rules.src/gradata/cli.py:cmd_rule_remove— round-tripsmetadata.how_enforcedback to"injected", emitsRULE_PATCH_REVERTED+HOOK_DEMOTEDon demote.src/gradata/_types.py:Lesson.determinism_class— reserved field for future AST-class routing (unused today).Council empirical gate (commit
eba3127)Confidence >= 0.99 is necessary but no longer sufficient. Promotion now requires all of:
lesson.fire_count >= 10correction_event_ids>= 3RULE_PATCH_REVERTEDorHOOK_DEMOTEDevents tied to the rule_id in the last 30 daysFailing lessons stay as text injection with a debug log.
Determinism classification
Reused existing
DETERMINISTIC_PATTERNSregex table (8 templates: em-dash, file-size, secret-scan, destructive-block, root-file, auto-test, read-before-edit, fstring-block). Single source of truth.Deferred to a separate PR
Per council verdict, out of scope for this PR:
pip -> uv) and path-enforcement.determinism_classregistry (theLesson.determinism_classfield lands here as a typed hook but is not consumed).Tests
2422 pass (+6 gate tests). Ruff clean on the new helpers. Existing graduation tests bumped to satisfy the empirical gate (fire_count 10, 3-session proof chain, seeded
events.jsonl).Out of scope (flagged for follow-up)
middleware/_core.py:RuleSource._ScoredLessondoesn't carryhow_enforced— middleware path still injects hooked rules. Separate refactor.Commits
55f5283feat(rule-to-hook): close auto-promotion loop in injection pathseba3127fix(rule-to-hook): empirical promotion gate per council (fire_count + sessions + zero reversals)Co-Authored-By: Gradata noreply@gradata.ai