feat: S101 — v0.5.0 release prep, brain.scope API, integration tests, rule-to-hook, formula v3#23
Conversation
Co-Authored-By: Gradata <noreply@gradata.ai>
- Brain.scope() — convenience method delegating to apply_brain_rules with domain/task_type params; also adds max_rules param to apply_brain_rules - detect_cross_domain_candidates() in meta_rules — groups rules by description, returns universal candidates appearing in 3+ distinct domains - suggest_scope_narrowing() in self_healing — narrows wildcard RuleScope fields using misfire context, returns None if already specific 15 new TDD tests in tests/test_scoped_brain.py. Full suite: 1834 passed. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…orrections Closes 3 integration testing gaps: atomic DB integrity under sequential writes, graceful degradation on empty brains, and correction-of-correction persistence. Co-Authored-By: Gradata <noreply@gradata.ai>
…orcement Adds rule_to_hook.py to the enhancements layer. Graduated RULE/META-RULE lessons above 0.90 confidence are classified via regex against 14 deterministic patterns (em-dash, file size, secret scan, test trigger, destructive commands, etc.) and promoted from prompt injection to enforcement hooks. Non-deterministic rules (tone, judgment, audience) stay as prompt injection unchanged. 14 tests pass; full suite 1848 passed, 23 skipped. Co-Authored-By: Gradata <noreply@gradata.ai>
- Severity improvement weight 25→20 pts (rebalanced for new components) - Active lessons weight 8→5 pts (quantity != quality, per expert consensus) - NEW: cross-domain universality (0-5 pts, rewards universal pattern discovery) - NEW: severity trend (0-3 pts, corrections getting less severe = deeper learning) - 1,848 tests passing Based on synthesis of 750 expert posts across 3 MiroFish sims (blind discovery, taxonomy audit, pattern detection red team) + ablation baseline showing brain rules beat config 69.9% of the time. Co-Authored-By: Gradata <noreply@gradata.ai>
375x812-*.png were auto-generated by responsive testing tool and accidentally committed in PR #20. Co-Authored-By: Gradata <noreply@gradata.ai>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 Walkthrough
WalkthroughReleases v0.5.0: adds rule-to-hook graduation, cross-domain rule detection, scope-narrowing self-healing, Brain.scoped API and max_rules, manifest scoring updates, new enhancement modules, and extensive tests and docs; version strings and changelog updated. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~40 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
| import json | ||
| from collections import defaultdict |
There was a problem hiding this comment.
Stdlib imports inside function body
json and collections.defaultdict are stdlib modules with no circular-import risk. Moving them to the module-level top-of-file imports keeps the file consistent with every other module in enhancements/ and avoids the minor per-call import lookup overhead.
(Move import json and from collections import defaultdict to the module-level imports alongside hashlib, re, etc.)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/enhancements/meta_rules.py
Line: 495-496
Comment:
**Stdlib imports inside function body**
`json` and `collections.defaultdict` are stdlib modules with no circular-import risk. Moving them to the module-level top-of-file imports keeps the file consistent with every other module in `enhancements/` and avoids the minor per-call import lookup overhead.
(Move `import json` and `from collections import defaultdict` to the module-level imports alongside `hashlib`, `re`, etc.)
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 12
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@CHANGELOG.md`:
- Around line 5-21: The markdown headings "### Added", "### Fixed", and "###
Changed" are missing the required blank line after each heading (MD022); update
the CHANGELOG.md so that there is exactly one empty line immediately following
each of these subsection headings ("### Added", "### Fixed", "### Changed")
before the list items to satisfy markdownlint. Ensure you add the blank line for
each of these three headings and keep the rest of the content unchanged.
In `@docs/superpowers/plans/2026-04-10-s101-master-plan.md`:
- Around line 100-101: The doc contains a hardcoded Windows absolute path string
referencing ablation_experiment.py
("C:/Users/olive/SpritesWork/brain/scripts/ablation_experiment.py"), which
breaks portability; change those occurrences (including the other similar
entries around lines 126–127) to a repository-relative path (e.g.,
scripts/ablation_experiment.py or ./scripts/ablation_experiment.py) or a generic
placeholder (<repo_root>/scripts/ablation_experiment.py), and update any similar
hardcoded Windows-style entries so the plan uses relative/portable paths instead
of developer-specific absolutes.
In `@src/gradata/_manifest_quality.py`:
- Around line 446-449: The code currently uses severity_ratio directly when
computing the score in the manifest quality logic; clamp severity_ratio into the
[0,1] range before applying it (e.g., replace direct use of severity_ratio with
a clamped value like min(max(severity_ratio, 0.0), 1.0)) so score +=
clamped_severity_ratio * 20; leave the existing else branch that reduces
max_achievable when severity_ratio is None unchanged; update references around
the severity_ratio usage in this scoring block inside the _manifest_quality
flow.
- Around line 410-412: The new v3 fields cross_domain_rules, total_rules, and
severity_trend_improving are defaulting to 0/False and not being passed into
_compound_score(), causing underreported scores; either (A) update the call site
in _manifest_metrics.py where _compound_score(...) is invoked (around the
current 359-369 block) to pass the actual cross_domain_rules, total_rules, and
severity_trend_improving values from the manifest/metrics you compute, and
adjust max_achievable accordingly, or (B) if those real values are not yet
available, set conservative computed defaults before calling _compound_score()
(e.g., compute total_rules and cross_domain_rules from existing rule lists and
derive severity_trend_improving from recent severity history) so the call to
_compound_score(name, severity, stability_score, cross_domain_rules,
total_rules, severity_trend_improving, ...) receives proper non-zero inputs;
ensure the same fix is applied to the other occurrence of the call around lines
526-540.
In `@src/gradata/enhancements/meta_rules.py`:
- Around line 471-515: In detect_cross_domain_candidates, only consider
graduated RULE lessons before grouping: add an early filter in the loop (before
extracting domain/normalised) that skips any lesson that is not a rule and/or
not graduated (e.g. continue unless lesson.type == "RULE" and lesson.state ==
"GRADUATED" — or the project’s equivalent attributes, e.g. lesson.kind/status),
so groups (the dict keyed by normalised) only accumulates (domain, confidence)
for graduated RULE lessons.
In `@src/gradata/enhancements/rule_to_hook.py`:
- Around line 60-72: Validate confidence inputs for both classify_rule and
find_hook_candidates: ensure the confidence argument is within [0.0, 1.0] before
any promotion logic (e.g., before returning a HookCandidate or comparing against
min_confidence). If the value is outside bounds, either clamp it to 0.0/1.0 or
(preferred) raise a ValueError with a clear message so callers cannot promote
invalid confidences; update references in classify_rule, find_hook_candidates,
and any uses of HookCandidate or min_confidence/DETERMINISTIC_PATTERNS to
perform this validation early.
- Around line 98-100: The loop in find_hook_candidates (in rule_to_hook.py) is
reading lesson.get("status", "") instead of the canonical lesson key "state",
causing RULE/META-RULE lessons to be skipped; update that check to use
lesson.get("state", "").upper() and keep the same membership test against
("RULE", "META-RULE", "META_RULE") so the function recognizes SDK lesson
payloads without an adapter step.
In `@src/gradata/enhancements/self_healing.py`:
- Around line 372-385: The loop currently treats any context-provided value as a
narrowing even when that value equals the field default (e.g., stakes="normal"),
causing false-positive narrowings; update the loop in the self-healing logic to
skip applying/note a narrowing when the context value equals the default by
adding a guard (e.g., if context_val == default_val: continue) before the branch
that sets narrowed[field_name] and did_narrow, so did_narrow is only set when
the context provides a non-default specific value that actually tightens a
wildcard (variables: defaults, misfire_context, context_val, default_val,
current, narrowed, did_narrow).
- Around line 337-364: Move the RuleScope import out of the function and into
the module's TYPE_CHECKING block so the type is known to static checkers, then
update suggest_scope_narrowing's annotations to use unquoted names (RuleScope
and RuleScope | None) and remove the in-function "from gradata._scope import
RuleScope" import; ensure the module imports typing.TYPE_CHECKING (and add "from
__future__ import annotations" at top if the codebase requires postponed
evaluation) so at runtime the import is skipped but type checkers see RuleScope.
In `@tests/test_failure_modes.py`:
- Around line 28-31: The test test_search_with_no_index is asserting that
brain.search(...) returns either a list or dict, but per brain.search its return
type is list[dict] and all paths return a list; update the assertion in
test_search_with_no_index to assert that results is a list (e.g., assert
isinstance(results, list)) and optionally tighten further by asserting the
element type (e.g., assert all(isinstance(r, dict) for r in results)) so the
test matches the behavior of brain.search.
In `@tests/test_rule_to_hook.py`:
- Around line 11-48: Consolidate the repetitive methods in TestClassifyRule by
replacing the nine nearly identical test_... methods with a single parametrized
test that calls classify_rule for each input; use pytest.mark.parametrize to
supply tuples of (input_text, expected_determinism, expected_enforcement) and
assert result.determinism == DeterminismCheck... and, when provided,
result.enforcement == EnforcementType...; keep the class name TestClassifyRule
and reference classify_rule, DeterminismCheck, and EnforcementType so each case
(e.g., "Never use em dashes in prose" → DeterminismCheck.REGEX_PATTERN,
EnforcementType.HOOK) is covered by the parameter list.
In `@tests/test_scoped_brain.py`:
- Around line 159-163: Replace the fragile source-inspection assertion in
tests/test_scoped_brain.py with a behavioral check: call
self_healing.suggest_scope_narrowing and verify it uses RuleScope by importing
RuleScope from gradata._scope and asserting the returned value or constructed
scope is an instance of RuleScope (or that the suggestion output references a
RuleScope object), rather than searching for an import string in the function
source; update the test function
test_suggest_scope_narrowing_imports_rulescope_from_gradata_scope to perform
this runtime assertion against suggest_scope_narrowing.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b5cce382-8167-4663-b13f-dec64eabcd81
⛔ Files ignored due to path filters (3)
375x812-desktop.pngis excluded by!**/*.png375x812-mobile.pngis excluded by!**/*.png375x812-tablet.pngis excluded by!**/*.png
📒 Files selected for processing (16)
CHANGELOG.mddocs/superpowers/plans/2026-04-10-s101-master-plan.mddocs/superpowers/specs/2026-04-10-s101-session-plan.mdpyproject.tomlsrc/gradata/__init__.pysrc/gradata/_manifest_quality.pysrc/gradata/brain.pysrc/gradata/enhancements/__init__.pysrc/gradata/enhancements/meta_rules.pysrc/gradata/enhancements/rule_to_hook.pysrc/gradata/enhancements/self_healing.pytests/test_atomic_writes.pytests/test_cascading_corrections.pytests/test_failure_modes.pytests/test_rule_to_hook.pytests/test_scoped_brain.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/__init__.pysrc/gradata/enhancements/__init__.pysrc/gradata/enhancements/self_healing.pysrc/gradata/enhancements/meta_rules.pysrc/gradata/brain.pysrc/gradata/_manifest_quality.pysrc/gradata/enhancements/rule_to_hook.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_atomic_writes.pytests/test_rule_to_hook.pytests/test_cascading_corrections.pytests/test_failure_modes.pytests/test_scoped_brain.py
🪛 GitHub Actions: CI
src/gradata/enhancements/self_healing.py
[error] 338-338: pyright error: "RuleScope" is not defined (reportUndefinedVariable)
[error] 340-340: pyright error: "RuleScope" is not defined (reportUndefinedVariable)
src/gradata/brain.py
[warning] 896-896: pyright warning: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)
🪛 GitHub Actions: SDK CI
src/gradata/enhancements/self_healing.py
[error] 338-338: ruff F821: Undefined name RuleScope in type annotation rule_scope: "RuleScope".
[error] 340-340: ruff F821: Undefined name RuleScope in return annotation ) -> "RuleScope | None".
[error] 338-338: ruff UP037: Remove quotes from type annotation rule_scope: "RuleScope".
[error] 340-340: ruff UP037: Remove quotes from type annotation ) -> "RuleScope | None".
[error] 362-363: ruff I001: Import block is un-sorted or un-formatted; imports starting at from dataclasses import asdict and from gradata._scope import RuleScope need organizing.
🪛 LanguageTool
docs/superpowers/specs/2026-04-10-s101-session-plan.md
[uncategorized] ~73-~73: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...tern detection and matching #### BG-1: Open Source Scenario Research - Research agent (web...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
[uncategorized] ~75-~75: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...lerates adoption, nobody cares, delayed open source - Precedents: Redis, Elastic, MongoDB, ...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
docs/superpowers/plans/2026-04-10-s101-master-plan.md
[style] ~7-~7: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...el. Wave 2 depends on MiroFish results. Wave 3 depends on Wave 2. Tech Stack: P...
(ENGLISH_WORD_REPEAT_BEGINNING_RULE)
[uncategorized] ~141-~141: If this is a compound adjective that modifies the following noun, use a hyphen.
Context: ...~15 min total) --- ### Task 7: BG-1 — Open Source Scenario Research Output: `docs/su...
(EN_COMPOUND_ADJECTIVE_INTERNAL)
🪛 markdownlint-cli2 (0.22.0)
CHANGELOG.md
[warning] 5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🔇 Additional comments (10)
tests/test_rule_to_hook.py (1)
51-85: LGTM!
TestFindHookCandidateshas good coverage: confidence filtering, status filtering, determinism filtering, empty input, and meta-rule inclusion. Assertions check specific values rather than just truthiness.tests/test_atomic_writes.py (1)
5-26: LGTM!Integration tests appropriately validate:
- Rapid sequential writes don't corrupt the database
- Single corrections are persisted
- Lessons file is created and not truncated
The use of
>= Nassertions is acceptable for integration tests verifying minimum persistence guarantees.tests/test_cascading_corrections.py (1)
5-18: LGTM!Tests appropriately validate cascading correction scenarios:
- Chained corrections (using previous
finalas newdraft)- Contradictory corrections that reverse draft/final relationships
Both scenarios correctly verify that all correction events are persisted.
src/gradata/brain.py (2)
474-503: LGTM!The
max_rulesparameter addition is clean:
- Defaults to 10 (consistent with
apply_rules()in rule_engine.py)- Properly forwarded to the underlying
apply_rules()call- No breaking change to existing callers
505-533: LGTM!The new
scope()method is well-designed:
- Clear convenience wrapper around
apply_brain_rules()- Properly builds context dict from named parameters
- Task string fallback to
"general"when both domain and task_type are empty- Handles empty
agent_typeby converting toNone- Good docstring with Args/Returns documentation
tests/test_failure_modes.py (1)
5-26: LGTM!Graceful degradation tests properly verify that core API methods return expected types under empty/minimal state:
apply_brain_rules()→strprove()→dictmanifest()→dictcorrect()with empty draft records event- DB initialization creates
system.dbdocs/superpowers/plans/2026-04-10-s101-master-plan.md (1)
1-175: LGTM overall — well-structured execution plan.The plan clearly defines:
- Wave dependencies (Wave 1 parallel → Wave 2 sequential → Wave 3 sequential)
- Prerequisites (API keys)
- Verification steps with expected outcomes
- Budget estimates
docs/superpowers/specs/2026-04-10-s101-session-plan.md (1)
1-133: LGTM!Well-structured session plan with:
- Clear workstream definitions
- Budget breakdown per task
- Success criteria for ablation (>70% vs bare LLM, >55% vs config)
- ASCII dependency diagram for wave scheduling
- Prerequisites listed
tests/test_scoped_brain.py (2)
92-103: LGTM — good use ofpytest.approxfor floating-point comparison.The test correctly uses
pytest.approx(expected, abs=1e-4)for comparingavg_confidence, following the coding guidelines for floating-point assertions.
9-34: LGTM!Comprehensive test coverage for all three features:
brain.scope(): Tests various parameter combinationsdetect_cross_domain_candidates(): Tests threshold behavior, domain deduplication, confidence averagingsuggest_scope_narrowing(): Tests wildcard narrowing, specific scope handling, empty context, partial narrowingAlso applies to: 51-115, 120-157
| ### Added | ||
| - Self-healing engine: rule failure detection + auto-patching (PR #21) | ||
| - Cloud backend: Supabase schema, FastAPI sync endpoint, Railway deploy config (PR #22) | ||
| - Wiki-aware rule injection: semantic boost from qmd wiki pages | ||
| - Notification system: `brain.on_notification()` API with 5 event formatters | ||
| - Supabase wiki store: pgvector semantic search for cloud rule injection | ||
| - 19 Python hooks + installer + profile system (PR #20) | ||
|
|
||
| ### Fixed | ||
| - 210 ruff errors across 106 files | ||
| - Bandit false positives suppressed with explanations | ||
| - Flaky graduation test stabilized for Python 3.12 | ||
| - CodeRabbit review findings across PRs #20, #21, #22 | ||
|
|
||
| ### Changed | ||
| - CI: added ruff>=0.4 to dev dependencies | ||
| - CI: fixed sdk-ci.yml paths (removed stale working-directory) |
There was a problem hiding this comment.
Add blank lines after the subsection headings.
markdownlint is right here: ### Added, ### Fixed, and ### Changed each need an empty line below them, or this section will keep failing MD022.
💡 Suggested fix
### Added
+
- Self-healing engine: rule failure detection + auto-patching (PR `#21`)
@@
### Fixed
+
- 210 ruff errors across 106 files
@@
### Changed
+
- CI: added ruff>=0.4 to dev dependencies📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Added | |
| - Self-healing engine: rule failure detection + auto-patching (PR #21) | |
| - Cloud backend: Supabase schema, FastAPI sync endpoint, Railway deploy config (PR #22) | |
| - Wiki-aware rule injection: semantic boost from qmd wiki pages | |
| - Notification system: `brain.on_notification()` API with 5 event formatters | |
| - Supabase wiki store: pgvector semantic search for cloud rule injection | |
| - 19 Python hooks + installer + profile system (PR #20) | |
| ### Fixed | |
| - 210 ruff errors across 106 files | |
| - Bandit false positives suppressed with explanations | |
| - Flaky graduation test stabilized for Python 3.12 | |
| - CodeRabbit review findings across PRs #20, #21, #22 | |
| ### Changed | |
| - CI: added ruff>=0.4 to dev dependencies | |
| - CI: fixed sdk-ci.yml paths (removed stale working-directory) | |
| ### Added | |
| - Self-healing engine: rule failure detection + auto-patching (PR `#21`) | |
| - Cloud backend: Supabase schema, FastAPI sync endpoint, Railway deploy config (PR `#22`) | |
| - Wiki-aware rule injection: semantic boost from qmd wiki pages | |
| - Notification system: `brain.on_notification()` API with 5 event formatters | |
| - Supabase wiki store: pgvector semantic search for cloud rule injection | |
| - 19 Python hooks + installer + profile system (PR `#20`) | |
| ### Fixed | |
| - 210 ruff errors across 106 files | |
| - Bandit false positives suppressed with explanations | |
| - Flaky graduation test stabilized for Python 3.12 | |
| - CodeRabbit review findings across PRs `#20`, `#21`, `#22` | |
| ### Changed | |
| - CI: added ruff>=0.4 to dev dependencies | |
| - CI: fixed sdk-ci.yml paths (removed stale working-directory) |
🧰 Tools
🪛 markdownlint-cli2 (0.22.0)
[warning] 5-5: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 13-13: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
[warning] 19-19: Headings should be surrounded by blank lines
Expected: 1; Actual: 0; Below
(MD022, blanks-around-headings)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@CHANGELOG.md` around lines 5 - 21, The markdown headings "### Added", "###
Fixed", and "### Changed" are missing the required blank line after each heading
(MD022); update the CHANGELOG.md so that there is exactly one empty line
immediately following each of these subsection headings ("### Added", "###
Fixed", "### Changed") before the list items to satisfy markdownlint. Ensure you
add the blank line for each of these three headings and keep the rest of the
content unchanged.
| - Create: `C:/Users/olive/SpritesWork/brain/scripts/ablation_experiment.py` | ||
|
|
There was a problem hiding this comment.
Hardcoded Windows paths reduce portability.
The paths C:/Users/olive/SpritesWork/brain/scripts/ are developer-specific and will fail for other team members or CI environments. Consider using relative paths from the repository root.
📝 Suggested fix
**Files:**
-- Create: `C:/Users/olive/SpritesWork/brain/scripts/ablation_experiment.py`
+- Create: `scripts/ablation_experiment.py` **Files:**
-- Create: `C:/Users/olive/SpritesWork/brain/scripts/sim_seeds_s101.py`
-- Modify: `C:/Users/olive/SpritesWork/brain/scripts/mirofish_sim_v2.py` (add import)
+- Create: `scripts/sim_seeds_s101.py`
+- Modify: `scripts/mirofish_sim_v2.py` (add import)Also applies to: 126-127
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/superpowers/plans/2026-04-10-s101-master-plan.md` around lines 100 -
101, The doc contains a hardcoded Windows absolute path string referencing
ablation_experiment.py
("C:/Users/olive/SpritesWork/brain/scripts/ablation_experiment.py"), which
breaks portability; change those occurrences (including the other similar
entries around lines 126–127) to a repository-relative path (e.g.,
scripts/ablation_experiment.py or ./scripts/ablation_experiment.py) or a generic
placeholder (<repo_root>/scripts/ablation_experiment.py), and update any similar
hardcoded Windows-style entries so the plan uses relative/portable paths instead
of developer-specific absolutes.
| cross_domain_rules: int = 0, | ||
| total_rules: int = 0, | ||
| severity_trend_improving: bool = False, |
There was a problem hiding this comment.
The new v3 components aren’t wired into current callers yet.
cross_domain_rules, total_rules, and severity_trend_improving default to 0/0/False, and the existing _compound_score() call in src/gradata/_manifest_metrics.py:359-369 still omits all three. That means current manifests lose the new 5+3 points by default, with no max_achievable adjustment, so scores are systematically underreported until the metrics are plumbed through.
💡 Two safe ways to fix this
- cross_domain_rules: int = 0,
- total_rules: int = 0,
- severity_trend_improving: bool = False,
+ cross_domain_rules: int | None = None,
+ total_rules: int | None = None,
+ severity_trend_improving: bool | None = None,
@@
- if total_rules > 0 and cross_domain_rules > 0:
+ if total_rules is None or cross_domain_rules is None:
+ max_achievable -= 5
+ elif total_rules > 0 and cross_domain_rules > 0:
universality = min(1.0, cross_domain_rules / max(3, total_rules * 0.2))
score += universality * 5
@@
- if severity_trend_improving:
+ if severity_trend_improving is None:
+ max_achievable -= 3
+ elif severity_trend_improving:
score += 3.0Or: update the _manifest_metrics.py call site to pass real values for all three new inputs immediately.
Also applies to: 526-540
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_manifest_quality.py` around lines 410 - 412, The new v3 fields
cross_domain_rules, total_rules, and severity_trend_improving are defaulting to
0/False and not being passed into _compound_score(), causing underreported
scores; either (A) update the call site in _manifest_metrics.py where
_compound_score(...) is invoked (around the current 359-369 block) to pass the
actual cross_domain_rules, total_rules, and severity_trend_improving values from
the manifest/metrics you compute, and adjust max_achievable accordingly, or (B)
if those real values are not yet available, set conservative computed defaults
before calling _compound_score() (e.g., compute total_rules and
cross_domain_rules from existing rule lists and derive severity_trend_improving
from recent severity history) so the call to _compound_score(name, severity,
stability_score, cross_domain_rules, total_rules, severity_trend_improving, ...)
receives proper non-zero inputs; ensure the same fix is applied to the other
occurrence of the call around lines 526-540.
| if severity_ratio is not None: | ||
| score += severity_ratio * 25 | ||
| score += severity_ratio * 20 | ||
| else: | ||
| max_achievable -= 25 | ||
| max_achievable -= 20 |
There was a problem hiding this comment.
Clamp severity_ratio before scoring it.
This component now trusts the caller entirely, but the SDK guideline requires severity scoring inputs to stay in [0,1]. An out-of-range value will overweight or underweight the new v3 score.
💡 Suggested fix
if severity_ratio is not None:
- score += severity_ratio * 20
+ score += max(0.0, min(1.0, severity_ratio)) * 20
else:
max_achievable -= 20As per coding guidelines, "Severity scoring must clamp to [0,1]."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_manifest_quality.py` around lines 446 - 449, The code currently
uses severity_ratio directly when computing the score in the manifest quality
logic; clamp severity_ratio into the [0,1] range before applying it (e.g.,
replace direct use of severity_ratio with a clamped value like
min(max(severity_ratio, 0.0), 1.0)) so score += clamped_severity_ratio * 20;
leave the existing else branch that reduces max_achievable when severity_ratio
is None unchanged; update references around the severity_ratio usage in this
scoring block inside the _manifest_quality flow.
| def detect_cross_domain_candidates( | ||
| lessons: list, | ||
| min_domains: int = 3, | ||
| ) -> list[dict]: | ||
| """Find rules that appear in 3+ distinct domains — universal candidates. | ||
|
|
||
| Groups graduated rules by their normalised description. Any description | ||
| that appears (across distinct domains) in at least *min_domains* different | ||
| domains is returned as a cross-domain candidate. | ||
|
|
||
| Args: | ||
| lessons: Iterable of :class:`~gradata._types.Lesson` objects. Only | ||
| lessons with a non-empty ``domain`` field in their ``scope_json`` | ||
| are considered. | ||
| min_domains: Minimum distinct domain count to qualify (default 3). | ||
|
|
||
| Returns: | ||
| List of dicts, each with keys: | ||
| - ``"description"`` (str): The normalised rule description. | ||
| - ``"domains"`` (list[str]): Distinct domains where the rule appears. | ||
| - ``"avg_confidence"`` (float): Mean confidence across all matching | ||
| lessons. | ||
| - ``"count"`` (int): Total number of matching lessons. | ||
| """ | ||
| import json | ||
| from collections import defaultdict | ||
|
|
||
| # Map normalised_description -> list of (domain, confidence) pairs | ||
| groups: dict[str, list[tuple[str, float]]] = defaultdict(list) | ||
|
|
||
| for lesson in lessons: | ||
| # Extract domain from scope_json | ||
| domain = "" | ||
| if lesson.scope_json: | ||
| try: | ||
| scope_data = json.loads(lesson.scope_json) | ||
| domain = scope_data.get("domain", "") or "" | ||
| except (json.JSONDecodeError, TypeError): | ||
| domain = "" | ||
|
|
||
| if not domain: | ||
| continue # Skip lessons without a domain | ||
|
|
||
| normalised = lesson.description.strip() | ||
| groups[normalised].append((domain, lesson.confidence)) |
There was a problem hiding this comment.
Filter this to graduated RULE lessons before grouping.
Right now any lesson with a matching description and domain is counted. That means repeated INSTINCT/PATTERN entries can be promoted as “cross-domain rules”, which will overstate universality and distort the new compound-score component.
💡 Suggested fix
def detect_cross_domain_candidates(
- lessons: list,
+ lessons: list[Lesson],
min_domains: int = 3,
) -> list[dict]:
@@
import json
from collections import defaultdict
+ from gradata._types import LessonState
@@
for lesson in lessons:
+ if lesson.state != LessonState.RULE:
+ continue
# Extract domain from scope_json
domain = ""🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/meta_rules.py` around lines 471 - 515, In
detect_cross_domain_candidates, only consider graduated RULE lessons before
grouping: add an early filter in the loop (before extracting domain/normalised)
that skips any lesson that is not a rule and/or not graduated (e.g. continue
unless lesson.type == "RULE" and lesson.state == "GRADUATED" — or the project’s
equivalent attributes, e.g. lesson.kind/status), so groups (the dict keyed by
normalised) only accumulates (domain, confidence) for graduated RULE lessons.
| def suggest_scope_narrowing( | ||
| rule_scope: "RuleScope", | ||
| misfire_context: dict, | ||
| ) -> "RuleScope | None": | ||
| """Suggest a narrowed RuleScope based on a misfire context. | ||
|
|
||
| Inspects each field of *rule_scope*. If a field is a wildcard (holds its | ||
| default value) **and** the *misfire_context* contains a specific value for | ||
| that field, the returned scope sets that field to the context value — | ||
| narrowing the scope to the observed failure context. | ||
|
|
||
| If every field already matches the context (no narrowing possible), or the | ||
| context provides no scope-relevant keys, returns ``None``. | ||
|
|
||
| Args: | ||
| rule_scope: The current :class:`~gradata._scope.RuleScope` attached to | ||
| the rule. | ||
| misfire_context: Dict describing where the rule fired incorrectly. | ||
| Supported keys: ``domain``, ``task_type``, ``audience``, | ||
| ``channel``, ``stakes``, ``agent_type``, ``namespace``. | ||
|
|
||
| Returns: | ||
| A new narrowed :class:`~gradata._scope.RuleScope`, or ``None`` if the | ||
| scope already matches the context or no narrowing is applicable. | ||
| """ | ||
| from dataclasses import asdict | ||
| from gradata._scope import RuleScope | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, check if the file exists and get its structure
head -20 src/gradata/enhancements/self_healing.py
# Get line count to understand file size
wc -l src/gradata/enhancements/self_healing.py
# Now read the section around lines 337-364 with context
sed -n '1,50p' src/gradata/enhancements/self_healing.pyRepository: Gradata/gradata
Length of output: 2707
🏁 Script executed:
# Check imports at the top of the file
sed -n '1,100p' src/gradata/enhancements/self_healing.py | cat -nRepository: Gradata/gradata
Length of output: 4081
🏁 Script executed:
# Read lines 337-380 to see the full function definition
sed -n '337,380p' src/gradata/enhancements/self_healing.py | cat -nRepository: Gradata/gradata
Length of output: 2164
Import RuleScope under TYPE_CHECKING to resolve type checker errors.
The function signature references RuleScope with quoted annotations ("RuleScope" and "RuleScope | None"), but RuleScope is not available at module scope—it's only imported inside the function body (line 363). This causes Ruff (F821) and Pyright to report undefined-name errors. Move the import to the existing TYPE_CHECKING block and use unquoted annotations:
Fix
if TYPE_CHECKING:
+ from gradata._scope import RuleScope
from gradata._types import LessonThen update the function signature:
def suggest_scope_narrowing(
- rule_scope: "RuleScope",
+ rule_scope: RuleScope,
misfire_context: dict,
-) -> "RuleScope | None":
+) -> RuleScope | None:🧰 Tools
🪛 GitHub Actions: CI
[error] 338-338: pyright error: "RuleScope" is not defined (reportUndefinedVariable)
[error] 340-340: pyright error: "RuleScope" is not defined (reportUndefinedVariable)
🪛 GitHub Actions: SDK CI
[error] 338-338: ruff F821: Undefined name RuleScope in type annotation rule_scope: "RuleScope".
[error] 340-340: ruff F821: Undefined name RuleScope in return annotation ) -> "RuleScope | None".
[error] 338-338: ruff UP037: Remove quotes from type annotation rule_scope: "RuleScope".
[error] 340-340: ruff UP037: Remove quotes from type annotation ) -> "RuleScope | None".
[error] 362-363: ruff I001: Import block is un-sorted or un-formatted; imports starting at from dataclasses import asdict and from gradata._scope import RuleScope need organizing.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/self_healing.py` around lines 337 - 364, Move the
RuleScope import out of the function and into the module's TYPE_CHECKING block
so the type is known to static checkers, then update suggest_scope_narrowing's
annotations to use unquoted names (RuleScope and RuleScope | None) and remove
the in-function "from gradata._scope import RuleScope" import; ensure the module
imports typing.TYPE_CHECKING (and add "from __future__ import annotations" at
top if the codebase requires postponed evaluation) so at runtime the import is
skipped but type checkers see RuleScope.
| for field_name, default_val in defaults.items(): | ||
| context_val = misfire_context.get(field_name, "") | ||
| if not context_val: | ||
| continue # Context provides no signal for this field | ||
|
|
||
| rule_val = current[field_name] | ||
| if rule_val == default_val: | ||
| # Field is wildcard in rule but context has a specific value → narrow | ||
| narrowed[field_name] = context_val | ||
| did_narrow = True | ||
| # If rule already has a specific value equal to the context, no change needed | ||
|
|
||
| if not did_narrow: | ||
| return None # Scope already fully constrained or context gives no new info |
There was a problem hiding this comment.
Don’t count default-valued context as a narrowing.
stakes defaults to "normal", so a misfire context like {"stakes": "normal"} sets did_narrow = True even though the returned scope is identical to the input. That violates the function contract and will emit false-positive narrowing suggestions.
💡 Suggested fix
rule_val = current[field_name]
- if rule_val == default_val:
+ if rule_val == default_val and context_val != default_val:
# Field is wildcard in rule but context has a specific value → narrow
narrowed[field_name] = context_val
did_narrow = True📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for field_name, default_val in defaults.items(): | |
| context_val = misfire_context.get(field_name, "") | |
| if not context_val: | |
| continue # Context provides no signal for this field | |
| rule_val = current[field_name] | |
| if rule_val == default_val: | |
| # Field is wildcard in rule but context has a specific value → narrow | |
| narrowed[field_name] = context_val | |
| did_narrow = True | |
| # If rule already has a specific value equal to the context, no change needed | |
| if not did_narrow: | |
| return None # Scope already fully constrained or context gives no new info | |
| for field_name, default_val in defaults.items(): | |
| context_val = misfire_context.get(field_name, "") | |
| if not context_val: | |
| continue # Context provides no signal for this field | |
| rule_val = current[field_name] | |
| if rule_val == default_val and context_val != default_val: | |
| # Field is wildcard in rule but context has a specific value → narrow | |
| narrowed[field_name] = context_val | |
| did_narrow = True | |
| # If rule already has a specific value equal to the context, no change needed | |
| if not did_narrow: | |
| return None # Scope already fully constrained or context gives no new info |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/self_healing.py` around lines 372 - 385, The loop
currently treats any context-provided value as a narrowing even when that value
equals the field default (e.g., stakes="normal"), causing false-positive
narrowings; update the loop in the self-healing logic to skip applying/note a
narrowing when the context value equals the default by adding a guard (e.g., if
context_val == default_val: continue) before the branch that sets
narrowed[field_name] and did_narrow, so did_narrow is only set when the context
provides a non-default specific value that actually tightens a wildcard
(variables: defaults, misfire_context, context_val, default_val, current,
narrowed, did_narrow).
| def test_search_with_no_index(self, tmp_path): | ||
| brain = init_brain(tmp_path) | ||
| results = brain.search("anything") | ||
| assert isinstance(results, (list, dict)) |
There was a problem hiding this comment.
Assertion is overly permissive — brain.search() always returns list.
Per src/gradata/brain.py:927-938, search() has return type list[dict] and all code paths return a list. The dict alternative in the assertion is never reached.
🔧 Proposed fix
def test_search_with_no_index(self, tmp_path):
brain = init_brain(tmp_path)
results = brain.search("anything")
- assert isinstance(results, (list, dict))
+ assert isinstance(results, list)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_search_with_no_index(self, tmp_path): | |
| brain = init_brain(tmp_path) | |
| results = brain.search("anything") | |
| assert isinstance(results, (list, dict)) | |
| def test_search_with_no_index(self, tmp_path): | |
| brain = init_brain(tmp_path) | |
| results = brain.search("anything") | |
| assert isinstance(results, list) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_failure_modes.py` around lines 28 - 31, The test
test_search_with_no_index is asserting that brain.search(...) returns either a
list or dict, but per brain.search its return type is list[dict] and all paths
return a list; update the assertion in test_search_with_no_index to assert that
results is a list (e.g., assert isinstance(results, list)) and optionally
tighten further by asserting the element type (e.g., assert all(isinstance(r,
dict) for r in results)) so the test matches the behavior of brain.search.
| class TestClassifyRule: | ||
| def test_em_dash_rule_is_deterministic(self): | ||
| result = classify_rule("Never use em dashes in prose", 0.95) | ||
| assert result.determinism == DeterminismCheck.REGEX_PATTERN | ||
| assert result.enforcement == EnforcementType.HOOK | ||
|
|
||
| def test_file_size_rule_is_deterministic(self): | ||
| result = classify_rule("Keep files under 500 lines", 0.92) | ||
| assert result.determinism == DeterminismCheck.FILE_CHECK | ||
|
|
||
| def test_secret_rule_is_deterministic(self): | ||
| result = classify_rule("Never commit secrets or API keys", 0.98) | ||
| assert result.determinism == DeterminismCheck.COMMAND_BLOCK | ||
|
|
||
| def test_test_rule_is_deterministic(self): | ||
| result = classify_rule("Run tests after code changes", 0.91) | ||
| assert result.determinism == DeterminismCheck.TEST_TRIGGER | ||
|
|
||
| def test_read_before_edit_is_deterministic(self): | ||
| result = classify_rule("Always read a file before editing it", 0.93) | ||
| assert result.determinism == DeterminismCheck.FILE_CHECK | ||
|
|
||
| def test_destructive_command_is_deterministic(self): | ||
| result = classify_rule("Never force push to main", 0.96) | ||
| assert result.determinism == DeterminismCheck.COMMAND_BLOCK | ||
|
|
||
| def test_tone_rule_is_not_deterministic(self): | ||
| result = classify_rule("Be concise and direct", 0.91) | ||
| assert result.determinism == DeterminismCheck.NOT_DETERMINISTIC | ||
| assert result.enforcement == EnforcementType.PROMPT_INJECTION | ||
|
|
||
| def test_judgment_rule_is_not_deterministic(self): | ||
| result = classify_rule("Lead with the answer, not the reasoning", 0.90) | ||
| assert result.determinism == DeterminismCheck.NOT_DETERMINISTIC | ||
|
|
||
| def test_audience_rule_is_not_deterministic(self): | ||
| result = classify_rule("Match formality to the audience", 0.92) | ||
| assert result.determinism == DeterminismCheck.NOT_DETERMINISTIC |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider parametrizing TestClassifyRule tests for better maintainability.
The 9 test methods follow a repetitive pattern that could be consolidated using @pytest.mark.parametrize. This would reduce code duplication and make it easier to add new test cases.
♻️ Proposed parametrized refactor
+import pytest
+from gradata.enhancements.rule_to_hook import (
+ DeterminismCheck,
+ EnforcementType,
+ classify_rule,
+ find_hook_candidates,
+)
+
+
class TestClassifyRule:
- def test_em_dash_rule_is_deterministic(self):
- result = classify_rule("Never use em dashes in prose", 0.95)
- assert result.determinism == DeterminismCheck.REGEX_PATTERN
- assert result.enforcement == EnforcementType.HOOK
-
- def test_file_size_rule_is_deterministic(self):
- result = classify_rule("Keep files under 500 lines", 0.92)
- assert result.determinism == DeterminismCheck.FILE_CHECK
-
- # ... remaining similar tests ...
+ `@pytest.mark.parametrize`("description,confidence,expected_determinism,expected_enforcement", [
+ ("Never use em dashes in prose", 0.95, DeterminismCheck.REGEX_PATTERN, EnforcementType.HOOK),
+ ("Keep files under 500 lines", 0.92, DeterminismCheck.FILE_CHECK, EnforcementType.HOOK),
+ ("Never commit secrets or API keys", 0.98, DeterminismCheck.COMMAND_BLOCK, EnforcementType.HOOK),
+ ("Run tests after code changes", 0.91, DeterminismCheck.TEST_TRIGGER, EnforcementType.HOOK),
+ ("Always read a file before editing it", 0.93, DeterminismCheck.FILE_CHECK, EnforcementType.HOOK),
+ ("Never force push to main", 0.96, DeterminismCheck.COMMAND_BLOCK, EnforcementType.HOOK),
+ ("Be concise and direct", 0.91, DeterminismCheck.NOT_DETERMINISTIC, EnforcementType.PROMPT_INJECTION),
+ ("Lead with the answer, not the reasoning", 0.90, DeterminismCheck.NOT_DETERMINISTIC, EnforcementType.PROMPT_INJECTION),
+ ("Match formality to the audience", 0.92, DeterminismCheck.NOT_DETERMINISTIC, EnforcementType.PROMPT_INJECTION),
+ ])
+ def test_classify_rule_determinism(self, description, confidence, expected_determinism, expected_enforcement):
+ result = classify_rule(description, confidence)
+ assert result.determinism == expected_determinism
+ assert result.enforcement == expected_enforcementAs per coding guidelines: "parametrized tests preferred for boundary conditions."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_rule_to_hook.py` around lines 11 - 48, Consolidate the repetitive
methods in TestClassifyRule by replacing the nine nearly identical test_...
methods with a single parametrized test that calls classify_rule for each input;
use pytest.mark.parametrize to supply tuples of (input_text,
expected_determinism, expected_enforcement) and assert result.determinism ==
DeterminismCheck... and, when provided, result.enforcement ==
EnforcementType...; keep the class name TestClassifyRule and reference
classify_rule, DeterminismCheck, and EnforcementType so each case (e.g., "Never
use em dashes in prose" → DeterminismCheck.REGEX_PATTERN, EnforcementType.HOOK)
is covered by the parameter list.
| def test_suggest_scope_narrowing_imports_rulescope_from_gradata_scope(): | ||
| import inspect | ||
| from gradata.enhancements import self_healing | ||
| source = inspect.getsource(self_healing.suggest_scope_narrowing) | ||
| assert "from gradata._scope import RuleScope" in source |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Source inspection test is fragile — consider alternative validation.
Inspecting source code for import statements is brittle; refactoring the import location would break this test without affecting functionality. Consider testing the actual behavior instead.
💡 Alternative approach
-def test_suggest_scope_narrowing_imports_rulescope_from_gradata_scope():
- import inspect
- from gradata.enhancements import self_healing
- source = inspect.getsource(self_healing.suggest_scope_narrowing)
- assert "from gradata._scope import RuleScope" in source
+def test_suggest_scope_narrowing_returns_rulescope_type():
+ """Verify returned object is actually a RuleScope from gradata._scope."""
+ from gradata._scope import RuleScope
+ from gradata.enhancements.self_healing import suggest_scope_narrowing
+ rule_scope = RuleScope(domain="", task_type="")
+ result = suggest_scope_narrowing(rule_scope, {"domain": "sales"})
+ assert result is not None
+ assert type(result).__module__ == "gradata._scope"
+ assert type(result).__name__ == "RuleScope"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_scoped_brain.py` around lines 159 - 163, Replace the fragile
source-inspection assertion in tests/test_scoped_brain.py with a behavioral
check: call self_healing.suggest_scope_narrowing and verify it uses RuleScope by
importing RuleScope from gradata._scope and asserting the returned value or
constructed scope is an instance of RuleScope (or that the suggestion output
references a RuleScope object), rather than searching for an import string in
the function source; update the test function
test_suggest_scope_narrowing_imports_rulescope_from_gradata_scope to perform
this runtime assertion against suggest_scope_narrowing.
Co-Authored-By: Gradata <noreply@gradata.ai>
- CHANGELOG: add blank lines after subsection headings (MD022) - _manifest_quality: clamp severity_ratio to [0,1], use None defaults for new v3 components with max_achievable adjustment - meta_rules: move json + defaultdict imports to module level (Greptile) - rule_to_hook: validate confidence bounds in [0,1] (CodeRabbit) - self_healing: don't count default stakes="normal" as scope narrowing - test_failure_modes: search() returns list, not dict Co-Authored-By: Gradata <noreply@gradata.ai>
Summary
1,858 tests passing, 23 skipped.
Test plan
Generated with Gradata
Greptile Summary
This PR delivers the S101 milestone: v0.5.0 release prep, including
brain.scope()for scoped rule injection into sub-agents,detect_cross_domain_candidates()andsuggest_scope_narrowing()for meta-rule evolution, therule_to_hookgraduation pipeline, a rebalanced v3 compound score formula informed by MiroFish expert-panel sims, and three suites of integration tests (atomic writes, failure modes, cascading corrections).The implementation is well-structured throughout —
from __future__ import annotationsis consistently applied, TYPE_CHECKING patterns correctly gate runtime imports, and the self-healing and scope-narrowing logic is clearly documented and tested.Key changes:
brain.scope(domain, task_type, agent_type)— thin wrapper overapply_brain_rulesthat builds a context dict from named parameters; 14 tests covering all code paths.rule_to_hook.py— new module withclassify_rule/find_hook_candidates; 14 tests; clean StrEnum + dataclass design._manifest_quality.py— v3 formula adds cross-domain universality (Component 9, 5 pts) and severity trend (Component 10, 3 pts), while reducing active-lessons weight from 8 → 5 pts. Total still sums to 100.detect_cross_domain_candidates()inmeta_rules.py— groups lessons by normalised description × distinct domain, returns candidates matching 3+ domains.suggest_scope_narrowing()inself_healing.py— narrows wildcard fields in aRuleScopebased on misfire context; correctly leaves non-wildcard fields unchanged._manifest_quality.pyis missingfrom __future__ import annotations(required by the project's custom style rule for all*.pySDK files).Confidence Score: 4/5
Safe to merge after adding
from __future__ import annotationsto_manifest_quality.pyand updating the CHANGELOG — neither issue causes a runtime failure.1,858 tests passing, the new APIs (brain.scope, detect_cross_domain_candidates, suggest_scope_narrowing) all have thorough test coverage, the v3 formula math is correct (components still sum to 100), and the rule_to_hook module is clean and well-isolated. The two flagged issues are style/documentation concerns that don't affect runtime behaviour on the required Python ≥3.11 baseline. Score reflects one targeted fix remaining before the release tag.
src/gradata/_manifest_quality.py(missing future import) andCHANGELOG.md(incomplete 0.5.0 entry).Important Files Changed
scope()convenience method wrappingapply_brain_ruleswith named domain/task_type parameters; logic is correct and tests pass for all parameter combinations.from __future__ import annotationsheader required by project style rule.detect_cross_domain_candidates()grouping lessons by description×domain; correctly deduplicates same-domain entries and computes avg_confidence across all matching lessons.suggest_scope_narrowing()using dataclass introspection; correctly leaves non-wildcard fields unchanged and returns None when no narrowing is possible.Sequence Diagram
sequenceDiagram participant SA as Sub-Agent participant B as Brain.scope() participant ABR as apply_brain_rules() participant RE as RuleEngine participant RL as lessons.md SA->>B: scope(domain, task_type) B->>ABR: apply_brain_rules(task, context, agent_type) ABR->>RL: parse_lessons() RL-->>ABR: lessons[] ABR->>RE: apply_rules(lessons, build_scope(ctx), max_rules) RE-->>ABR: filtered and ranked rules ABR-->>B: format_rules_for_prompt() B-->>SA: rule string for prompt injection Note over SA,RL: Rule-to-Hook graduation path SA->>B: end_session() B->>B: find_hook_candidates(lessons, min_confidence=0.90) B->>B: classify_rule(description, confidence) alt deterministic pattern match B-->>SA: HookCandidate(enforcement=HOOK) else requires LLM judgment B-->>SA: HookCandidate(enforcement=PROMPT_INJECTION) endComments Outside Diff (2)
src/gradata/_manifest_quality.py, line 9 (link)from __future__ import annotations_manifest_quality.pyis a source SDK file but does not includefrom __future__ import annotations, violating project Rule 6. The same omission exists in all four new test files added in this PR:tests/test_rule_to_hook.py(line 1),tests/test_atomic_writes.py(line 1),tests/test_cascading_corrections.py(line 1), andtests/test_failure_modes.py(line 1).The project rule applies to all
*.pyfiles.test_scoped_brain.py(also new in this PR) correctly includes the import at line 2, so the omission here looks accidental.Rule Used: # Code Review Rules
Rule 1: Never use print() ... (source)
Prompt To Fix With AI
src/gradata/_manifest_quality.py, line 507-513 (link)The anti-gaming block (lines ~495–501) reduces
slope_ptsto 30% of its value when more than 60% of corrections are front-loaded. The low-absolute-error bonus immediately below can raiseslope_ptsback to 10.0 (or 5.0) viamax(slope_pts, …), effectively undoing the penalty for any brain that now has near-zero recent corrections.For example:
slope_ptscomputed as 15.0 → anti-gaming drops it to 4.5 → low-error bonus lifts it to 10.0If the intention is that "genuinely low recent error always wins", the current behaviour is correct. But if the anti-gaming guard should apply even to low-error brains, the bonus should only fire when no anti-gaming was triggered. Consider documenting the interaction or guarding the bonus:
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix: add RuleScope import to self_healin..." | Re-trigger Greptile
Context used:
Rule 1: Never use print() ... (source)