feat: Sim 9 analytics — CUSUM, Beta scoring, kill threshold fix, dynamic similarity#16
Conversation
Co-Authored-By: Gradata <noreply@gradata.ai>
…nfidence Replace naive misfires/fires ratio with Beta distribution lower bound for domain scoping. Add effective_confidence = FSRS * Beta for ranking. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…al, looser for style Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Adds average edit distance per session and Mann-Kendall trend analysis to brain.convergence() return value, enabling detection of whether AI output is getting closer to user expectations over time. Co-Authored-By: Gradata <noreply@gradata.ai>
Chaos brain exposed: random noise with no Mann-Kendall trend was mapped to "converged." Now uses coefficient of variation: CV < 0.5 = genuinely stable (converged), CV >= 0.5 = noisy (no_signal). Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Lightweight adjacency list (dict-of-dicts, persisted as JSON) that tracks conflict and co-occurrence edges between rules. Data layer for future rule injection filtering and meta-rule clustering. Co-Authored-By: Gradata <noreply@gradata.ai>
…L, clear fired rules - Replace duplicate _mann_kendall() + _normal_cdf() (~55 LOC) with delegation to _stats.trend_analysis() - Push per-category grouping to SQL (was fetching all JSON blobs into Python) - Remove unused json import from brain_convergence - Init desc="" before try block (was using fragile locals() introspection) - Clear _fired_rules after attribution (was unbounded growth) - Apply no_signal/insufficient_data logic to per-category trends (was copy-paste divergence) - Simplify _SEVERITY_SECONDS dict to single constant (only "moderate" was used) - Type _fired_rules as list[Lesson] - Clarify beta_domain_reliability docstring (fires vs misfires semantics) Co-Authored-By: Gradata <noreply@gradata.ai>
… on temp dirs SQLite WAL mode keeps file handles open during TemporaryDirectory cleanup on Windows. Using ignore_cleanup_errors=True (Python 3.12+) fixes all 5 previously flaky tests. Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…rust Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…icate Co-Authored-By: Gradata <noreply@gradata.ai>
…ld docs - CUSUM: docstring now correctly states it uses successive differences, not mean deviations. Sensitive to step changes, not gradual trends. - is_rule_disabled_for_domain: docstring now correctly states the 0.5 success rate threshold (not 30% misfire rate). Co-Authored-By: Gradata <noreply@gradata.ai>
- Add from __future__ import annotations to 6 files (_stats.py, _config.py, 4 test files) for consistent annotation style - Guard beta_domain_reliability against misfires > fires data corruption (clamp misfires to min(misfires, fires) before computing Beta) - Per-category no_signal vs converged already fixed in prior commit Co-Authored-By: Gradata <noreply@gradata.ai>
| if fires == 0 and misfires == 0: | ||
| return 1.0 | ||
| misfires = min(misfires, fires) # enforce invariant: misfires <= fires | ||
| successes = fires - misfires | ||
| alpha = max(1, successes + 1) | ||
| beta_param = misfires + 1 | ||
| return round(_beta_ppf_05(alpha, beta_param), 4) |
There was a problem hiding this comment.
fires == 0 guard fires before the misfires clamp
The early-return if fires == 0 and misfires == 0 is evaluated before misfires = min(misfires, fires). So corrupted data like {"fires": 0, "misfires": 5} — which is exactly what the clamp exists to sanitise — bypasses the neutral return, then has its misfires zeroed to 0, and falls through to _beta_ppf_05(1, 1) → 0.2 instead of the intended 1.0.
In is_rule_disabled_for_domain the fires < 3 guard masks this for the kill-switch path, but a direct call like effective_confidence(0.92, fires=0, misfires=5) would silently apply a 0.2× multiplier and suppress the rule with no warning.
Fix: move the neutral guard to after the clamp, and widen it to any zero-fires case:
| if fires == 0 and misfires == 0: | |
| return 1.0 | |
| misfires = min(misfires, fires) # enforce invariant: misfires <= fires | |
| successes = fires - misfires | |
| alpha = max(1, successes + 1) | |
| beta_param = misfires + 1 | |
| return round(_beta_ppf_05(alpha, beta_param), 4) | |
| misfires = min(misfires, fires) # enforce invariant: misfires <= fires | |
| if fires == 0: # no activations → no domain signal, stay neutral | |
| return 1.0 | |
| successes = fires - misfires |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_engine.py
Line: 447-453
Comment:
**`fires == 0` guard fires before the misfires clamp**
The early-return `if fires == 0 and misfires == 0` is evaluated _before_ `misfires = min(misfires, fires)`. So corrupted data like `{"fires": 0, "misfires": 5}` — which is exactly what the clamp exists to sanitise — bypasses the neutral return, then has its `misfires` zeroed to 0, and falls through to `_beta_ppf_05(1, 1) → 0.2` instead of the intended `1.0`.
In `is_rule_disabled_for_domain` the `fires < 3` guard masks this for the kill-switch path, but a direct call like `effective_confidence(0.92, fires=0, misfires=5)` would silently apply a 0.2× multiplier and suppress the rule with no warning.
Fix: move the neutral guard to after the clamp, and widen it to any zero-fires case:
```suggestion
misfires = min(misfires, fires) # enforce invariant: misfires <= fires
if fires == 0: # no activations → no domain signal, stay neutral
return 1.0
successes = fires - misfires
```
How can I resolve this? If you propose a fix, please make it concise.When a lesson promotes from INSTINCT to PATTERN or PATTERN to RULE, the EventBus now emits a "lesson.graduated" event with a human-readable message -- the "aha moment" for users seeing Gradata learn their patterns. Co-Authored-By: Gradata <noreply@gradata.ai>
…ives Export graduated rules (PATTERN + RULE) as portable packages for team distribution. Absorbed rules enter as INSTINCT so the recipient brain must validate them through its own correction cycle. Co-Authored-By: Gradata <noreply@gradata.ai>
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (7)
📝 WalkthroughWalkthroughAdds proof-generation, team rule-sharing, and a persisted RuleGraph; introduces Beta-based domain reliability and conflict filtering in the rule engine, CUSUM changepoint detection, per-category similarity thresholds, graduation events, adjusted kill thresholds, and many tests plus an Changes
Sequence Diagram(s)sequenceDiagram
participant Session
participant Brain
participant RuleEngine as Rule Engine
participant RuleGraph as Rule Graph
participant Stats as Stats Module
Session->>Brain: correct(category, correction)
Brain->>RuleEngine: apply_rules(fired_rules, scope, graph=RuleGraph)
RuleEngine->>RuleGraph: conflict_count(rule_id, selected_rule_id)
RuleGraph-->>RuleEngine: conflict frequency
RuleEngine->>RuleGraph: add_co_occurrence(selected_rule_ids)
RuleGraph-->>RuleEngine: ack
Brain->>Brain: attribute misfire / record conflict
Brain->>RuleGraph: add_conflict(rule_a, rule_b)
Brain->>RuleGraph: save()
Session->>Brain: end_session()
Brain->>Stats: convergence(by_category=True)
Stats->>Stats: cusum_changepoints(edit_distances)
Stats-->>Brain: changepoints, trend, p_value
Brain->>Brain: prove()
Brain-->>Session: proof dict
sequenceDiagram
participant BrainA as Brain A (Source)
participant Share as Share
participant Package as Package
participant BrainB as Brain B (Recipient)
participant Absorb as Absorb
BrainA->>Share: brain_share()
Share->>Share: filter graduated lessons, aggregate proof
Share-->>Package: package{brain_id, rules, proof...}
Package->>BrainB: deliver
BrainB->>Absorb: brain_absorb(package)
Absorb->>Absorb: deduplicate by similarity threshold
Absorb->>Absorb: create INSTINCT lessons (conf=0.40, agent_type="shared")
Absorb-->>BrainB: absorbed/skipped/total counts
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes Possibly related PRs
Suggested labels
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
src/gradata/brain.py (1)
436-440:⚠️ Potential issue | 🟠 Major
apply_brain_rulesdoesn't passgraphparameter for conflict filtering.The
apply_rulesfunction now accepts an optionalgraph: RuleGraphparameter for conflict filtering and co-occurrence tracking (seerule_engine.pychanges). However,apply_brain_rulesdoesn't passself._rule_graph:return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx)))This means conflict-based rule filtering is silently disabled when rules are applied via
brain.apply_brain_rules(). If this is intentional, consider documenting it. Otherwise:Proposed fix
- return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx))) + return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx), graph=self._rule_graph))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain.py` around lines 436 - 440, The call in apply_brain_rules currently invokes apply_rules(lessons, build_scope(ctx)) and therefore omits the new optional graph parameter; update that call to pass the instance RuleGraph (self._rule_graph) so conflict filtering and co-occurrence tracking are enabled (e.g., apply_rules(lessons, build_scope(ctx), self._rule_graph) or apply_rules(lessons, build_scope(ctx), graph=self._rule_graph)); ensure you reference the apply_brain_rules function and the self._rule_graph attribute when making the change.src/gradata/_brain_manifest.py (1)
1-27:⚠️ Potential issue | 🟡 MinorMissing
from __future__ import annotationsimport.This file is in
src/gradata/and per coding guidelines requires the annotations import for type safety.Proposed fix
""" Brain Manifest Generator (SDK Layer). ... """ +from __future__ import annotations import json from datetime import UTC, datetimeAs per coding guidelines: "src/gradata/**/*.py: ... type safety (from future import annotations required)".
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_brain_manifest.py` around lines 1 - 27, The module is missing the required future annotations import; add the line "from __future__ import annotations" as the very first import in this file (before any other imports) to satisfy the codebase type-safety rule and ensure postponed evaluation of annotations for public API symbols in this module (e.g., used by generate_manifest, validate_manifest, MANIFEST_PATH, BrainContext, get_connection).src/gradata/_core.py (1)
1-6: 🧹 Nitpick | 🔵 TrivialFile exceeds 500-line guideline (1267 lines).
Per coding guidelines, files should stay under 500 lines. While this module intentionally consolidates heavy methods from
Brainto keep that class small, consider further splitting by domain:
_core_correct.pyfor correction logic_core_convergence.pyfor convergence/efficiency/prove_core_sharing.pyfor share/absorbThis is architectural advice for future maintainability.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_core.py` around lines 1 - 6, The file _core.py is too large; split its domain responsibilities into smaller modules: create _core_correct.py containing correction-related functions (correct, end_session, auto_evolve), _core_convergence.py containing convergence/prove/efficiency logic (prove, converge, any efficiency-related helpers), and _core_sharing.py containing share/absorb and sharing helpers; move the corresponding functions and their private helpers into those files, update imports/exports so existing callers (Brain and other modules) import the same symbols from src.gradata._core (add re-exporting imports in _core.py or update Brain to import directly from the new modules), and ensure any shared utility functions keep their visibility (either keep them in a small common helper module or import them where needed) so behavior and tests remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_brain_manifest.py`:
- Around line 194-197: The validator is requiring a "proof" key that
generate_manifest() does not produce, causing false "Missing required key:
proof" errors; update validate_manifest() to remove "proof" from required_keys
(leave Brain.manifest() to still append proof via m["proof"] = self.prove()), so
validation accepts manifests produced by generate_manifest() and
write_manifest() while Brain.manifest() can continue to include proof when
available.
In `@src/gradata/_core.py`:
- Line 1207: Replace the locally defined INITIAL_CONFIDENCE = 0.40 with the
shared constant already imported from the self_improvement module: remove the
local assignment and use the imported INITIAL_CONFIDENCE symbol everywhere in
this file (ensure any references in functions or classes that currently rely on
the local value simply reference the imported INITIAL_CONFIDENCE). This keeps a
single source of truth and prevents drift if the constant changes in
self_improvement.
- Around line 392-402: The type hint in _graduation_message references Lesson
but Lesson isn't imported under the TYPE_CHECKING guard; update the conditional
imports to include Lesson (add Lesson to the TYPE_CHECKING block where other
forward-only imports live) so the symbol is defined for static type checkers and
the pyright error is resolved.
- Around line 1257-1258: The code currently always writes back with
lessons_path.write_text(format_lessons(existing), encoding="utf-8") even when no
changes were made (absorbed == 0), causing unnecessary I/O; modify the
surrounding logic (the block that uses variables absorbed and existing and calls
format_lessons) to detect no-op cases (e.g., absorbed == 0 and no new rules were
appended to existing) and skip the write_text call in that case so the file is
only rewritten when existing was actually modified.
In `@src/gradata/_stats.py`:
- Around line 59-87: In cusum_changepoints, validate inputs before computing
mean/variance: ensure threshold is finite and > 0 (reject non-finite or <= 0)
and ensure all elements of data are finite numbers (no NaN/inf); if validation
fails return [] or raise ValueError per project convention. Update checks around
variables threshold, limit and data so the function rejects bad thresholds and
non-finite samples up front (before computing mean/variance/std_dev) to prevent
poisoned math and spurious changepoints.
In `@src/gradata/rules/rule_engine.py`:
- Around line 625-640: The conflict threshold (currently hardcoded as 3 in the
conflict filtering loop that uses graph.conflict_count(rule_id, sel_id)) should
be made configurable: introduce a named constant (e.g., CONFLICT_THRESHOLD = 3)
or a function parameter to the rule selection routine (the code around scored,
selected_ids, and _make_rule_id in rule_engine.py), replace the literal "3" with
that symbol, and ensure any callers that should control sensitivity can pass a
value or use the constant; update any relevant docstring or default to
CONFLICT_THRESHOLD so behavior remains the same unless tuned.
In `@src/gradata/rules/rule_graph.py`:
- Around line 71-76: save() currently writes directly to self._path and can
leave a corrupt file if interrupted; change it to write JSON to a temporary file
in the same directory (e.g., use a tmp name like
self._path.with_suffix(self._path.suffix + ".tmp") or tempfile in same dir),
flush/close, then atomically replace the target with os.replace or
pathlib.Path.replace. Use json.dumps(self._edges, indent=2) and ensure
encoding="utf-8" when writing the temp file; update save() to perform the
write-to-temp then atomic rename so subsequent reads of the constructor won't
see a half-written file.
- Around line 21-24: The persisted _edges currently use caller-supplied,
non-deterministic numeric IDs which break history; fix by generating stable,
namespaced IDs before persisting: replace the hash(description) % 10000 usage in
the caller (the code that calls add_conflict/add_co_occurrence) with
deterministic IDs (e.g., SHA256 hex of the description) or with a sequential
generator and always prefix IDs with a namespace like "rule:" and "correction:"
so keys stored in self._edges remain stable across restarts; ensure the same
deterministic ID creation is used everywhere that constructs node IDs and passed
into RuleGraph.add_conflict()/add_co_occurrence so persisted entries in _edges
map to the new runtime IDs.
In `@tests/test_beta_scoring.py`:
- Around line 64-66: The test test_effective_confidence_no_domain_data currently
asserts exact float equality against effective_confidence(fsrs_confidence=0.92,
domain_fires=0, domain_misfires=0); replace the exact equality with a
floating-point tolerant assertion using pytest.approx (e.g., assert score ==
pytest.approx(0.92)) to avoid brittle comparisons when testing the
effective_confidence function.
- Around line 22-24: Update the test test_beta_reliability_no_data to use
pytest.approx for float comparison: in the test function that calls
beta_domain_reliability(fires=0, misfires=0) replace the exact equality
assertion with an assertion using pytest.approx(1.0) and ensure pytest is
imported at the top of the test module if not already present so the test uses
pytest.approx instead of ==.
In `@tests/test_cusum.py`:
- Around line 37-47: Move the module import to the top of the test file:
relocate the line "from gradata.brain import Brain" so it's grouped with the
other imports at the top of tests/test_cusum.py rather than placed between test
functions; ensure test_convergence_includes_changepoints still instantiates
Brain(str(tmp_path)) and calls brain.convergence() unchanged after the import
move.
In `@tests/test_dynamic_similarity.py`:
- Around line 12-18: Replace the hardcoded 0.35 in the tests with the
SIMILARITY_THRESHOLD constant from _config: ensure tests test_default_threshold
and test_stylistic_categories_are_loose use SIMILARITY_THRESHOLD when asserting
get_similarity_threshold("UNKNOWN_CATEGORY") == SIMILARITY_THRESHOLD and when
comparing stylistic categories (for cat in ["DRAFTING","TONE","STYLE"]: assert
get_similarity_threshold(cat) <= SIMILARITY_THRESHOLD); also add the import for
SIMILARITY_THRESHOLD if it’s missing.
In `@tests/test_edit_distance_convergence.py`:
- Around line 10-24: The helper _make_brain currently uses tempfile.mkdtemp()
which leaves orphaned temp dirs; change _make_brain to accept a Path-like
directory parameter (or a pytest tmp_path fixture) instead of creating its own
temp dir, use that Path to write "lessons.md" and construct Brain(dir), and
update all tests in this file that call _make_brain to accept the pytest
tmp_path fixture and pass tmp_path (or tmp_path / "subdir") into _make_brain so
pytest will handle cleanup automatically; keep the function name _make_brain and
the Brain.emit calls unchanged.
- Around line 87-98: The test test_edit_distance_values_are_rounded currently
compares floats with exact equality (val == round(val, 4)); change the assertion
to use pytest.approx so floating-point rounding imprecision is tolerated — for
each val in result["edit_distance_per_session"] assert val ==
pytest.approx(round(val, 4), rel=0, abs=1e-4) (or equivalently assert val ==
pytest.approx(round(val,4), abs=1e-4)) and add an import for pytest if needed;
keep the rest of the test and the call to brain.convergence() unchanged.
In `@tests/test_graduation_notification.py`:
- Around line 24-28: The test currently uses a conditional guard (if graduated:)
which lets the test pass when no graduation event is present; change the logic
in tests/test_graduation_notification.py to assert the presence of graduation
events explicitly by adding assert len(graduated) > 0 (or assert graduated)
immediately before inspecting graduated[0] and its "message" contents; likewise
replace other conditional guards that check if events: with explicit assertions
that the events list is non-empty before performing payload checks (or mark
those tests pytest.mark.xfail if the behavior is known to be flaky).
In `@tests/test_kill_thresholds.py`:
- Around line 12-20: test_machine_kill_limits_also_inverted only checks INFANT <
STABLE which misses regressions in intermediate buckets; update it to assert
monotonic increase across MACHINE_KILL_LIMITS like
test_kill_limits_monotonically_increasing does for KILL_LIMITS: define the same
order = ["INFANT","ADOLESCENT","MATURE","STABLE"] and loop/parametrize to assert
MACHINE_KILL_LIMITS[order[i]] < MACHINE_KILL_LIMITS[order[i+1]] (or use
pytest.mark.parametrize to create per-pair assertions) so all adjacent
thresholds are validated rather than only endpoints.
In `@tests/test_rule_graph_integration.py`:
- Around line 41-43: The assertion graph.node_count >= 0 is vacuous; change the
test to assert meaningful outcomes of co-occurrence tracking: for example,
assert graph.node_count > 0 to ensure nodes were added or assert that the
expected co-occurrence edge exists (e.g., check for an edge between the two rule
node identifiers or that an edge with the co-occurrence attribute/count is
present). Update the test in tests/test_rule_graph_integration.py to reference
graph.node_count, graph.has_edge or graph.edges (and the specific node IDs or
edge attribute name used by your rule graph implementation) to validate that
co-occurrence was actually recorded.
- Around line 46-77: The test exposes a mismatch: rule IDs created by
brain_correct (which uses built-in hash()) differ from IDs used by
apply_rules/_make_rule_id (which uses hashlib.sha256), so conflicts stored by
RuleGraph aren't recognized; fix by unifying ID creation—update brain_correct to
call _make_rule_id (or refactor both to a single helper) so both brain_correct
and apply_rules use identical hashing, and update any tests that seed conflicts
to use _make_rule_id (or the chosen unified helper) to ensure conflict seeding
matches production ID format.
In `@tests/test_sharing.py`:
- Around line 150-152: Replace the direct float equality assertion for
tone_lessons[0].confidence with a pytest.approx comparison: update the assertion
that checks tone_lessons[0].confidence to use pytest.approx(0.40). Ensure pytest
is imported in the test module (add "import pytest" if missing) and keep the
existing checks for len(tone_lessons) and tone_lessons[0].state.value unchanged.
---
Outside diff comments:
In `@src/gradata/_brain_manifest.py`:
- Around line 1-27: The module is missing the required future annotations
import; add the line "from __future__ import annotations" as the very first
import in this file (before any other imports) to satisfy the codebase
type-safety rule and ensure postponed evaluation of annotations for public API
symbols in this module (e.g., used by generate_manifest, validate_manifest,
MANIFEST_PATH, BrainContext, get_connection).
In `@src/gradata/_core.py`:
- Around line 1-6: The file _core.py is too large; split its domain
responsibilities into smaller modules: create _core_correct.py containing
correction-related functions (correct, end_session, auto_evolve),
_core_convergence.py containing convergence/prove/efficiency logic (prove,
converge, any efficiency-related helpers), and _core_sharing.py containing
share/absorb and sharing helpers; move the corresponding functions and their
private helpers into those files, update imports/exports so existing callers
(Brain and other modules) import the same symbols from src.gradata._core (add
re-exporting imports in _core.py or update Brain to import directly from the new
modules), and ensure any shared utility functions keep their visibility (either
keep them in a small common helper module or import them where needed) so
behavior and tests remain unchanged.
In `@src/gradata/brain.py`:
- Around line 436-440: The call in apply_brain_rules currently invokes
apply_rules(lessons, build_scope(ctx)) and therefore omits the new optional
graph parameter; update that call to pass the instance RuleGraph
(self._rule_graph) so conflict filtering and co-occurrence tracking are enabled
(e.g., apply_rules(lessons, build_scope(ctx), self._rule_graph) or
apply_rules(lessons, build_scope(ctx), graph=self._rule_graph)); ensure you
reference the apply_brain_rules function and the self._rule_graph attribute when
making the change.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 3050b317-2a72-4a99-8fb7-4c118e791b85
📒 Files selected for processing (24)
pyproject.tomlsrc/gradata/_brain_manifest.pysrc/gradata/_config.pysrc/gradata/_core.pysrc/gradata/_stats.pysrc/gradata/brain.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/rules/rule_engine.pysrc/gradata/rules/rule_graph.pytests/test_adaptations.pytests/test_beta_scoring.pytests/test_core_behavioral.pytests/test_cusum.pytests/test_dynamic_similarity.pytests/test_edit_distance_convergence.pytests/test_graduation_notification.pytests/test_integration_workflow.pytests/test_kill_thresholds.pytests/test_machine_mode.pytests/test_manifest_prove.pytests/test_prove.pytests/test_rule_graph.pytests/test_rule_graph_integration.pytests/test_sharing.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Always read a file before editing it. Always prefer editing over creating new files.
Files:
pyproject.tomlsrc/gradata/_brain_manifest.pytests/test_adaptations.pysrc/gradata/_config.pytests/test_dynamic_similarity.pytests/test_cusum.pytests/test_kill_thresholds.pytests/test_rule_graph_integration.pytests/test_machine_mode.pytests/test_rule_graph.pytests/test_integration_workflow.pytests/test_edit_distance_convergence.pysrc/gradata/_stats.pysrc/gradata/enhancements/self_improvement.pytests/test_core_behavioral.pytests/test_sharing.pysrc/gradata/rules/rule_engine.pysrc/gradata/brain.pytests/test_graduation_notification.pytests/test_manifest_prove.pytests/test_beta_scoring.pytests/test_prove.pysrc/gradata/rules/rule_graph.pysrc/gradata/_core.py
**/*.{py,sh,js,ts,json,env*}
📄 CodeRabbit inference engine (CLAUDE.md)
Never hardcode secrets. Never commit .env files. Pre-commit hook blocks both.
Files:
src/gradata/_brain_manifest.pytests/test_adaptations.pysrc/gradata/_config.pytests/test_dynamic_similarity.pytests/test_cusum.pytests/test_kill_thresholds.pytests/test_rule_graph_integration.pytests/test_machine_mode.pytests/test_rule_graph.pytests/test_integration_workflow.pytests/test_edit_distance_convergence.pysrc/gradata/_stats.pysrc/gradata/enhancements/self_improvement.pytests/test_core_behavioral.pytests/test_sharing.pysrc/gradata/rules/rule_engine.pysrc/gradata/brain.pytests/test_graduation_notification.pytests/test_manifest_prove.pytests/test_beta_scoring.pytests/test_prove.pysrc/gradata/rules/rule_graph.pysrc/gradata/_core.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Keep files under 500 lines. Validate input at system boundaries.
Files:
src/gradata/_brain_manifest.pytests/test_adaptations.pysrc/gradata/_config.pytests/test_dynamic_similarity.pytests/test_cusum.pytests/test_kill_thresholds.pytests/test_rule_graph_integration.pytests/test_machine_mode.pytests/test_rule_graph.pytests/test_integration_workflow.pytests/test_edit_distance_convergence.pysrc/gradata/_stats.pysrc/gradata/enhancements/self_improvement.pytests/test_core_behavioral.pytests/test_sharing.pysrc/gradata/rules/rule_engine.pysrc/gradata/brain.pytests/test_graduation_notification.pytests/test_manifest_prove.pytests/test_beta_scoring.pytests/test_prove.pysrc/gradata/rules/rule_graph.pysrc/gradata/_core.py
src/gradata/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use AGPL-3.0 licensing. Source code resides in src/gradata/.
Files:
src/gradata/_brain_manifest.pysrc/gradata/_config.pysrc/gradata/_stats.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/rules/rule_engine.pysrc/gradata/brain.pysrc/gradata/rules/rule_graph.pysrc/gradata/_core.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_brain_manifest.pysrc/gradata/_config.pysrc/gradata/_stats.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/rules/rule_engine.pysrc/gradata/brain.pysrc/gradata/rules/rule_graph.pysrc/gradata/_core.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_adaptations.pytests/test_dynamic_similarity.pytests/test_cusum.pytests/test_kill_thresholds.pytests/test_rule_graph_integration.pytests/test_machine_mode.pytests/test_rule_graph.pytests/test_integration_workflow.pytests/test_edit_distance_convergence.pytests/test_core_behavioral.pytests/test_sharing.pytests/test_graduation_notification.pytests/test_manifest_prove.pytests/test_beta_scoring.pytests/test_prove.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T15:53:08.573Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T15:53:08.573Z
Learning: Correction pipeline: correct → extract behavioral instruction → graduate (INSTINCT→PATTERN→RULE) → inject into next session. Meta-rules emerge when patterns cluster across domains.
Applied to files:
src/gradata/_core.py
🪛 GitHub Actions: CI
src/gradata/_stats.py
[warning] 116-116: pyright: Import "scipy.stats" could not be resolved (reportMissingImports)
[warning] 126-126: pyright: Import "scipy.stats" could not be resolved (reportMissingImports)
src/gradata/brain.py
[warning] 727-727: pyright: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)
src/gradata/_core.py
[error] 393-393: pyright: "Lesson" is not defined (reportUndefinedVariable)
🔇 Additional comments (21)
src/gradata/_config.py (1)
70-84: LGTM!The per-category similarity thresholds are well-structured with sensible defaults:
- Factual categories (ACCURACY, FACTUAL) have stricter thresholds (0.60)
- Stylistic categories (TONE, STYLE) have looser thresholds (0.30)
- Unknown categories fall back to the global
SIMILARITY_THRESHOLD(0.35)The case-insensitive lookup via
.upper()is a good defensive choice.tests/test_rule_graph.py (2)
7-65: LGTM!Excellent unit test coverage for
RuleGraph:
- Bidirectional conflict creation and counting
- Co-occurrence tracking across multiple nodes
- Edge cases (no conflict, empty conflicts, nonexistent nodes)
- Persistence via save/load roundtrip
- Graceful handling of corrupted JSON files
1-5:⚠️ Potential issue | 🟡 MinorMissing
from __future__ import annotationsimport.Proposed fix
"""Tests for rule graph — conflict and co-occurrence edges.""" +from __future__ import annotations + from pathlib import Path from gradata.rules.rule_graph import RuleGraph> Likely an incorrect or invalid review comment.src/gradata/brain.py (3)
74-79: LGTM!Clean initialization:
_fired_rulesproperly typed with forward reference forLessonRuleGraphinstantiated with persistent path for conflict/co-occurrence tracking
369-394: LGTM!New public API methods (
prove,share,absorb) follow the established delegation pattern to_corefunctions. Docstrings clearly explain the purpose and state transitions (e.g., shared rules enter as INSTINCT in the receiving brain).
833-842: LGTM!Good defensive error handling for
prove()failures. The fallback proof object correctly indicates the error state withproven: Falseandconfidence_level: "error".src/gradata/rules/rule_engine.py (4)
420-453: LGTM with minor observation.The Beta distribution scoring implementation is sound:
_beta_ppf_05uses normal approximation with proper clamping to [0, 1]beta_domain_reliabilitycorrectly handles the neutral (0, 0) case- The
misfires = min(misfires, fires)invariant enforcement on line 449 is a good defensive measureMinor: Line 451's
max(1, successes + 1)is alwayssuccesses + 1sincesuccesses >= 0after line 450. Themax(1, ...)is redundant but harmless.
456-467: LGTM!
effective_confidencecleanly combines global FSRS confidence with per-domain Beta reliability without double-counting, as documented. The multiplication of two [0,1] values stays in range, and the 4-decimal rounding is appropriate for confidence scores.
470-483: LGTM!The updated
is_rule_disabled_for_domainnow uses the principled Beta distribution approach instead of naïve ratio checking. The 0.5 threshold (95% confident success rate is below 50%) is a reasonable conservative gate.
659-662: LGTM!Co-occurrence tracking correctly fires only when multiple rules are selected, avoiding self-edges.
tests/test_rule_graph_integration.py (1)
1-4:⚠️ Potential issue | 🟡 MinorMissing
from __future__ import annotationsimport.Per coding guidelines, test files should follow the same conventions. Add the import for consistency.
Proposed fix
"""Tests for RuleGraph integration with the correction pipeline.""" +from __future__ import annotations + from gradata.brain import Brain from gradata.rules.rule_graph import RuleGraph> Likely an incorrect or invalid review comment.tests/test_integration_workflow.py (1)
1-11: LGTM!Well-structured integration test suite with appropriate skip conditions for API key availability. The tests cover the key workflows (correction, convergence, efficiency) and validate expected response structures.
tests/test_manifest_prove.py (1)
1-44: LGTM!Comprehensive tests for proof integration in manifest. Good coverage of success path, failure fallback, and disk persistence. The
patch.objectusage correctly tests the error handling path.tests/test_sharing.py (1)
1-261: LGTM overall!Comprehensive test coverage for the share/absorb functionality including edge cases like invalid correction types, duplicate detection, and round-trip behavior.
tests/test_prove.py (1)
1-93: LGTM!Well-designed test suite with a clean
_mock_convergencehelper that provides sensible defaults. Good coverage of proof strength tiers (strong, moderate, insufficient) and evidence structure validation.src/gradata/_core.py (6)
68-73: LGTM for rule graph conflict tracking.The hash-based ID generation provides reasonable uniqueness for analytics purposes. The 4-digit modulo creates a manageable ID space while accepting some collision possibility.
229-231: LGTM for dynamic similarity thresholds.Using per-category thresholds from configuration allows stricter matching for factual/accuracy categories while being more lenient for style categories.
323-337: LGTM for misfire attribution and rule graph persistence.Good memory management by clearing
_fired_rulesafter attribution. The silent exception handling for graph persistence is appropriate since it's non-critical analytics data.
462-474: LGTM for graduation notification emission.Well-structured event payload with appropriate exception handling to prevent notification failures from breaking the session end flow.
870-887: LGTM for Mann-Kendall delegation.Clean refactor to delegate statistical analysis to
_stats.trend_analysis(). The p < 0.05 significance threshold is appropriate.
1066-1143: LGTM for proof generation logic.Well-structured proof document with clear confidence level tiers. The evidence dictionary provides comprehensive metrics for debugging and auditing.
| required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"] | ||
| for k in required_keys: | ||
| if k not in manifest: | ||
| issues.append(f"Missing required key: {k}") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if generate_manifest() produces a "proof" key anywhere
rg -n "proof" src/gradata/_brain_manifest.py
# Check how validate_manifest is called in tests
rg -n "validate_manifest" tests/Repository: Gradata/gradata
Length of output: 581
🏁 Script executed:
#!/bin/bash
# First, let's check the validate_manifest function signature
sed -n '180,210p' src/gradata/_brain_manifest.py
# Check the generate_manifest function to see what keys it returns
sed -n '46,100p' src/gradata/_brain_manifest.pyRepository: Gradata/gradata
Length of output: 2895
🏁 Script executed:
#!/bin/bash
# Check the test context around lines 315-316 and 328-329
sed -n '310,335p' tests/test_brain.pyRepository: Gradata/gradata
Length of output: 1000
🏁 Script executed:
#!/bin/bash
# Check if there's a default manifest or fixture being validated
rg -B 5 -A 2 "validate_manifest\(\)" tests/test_brain.pyRepository: Gradata/gradata
Length of output: 543
🏁 Script executed:
#!/bin/bash
# Look for the complete generate_manifest function to see all returned keys
ast-grep --pattern $'def generate_manifest($$$) {
$$$
return $_
}'Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
#!/bin/bash
# See the rest of generate_manifest() function
sed -n '46,163p' src/gradata/_brain_manifest.pyRepository: Gradata/gradata
Length of output: 4936
🏁 Script executed:
#!/bin/bash
# Find Brain.manifest() method in brain.py
rg -A 20 "def manifest\(" src/gradata/brain.pyRepository: Gradata/gradata
Length of output: 957
🏁 Script executed:
#!/bin/bash
# Find write_manifest function
rg -A 10 "def write_manifest" src/gradata/_brain_manifest.pyRepository: Gradata/gradata
Length of output: 507
🏁 Script executed:
#!/bin/bash
# Check if there are any direct calls to write_manifest(generate_manifest()) in the codebase
rg "write_manifest.*generate_manifest" src/ tests/Repository: Gradata/gradata
Length of output: 167
🏁 Script executed:
#!/bin/bash
# Check all calls to generate_manifest to see if it's ever used outside of Brain.manifest()
rg -B 2 -A 2 "generate_manifest\(" src/ tests/Repository: Gradata/gradata
Length of output: 2373
validate_manifest() requires "proof" key that generate_manifest() never produces.
generate_manifest() returns a manifest without a "proof" key. Only Brain.manifest() adds m["proof"] = self.prove() after calling generate_manifest() and before calling write_manifest().
This creates two problematic paths:
write_manifest(generate_manifest())writes a manifest lacking "proof", causingvalidate_manifest()to report "Missing required key: proof"write_manifest()with no arguments internally callsgenerate_manifest(), producing the same result
Since validate_manifest() reads the manifest from disk and checks for all required keys, only manifests created through Brain.manifest() will pass validation. Manifests from direct generate_manifest() calls will fail.
Either add "proof" generation to generate_manifest(), remove "proof" from required keys, or document that validation only applies to manifests created via Brain.manifest().
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_brain_manifest.py` around lines 194 - 197, The validator is
requiring a "proof" key that generate_manifest() does not produce, causing false
"Missing required key: proof" errors; update validate_manifest() to remove
"proof" from required_keys (leave Brain.manifest() to still append proof via
m["proof"] = self.prove()), so validation accepts manifests produced by
generate_manifest() and write_manifest() while Brain.manifest() can continue to
include proof when available.
| from gradata.enhancements.self_improvement import format_lessons, parse_lessons | ||
| from gradata.enhancements.similarity import best_similarity | ||
|
|
||
| INITIAL_CONFIDENCE = 0.40 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider reusing INITIAL_CONFIDENCE from self_improvement module.
This locally defines INITIAL_CONFIDENCE = 0.40, but the same constant is imported from self_improvement at line 175. Using the shared constant prevents drift if the value changes.
- INITIAL_CONFIDENCE = 0.40
+ from gradata.enhancements.self_improvement import INITIAL_CONFIDENCE📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| INITIAL_CONFIDENCE = 0.40 | |
| # Remove the line entirely and use the module-level imported INITIAL_CONFIDENCE |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_core.py` at line 1207, Replace the locally defined
INITIAL_CONFIDENCE = 0.40 with the shared constant already imported from the
self_improvement module: remove the local assignment and use the imported
INITIAL_CONFIDENCE symbol everywhere in this file (ensure any references in
functions or classes that currently rely on the local value simply reference the
imported INITIAL_CONFIDENCE). This keeps a single source of truth and prevents
drift if the constant changes in self_improvement.
| # Write back | ||
| lessons_path.write_text(format_lessons(existing), encoding="utf-8") |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Minor: Consider skipping write when nothing changed.
The file is written even when absorbed == 0 and no rules were added. This is harmless but unnecessary I/O.
# Write back
- lessons_path.write_text(format_lessons(existing), encoding="utf-8")
+ if absorbed > 0:
+ lessons_path.write_text(format_lessons(existing), encoding="utf-8")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # Write back | |
| lessons_path.write_text(format_lessons(existing), encoding="utf-8") | |
| # Write back | |
| if absorbed > 0: | |
| lessons_path.write_text(format_lessons(existing), encoding="utf-8") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_core.py` around lines 1257 - 1258, The code currently always
writes back with lessons_path.write_text(format_lessons(existing),
encoding="utf-8") even when no changes were made (absorbed == 0), causing
unnecessary I/O; modify the surrounding logic (the block that uses variables
absorbed and existing and calls format_lessons) to detect no-op cases (e.g.,
absorbed == 0 and no new rules were appended to existing) and skip the
write_text call in that case so the file is only rewritten when existing was
actually modified.
| def cusum_changepoints(data: list[int] | list[float], threshold: float = 1.0) -> list[int]: | ||
| """CUSUM changepoint detection for abrupt level shifts. | ||
|
|
||
| Accumulates consecutive differences (data[i] - data[i-1]). When | ||
| the cumulative sum exceeds threshold * std_dev, a changepoint is | ||
| recorded and the accumulator resets. | ||
|
|
||
| Note: uses successive differences, NOT deviations from the global | ||
| mean. Sensitive to sharp step changes (e.g. corrections dropping | ||
| from 10 to 2) but insensitive to slow gradual trends — use | ||
| trend_analysis / Mann-Kendall for those. | ||
|
|
||
| Args: | ||
| data: Time series of values. | ||
| threshold: Sensitivity in standard deviations. Default 1.0. | ||
|
|
||
| Returns: | ||
| Sorted list of indices where significant level shifts occurred. | ||
| """ | ||
| n = len(data) | ||
| if n < 3: | ||
| return [] | ||
|
|
||
| mean = sum(data) / n | ||
| variance = sum((x - mean) ** 2 for x in data) / n | ||
| if variance == 0: | ||
| return [] | ||
| std_dev = variance ** 0.5 | ||
| limit = threshold * std_dev |
There was a problem hiding this comment.
Validate CUSUM inputs before computing the control limit.
A negative or non-finite threshold turns limit into nonsense; with threshold < 0, the check at Line 98 can fire on the first iteration even when there is no changepoint. Non-finite samples also silently poison the variance math. Reject those inputs up front.
🛡️ Proposed fix
def cusum_changepoints(data: list[int] | list[float], threshold: float = 1.0) -> list[int]:
"""CUSUM changepoint detection for abrupt level shifts.
@@
Returns:
Sorted list of indices where significant level shifts occurred.
"""
+ if not math.isfinite(threshold) or threshold < 0:
+ raise ValueError("threshold must be a finite non-negative number")
+ if any(not math.isfinite(x) for x in data):
+ raise ValueError("data must contain only finite numeric values")
+
n = len(data)
if n < 3:
return []As per coding guidelines, **/*.py: Validate input at system boundaries.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_stats.py` around lines 59 - 87, In cusum_changepoints, validate
inputs before computing mean/variance: ensure threshold is finite and > 0
(reject non-finite or <= 0) and ensure all elements of data are finite numbers
(no NaN/inf); if validation fails return [] or raise ValueError per project
convention. Update checks around variables threshold, limit and data so the
function rejects bad thresholds and non-finite samples up front (before
computing mean/variance/std_dev) to prevent poisoned math and spurious
changepoints.
| graduated = [e for e in events if e.get("new_state") == "PATTERN"] | ||
| if graduated: | ||
| assert "message" in graduated[0] | ||
| msg = graduated[0]["message"].lower() | ||
| assert "learned it" in msg or "corrected" in msg |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider making graduation assertions mandatory rather than conditional.
The if graduated: guard means this test passes even if no graduation event fires. This could mask regressions where graduation stops working. Consider either:
- Adding
assert len(graduated) > 0before the payload checks, or - Using
pytest.mark.xfailif graduation is expected to be flaky
Same applies to the other tests with if events: guards (lines 46, 66, 89).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_graduation_notification.py` around lines 24 - 28, The test
currently uses a conditional guard (if graduated:) which lets the test pass when
no graduation event is present; change the logic in
tests/test_graduation_notification.py to assert the presence of graduation
events explicitly by adding assert len(graduated) > 0 (or assert graduated)
immediately before inspecting graduated[0] and its "message" contents; likewise
replace other conditional guards that check if events: with explicit assertions
that the events list is non-empty before performing payload checks (or mark
those tests pytest.mark.xfail if the behavior is known to be flaky).
| def test_machine_kill_limits_also_inverted(): | ||
| assert MACHINE_KILL_LIMITS["STABLE"] > MACHINE_KILL_LIMITS["INFANT"] | ||
|
|
||
|
|
||
| def test_kill_limits_monotonically_increasing(): | ||
| order = ["INFANT", "ADOLESCENT", "MATURE", "STABLE"] | ||
| for i in range(len(order) - 1): | ||
| assert KILL_LIMITS[order[i]] < KILL_LIMITS[order[i + 1]], \ | ||
| f"{order[i]} ({KILL_LIMITS[order[i]]}) should be < {order[i+1]} ({KILL_LIMITS[order[i+1]]})" |
There was a problem hiding this comment.
Check machine thresholds monotonically, not just endpoints.
test_machine_kill_limits_also_inverted() only asserts INFANT < STABLE, so a regression in ADOLESCENT or MATURE would still pass. Reuse the ordered assertion for MACHINE_KILL_LIMITS, ideally via a parametrized helper shared with the human thresholds.
As per coding guidelines, tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness, parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_kill_thresholds.py` around lines 12 - 20,
test_machine_kill_limits_also_inverted only checks INFANT < STABLE which misses
regressions in intermediate buckets; update it to assert monotonic increase
across MACHINE_KILL_LIMITS like test_kill_limits_monotonically_increasing does
for KILL_LIMITS: define the same order =
["INFANT","ADOLESCENT","MATURE","STABLE"] and loop/parametrize to assert
MACHINE_KILL_LIMITS[order[i]] < MACHINE_KILL_LIMITS[order[i+1]] (or use
pytest.mark.parametrize to create per-pair assertions) so all adjacent
thresholds are validated rather than only endpoints.
|
|
||
| # Both rules should be applied, creating a co-occurrence edge | ||
| assert graph.node_count >= 0 # May or may not have edges depending on relevance |
There was a problem hiding this comment.
Assertion graph.node_count >= 0 is always true.
node_count returns a non-negative integer by definition (it counts nodes). This assertion doesn't verify that co-occurrence tracking actually occurred.
Consider asserting that nodes were actually added, or that specific co-occurrence edges exist:
- assert graph.node_count >= 0 # May or may not have edges depending on relevance
+ # Verify rules were tracked (even if not applied due to relevance filtering)
+ # If both rules pass scope, they should create co-occurrence
+ assert graph.node_count >= 0 # Basic sanity
+ # Consider: assert graph.node_count > 0 or check specific edges🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_rule_graph_integration.py` around lines 41 - 43, The assertion
graph.node_count >= 0 is vacuous; change the test to assert meaningful outcomes
of co-occurrence tracking: for example, assert graph.node_count > 0 to ensure
nodes were added or assert that the expected co-occurrence edge exists (e.g.,
check for an edge between the two rule node identifiers or that an edge with the
co-occurrence attribute/count is present). Update the test in
tests/test_rule_graph_integration.py to reference graph.node_count,
graph.has_edge or graph.edges (and the specific node IDs or edge attribute name
used by your rule graph implementation) to validate that co-occurrence was
actually recorded.
| def test_conflict_filtering_drops_conflicting_rules(): | ||
| """apply_rules drops lower-ranked rules that conflict with higher-ranked ones.""" | ||
| from gradata._types import Lesson, LessonState | ||
| from gradata._scope import RuleScope | ||
| from gradata.rules.rule_engine import apply_rules, _make_rule_id | ||
|
|
||
| # Build two lessons and derive their actual rule IDs | ||
| lesson_a = Lesson( | ||
| date="2026-04-06", state=LessonState.RULE, confidence=0.95, | ||
| category="DRAFTING", description="Use active voice", | ||
| ) | ||
| lesson_b = Lesson( | ||
| date="2026-04-06", state=LessonState.RULE, confidence=0.90, | ||
| category="TONE", description="Be concise", | ||
| ) | ||
| rule_id_a = _make_rule_id(lesson_a) | ||
| rule_id_b = _make_rule_id(lesson_b) | ||
|
|
||
| graph = RuleGraph() | ||
| # Simulate 3 past conflicts between these rules | ||
| for _ in range(3): | ||
| graph.add_conflict(rule_id_a, rule_id_b) | ||
|
|
||
| rules = [lesson_a, lesson_b] | ||
| scope = RuleScope() | ||
|
|
||
| # Without graph — both rules appear | ||
| results_no_graph = apply_rules(rules, scope) | ||
| # With graph — conflicting lower-ranked rule should be dropped | ||
| results_with_graph = apply_rules(rules, scope, graph=graph) | ||
|
|
||
| assert len(results_no_graph) >= len(results_with_graph) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Verify the hash mismatch between rule_engine.py and _core.py
echo "=== rule_engine.py _make_rule_id ==="
rg -n "def _make_rule_id|hashlib|hash\(" src/gradata/rules/rule_engine.py
echo ""
echo "=== _core.py rule_id computation ==="
rg -n "rule_id\s*=" src/gradata/_core.py | head -20Repository: Gradata/gradata
Length of output: 367
🏁 Script executed:
# Get full _make_rule_id implementation
sed -n '285,310p' src/gradata/rules/rule_engine.pyRepository: Gradata/gradata
Length of output: 855
🏁 Script executed:
# Get context around _core.py line 70 to understand the function
sed -n '60,85p' src/gradata/_core.pyRepository: Gradata/gradata
Length of output: 1199
🏁 Script executed:
# Check how apply_rules function is implemented and which rule ID method it uses
rg -n "def apply_rules" src/gradata/rules/rule_engine.py -A 30Repository: Gradata/gradata
Length of output: 1626
🏁 Script executed:
# Check RuleGraph.add_conflict method to see what format it expects
rg -n "def add_conflict" -A 10Repository: Gradata/gradata
Length of output: 988
🏁 Script executed:
# Get more of apply_rules to see how graph is used
sed -n '486,550p' src/gradata/rules/rule_engine.pyRepository: Gradata/gradata
Length of output: 2875
🏁 Script executed:
# Search for graph usage in apply_rules
rg -n "graph\." src/gradata/rules/rule_engine.py | grep -A2 -B2 "apply_rules"Repository: Gradata/gradata
Length of output: 41
🏁 Script executed:
# Search for conflict filtering logic in rule_engine
rg -n "conflict|filter" src/gradata/rules/rule_engine.pyRepository: Gradata/gradata
Length of output: 1195
🏁 Script executed:
# Get the full conflict filtering section to confirm the hash function
sed -n '625,645p' src/gradata/rules/rule_engine.pyRepository: Gradata/gradata
Length of output: 965
🏁 Script executed:
# Check test assertion logic to see if it passes for other reasons
sed -n '46,77p' tests/test_rule_graph_integration.pyRepository: Gradata/gradata
Length of output: 1293
🏁 Script executed:
# Verify no utility function bridges the two hash methods
rg -n "def.*rule_id|hash" src/gradata/_core.py src/gradata/rules/rule_engine.py | head -30Repository: Gradata/gradata
Length of output: 1063
Production rule ID mismatch undermines conflict filtering validation.
The test uses _make_rule_id() consistently throughout (both creating conflicts and calling apply_rules), so it will pass. However, this masks a production bug: brain_correct() in _core.py (line 70) computes rule IDs using built-in hash(), while apply_rules() in rule_engine.py (line 634) uses _make_rule_id() with hashlib.sha256.
Conflicts recorded during actual corrections (via brain_correct) won't be recognized during rule application because the ID formats differ. The test should seed conflicts using the same method that brain_correct() uses, or both functions should be unified to use the same hashing algorithm.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_rule_graph_integration.py` around lines 46 - 77, The test exposes
a mismatch: rule IDs created by brain_correct (which uses built-in hash())
differ from IDs used by apply_rules/_make_rule_id (which uses hashlib.sha256),
so conflicts stored by RuleGraph aren't recognized; fix by unifying ID
creation—update brain_correct to call _make_rule_id (or refactor both to a
single helper) so both brain_correct and apply_rules use identical hashing, and
update any tests that seed conflicts to use _make_rule_id (or the chosen unified
helper) to ensure conflict seeding matches production ID format.
| assert len(tone_lessons) == 1 | ||
| assert tone_lessons[0].state.value == "INSTINCT" | ||
| assert tone_lessons[0].confidence == 0.40 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Use pytest.approx for float comparison.
Per test guidelines, floating-point comparisons should use pytest.approx.
assert len(tone_lessons) == 1
assert tone_lessons[0].state.value == "INSTINCT"
- assert tone_lessons[0].confidence == 0.40
+ assert tone_lessons[0].confidence == pytest.approx(0.40)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| assert len(tone_lessons) == 1 | |
| assert tone_lessons[0].state.value == "INSTINCT" | |
| assert tone_lessons[0].confidence == 0.40 | |
| assert len(tone_lessons) == 1 | |
| assert tone_lessons[0].state.value == "INSTINCT" | |
| assert tone_lessons[0].confidence == pytest.approx(0.40) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_sharing.py` around lines 150 - 152, Replace the direct float
equality assertion for tone_lessons[0].confidence with a pytest.approx
comparison: update the assertion that checks tone_lessons[0].confidence to use
pytest.approx(0.40). Ensure pytest is imported in the test module (add "import
pytest" if missing) and keep the existing checks for len(tone_lessons) and
tone_lessons[0].state.value unchanged.
Co-Authored-By: Gradata <noreply@gradata.ai>
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@Dockerfile`:
- Line 15: The current ENTRYPOINT runs a short-lived Python one-liner that
instantiates Brain and prints a message, so the container exits; replace it with
a command that launches a long-lived process. Update the Dockerfile ENTRYPOINT
to call a blocking/run method on the Brain class (e.g., Brain('/brain').run() or
.serve()) or invoke a module-level long-running entrypoint (e.g., python -m
gradata.brain) so PID 1 remains active; if Brain lacks a blocking method, add a
blocking start() or serve() method to the Brain class that starts the
service/worker loop and call that from ENTRYPOINT.
- Around line 1-15: Add a Docker HEALTHCHECK instruction to the Dockerfile so
orchestrators can detect unhealthy instances; modify the file that currently
sets ENTRYPOINT to run Brain('/brain') by appending a HEALTHCHECK that invokes a
lightweight check (e.g., a short python command or script that imports
gradata.brain and performs a minimal sanity call against Brain(BRAIN_DIR)) and
returns appropriate exit codes, and configure sensible options (interval,
timeout, start-period, retries). Target the existing ENTRYPOINT/ENV BRAIN_DIR
and ensure the health command uses the same BRAIN_DIR environment variable and
exits non-zero on failure so orchestrators mark the container unhealthy.
- Around line 1-15: The Dockerfile currently runs the container as root; create
a non-root runtime user and ensure /brain ownership: add a user (e.g.,
"gradata") and group via useradd/usermod during image build, chown the WORKDIR
(/app) and the default brain directory (/brain) to that user, and switch to that
user with the USER instruction before ENTRYPOINT so the ENTRYPOINT ("python",
"-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata
brain ready at /brain')") runs unprivileged; keep ENV BRAIN_DIR=/brain and
ensure any files copied into /app are accessible by the new user.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 252b155c-c8a2-44ca-bca1-e7343a9bfb0f
📒 Files selected for processing (2)
.dockerignoreDockerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Always read a file before editing it. Always prefer editing over creating new files.
Files:
Dockerfile
🪛 Checkov (3.2.513)
Dockerfile
[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[low] 1-15: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.14.0)
Dockerfile
[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.
(DL3059)
🪛 Trivy (0.69.3)
Dockerfile
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
[info] 1-1: No HEALTHCHECK defined
Add HEALTHCHECK instruction in your Dockerfile
Rule: DS-0026
(IaC/Dockerfile)
🔇 Additional comments (3)
.dockerignore (3)
1-6: LGTM: Standard development and Python artifact exclusions.These entries appropriately exclude version control, temporary files, and Python caches from the Docker build context.
8-13: LGTM: Standard exclusions for documentation, tests, and build artifacts.Excluding
docs/andtests/is standard practice for production container images. The build artifacts (*.egg-info,dist/,build/) and JavaScript dependencies (node_modules/) are appropriately excluded.
7-7: Remove or clarify thebrain/exclusion in.dockerignore— the directory does not exist in the codebase.The
brain/directory referenced in.dockerignoredoes not exist in the repository. In the codebase, "brain" refers to thegradata.brainmodule and theBrainclass used for learning and rule management, not a filesystem directory. The exclusion is harmless but serves no purpose unlessbrain/is expected to be created at runtime during the build process. Either remove it or add a comment clarifying what it excludes (e.g.,# Runtime brain state directory, if created).
| FROM python:3.12-slim | ||
|
|
||
| WORKDIR /app | ||
|
|
||
| COPY pyproject.toml README.md ./ | ||
| COPY src/ src/ | ||
|
|
||
| RUN pip install --no-cache-dir -e . | ||
|
|
||
| # Create a default brain directory | ||
| RUN mkdir -p /brain | ||
|
|
||
| ENV BRAIN_DIR=/brain | ||
|
|
||
| ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"] |
There was a problem hiding this comment.
Add a HEALTHCHECK for orchestration visibility.
If this image is intended to stay running, add HEALTHCHECK so orchestrators can detect unhealthy instances.
Example HEALTHCHECK
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+ CMD python -c "from pathlib import Path; p=Path('/brain'); raise SystemExit(0 if p.exists() else 1)"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| FROM python:3.12-slim | |
| WORKDIR /app | |
| COPY pyproject.toml README.md ./ | |
| COPY src/ src/ | |
| RUN pip install --no-cache-dir -e . | |
| # Create a default brain directory | |
| RUN mkdir -p /brain | |
| ENV BRAIN_DIR=/brain | |
| ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"] | |
| FROM python:3.12-slim | |
| WORKDIR /app | |
| COPY pyproject.toml README.md ./ | |
| COPY src/ src/ | |
| RUN pip install --no-cache-dir -e . | |
| # Create a default brain directory | |
| RUN mkdir -p /brain | |
| ENV BRAIN_DIR=/brain | |
| HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \ | |
| CMD python -c "from pathlib import Path; p=Path('/brain'); raise SystemExit(0 if p.exists() else 1)" | |
| ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"] |
🧰 Tools
🪛 Checkov (3.2.513)
[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[low] 1-15: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.14.0)
[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.
(DL3059)
🪛 Trivy (0.69.3)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
[info] 1-1: No HEALTHCHECK defined
Add HEALTHCHECK instruction in your Dockerfile
Rule: DS-0026
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` around lines 1 - 15, Add a Docker HEALTHCHECK instruction to the
Dockerfile so orchestrators can detect unhealthy instances; modify the file that
currently sets ENTRYPOINT to run Brain('/brain') by appending a HEALTHCHECK that
invokes a lightweight check (e.g., a short python command or script that imports
gradata.brain and performs a minimal sanity call against Brain(BRAIN_DIR)) and
returns appropriate exit codes, and configure sensible options (interval,
timeout, start-period, retries). Target the existing ENTRYPOINT/ENV BRAIN_DIR
and ensure the health command uses the same BRAIN_DIR environment variable and
exits non-zero on failure so orchestrators mark the container unhealthy.
Container runs as root; add a non-root runtime user.
Running as root increases blast radius for container escapes and file-system abuse. Add useradd + USER and ensure /brain ownership.
Harden runtime user
FROM python:3.12-slim
WORKDIR /app
@@
-RUN mkdir -p /brain
+RUN useradd --create-home --uid 10001 appuser \
+ && mkdir -p /brain \
+ && chown -R appuser:appuser /app /brain
ENV BRAIN_DIR=/brain
+USER appuser🧰 Tools
🪛 Checkov (3.2.513)
[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[low] 1-15: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.14.0)
[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.
(DL3059)
🪛 Trivy (0.69.3)
[error] 1-1: Image user should not be 'root'
Specify at least 1 USER command in Dockerfile with non-root user as argument
Rule: DS-0002
(IaC/Dockerfile)
[info] 1-1: No HEALTHCHECK defined
Add HEALTHCHECK instruction in your Dockerfile
Rule: DS-0026
(IaC/Dockerfile)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` around lines 1 - 15, The Dockerfile currently runs the container
as root; create a non-root runtime user and ensure /brain ownership: add a user
(e.g., "gradata") and group via useradd/usermod during image build, chown the
WORKDIR (/app) and the default brain directory (/brain) to that user, and switch
to that user with the USER instruction before ENTRYPOINT so the ENTRYPOINT
("python", "-c", "from gradata.brain import Brain; b = Brain('/brain');
print(f'Gradata brain ready at /brain')") runs unprivileged; keep ENV
BRAIN_DIR=/brain and ensure any files copied into /app are accessible by the new
user.
|
|
||
| ENV BRAIN_DIR=/brain | ||
|
|
||
| ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"] |
There was a problem hiding this comment.
Entrypoint exits immediately, so this “deployment” container won’t stay up.
python -c "…; print(...)" finishes and the container stops. For a deployable image, run a long-lived process (service/worker) as PID 1.
Suggested direction
-ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]
+# Keep init check separate (build/test), and run an actual long-lived runtime command.
+# Example:
+# CMD ["python", "-m", "gradata.mcp_server"]🧰 Tools
🪛 Checkov (3.2.513)
[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[low] 1-15: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@Dockerfile` at line 15, The current ENTRYPOINT runs a short-lived Python
one-liner that instantiates Brain and prints a message, so the container exits;
replace it with a command that launches a long-lived process. Update the
Dockerfile ENTRYPOINT to call a blocking/run method on the Brain class (e.g.,
Brain('/brain').run() or .serve()) or invoke a module-level long-running
entrypoint (e.g., python -m gradata.brain) so PID 1 remains active; if Brain
lacks a blocking method, add a blocking start() or serve() method to the Brain
class that starts the service/worker loop and call that from ENTRYPOINT.
Meta-rule discovery, grouping, and synthesis algorithms are now cloud-only. Open-source SDK keeps data models, formatting, ranking, validation, and storage API intact so cloud-generated meta-rules work seamlessly. - discover_meta_rules() → returns [] - refresh_meta_rules() → validates existing, skips rediscovery - merge_into_meta() → returns placeholder MetaRule - Removed: theme clusters, grouping algos, principle synthesis - Added TIER_UNIVERSAL constant, fixed Pyright Lesson import - 6 discovery-dependent tests skipped with cloud marker - 1438 tests pass, 17 skipped, 0 failures Co-Authored-By: Gradata <noreply@gradata.ai>
| rule_id = f"{rule.category}:{hash(rule.description) % 10000:04d}" | ||
| correction_id = f"{correction_category}:{hash(correction_desc) % 10000:04d}" |
There was a problem hiding this comment.
Conflict IDs use unstable
hash() — IDs never match _make_rule_id's SHA-256
_make_rule_id in rule_engine.py (line 298) uses hashlib.sha256 to build a stable, process-safe identifier:
desc_hash = int(hashlib.sha256(lesson.description.encode()).hexdigest(), 16)
return f"{lesson.category}:{desc_hash % 10000:04d}"The two lines here use Python's built-in hash(), which (a) uses a completely different algorithm and (b) is randomised by PYTHONHASHSEED — IDs change on every interpreter restart.
Net effect: every conflict edge written by _attribute_domain_fires has IDs that will never match those looked up by conflict_count inside apply_rules. The conflict-filtering step ("Step 5.5") is silently a no-op, both within a single run and definitely across restarts once the graph is persisted to rule_graph.json.
Fix — mirror _make_rule_id's formula using a small shared helper (avoids importing rule_engine here):
| rule_id = f"{rule.category}:{hash(rule.description) % 10000:04d}" | |
| correction_id = f"{correction_category}:{hash(correction_desc) % 10000:04d}" | |
| import hashlib as _hl | |
| def _sid(cat: str, desc: str) -> str: | |
| h = int(_hl.sha256(desc.encode()).hexdigest(), 16) | |
| return f"{cat}:{h % 10000:04d}" | |
| rule_id = _sid(rule.category, rule.description) | |
| correction_id = _sid(correction_category, correction_desc) |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 71-72
Comment:
**Conflict IDs use unstable `hash()` — IDs never match `_make_rule_id`'s SHA-256**
`_make_rule_id` in `rule_engine.py` (line 298) uses `hashlib.sha256` to build a stable, process-safe identifier:
```python
desc_hash = int(hashlib.sha256(lesson.description.encode()).hexdigest(), 16)
return f"{lesson.category}:{desc_hash % 10000:04d}"
```
The two lines here use Python's built-in `hash()`, which (a) uses a completely different algorithm and (b) is randomised by `PYTHONHASHSEED` — IDs change on every interpreter restart.
**Net effect:** every conflict edge written by `_attribute_domain_fires` has IDs that will *never* match those looked up by `conflict_count` inside `apply_rules`. The conflict-filtering step ("Step 5.5") is silently a no-op, both within a single run and definitely across restarts once the graph is persisted to `rule_graph.json`.
Fix — mirror `_make_rule_id`'s formula using a small shared helper (avoids importing `rule_engine` here):
```suggestion
import hashlib as _hl
def _sid(cat: str, desc: str) -> str:
h = int(_hl.sha256(desc.encode()).hexdigest(), 16)
return f"{cat}:{h % 10000:04d}"
rule_id = _sid(rule.category, rule.description)
correction_id = _sid(correction_category, correction_desc)
```
How can I resolve this? If you propose a fix, please make it concise.| return [f"Invalid JSON: {e}"] | ||
|
|
||
| required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"] | ||
| required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"] |
There was a problem hiding this comment.
Adding
"proof" to mandatory keys breaks existing manifests
validate_manifest() now returns "Missing required key: proof" for any brain.manifest.json written before this PR. Since the key is only populated by the new brain.manifest() call path, any brain that hasn't been re-manifested will silently fail validation — including marketplace-shared brains and CI artefacts.
Consider keeping "proof" as a soft check (warn instead of error) or splitting it off from required_keys:
| required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"] | |
| required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"] | |
| optional_keys = ["proof"] | |
| for k in required_keys: |
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_brain_manifest.py
Line: 194
Comment:
**Adding `"proof"` to mandatory keys breaks existing manifests**
`validate_manifest()` now returns `"Missing required key: proof"` for any `brain.manifest.json` written before this PR. Since the key is only populated by the new `brain.manifest()` call path, any brain that hasn't been re-manifested will silently fail validation — including marketplace-shared brains and CI artefacts.
Consider keeping `"proof"` as a soft check (warn instead of error) or splitting it off from `required_keys`:
```suggestion
required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"]
optional_keys = ["proof"]
for k in required_keys:
```
How can I resolve this? If you propose a fix, please make it concise.
Summary
effective_confidence = FSRS * Beta. Clean separation — FSRS owns global graduation, Beta owns per-domain reliability.Source
Sim 9-16 synthesis (16,000 posts, 1,600 agents, 8 Kenoodl meta-analyses)
Test plan
pytest tests/test_cusum.py -v— 6 changepoint testspytest tests/test_beta_scoring.py -v— 9 Beta scoring testspytest tests/test_kill_thresholds.py -v— 3 kill threshold testspytest tests/test_dynamic_similarity.py -v— 3 dynamic threshold testspytest tests/test_machine_mode.py -v— 17 machine mode tests (updated)pytest tests/ -q— full regression suite (1395 pass)pyright src/gradata/— 0 errorsGenerated with Gradata
Greptile Summary
This PR ships four well-scoped analytics improvements (CUSUM changepoint detection, Bayesian Beta domain scoring, inverted kill thresholds, dynamic similarity thresholds) plus three new public API methods (
prove,share,absorb) and a conflict-tracking rule graph. The test coverage is solid across all new features. One critical logic bug was found: the conflict IDs written by_attribute_domain_firesuse Python's built-inhash(), while the filtering code inapply_rulesqueries those same edges using SHA-256 (_make_rule_id). These two ID spaces are always disjoint, so the new conflict-filtering feature is silently a no-op._stats.py; docstring now accurately describes the successive-difference algorithm (prior PR thread resolved). Wired intoconvergence()correctly.beta_domain_reliability/effective_confidence/is_rule_disabled_for_domainare well-designed; prior thread concerns about the misfires clamp and guard ordering are addressed.CATEGORY_SIMILARITY_THRESHOLDSandget_similarity_threshold()cleanly integrate into the correction pipeline.RuleGraphpersists cleanly, but the conflict edge IDs recorded during correction are computed with a different hash than those used during rule injection (see inline comment).validate_manifestbackward compatibility: Adding"proof"to the mandatory key list will break validation for any existing manifest file written before this PR.Confidence Score: 3/5
The PR has one confirmed P0 logic bug (conflict IDs always miss) and one P1 backward-compat break (
validate_manifest). Core features (Beta scoring, CUSUM, kill thresholds, similarity) are correct and well-tested.The conflict-tracking feature shipped in this PR is silently a no-op due to the hash mismatch, which is a runtime correctness issue even if its impact is limited to the new rule-graph path. The manifest validation regression affects any user running
validate_manifeston existing brains. Both need fixes before merge to avoid shipping broken/regressive behaviour.src/gradata/_core.py(lines 71-72, hash mismatch) andsrc/gradata/_brain_manifest.py(line 194, mandatory proof key) need attention before merge.Important Files Changed
cusum_changepoints— correct standard-CUSUM algorithm with accurate docstring (prior thread concern resolved).trend_analysisdelegation is clean.beta_domain_reliability,effective_confidence) and conflict-graph filtering; scoring logic is correct but conflict IDs will never match those written by_core.py.RuleGraphclass — clean, well-tested, bidirectional conflict/co-occurrence tracking with graceful JSON persistence and corruption handling.hash()vs SHA-256 ID mismatch that makes conflict recording a no-op.proofto mandatory validation keys — breaks backward compatibility for existing manifests that predate this PR.CATEGORY_SIMILARITY_THRESHOLDSandget_similarity_threshold()— clean, well-motivated defaults.prove(),share(),absorb()public API and wires_rule_graphinto Brain init; proof embedded in manifest cleanly.TIER_UNIVERSALconstant added consistently.Flowchart
%%{init: {'theme': 'neutral'}}%% flowchart TD A[brain.correct\ndraft → final] --> B[_attribute_domain_fires] B --> C{CONTRADICTING?} C -- yes --> D[domain_scores misfires++] C -- yes --> E[rule_graph.add_conflict\nhash-based ID ⚠️] A --> F[Dynamic similarity\nget_similarity_threshold cat] F --> G{best_sim >= threshold?} G -- yes --> H[Reinforce existing lesson] G -- no --> I[Create new lesson] A --> J[brain.end_session] J --> K[Graduate lessons\nINSTINCT→PATTERN→RULE] K --> L[lesson.graduated event\n+ _graduation_message] K --> M[Meta-rule refresh\nnoop in OSS] N[brain.apply_rules] --> O[effective_confidence\nFSRS × Beta] N --> P[conflict_count filter\nSHA-256 ID ✓] P -- conflict_count >= 3 --> Q[Drop conflicting rule] E -. IDs never match .-> P R[brain.convergence] --> S[CUSUM changepoints] R --> T[edit_distance_trend] R --> U[CV-based converged vs no_signal] V[brain.prove] --> R V --> W[effort_ratio] X[brain.share] --> Y[Export PATTERN+RULE lessons] Z[brain.absorb] --> AA[Import as INSTINCT\nagent_type=shared]Comments Outside Diff (3)
src/gradata/_stats.py, line 1 (link)from __future__ import annotationsin modified SDK and test filesSeveral files touched by this PR are missing the required
from __future__ import annotationsheader. Nineteen existing test files and all other modified SDK modules (_core.py,self_improvement.py,rule_engine.py) already include it.Files that need it added as the very first non-comment line:
src/gradata/_stats.py(line 1) — introduceslist[int] | list[float]annotationsrc/gradata/_config.py(line 1) — introducesstr | Path | Noneannotation inreload_configtests/test_cusum.py(line 1)tests/test_beta_scoring.py(line 1)tests/test_dynamic_similarity.py(line 1)tests/test_kill_thresholds.py(line 1)Rule Used: # Code Review Rules
Rule 1: Never use print() ... (source)
Prompt To Fix With AI
src/gradata/_core.py, line 983-984 (link)"no_signal"— noisy categories are silently mislabeled"converged"The top-level trend correctly distinguishes a genuinely flat series from random noise using a coefficient-of-variation check (line 963:
trend = "converged" if cv < 0.5 else "no_signal"). The per-category block does not — it unconditionally falls through to"converged"whenever Mann-Kendall finds no trend.This matters because at line 185 the caller checks:
and at line 188 silently skips extraction for that category. A noisy category (high variance, no clear trend → should be
"no_signal") is therefore treated as converged, and its corrections are never extracted — a silent data loss path.Apply the same CV gate used for the top-level trend:
Prompt To Fix With AI
src/gradata/rules/rule_engine.py, line 549-556 (link)bus.emitnot wrapped in try-exceptCustom rule 4 requires all EventBus handler calls to be wrapped in a try-except block to prevent a failing handler from breaking the rule-injection pipeline. The
bus.emit("rule_scoped_out", {...})call here is unguarded.Rule Used: # Code Review Rules
Rule 1: Never use print() ... (source)
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (11): Last reviewed commit: "feat: stub meta-rules for open source — ..." | Re-trigger Greptile
Context used:
Rule 1: Never use print() ... (source)