Skip to content

feat: Sim 9 analytics — CUSUM, Beta scoring, kill threshold fix, dynamic similarity#16

Merged
Gradata merged 22 commits into
mainfrom
feat/sim9-analytics
Apr 7, 2026
Merged

feat: Sim 9 analytics — CUSUM, Beta scoring, kill threshold fix, dynamic similarity#16
Gradata merged 22 commits into
mainfrom
feat/sim9-analytics

Conversation

@Gradata

@Gradata Gradata commented Apr 6, 2026

Copy link
Copy Markdown
Owner

Summary

  • CUSUM changepoint detection: pure Python, wired into convergence() — "your brain clicked at session 14"
  • Bayesian Beta domain scoring: replaces naive misfire ratio with proper uncertainty bounds. Unified score: effective_confidence = FSRS * Beta. Clean separation — FSRS owns global graduation, Beta owns per-domain reliability.
  • Inverted kill thresholds: mature rules now get longer grace periods, infants die fast (was backwards — Sim 11 unanimous)
  • Dynamic similarity thresholds: ACCURACY/FACTUAL use 0.60, TONE/STYLE use 0.30 (was flat 0.35 — Sim 11)
  • Sim 17 holdout design: blind A/B sim definition for marketing proof

Source

Sim 9-16 synthesis (16,000 posts, 1,600 agents, 8 Kenoodl meta-analyses)

Test plan

  • pytest tests/test_cusum.py -v — 6 changepoint tests
  • pytest tests/test_beta_scoring.py -v — 9 Beta scoring tests
  • pytest tests/test_kill_thresholds.py -v — 3 kill threshold tests
  • pytest tests/test_dynamic_similarity.py -v — 3 dynamic threshold tests
  • pytest tests/test_machine_mode.py -v — 17 machine mode tests (updated)
  • pytest tests/ -q — full regression suite (1395 pass)
  • pyright src/gradata/ — 0 errors

Generated with Gradata

Greptile Summary

This PR ships four well-scoped analytics improvements (CUSUM changepoint detection, Bayesian Beta domain scoring, inverted kill thresholds, dynamic similarity thresholds) plus three new public API methods (prove, share, absorb) and a conflict-tracking rule graph. The test coverage is solid across all new features. One critical logic bug was found: the conflict IDs written by _attribute_domain_fires use Python's built-in hash(), while the filtering code in apply_rules queries those same edges using SHA-256 (_make_rule_id). These two ID spaces are always disjoint, so the new conflict-filtering feature is silently a no-op.

  • CUSUM: Clean pure-Python implementation in _stats.py; docstring now accurately describes the successive-difference algorithm (prior PR thread resolved). Wired into convergence() correctly.
  • Beta scoring: beta_domain_reliability / effective_confidence / is_rule_disabled_for_domain are well-designed; prior thread concerns about the misfires clamp and guard ordering are addressed.
  • Kill thresholds: Correctly inverted — infant rules now die faster (8 vs. old 15), stable rules live longest (20 vs. old 8).
  • Dynamic similarity: CATEGORY_SIMILARITY_THRESHOLDS and get_similarity_threshold() cleanly integrate into the correction pipeline.
  • Rule graph: RuleGraph persists cleanly, but the conflict edge IDs recorded during correction are computed with a different hash than those used during rule injection (see inline comment).
  • validate_manifest backward compatibility: Adding "proof" to the mandatory key list will break validation for any existing manifest file written before this PR.

Confidence Score: 3/5

The PR has one confirmed P0 logic bug (conflict IDs always miss) and one P1 backward-compat break (validate_manifest). Core features (Beta scoring, CUSUM, kill thresholds, similarity) are correct and well-tested.

The conflict-tracking feature shipped in this PR is silently a no-op due to the hash mismatch, which is a runtime correctness issue even if its impact is limited to the new rule-graph path. The manifest validation regression affects any user running validate_manifest on existing brains. Both need fixes before merge to avoid shipping broken/regressive behaviour.

src/gradata/_core.py (lines 71-72, hash mismatch) and src/gradata/_brain_manifest.py (line 194, mandatory proof key) need attention before merge.

Important Files Changed

Filename Overview
src/gradata/_stats.py Adds cusum_changepoints — correct standard-CUSUM algorithm with accurate docstring (prior thread concern resolved). trend_analysis delegation is clean.
src/gradata/rules/rule_engine.py Adds Beta domain scoring (beta_domain_reliability, effective_confidence) and conflict-graph filtering; scoring logic is correct but conflict IDs will never match those written by _core.py.
src/gradata/rules/rule_graph.py New RuleGraph class — clean, well-tested, bidirectional conflict/co-occurrence tracking with graceful JSON persistence and corruption handling.
src/gradata/_core.py Key pipeline improvements (dynamic similarity, edit-distance tracking, graduation notifications, share/absorb/prove); contains the hash() vs SHA-256 ID mismatch that makes conflict recording a no-op.
src/gradata/_brain_manifest.py Adds proof to mandatory validation keys — breaks backward compatibility for existing manifests that predate this PR.
src/gradata/_config.py Adds CATEGORY_SIMILARITY_THRESHOLDS and get_similarity_threshold() — clean, well-motivated defaults.
src/gradata/enhancements/self_improvement.py Kill-threshold inversion is correct and matches the spec (infant=8, stable=20 for human; 16/30 for machine). Machine limits remain strictly higher than human.
src/gradata/enhancements/meta_rules.py Meta-rule discovery/synthesis correctly stubbed out as Cloud-only no-ops; data model and storage API fully preserved for compatibility.
src/gradata/brain.py Adds prove(), share(), absorb() public API and wires _rule_graph into Brain init; proof embedded in manifest cleanly.
src/gradata/enhancements/super_meta_rules.py Super-meta-rule discovery correctly stubbed as Cloud-only; TIER_UNIVERSAL constant added consistently.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[brain.correct\ndraft → final] --> B[_attribute_domain_fires]
    B --> C{CONTRADICTING?}
    C -- yes --> D[domain_scores misfires++]
    C -- yes --> E[rule_graph.add_conflict\nhash-based ID ⚠️]
    A --> F[Dynamic similarity\nget_similarity_threshold cat]
    F --> G{best_sim >= threshold?}
    G -- yes --> H[Reinforce existing lesson]
    G -- no --> I[Create new lesson]
    A --> J[brain.end_session]
    J --> K[Graduate lessons\nINSTINCT→PATTERN→RULE]
    K --> L[lesson.graduated event\n+ _graduation_message]
    K --> M[Meta-rule refresh\nnoop in OSS]
    N[brain.apply_rules] --> O[effective_confidence\nFSRS × Beta]
    N --> P[conflict_count filter\nSHA-256 ID ✓]
    P -- conflict_count >= 3 --> Q[Drop conflicting rule]
    E -. IDs never match .-> P
    R[brain.convergence] --> S[CUSUM changepoints]
    R --> T[edit_distance_trend]
    R --> U[CV-based converged vs no_signal]
    V[brain.prove] --> R
    V --> W[effort_ratio]
    X[brain.share] --> Y[Export PATTERN+RULE lessons]
    Z[brain.absorb] --> AA[Import as INSTINCT\nagent_type=shared]
Loading

Comments Outside Diff (3)

  1. src/gradata/_stats.py, line 1 (link)

    P2 Missing from __future__ import annotations in modified SDK and test files

    Several files touched by this PR are missing the required from __future__ import annotations header. Nineteen existing test files and all other modified SDK modules (_core.py, self_improvement.py, rule_engine.py) already include it.

    Files that need it added as the very first non-comment line:

    • src/gradata/_stats.py (line 1) — introduces list[int] | list[float] annotation
    • src/gradata/_config.py (line 1) — introduces str | Path | None annotation in reload_config
    • tests/test_cusum.py (line 1)
    • tests/test_beta_scoring.py (line 1)
    • tests/test_dynamic_similarity.py (line 1)
    • tests/test_kill_thresholds.py (line 1)

    Rule Used: # Code Review Rules

    Rule 1: Never use print() ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_stats.py
    Line: 1
    
    Comment:
    **Missing `from __future__ import annotations` in modified SDK and test files**
    
    Several files touched by this PR are missing the required `from __future__ import annotations` header. Nineteen existing test files and all other modified SDK modules (`_core.py`, `self_improvement.py`, `rule_engine.py`) already include it.
    
    Files that need it added as the very first non-comment line:
    
    - `src/gradata/_stats.py` (line 1) — introduces `list[int] | list[float]` annotation
    - `src/gradata/_config.py` (line 1) — introduces `str | Path | None` annotation in `reload_config`
    - `tests/test_cusum.py` (line 1)
    - `tests/test_beta_scoring.py` (line 1)
    - `tests/test_dynamic_similarity.py` (line 1)
    - `tests/test_kill_thresholds.py` (line 1)
    
    
    
    **Rule Used:** # Code Review Rules
    
    ## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  2. src/gradata/_core.py, line 983-984 (link)

    P1 Per-category trend never emits "no_signal" — noisy categories are silently mislabeled "converged"

    The top-level trend correctly distinguishes a genuinely flat series from random noise using a coefficient-of-variation check (line 963: trend = "converged" if cv < 0.5 else "no_signal"). The per-category block does not — it unconditionally falls through to "converged" whenever Mann-Kendall finds no trend.

    This matters because at line 185 the caller checks:

    category_converged = cat_convergence.get("trend") == "converged"

    and at line 188 silently skips extraction for that category. A noisy category (high variance, no clear trend → should be "no_signal") is therefore treated as converged, and its corrections are never extracted — a silent data loss path.

    Apply the same CV gate used for the top-level trend:

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/_core.py
    Line: 983-984
    
    Comment:
    **Per-category trend never emits `"no_signal"` — noisy categories are silently mislabeled `"converged"`**
    
    The top-level trend correctly distinguishes a genuinely flat series from random noise using a coefficient-of-variation check (line 963: `trend = "converged" if cv < 0.5 else "no_signal"`). The per-category block does not — it unconditionally falls through to `"converged"` whenever Mann-Kendall finds no trend.
    
    This matters because at line 185 the caller checks:
    ```python
    category_converged = cat_convergence.get("trend") == "converged"
    ```
    and at line 188 silently skips extraction for that category. A noisy category (high variance, no clear trend → should be `"no_signal"`) is therefore treated as converged, and its corrections are never extracted — a silent data loss path.
    
    Apply the same CV gate used for the top-level trend:
    
    
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

  3. src/gradata/rules/rule_engine.py, line 549-556 (link)

    P1 bus.emit not wrapped in try-except

    Custom rule 4 requires all EventBus handler calls to be wrapped in a try-except block to prevent a failing handler from breaking the rule-injection pipeline. The bus.emit("rule_scoped_out", {...}) call here is unguarded.

    Rule Used: # Code Review Rules

    Rule 1: Never use print() ... (source)

    Prompt To Fix With AI
    This is a comment left during a code review.
    Path: src/gradata/rules/rule_engine.py
    Line: 549-556
    
    Comment:
    **`bus.emit` not wrapped in try-except**
    
    Custom rule 4 requires all EventBus handler calls to be wrapped in a try-except block to prevent a failing handler from breaking the rule-injection pipeline. The `bus.emit("rule_scoped_out", {...})` call here is unguarded.
    
    
    
    **Rule Used:** # Code Review Rules
    
    ## Rule 1: Never use print() ... ([source](https://app.greptile.com/review/custom-context?memory=dee613fe-ca52-4382-b9d7-fad6d0b079ec))
    
    How can I resolve this? If you propose a fix, please make it concise.

    Fix in Claude Code

Fix All in Claude Code

Prompt To Fix All With AI
This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 71-72

Comment:
**Conflict IDs use unstable `hash()` — IDs never match `_make_rule_id`'s SHA-256**

`_make_rule_id` in `rule_engine.py` (line 298) uses `hashlib.sha256` to build a stable, process-safe identifier:
```python
desc_hash = int(hashlib.sha256(lesson.description.encode()).hexdigest(), 16)
return f"{lesson.category}:{desc_hash % 10000:04d}"
```
The two lines here use Python's built-in `hash()`, which (a) uses a completely different algorithm and (b) is randomised by `PYTHONHASHSEED` — IDs change on every interpreter restart.

**Net effect:** every conflict edge written by `_attribute_domain_fires` has IDs that will *never* match those looked up by `conflict_count` inside `apply_rules`. The conflict-filtering step ("Step 5.5") is silently a no-op, both within a single run and definitely across restarts once the graph is persisted to `rule_graph.json`.

Fix — mirror `_make_rule_id`'s formula using a small shared helper (avoids importing `rule_engine` here):
```suggestion
                import hashlib as _hl
                def _sid(cat: str, desc: str) -> str:
                    h = int(_hl.sha256(desc.encode()).hexdigest(), 16)
                    return f"{cat}:{h % 10000:04d}"
                rule_id = _sid(rule.category, rule.description)
                correction_id = _sid(correction_category, correction_desc)
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_brain_manifest.py
Line: 194

Comment:
**Adding `"proof"` to mandatory keys breaks existing manifests**

`validate_manifest()` now returns `"Missing required key: proof"` for any `brain.manifest.json` written before this PR. Since the key is only populated by the new `brain.manifest()` call path, any brain that hasn't been re-manifested will silently fail validation — including marketplace-shared brains and CI artefacts.

Consider keeping `"proof"` as a soft check (warn instead of error) or splitting it off from `required_keys`:
```suggestion
    required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"]
    optional_keys = ["proof"]
    for k in required_keys:
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 114-122

Comment:
**Unused `old_state` parameter in `_graduation_message`**

The function accepts `old_state: str` and it is passed as `_graduation_message(old_s, l)`, but the body only inspects `lesson.state.value` (the *new* state). The parameter is dead weight and misleads readers into thinking the message might vary by the previous state.

Either use it (e.g. to say "promoted from INSTINCT to PATTERN") or remove it from the signature:
```suggestion
def _graduation_message(lesson: "Lesson") -> str:
    """Generate a user-facing graduation notification message."""
    if lesson.state.value == "PATTERN":
        return (f"You've corrected this {lesson.fire_count} times — "
                f"Gradata learned it: \"{lesson.description[:80]}\"")
    elif lesson.state.value == "RULE":
        return (f"Graduated to RULE: \"{lesson.description[:80]}\" — "
                f"this correction is now permanent ({lesson.confidence:.0%} confidence)")
    return f"Lesson updated: {lesson.description[:80]}"
```
And update the call site to `"message": _graduation_message(l)`.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (11): Last reviewed commit: "feat: stub meta-rules for open source — ..." | Re-trigger Greptile

Greptile also left 2 inline comments on this PR.

Context used:

  • Rule used - # Code Review Rules

Rule 1: Never use print() ... (source)

Oliver Le and others added 5 commits April 6, 2026 16:17
Co-Authored-By: Gradata <noreply@gradata.ai>
…nfidence

Replace naive misfires/fires ratio with Beta distribution lower bound
for domain scoping. Add effective_confidence = FSRS * Beta for ranking.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…al, looser for style

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
@Gradata Gradata changed the base branch from feat/sim9-improvements to main April 6, 2026 23:41
Comment thread src/gradata/_stats.py
Comment thread src/gradata/rules/rule_engine.py
Oliver Le and others added 6 commits April 6, 2026 16:51
Adds average edit distance per session and Mann-Kendall trend analysis
to brain.convergence() return value, enabling detection of whether AI
output is getting closer to user expectations over time.

Co-Authored-By: Gradata <noreply@gradata.ai>
Chaos brain exposed: random noise with no Mann-Kendall trend was
mapped to "converged." Now uses coefficient of variation: CV < 0.5
= genuinely stable (converged), CV >= 0.5 = noisy (no_signal).

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
Lightweight adjacency list (dict-of-dicts, persisted as JSON) that tracks
conflict and co-occurrence edges between rules. Data layer for future
rule injection filtering and meta-rule clustering.

Co-Authored-By: Gradata <noreply@gradata.ai>
…L, clear fired rules

- Replace duplicate _mann_kendall() + _normal_cdf() (~55 LOC) with delegation to _stats.trend_analysis()
- Push per-category grouping to SQL (was fetching all JSON blobs into Python)
- Remove unused json import from brain_convergence
- Init desc="" before try block (was using fragile locals() introspection)
- Clear _fired_rules after attribution (was unbounded growth)
- Apply no_signal/insufficient_data logic to per-category trends (was copy-paste divergence)
- Simplify _SEVERITY_SECONDS dict to single constant (only "moderate" was used)
- Type _fired_rules as list[Lesson]
- Clarify beta_domain_reliability docstring (fires vs misfires semantics)

Co-Authored-By: Gradata <noreply@gradata.ai>
Comment thread src/gradata/rules/rule_engine.py
Oliver Le and others added 7 commits April 6, 2026 17:35
… on temp dirs

SQLite WAL mode keeps file handles open during TemporaryDirectory cleanup
on Windows. Using ignore_cleanup_errors=True (Python 3.12+) fixes all 5
previously flaky tests.

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…rust

Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
…icate

Co-Authored-By: Gradata <noreply@gradata.ai>
…ld docs

- CUSUM: docstring now correctly states it uses successive differences,
  not mean deviations. Sensitive to step changes, not gradual trends.
- is_rule_disabled_for_domain: docstring now correctly states the 0.5
  success rate threshold (not 30% misfire rate).

Co-Authored-By: Gradata <noreply@gradata.ai>
- Add from __future__ import annotations to 6 files (_stats.py, _config.py,
  4 test files) for consistent annotation style
- Guard beta_domain_reliability against misfires > fires data corruption
  (clamp misfires to min(misfires, fires) before computing Beta)
- Per-category no_signal vs converged already fixed in prior commit

Co-Authored-By: Gradata <noreply@gradata.ai>
Comment on lines +447 to +453
if fires == 0 and misfires == 0:
return 1.0
misfires = min(misfires, fires) # enforce invariant: misfires <= fires
successes = fires - misfires
alpha = max(1, successes + 1)
beta_param = misfires + 1
return round(_beta_ppf_05(alpha, beta_param), 4)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 fires == 0 guard fires before the misfires clamp

The early-return if fires == 0 and misfires == 0 is evaluated before misfires = min(misfires, fires). So corrupted data like {"fires": 0, "misfires": 5} — which is exactly what the clamp exists to sanitise — bypasses the neutral return, then has its misfires zeroed to 0, and falls through to _beta_ppf_05(1, 1) → 0.2 instead of the intended 1.0.

In is_rule_disabled_for_domain the fires < 3 guard masks this for the kill-switch path, but a direct call like effective_confidence(0.92, fires=0, misfires=5) would silently apply a 0.2× multiplier and suppress the rule with no warning.

Fix: move the neutral guard to after the clamp, and widen it to any zero-fires case:

Suggested change
if fires == 0 and misfires == 0:
return 1.0
misfires = min(misfires, fires) # enforce invariant: misfires <= fires
successes = fires - misfires
alpha = max(1, successes + 1)
beta_param = misfires + 1
return round(_beta_ppf_05(alpha, beta_param), 4)
misfires = min(misfires, fires) # enforce invariant: misfires <= fires
if fires == 0: # no activations → no domain signal, stay neutral
return 1.0
successes = fires - misfires
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/rules/rule_engine.py
Line: 447-453

Comment:
**`fires == 0` guard fires before the misfires clamp**

The early-return `if fires == 0 and misfires == 0` is evaluated _before_ `misfires = min(misfires, fires)`. So corrupted data like `{"fires": 0, "misfires": 5}` — which is exactly what the clamp exists to sanitise — bypasses the neutral return, then has its `misfires` zeroed to 0, and falls through to `_beta_ppf_05(1, 1) → 0.2` instead of the intended `1.0`.

In `is_rule_disabled_for_domain` the `fires < 3` guard masks this for the kill-switch path, but a direct call like `effective_confidence(0.92, fires=0, misfires=5)` would silently apply a 0.2× multiplier and suppress the rule with no warning.

Fix: move the neutral guard to after the clamp, and widen it to any zero-fires case:

```suggestion
    misfires = min(misfires, fires)  # enforce invariant: misfires <= fires
    if fires == 0:  # no activations → no domain signal, stay neutral
        return 1.0
    successes = fires - misfires
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Oliver Le and others added 2 commits April 6, 2026 18:56
When a lesson promotes from INSTINCT to PATTERN or PATTERN to RULE,
the EventBus now emits a "lesson.graduated" event with a human-readable
message -- the "aha moment" for users seeing Gradata learn their patterns.

Co-Authored-By: Gradata <noreply@gradata.ai>
…ives

Export graduated rules (PATTERN + RULE) as portable packages for team
distribution. Absorbed rules enter as INSTINCT so the recipient brain
must validate them through its own correction cycle.

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai

coderabbitai Bot commented Apr 7, 2026

Copy link
Copy Markdown

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 6a2cb04a-6e2b-4c6f-8a14-2e372211527e

📥 Commits

Reviewing files that changed from the base of the PR and between 61ecdfc and 54138d9.

📒 Files selected for processing (7)
  • src/gradata/_core.py
  • src/gradata/enhancements/meta_rules.py
  • src/gradata/enhancements/super_meta_rules.py
  • tests/test_llm_synthesizer.py
  • tests/test_meta_rule_generalization.py
  • tests/test_meta_rules.py
  • tests/test_multi_brain_simulation.py

📝 Walkthrough

Walkthrough

Adds proof-generation, team rule-sharing, and a persisted RuleGraph; introduces Beta-based domain reliability and conflict filtering in the rule engine, CUSUM changepoint detection, per-category similarity thresholds, graduation events, adjusted kill thresholds, and many tests plus an integration pytest marker.

Changes

Cohort / File(s) Summary
Core logic & Brain API
src/gradata/_core.py, src/gradata/brain.py
Added proof/share/absorb APIs, lesson graduation messaging, rule-graph conflict recording/persistence, convergence/statistics changes (delegation to stats module), and efficiency/estimation updates.
Rule graph & engine
src/gradata/rules/rule_graph.py, src/gradata/rules/rule_engine.py
New RuleGraph persisted adjacency for conflicts/co_occurs; rule engine adds Beta-based domain reliability, effective_confidence, domain disablement via Beta lower-bound, optional graph param for conflict filtering and co-occurrence tracking.
Config & thresholds
src/gradata/_config.py, src/gradata/enhancements/self_improvement.py
Added CATEGORY_SIMILARITY_THRESHOLDS + get_similarity_threshold; adjusted KILL_LIMITS and MACHINE_KILL_LIMITS constants (maturity/machine thresholds changed).
Statistics
src/gradata/_stats.py
Added cusum_changepoints(...) and replaced internal Mann–Kendall with delegated trend analysis.
Manifest & validation
src/gradata/_brain_manifest.py, src/gradata/brain.py, pyproject.toml
Manifest now requires proof; Brain.manifest() includes proof (with error fallback); pyproject adds integration pytest marker.
Meta-rule feature gating
src/gradata/enhancements/meta_rules.py, src/gradata/enhancements/super_meta_rules.py
Replaced local meta/super-meta synthesis with cloud-gated no-op implementations and simplified refresh/validation paths; signatures accept **kwargs.
Tests — new & updated
tests/... (tests/test_beta_scoring.py, tests/test_rule_graph.py, tests/test_rule_graph_integration.py, tests/test_cusum.py, tests/test_dynamic_similarity.py, tests/test_edit_distance_convergence.py, tests/test_graduation_notification.py, tests/test_manifest_prove.py, tests/test_prove.py, tests/test_sharing.py, tests/test_integration_workflow.py, tests/test_kill_thresholds.py, tests/test_machine_mode.py, plus updates in tests/test_adaptations.py, tests/test_core_behavioral.py, tests/test_llm_synthesizer.py, tests/test_meta_rule_generalization.py, tests/test_meta_rules.py, tests/test_multi_brain_simulation.py)
Added many integration/unit tests for RuleGraph, Beta scoring, CUSUM, similarity thresholds, proof/share/absorb, graduation events, convergence/edit-distance metrics; several tests marked skipped or adjusted for cloud dependency; temporary-directory cleanup flag added.
Packaging / containers
.dockerignore, Dockerfile
Added .dockerignore and a minimal Dockerfile that installs package and instantiates Brain on startup.

Sequence Diagram(s)

sequenceDiagram
    participant Session
    participant Brain
    participant RuleEngine as Rule Engine
    participant RuleGraph as Rule Graph
    participant Stats as Stats Module

    Session->>Brain: correct(category, correction)
    Brain->>RuleEngine: apply_rules(fired_rules, scope, graph=RuleGraph)
    RuleEngine->>RuleGraph: conflict_count(rule_id, selected_rule_id)
    RuleGraph-->>RuleEngine: conflict frequency
    RuleEngine->>RuleGraph: add_co_occurrence(selected_rule_ids)
    RuleGraph-->>RuleEngine: ack
    Brain->>Brain: attribute misfire / record conflict
    Brain->>RuleGraph: add_conflict(rule_a, rule_b)
    Brain->>RuleGraph: save()
    Session->>Brain: end_session()
    Brain->>Stats: convergence(by_category=True)
    Stats->>Stats: cusum_changepoints(edit_distances)
    Stats-->>Brain: changepoints, trend, p_value
    Brain->>Brain: prove()
    Brain-->>Session: proof dict
Loading
sequenceDiagram
    participant BrainA as Brain A (Source)
    participant Share as Share
    participant Package as Package
    participant BrainB as Brain B (Recipient)
    participant Absorb as Absorb

    BrainA->>Share: brain_share()
    Share->>Share: filter graduated lessons, aggregate proof
    Share-->>Package: package{brain_id, rules, proof...}
    Package->>BrainB: deliver
    BrainB->>Absorb: brain_absorb(package)
    Absorb->>Absorb: deduplicate by similarity threshold
    Absorb->>Absorb: create INSTINCT lessons (conf=0.40, agent_type="shared")
    Absorb-->>BrainB: absorbed/skipped/total counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

Possibly related PRs

Suggested labels

feature, breaking-change

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 51.03% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The PR title accurately summarizes the main changes: four analytics-driven improvements (CUSUM changepoint detection, Beta domain scoring, kill threshold fixes, and dynamic similarity thresholds) from simulation runs 9-16.
Description check ✅ Passed The PR description comprehensively describes the four main analytics improvements (CUSUM changepoint detection, Bayesian Beta domain scoring, inverted kill thresholds, dynamic similarity thresholds) and supporting infrastructure changes, with clear alignment to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sim9-analytics

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot added the feature label Apr 7, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 19

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
src/gradata/brain.py (1)

436-440: ⚠️ Potential issue | 🟠 Major

apply_brain_rules doesn't pass graph parameter for conflict filtering.

The apply_rules function now accepts an optional graph: RuleGraph parameter for conflict filtering and co-occurrence tracking (see rule_engine.py changes). However, apply_brain_rules doesn't pass self._rule_graph:

return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx)))

This means conflict-based rule filtering is silently disabled when rules are applied via brain.apply_brain_rules(). If this is intentional, consider documenting it. Otherwise:

Proposed fix
-        return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx)))
+        return format_rules_for_prompt(apply_rules(lessons, build_scope(ctx), graph=self._rule_graph))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/brain.py` around lines 436 - 440, The call in apply_brain_rules
currently invokes apply_rules(lessons, build_scope(ctx)) and therefore omits the
new optional graph parameter; update that call to pass the instance RuleGraph
(self._rule_graph) so conflict filtering and co-occurrence tracking are enabled
(e.g., apply_rules(lessons, build_scope(ctx), self._rule_graph) or
apply_rules(lessons, build_scope(ctx), graph=self._rule_graph)); ensure you
reference the apply_brain_rules function and the self._rule_graph attribute when
making the change.
src/gradata/_brain_manifest.py (1)

1-27: ⚠️ Potential issue | 🟡 Minor

Missing from __future__ import annotations import.

This file is in src/gradata/ and per coding guidelines requires the annotations import for type safety.

Proposed fix
 """
 Brain Manifest Generator (SDK Layer).
 ...
 """
+from __future__ import annotations

 import json
 from datetime import UTC, datetime

As per coding guidelines: "src/gradata/**/*.py: ... type safety (from future import annotations required)".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_brain_manifest.py` around lines 1 - 27, The module is missing
the required future annotations import; add the line "from __future__ import
annotations" as the very first import in this file (before any other imports) to
satisfy the codebase type-safety rule and ensure postponed evaluation of
annotations for public API symbols in this module (e.g., used by
generate_manifest, validate_manifest, MANIFEST_PATH, BrainContext,
get_connection).
src/gradata/_core.py (1)

1-6: 🧹 Nitpick | 🔵 Trivial

File exceeds 500-line guideline (1267 lines).

Per coding guidelines, files should stay under 500 lines. While this module intentionally consolidates heavy methods from Brain to keep that class small, consider further splitting by domain:

  • _core_correct.py for correction logic
  • _core_convergence.py for convergence/efficiency/prove
  • _core_sharing.py for share/absorb

This is architectural advice for future maintainability.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 1 - 6, The file _core.py is too large;
split its domain responsibilities into smaller modules: create _core_correct.py
containing correction-related functions (correct, end_session, auto_evolve),
_core_convergence.py containing convergence/prove/efficiency logic (prove,
converge, any efficiency-related helpers), and _core_sharing.py containing
share/absorb and sharing helpers; move the corresponding functions and their
private helpers into those files, update imports/exports so existing callers
(Brain and other modules) import the same symbols from src.gradata._core (add
re-exporting imports in _core.py or update Brain to import directly from the new
modules), and ensure any shared utility functions keep their visibility (either
keep them in a small common helper module or import them where needed) so
behavior and tests remain unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/gradata/_brain_manifest.py`:
- Around line 194-197: The validator is requiring a "proof" key that
generate_manifest() does not produce, causing false "Missing required key:
proof" errors; update validate_manifest() to remove "proof" from required_keys
(leave Brain.manifest() to still append proof via m["proof"] = self.prove()), so
validation accepts manifests produced by generate_manifest() and
write_manifest() while Brain.manifest() can continue to include proof when
available.

In `@src/gradata/_core.py`:
- Line 1207: Replace the locally defined INITIAL_CONFIDENCE = 0.40 with the
shared constant already imported from the self_improvement module: remove the
local assignment and use the imported INITIAL_CONFIDENCE symbol everywhere in
this file (ensure any references in functions or classes that currently rely on
the local value simply reference the imported INITIAL_CONFIDENCE). This keeps a
single source of truth and prevents drift if the constant changes in
self_improvement.
- Around line 392-402: The type hint in _graduation_message references Lesson
but Lesson isn't imported under the TYPE_CHECKING guard; update the conditional
imports to include Lesson (add Lesson to the TYPE_CHECKING block where other
forward-only imports live) so the symbol is defined for static type checkers and
the pyright error is resolved.
- Around line 1257-1258: The code currently always writes back with
lessons_path.write_text(format_lessons(existing), encoding="utf-8") even when no
changes were made (absorbed == 0), causing unnecessary I/O; modify the
surrounding logic (the block that uses variables absorbed and existing and calls
format_lessons) to detect no-op cases (e.g., absorbed == 0 and no new rules were
appended to existing) and skip the write_text call in that case so the file is
only rewritten when existing was actually modified.

In `@src/gradata/_stats.py`:
- Around line 59-87: In cusum_changepoints, validate inputs before computing
mean/variance: ensure threshold is finite and > 0 (reject non-finite or <= 0)
and ensure all elements of data are finite numbers (no NaN/inf); if validation
fails return [] or raise ValueError per project convention. Update checks around
variables threshold, limit and data so the function rejects bad thresholds and
non-finite samples up front (before computing mean/variance/std_dev) to prevent
poisoned math and spurious changepoints.

In `@src/gradata/rules/rule_engine.py`:
- Around line 625-640: The conflict threshold (currently hardcoded as 3 in the
conflict filtering loop that uses graph.conflict_count(rule_id, sel_id)) should
be made configurable: introduce a named constant (e.g., CONFLICT_THRESHOLD = 3)
or a function parameter to the rule selection routine (the code around scored,
selected_ids, and _make_rule_id in rule_engine.py), replace the literal "3" with
that symbol, and ensure any callers that should control sensitivity can pass a
value or use the constant; update any relevant docstring or default to
CONFLICT_THRESHOLD so behavior remains the same unless tuned.

In `@src/gradata/rules/rule_graph.py`:
- Around line 71-76: save() currently writes directly to self._path and can
leave a corrupt file if interrupted; change it to write JSON to a temporary file
in the same directory (e.g., use a tmp name like
self._path.with_suffix(self._path.suffix + ".tmp") or tempfile in same dir),
flush/close, then atomically replace the target with os.replace or
pathlib.Path.replace. Use json.dumps(self._edges, indent=2) and ensure
encoding="utf-8" when writing the temp file; update save() to perform the
write-to-temp then atomic rename so subsequent reads of the constructor won't
see a half-written file.
- Around line 21-24: The persisted _edges currently use caller-supplied,
non-deterministic numeric IDs which break history; fix by generating stable,
namespaced IDs before persisting: replace the hash(description) % 10000 usage in
the caller (the code that calls add_conflict/add_co_occurrence) with
deterministic IDs (e.g., SHA256 hex of the description) or with a sequential
generator and always prefix IDs with a namespace like "rule:" and "correction:"
so keys stored in self._edges remain stable across restarts; ensure the same
deterministic ID creation is used everywhere that constructs node IDs and passed
into RuleGraph.add_conflict()/add_co_occurrence so persisted entries in _edges
map to the new runtime IDs.

In `@tests/test_beta_scoring.py`:
- Around line 64-66: The test test_effective_confidence_no_domain_data currently
asserts exact float equality against effective_confidence(fsrs_confidence=0.92,
domain_fires=0, domain_misfires=0); replace the exact equality with a
floating-point tolerant assertion using pytest.approx (e.g., assert score ==
pytest.approx(0.92)) to avoid brittle comparisons when testing the
effective_confidence function.
- Around line 22-24: Update the test test_beta_reliability_no_data to use
pytest.approx for float comparison: in the test function that calls
beta_domain_reliability(fires=0, misfires=0) replace the exact equality
assertion with an assertion using pytest.approx(1.0) and ensure pytest is
imported at the top of the test module if not already present so the test uses
pytest.approx instead of ==.

In `@tests/test_cusum.py`:
- Around line 37-47: Move the module import to the top of the test file:
relocate the line "from gradata.brain import Brain" so it's grouped with the
other imports at the top of tests/test_cusum.py rather than placed between test
functions; ensure test_convergence_includes_changepoints still instantiates
Brain(str(tmp_path)) and calls brain.convergence() unchanged after the import
move.

In `@tests/test_dynamic_similarity.py`:
- Around line 12-18: Replace the hardcoded 0.35 in the tests with the
SIMILARITY_THRESHOLD constant from _config: ensure tests test_default_threshold
and test_stylistic_categories_are_loose use SIMILARITY_THRESHOLD when asserting
get_similarity_threshold("UNKNOWN_CATEGORY") == SIMILARITY_THRESHOLD and when
comparing stylistic categories (for cat in ["DRAFTING","TONE","STYLE"]: assert
get_similarity_threshold(cat) <= SIMILARITY_THRESHOLD); also add the import for
SIMILARITY_THRESHOLD if it’s missing.

In `@tests/test_edit_distance_convergence.py`:
- Around line 10-24: The helper _make_brain currently uses tempfile.mkdtemp()
which leaves orphaned temp dirs; change _make_brain to accept a Path-like
directory parameter (or a pytest tmp_path fixture) instead of creating its own
temp dir, use that Path to write "lessons.md" and construct Brain(dir), and
update all tests in this file that call _make_brain to accept the pytest
tmp_path fixture and pass tmp_path (or tmp_path / "subdir") into _make_brain so
pytest will handle cleanup automatically; keep the function name _make_brain and
the Brain.emit calls unchanged.
- Around line 87-98: The test test_edit_distance_values_are_rounded currently
compares floats with exact equality (val == round(val, 4)); change the assertion
to use pytest.approx so floating-point rounding imprecision is tolerated — for
each val in result["edit_distance_per_session"] assert val ==
pytest.approx(round(val, 4), rel=0, abs=1e-4) (or equivalently assert val ==
pytest.approx(round(val,4), abs=1e-4)) and add an import for pytest if needed;
keep the rest of the test and the call to brain.convergence() unchanged.

In `@tests/test_graduation_notification.py`:
- Around line 24-28: The test currently uses a conditional guard (if graduated:)
which lets the test pass when no graduation event is present; change the logic
in tests/test_graduation_notification.py to assert the presence of graduation
events explicitly by adding assert len(graduated) > 0 (or assert graduated)
immediately before inspecting graduated[0] and its "message" contents; likewise
replace other conditional guards that check if events: with explicit assertions
that the events list is non-empty before performing payload checks (or mark
those tests pytest.mark.xfail if the behavior is known to be flaky).

In `@tests/test_kill_thresholds.py`:
- Around line 12-20: test_machine_kill_limits_also_inverted only checks INFANT <
STABLE which misses regressions in intermediate buckets; update it to assert
monotonic increase across MACHINE_KILL_LIMITS like
test_kill_limits_monotonically_increasing does for KILL_LIMITS: define the same
order = ["INFANT","ADOLESCENT","MATURE","STABLE"] and loop/parametrize to assert
MACHINE_KILL_LIMITS[order[i]] < MACHINE_KILL_LIMITS[order[i+1]] (or use
pytest.mark.parametrize to create per-pair assertions) so all adjacent
thresholds are validated rather than only endpoints.

In `@tests/test_rule_graph_integration.py`:
- Around line 41-43: The assertion graph.node_count >= 0 is vacuous; change the
test to assert meaningful outcomes of co-occurrence tracking: for example,
assert graph.node_count > 0 to ensure nodes were added or assert that the
expected co-occurrence edge exists (e.g., check for an edge between the two rule
node identifiers or that an edge with the co-occurrence attribute/count is
present). Update the test in tests/test_rule_graph_integration.py to reference
graph.node_count, graph.has_edge or graph.edges (and the specific node IDs or
edge attribute name used by your rule graph implementation) to validate that
co-occurrence was actually recorded.
- Around line 46-77: The test exposes a mismatch: rule IDs created by
brain_correct (which uses built-in hash()) differ from IDs used by
apply_rules/_make_rule_id (which uses hashlib.sha256), so conflicts stored by
RuleGraph aren't recognized; fix by unifying ID creation—update brain_correct to
call _make_rule_id (or refactor both to a single helper) so both brain_correct
and apply_rules use identical hashing, and update any tests that seed conflicts
to use _make_rule_id (or the chosen unified helper) to ensure conflict seeding
matches production ID format.

In `@tests/test_sharing.py`:
- Around line 150-152: Replace the direct float equality assertion for
tone_lessons[0].confidence with a pytest.approx comparison: update the assertion
that checks tone_lessons[0].confidence to use pytest.approx(0.40). Ensure pytest
is imported in the test module (add "import pytest" if missing) and keep the
existing checks for len(tone_lessons) and tone_lessons[0].state.value unchanged.

---

Outside diff comments:
In `@src/gradata/_brain_manifest.py`:
- Around line 1-27: The module is missing the required future annotations
import; add the line "from __future__ import annotations" as the very first
import in this file (before any other imports) to satisfy the codebase
type-safety rule and ensure postponed evaluation of annotations for public API
symbols in this module (e.g., used by generate_manifest, validate_manifest,
MANIFEST_PATH, BrainContext, get_connection).

In `@src/gradata/_core.py`:
- Around line 1-6: The file _core.py is too large; split its domain
responsibilities into smaller modules: create _core_correct.py containing
correction-related functions (correct, end_session, auto_evolve),
_core_convergence.py containing convergence/prove/efficiency logic (prove,
converge, any efficiency-related helpers), and _core_sharing.py containing
share/absorb and sharing helpers; move the corresponding functions and their
private helpers into those files, update imports/exports so existing callers
(Brain and other modules) import the same symbols from src.gradata._core (add
re-exporting imports in _core.py or update Brain to import directly from the new
modules), and ensure any shared utility functions keep their visibility (either
keep them in a small common helper module or import them where needed) so
behavior and tests remain unchanged.

In `@src/gradata/brain.py`:
- Around line 436-440: The call in apply_brain_rules currently invokes
apply_rules(lessons, build_scope(ctx)) and therefore omits the new optional
graph parameter; update that call to pass the instance RuleGraph
(self._rule_graph) so conflict filtering and co-occurrence tracking are enabled
(e.g., apply_rules(lessons, build_scope(ctx), self._rule_graph) or
apply_rules(lessons, build_scope(ctx), graph=self._rule_graph)); ensure you
reference the apply_brain_rules function and the self._rule_graph attribute when
making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3050b317-2a72-4a99-8fb7-4c118e791b85

📥 Commits

Reviewing files that changed from the base of the PR and between d079dd7 and 8849dfb.

📒 Files selected for processing (24)
  • pyproject.toml
  • src/gradata/_brain_manifest.py
  • src/gradata/_config.py
  • src/gradata/_core.py
  • src/gradata/_stats.py
  • src/gradata/brain.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/rules/rule_graph.py
  • tests/test_adaptations.py
  • tests/test_beta_scoring.py
  • tests/test_core_behavioral.py
  • tests/test_cusum.py
  • tests/test_dynamic_similarity.py
  • tests/test_edit_distance_convergence.py
  • tests/test_graduation_notification.py
  • tests/test_integration_workflow.py
  • tests/test_kill_thresholds.py
  • tests/test_machine_mode.py
  • tests/test_manifest_prove.py
  • tests/test_prove.py
  • tests/test_rule_graph.py
  • tests/test_rule_graph_integration.py
  • tests/test_sharing.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Always read a file before editing it. Always prefer editing over creating new files.

Files:

  • pyproject.toml
  • src/gradata/_brain_manifest.py
  • tests/test_adaptations.py
  • src/gradata/_config.py
  • tests/test_dynamic_similarity.py
  • tests/test_cusum.py
  • tests/test_kill_thresholds.py
  • tests/test_rule_graph_integration.py
  • tests/test_machine_mode.py
  • tests/test_rule_graph.py
  • tests/test_integration_workflow.py
  • tests/test_edit_distance_convergence.py
  • src/gradata/_stats.py
  • src/gradata/enhancements/self_improvement.py
  • tests/test_core_behavioral.py
  • tests/test_sharing.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/brain.py
  • tests/test_graduation_notification.py
  • tests/test_manifest_prove.py
  • tests/test_beta_scoring.py
  • tests/test_prove.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/_core.py
**/*.{py,sh,js,ts,json,env*}

📄 CodeRabbit inference engine (CLAUDE.md)

Never hardcode secrets. Never commit .env files. Pre-commit hook blocks both.

Files:

  • src/gradata/_brain_manifest.py
  • tests/test_adaptations.py
  • src/gradata/_config.py
  • tests/test_dynamic_similarity.py
  • tests/test_cusum.py
  • tests/test_kill_thresholds.py
  • tests/test_rule_graph_integration.py
  • tests/test_machine_mode.py
  • tests/test_rule_graph.py
  • tests/test_integration_workflow.py
  • tests/test_edit_distance_convergence.py
  • src/gradata/_stats.py
  • src/gradata/enhancements/self_improvement.py
  • tests/test_core_behavioral.py
  • tests/test_sharing.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/brain.py
  • tests/test_graduation_notification.py
  • tests/test_manifest_prove.py
  • tests/test_beta_scoring.py
  • tests/test_prove.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/_core.py
**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Keep files under 500 lines. Validate input at system boundaries.

Files:

  • src/gradata/_brain_manifest.py
  • tests/test_adaptations.py
  • src/gradata/_config.py
  • tests/test_dynamic_similarity.py
  • tests/test_cusum.py
  • tests/test_kill_thresholds.py
  • tests/test_rule_graph_integration.py
  • tests/test_machine_mode.py
  • tests/test_rule_graph.py
  • tests/test_integration_workflow.py
  • tests/test_edit_distance_convergence.py
  • src/gradata/_stats.py
  • src/gradata/enhancements/self_improvement.py
  • tests/test_core_behavioral.py
  • tests/test_sharing.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/brain.py
  • tests/test_graduation_notification.py
  • tests/test_manifest_prove.py
  • tests/test_beta_scoring.py
  • tests/test_prove.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/_core.py
src/gradata/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use AGPL-3.0 licensing. Source code resides in src/gradata/.

Files:

  • src/gradata/_brain_manifest.py
  • src/gradata/_config.py
  • src/gradata/_stats.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/brain.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/_core.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/_brain_manifest.py
  • src/gradata/_config.py
  • src/gradata/_stats.py
  • src/gradata/enhancements/self_improvement.py
  • src/gradata/rules/rule_engine.py
  • src/gradata/brain.py
  • src/gradata/rules/rule_graph.py
  • src/gradata/_core.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_adaptations.py
  • tests/test_dynamic_similarity.py
  • tests/test_cusum.py
  • tests/test_kill_thresholds.py
  • tests/test_rule_graph_integration.py
  • tests/test_machine_mode.py
  • tests/test_rule_graph.py
  • tests/test_integration_workflow.py
  • tests/test_edit_distance_convergence.py
  • tests/test_core_behavioral.py
  • tests/test_sharing.py
  • tests/test_graduation_notification.py
  • tests/test_manifest_prove.py
  • tests/test_beta_scoring.py
  • tests/test_prove.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T15:53:08.573Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T15:53:08.573Z
Learning: Correction pipeline: correct → extract behavioral instruction → graduate (INSTINCT→PATTERN→RULE) → inject into next session. Meta-rules emerge when patterns cluster across domains.

Applied to files:

  • src/gradata/_core.py
🪛 GitHub Actions: CI
src/gradata/_stats.py

[warning] 116-116: pyright: Import "scipy.stats" could not be resolved (reportMissingImports)


[warning] 126-126: pyright: Import "scipy.stats" could not be resolved (reportMissingImports)

src/gradata/brain.py

[warning] 727-727: pyright: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)

src/gradata/_core.py

[error] 393-393: pyright: "Lesson" is not defined (reportUndefinedVariable)

🔇 Additional comments (21)
src/gradata/_config.py (1)

70-84: LGTM!

The per-category similarity thresholds are well-structured with sensible defaults:

  • Factual categories (ACCURACY, FACTUAL) have stricter thresholds (0.60)
  • Stylistic categories (TONE, STYLE) have looser thresholds (0.30)
  • Unknown categories fall back to the global SIMILARITY_THRESHOLD (0.35)

The case-insensitive lookup via .upper() is a good defensive choice.

tests/test_rule_graph.py (2)

7-65: LGTM!

Excellent unit test coverage for RuleGraph:

  • Bidirectional conflict creation and counting
  • Co-occurrence tracking across multiple nodes
  • Edge cases (no conflict, empty conflicts, nonexistent nodes)
  • Persistence via save/load roundtrip
  • Graceful handling of corrupted JSON files

1-5: ⚠️ Potential issue | 🟡 Minor

Missing from __future__ import annotations import.

Proposed fix
 """Tests for rule graph — conflict and co-occurrence edges."""
+from __future__ import annotations
+
 from pathlib import Path

 from gradata.rules.rule_graph import RuleGraph
			> Likely an incorrect or invalid review comment.
src/gradata/brain.py (3)

74-79: LGTM!

Clean initialization:

  • _fired_rules properly typed with forward reference for Lesson
  • RuleGraph instantiated with persistent path for conflict/co-occurrence tracking

369-394: LGTM!

New public API methods (prove, share, absorb) follow the established delegation pattern to _core functions. Docstrings clearly explain the purpose and state transitions (e.g., shared rules enter as INSTINCT in the receiving brain).


833-842: LGTM!

Good defensive error handling for prove() failures. The fallback proof object correctly indicates the error state with proven: False and confidence_level: "error".

src/gradata/rules/rule_engine.py (4)

420-453: LGTM with minor observation.

The Beta distribution scoring implementation is sound:

  • _beta_ppf_05 uses normal approximation with proper clamping to [0, 1]
  • beta_domain_reliability correctly handles the neutral (0, 0) case
  • The misfires = min(misfires, fires) invariant enforcement on line 449 is a good defensive measure

Minor: Line 451's max(1, successes + 1) is always successes + 1 since successes >= 0 after line 450. The max(1, ...) is redundant but harmless.


456-467: LGTM!

effective_confidence cleanly combines global FSRS confidence with per-domain Beta reliability without double-counting, as documented. The multiplication of two [0,1] values stays in range, and the 4-decimal rounding is appropriate for confidence scores.


470-483: LGTM!

The updated is_rule_disabled_for_domain now uses the principled Beta distribution approach instead of naïve ratio checking. The 0.5 threshold (95% confident success rate is below 50%) is a reasonable conservative gate.


659-662: LGTM!

Co-occurrence tracking correctly fires only when multiple rules are selected, avoiding self-edges.

tests/test_rule_graph_integration.py (1)

1-4: ⚠️ Potential issue | 🟡 Minor

Missing from __future__ import annotations import.

Per coding guidelines, test files should follow the same conventions. Add the import for consistency.

Proposed fix
 """Tests for RuleGraph integration with the correction pipeline."""
+from __future__ import annotations
+
 from gradata.brain import Brain
 from gradata.rules.rule_graph import RuleGraph
			> Likely an incorrect or invalid review comment.
tests/test_integration_workflow.py (1)

1-11: LGTM!

Well-structured integration test suite with appropriate skip conditions for API key availability. The tests cover the key workflows (correction, convergence, efficiency) and validate expected response structures.

tests/test_manifest_prove.py (1)

1-44: LGTM!

Comprehensive tests for proof integration in manifest. Good coverage of success path, failure fallback, and disk persistence. The patch.object usage correctly tests the error handling path.

tests/test_sharing.py (1)

1-261: LGTM overall!

Comprehensive test coverage for the share/absorb functionality including edge cases like invalid correction types, duplicate detection, and round-trip behavior.

tests/test_prove.py (1)

1-93: LGTM!

Well-designed test suite with a clean _mock_convergence helper that provides sensible defaults. Good coverage of proof strength tiers (strong, moderate, insufficient) and evidence structure validation.

src/gradata/_core.py (6)

68-73: LGTM for rule graph conflict tracking.

The hash-based ID generation provides reasonable uniqueness for analytics purposes. The 4-digit modulo creates a manageable ID space while accepting some collision possibility.


229-231: LGTM for dynamic similarity thresholds.

Using per-category thresholds from configuration allows stricter matching for factual/accuracy categories while being more lenient for style categories.


323-337: LGTM for misfire attribution and rule graph persistence.

Good memory management by clearing _fired_rules after attribution. The silent exception handling for graph persistence is appropriate since it's non-critical analytics data.


462-474: LGTM for graduation notification emission.

Well-structured event payload with appropriate exception handling to prevent notification failures from breaking the session end flow.


870-887: LGTM for Mann-Kendall delegation.

Clean refactor to delegate statistical analysis to _stats.trend_analysis(). The p < 0.05 significance threshold is appropriate.


1066-1143: LGTM for proof generation logic.

Well-structured proof document with clear confidence level tiers. The evidence dictionary provides comprehensive metrics for debugging and auditing.

Comment on lines +194 to 197
required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"]
for k in required_keys:
if k not in manifest:
issues.append(f"Missing required key: {k}")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if generate_manifest() produces a "proof" key anywhere
rg -n "proof" src/gradata/_brain_manifest.py

# Check how validate_manifest is called in tests
rg -n "validate_manifest" tests/

Repository: Gradata/gradata

Length of output: 581


🏁 Script executed:

#!/bin/bash
# First, let's check the validate_manifest function signature
sed -n '180,210p' src/gradata/_brain_manifest.py

# Check the generate_manifest function to see what keys it returns
sed -n '46,100p' src/gradata/_brain_manifest.py

Repository: Gradata/gradata

Length of output: 2895


🏁 Script executed:

#!/bin/bash
# Check the test context around lines 315-316 and 328-329
sed -n '310,335p' tests/test_brain.py

Repository: Gradata/gradata

Length of output: 1000


🏁 Script executed:

#!/bin/bash
# Check if there's a default manifest or fixture being validated
rg -B 5 -A 2 "validate_manifest\(\)" tests/test_brain.py

Repository: Gradata/gradata

Length of output: 543


🏁 Script executed:

#!/bin/bash
# Look for the complete generate_manifest function to see all returned keys
ast-grep --pattern $'def generate_manifest($$$) {
  $$$
  return $_
}'

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

#!/bin/bash
# See the rest of generate_manifest() function
sed -n '46,163p' src/gradata/_brain_manifest.py

Repository: Gradata/gradata

Length of output: 4936


🏁 Script executed:

#!/bin/bash
# Find Brain.manifest() method in brain.py
rg -A 20 "def manifest\(" src/gradata/brain.py

Repository: Gradata/gradata

Length of output: 957


🏁 Script executed:

#!/bin/bash
# Find write_manifest function
rg -A 10 "def write_manifest" src/gradata/_brain_manifest.py

Repository: Gradata/gradata

Length of output: 507


🏁 Script executed:

#!/bin/bash
# Check if there are any direct calls to write_manifest(generate_manifest()) in the codebase
rg "write_manifest.*generate_manifest" src/ tests/

Repository: Gradata/gradata

Length of output: 167


🏁 Script executed:

#!/bin/bash
# Check all calls to generate_manifest to see if it's ever used outside of Brain.manifest()
rg -B 2 -A 2 "generate_manifest\(" src/ tests/

Repository: Gradata/gradata

Length of output: 2373


validate_manifest() requires "proof" key that generate_manifest() never produces.

generate_manifest() returns a manifest without a "proof" key. Only Brain.manifest() adds m["proof"] = self.prove() after calling generate_manifest() and before calling write_manifest().

This creates two problematic paths:

  1. write_manifest(generate_manifest()) writes a manifest lacking "proof", causing validate_manifest() to report "Missing required key: proof"
  2. write_manifest() with no arguments internally calls generate_manifest(), producing the same result

Since validate_manifest() reads the manifest from disk and checks for all required keys, only manifests created through Brain.manifest() will pass validation. Manifests from direct generate_manifest() calls will fail.

Either add "proof" generation to generate_manifest(), remove "proof" from required keys, or document that validation only applies to manifests created via Brain.manifest().

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_brain_manifest.py` around lines 194 - 197, The validator is
requiring a "proof" key that generate_manifest() does not produce, causing false
"Missing required key: proof" errors; update validate_manifest() to remove
"proof" from required_keys (leave Brain.manifest() to still append proof via
m["proof"] = self.prove()), so validation accepts manifests produced by
generate_manifest() and write_manifest() while Brain.manifest() can continue to
include proof when available.

Comment thread src/gradata/_core.py
Comment thread src/gradata/_core.py
from gradata.enhancements.self_improvement import format_lessons, parse_lessons
from gradata.enhancements.similarity import best_similarity

INITIAL_CONFIDENCE = 0.40

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider reusing INITIAL_CONFIDENCE from self_improvement module.

This locally defines INITIAL_CONFIDENCE = 0.40, but the same constant is imported from self_improvement at line 175. Using the shared constant prevents drift if the value changes.

-    INITIAL_CONFIDENCE = 0.40
+    from gradata.enhancements.self_improvement import INITIAL_CONFIDENCE
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
INITIAL_CONFIDENCE = 0.40
# Remove the line entirely and use the module-level imported INITIAL_CONFIDENCE
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` at line 1207, Replace the locally defined
INITIAL_CONFIDENCE = 0.40 with the shared constant already imported from the
self_improvement module: remove the local assignment and use the imported
INITIAL_CONFIDENCE symbol everywhere in this file (ensure any references in
functions or classes that currently rely on the local value simply reference the
imported INITIAL_CONFIDENCE). This keeps a single source of truth and prevents
drift if the constant changes in self_improvement.

Comment thread src/gradata/_core.py
Comment on lines +1257 to +1258
# Write back
lessons_path.write_text(format_lessons(existing), encoding="utf-8")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Minor: Consider skipping write when nothing changed.

The file is written even when absorbed == 0 and no rules were added. This is harmless but unnecessary I/O.

     # Write back
-    lessons_path.write_text(format_lessons(existing), encoding="utf-8")
+    if absorbed > 0:
+        lessons_path.write_text(format_lessons(existing), encoding="utf-8")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Write back
lessons_path.write_text(format_lessons(existing), encoding="utf-8")
# Write back
if absorbed > 0:
lessons_path.write_text(format_lessons(existing), encoding="utf-8")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_core.py` around lines 1257 - 1258, The code currently always
writes back with lessons_path.write_text(format_lessons(existing),
encoding="utf-8") even when no changes were made (absorbed == 0), causing
unnecessary I/O; modify the surrounding logic (the block that uses variables
absorbed and existing and calls format_lessons) to detect no-op cases (e.g.,
absorbed == 0 and no new rules were appended to existing) and skip the
write_text call in that case so the file is only rewritten when existing was
actually modified.

Comment thread src/gradata/_stats.py
Comment on lines +59 to +87
def cusum_changepoints(data: list[int] | list[float], threshold: float = 1.0) -> list[int]:
"""CUSUM changepoint detection for abrupt level shifts.

Accumulates consecutive differences (data[i] - data[i-1]). When
the cumulative sum exceeds threshold * std_dev, a changepoint is
recorded and the accumulator resets.

Note: uses successive differences, NOT deviations from the global
mean. Sensitive to sharp step changes (e.g. corrections dropping
from 10 to 2) but insensitive to slow gradual trends — use
trend_analysis / Mann-Kendall for those.

Args:
data: Time series of values.
threshold: Sensitivity in standard deviations. Default 1.0.

Returns:
Sorted list of indices where significant level shifts occurred.
"""
n = len(data)
if n < 3:
return []

mean = sum(data) / n
variance = sum((x - mean) ** 2 for x in data) / n
if variance == 0:
return []
std_dev = variance ** 0.5
limit = threshold * std_dev

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Validate CUSUM inputs before computing the control limit.

A negative or non-finite threshold turns limit into nonsense; with threshold < 0, the check at Line 98 can fire on the first iteration even when there is no changepoint. Non-finite samples also silently poison the variance math. Reject those inputs up front.

🛡️ Proposed fix
 def cusum_changepoints(data: list[int] | list[float], threshold: float = 1.0) -> list[int]:
     """CUSUM changepoint detection for abrupt level shifts.
@@
     Returns:
         Sorted list of indices where significant level shifts occurred.
     """
+    if not math.isfinite(threshold) or threshold < 0:
+        raise ValueError("threshold must be a finite non-negative number")
+    if any(not math.isfinite(x) for x in data):
+        raise ValueError("data must contain only finite numeric values")
+
     n = len(data)
     if n < 3:
         return []

As per coding guidelines, **/*.py: Validate input at system boundaries.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/gradata/_stats.py` around lines 59 - 87, In cusum_changepoints, validate
inputs before computing mean/variance: ensure threshold is finite and > 0
(reject non-finite or <= 0) and ensure all elements of data are finite numbers
(no NaN/inf); if validation fails return [] or raise ValueError per project
convention. Update checks around variables threshold, limit and data so the
function rejects bad thresholds and non-finite samples up front (before
computing mean/variance/std_dev) to prevent poisoned math and spurious
changepoints.

Comment on lines +24 to +28
graduated = [e for e in events if e.get("new_state") == "PATTERN"]
if graduated:
assert "message" in graduated[0]
msg = graduated[0]["message"].lower()
assert "learned it" in msg or "corrected" in msg

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider making graduation assertions mandatory rather than conditional.

The if graduated: guard means this test passes even if no graduation event fires. This could mask regressions where graduation stops working. Consider either:

  1. Adding assert len(graduated) > 0 before the payload checks, or
  2. Using pytest.mark.xfail if graduation is expected to be flaky

Same applies to the other tests with if events: guards (lines 46, 66, 89).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_graduation_notification.py` around lines 24 - 28, The test
currently uses a conditional guard (if graduated:) which lets the test pass when
no graduation event is present; change the logic in
tests/test_graduation_notification.py to assert the presence of graduation
events explicitly by adding assert len(graduated) > 0 (or assert graduated)
immediately before inspecting graduated[0] and its "message" contents; likewise
replace other conditional guards that check if events: with explicit assertions
that the events list is non-empty before performing payload checks (or mark
those tests pytest.mark.xfail if the behavior is known to be flaky).

Comment on lines +12 to +20
def test_machine_kill_limits_also_inverted():
assert MACHINE_KILL_LIMITS["STABLE"] > MACHINE_KILL_LIMITS["INFANT"]


def test_kill_limits_monotonically_increasing():
order = ["INFANT", "ADOLESCENT", "MATURE", "STABLE"]
for i in range(len(order) - 1):
assert KILL_LIMITS[order[i]] < KILL_LIMITS[order[i + 1]], \
f"{order[i]} ({KILL_LIMITS[order[i]]}) should be < {order[i+1]} ({KILL_LIMITS[order[i+1]]})"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Check machine thresholds monotonically, not just endpoints.

test_machine_kill_limits_also_inverted() only asserts INFANT < STABLE, so a regression in ADOLESCENT or MATURE would still pass. Reuse the ordered assertion for MACHINE_KILL_LIMITS, ideally via a parametrized helper shared with the human thresholds.

As per coding guidelines, tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness, parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_kill_thresholds.py` around lines 12 - 20,
test_machine_kill_limits_also_inverted only checks INFANT < STABLE which misses
regressions in intermediate buckets; update it to assert monotonic increase
across MACHINE_KILL_LIMITS like test_kill_limits_monotonically_increasing does
for KILL_LIMITS: define the same order =
["INFANT","ADOLESCENT","MATURE","STABLE"] and loop/parametrize to assert
MACHINE_KILL_LIMITS[order[i]] < MACHINE_KILL_LIMITS[order[i+1]] (or use
pytest.mark.parametrize to create per-pair assertions) so all adjacent
thresholds are validated rather than only endpoints.

Comment on lines +41 to +43

# Both rules should be applied, creating a co-occurrence edge
assert graph.node_count >= 0 # May or may not have edges depending on relevance

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Assertion graph.node_count >= 0 is always true.

node_count returns a non-negative integer by definition (it counts nodes). This assertion doesn't verify that co-occurrence tracking actually occurred.

Consider asserting that nodes were actually added, or that specific co-occurrence edges exist:

-    assert graph.node_count >= 0  # May or may not have edges depending on relevance
+    # Verify rules were tracked (even if not applied due to relevance filtering)
+    # If both rules pass scope, they should create co-occurrence
+    assert graph.node_count >= 0  # Basic sanity
+    # Consider: assert graph.node_count > 0 or check specific edges
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rule_graph_integration.py` around lines 41 - 43, The assertion
graph.node_count >= 0 is vacuous; change the test to assert meaningful outcomes
of co-occurrence tracking: for example, assert graph.node_count > 0 to ensure
nodes were added or assert that the expected co-occurrence edge exists (e.g.,
check for an edge between the two rule node identifiers or that an edge with the
co-occurrence attribute/count is present). Update the test in
tests/test_rule_graph_integration.py to reference graph.node_count,
graph.has_edge or graph.edges (and the specific node IDs or edge attribute name
used by your rule graph implementation) to validate that co-occurrence was
actually recorded.

Comment on lines +46 to +77
def test_conflict_filtering_drops_conflicting_rules():
"""apply_rules drops lower-ranked rules that conflict with higher-ranked ones."""
from gradata._types import Lesson, LessonState
from gradata._scope import RuleScope
from gradata.rules.rule_engine import apply_rules, _make_rule_id

# Build two lessons and derive their actual rule IDs
lesson_a = Lesson(
date="2026-04-06", state=LessonState.RULE, confidence=0.95,
category="DRAFTING", description="Use active voice",
)
lesson_b = Lesson(
date="2026-04-06", state=LessonState.RULE, confidence=0.90,
category="TONE", description="Be concise",
)
rule_id_a = _make_rule_id(lesson_a)
rule_id_b = _make_rule_id(lesson_b)

graph = RuleGraph()
# Simulate 3 past conflicts between these rules
for _ in range(3):
graph.add_conflict(rule_id_a, rule_id_b)

rules = [lesson_a, lesson_b]
scope = RuleScope()

# Without graph — both rules appear
results_no_graph = apply_rules(rules, scope)
# With graph — conflicting lower-ranked rule should be dropped
results_with_graph = apply_rules(rules, scope, graph=graph)

assert len(results_no_graph) >= len(results_with_graph)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify the hash mismatch between rule_engine.py and _core.py
echo "=== rule_engine.py _make_rule_id ==="
rg -n "def _make_rule_id|hashlib|hash\(" src/gradata/rules/rule_engine.py

echo ""
echo "=== _core.py rule_id computation ==="
rg -n "rule_id\s*=" src/gradata/_core.py | head -20

Repository: Gradata/gradata

Length of output: 367


🏁 Script executed:

# Get full _make_rule_id implementation
sed -n '285,310p' src/gradata/rules/rule_engine.py

Repository: Gradata/gradata

Length of output: 855


🏁 Script executed:

# Get context around _core.py line 70 to understand the function
sed -n '60,85p' src/gradata/_core.py

Repository: Gradata/gradata

Length of output: 1199


🏁 Script executed:

# Check how apply_rules function is implemented and which rule ID method it uses
rg -n "def apply_rules" src/gradata/rules/rule_engine.py -A 30

Repository: Gradata/gradata

Length of output: 1626


🏁 Script executed:

# Check RuleGraph.add_conflict method to see what format it expects
rg -n "def add_conflict" -A 10

Repository: Gradata/gradata

Length of output: 988


🏁 Script executed:

# Get more of apply_rules to see how graph is used
sed -n '486,550p' src/gradata/rules/rule_engine.py

Repository: Gradata/gradata

Length of output: 2875


🏁 Script executed:

# Search for graph usage in apply_rules
rg -n "graph\." src/gradata/rules/rule_engine.py | grep -A2 -B2 "apply_rules"

Repository: Gradata/gradata

Length of output: 41


🏁 Script executed:

# Search for conflict filtering logic in rule_engine
rg -n "conflict|filter" src/gradata/rules/rule_engine.py

Repository: Gradata/gradata

Length of output: 1195


🏁 Script executed:

# Get the full conflict filtering section to confirm the hash function
sed -n '625,645p' src/gradata/rules/rule_engine.py

Repository: Gradata/gradata

Length of output: 965


🏁 Script executed:

# Check test assertion logic to see if it passes for other reasons
sed -n '46,77p' tests/test_rule_graph_integration.py

Repository: Gradata/gradata

Length of output: 1293


🏁 Script executed:

# Verify no utility function bridges the two hash methods
rg -n "def.*rule_id|hash" src/gradata/_core.py src/gradata/rules/rule_engine.py | head -30

Repository: Gradata/gradata

Length of output: 1063


Production rule ID mismatch undermines conflict filtering validation.

The test uses _make_rule_id() consistently throughout (both creating conflicts and calling apply_rules), so it will pass. However, this masks a production bug: brain_correct() in _core.py (line 70) computes rule IDs using built-in hash(), while apply_rules() in rule_engine.py (line 634) uses _make_rule_id() with hashlib.sha256.

Conflicts recorded during actual corrections (via brain_correct) won't be recognized during rule application because the ID formats differ. The test should seed conflicts using the same method that brain_correct() uses, or both functions should be unified to use the same hashing algorithm.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_rule_graph_integration.py` around lines 46 - 77, The test exposes
a mismatch: rule IDs created by brain_correct (which uses built-in hash())
differ from IDs used by apply_rules/_make_rule_id (which uses hashlib.sha256),
so conflicts stored by RuleGraph aren't recognized; fix by unifying ID
creation—update brain_correct to call _make_rule_id (or refactor both to a
single helper) so both brain_correct and apply_rules use identical hashing, and
update any tests that seed conflicts to use _make_rule_id (or the chosen unified
helper) to ensure conflict seeding matches production ID format.

Comment thread tests/test_sharing.py
Comment on lines +150 to +152
assert len(tone_lessons) == 1
assert tone_lessons[0].state.value == "INSTINCT"
assert tone_lessons[0].confidence == 0.40

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Use pytest.approx for float comparison.

Per test guidelines, floating-point comparisons should use pytest.approx.

     assert len(tone_lessons) == 1
     assert tone_lessons[0].state.value == "INSTINCT"
-    assert tone_lessons[0].confidence == 0.40
+    assert tone_lessons[0].confidence == pytest.approx(0.40)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
assert len(tone_lessons) == 1
assert tone_lessons[0].state.value == "INSTINCT"
assert tone_lessons[0].confidence == 0.40
assert len(tone_lessons) == 1
assert tone_lessons[0].state.value == "INSTINCT"
assert tone_lessons[0].confidence == pytest.approx(0.40)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_sharing.py` around lines 150 - 152, Replace the direct float
equality assertion for tone_lessons[0].confidence with a pytest.approx
comparison: update the assertion that checks tone_lessons[0].confidence to use
pytest.approx(0.40). Ensure pytest is imported in the test module (add "import
pytest" if missing) and keep the existing checks for len(tone_lessons) and
tone_lessons[0].state.value unchanged.

Co-Authored-By: Gradata <noreply@gradata.ai>
@coderabbitai coderabbitai Bot added bug Something isn't working and removed feature labels Apr 7, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@Dockerfile`:
- Line 15: The current ENTRYPOINT runs a short-lived Python one-liner that
instantiates Brain and prints a message, so the container exits; replace it with
a command that launches a long-lived process. Update the Dockerfile ENTRYPOINT
to call a blocking/run method on the Brain class (e.g., Brain('/brain').run() or
.serve()) or invoke a module-level long-running entrypoint (e.g., python -m
gradata.brain) so PID 1 remains active; if Brain lacks a blocking method, add a
blocking start() or serve() method to the Brain class that starts the
service/worker loop and call that from ENTRYPOINT.
- Around line 1-15: Add a Docker HEALTHCHECK instruction to the Dockerfile so
orchestrators can detect unhealthy instances; modify the file that currently
sets ENTRYPOINT to run Brain('/brain') by appending a HEALTHCHECK that invokes a
lightweight check (e.g., a short python command or script that imports
gradata.brain and performs a minimal sanity call against Brain(BRAIN_DIR)) and
returns appropriate exit codes, and configure sensible options (interval,
timeout, start-period, retries). Target the existing ENTRYPOINT/ENV BRAIN_DIR
and ensure the health command uses the same BRAIN_DIR environment variable and
exits non-zero on failure so orchestrators mark the container unhealthy.
- Around line 1-15: The Dockerfile currently runs the container as root; create
a non-root runtime user and ensure /brain ownership: add a user (e.g.,
"gradata") and group via useradd/usermod during image build, chown the WORKDIR
(/app) and the default brain directory (/brain) to that user, and switch to that
user with the USER instruction before ENTRYPOINT so the ENTRYPOINT ("python",
"-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata
brain ready at /brain')") runs unprivileged; keep ENV BRAIN_DIR=/brain and
ensure any files copied into /app are accessible by the new user.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 252b155c-c8a2-44ca-bca1-e7343a9bfb0f

📥 Commits

Reviewing files that changed from the base of the PR and between 8849dfb and 61ecdfc.

📒 Files selected for processing (2)
  • .dockerignore
  • Dockerfile
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (1)
**/*

📄 CodeRabbit inference engine (CLAUDE.md)

Always read a file before editing it. Always prefer editing over creating new files.

Files:

  • Dockerfile
🪛 Checkov (3.2.513)
Dockerfile

[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-15: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.14.0)
Dockerfile

[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.

(DL3059)

🪛 Trivy (0.69.3)
Dockerfile

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)


[info] 1-1: No HEALTHCHECK defined

Add HEALTHCHECK instruction in your Dockerfile

Rule: DS-0026

Learn more

(IaC/Dockerfile)

🔇 Additional comments (3)
.dockerignore (3)

1-6: LGTM: Standard development and Python artifact exclusions.

These entries appropriately exclude version control, temporary files, and Python caches from the Docker build context.


8-13: LGTM: Standard exclusions for documentation, tests, and build artifacts.

Excluding docs/ and tests/ is standard practice for production container images. The build artifacts (*.egg-info, dist/, build/) and JavaScript dependencies (node_modules/) are appropriately excluded.


7-7: Remove or clarify the brain/ exclusion in .dockerignore — the directory does not exist in the codebase.

The brain/ directory referenced in .dockerignore does not exist in the repository. In the codebase, "brain" refers to the gradata.brain module and the Brain class used for learning and rule management, not a filesystem directory. The exclusion is harmless but serves no purpose unless brain/ is expected to be created at runtime during the build process. Either remove it or add a comment clarifying what it excludes (e.g., # Runtime brain state directory, if created).

Comment thread Dockerfile
Comment on lines +1 to +15
FROM python:3.12-slim

WORKDIR /app

COPY pyproject.toml README.md ./
COPY src/ src/

RUN pip install --no-cache-dir -e .

# Create a default brain directory
RUN mkdir -p /brain

ENV BRAIN_DIR=/brain

ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Add a HEALTHCHECK for orchestration visibility.

If this image is intended to stay running, add HEALTHCHECK so orchestrators can detect unhealthy instances.

Example HEALTHCHECK
+HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
+  CMD python -c "from pathlib import Path; p=Path('/brain'); raise SystemExit(0 if p.exists() else 1)"
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir -e .
# Create a default brain directory
RUN mkdir -p /brain
ENV BRAIN_DIR=/brain
ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]
FROM python:3.12-slim
WORKDIR /app
COPY pyproject.toml README.md ./
COPY src/ src/
RUN pip install --no-cache-dir -e .
# Create a default brain directory
RUN mkdir -p /brain
ENV BRAIN_DIR=/brain
HEALTHCHECK --interval=30s --timeout=5s --start-period=10s --retries=3 \
CMD python -c "from pathlib import Path; p=Path('/brain'); raise SystemExit(0 if p.exists() else 1)"
ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]
🧰 Tools
🪛 Checkov (3.2.513)

[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-15: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.14.0)

[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.

(DL3059)

🪛 Trivy (0.69.3)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)


[info] 1-1: No HEALTHCHECK defined

Add HEALTHCHECK instruction in your Dockerfile

Rule: DS-0026

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 1 - 15, Add a Docker HEALTHCHECK instruction to the
Dockerfile so orchestrators can detect unhealthy instances; modify the file that
currently sets ENTRYPOINT to run Brain('/brain') by appending a HEALTHCHECK that
invokes a lightweight check (e.g., a short python command or script that imports
gradata.brain and performs a minimal sanity call against Brain(BRAIN_DIR)) and
returns appropriate exit codes, and configure sensible options (interval,
timeout, start-period, retries). Target the existing ENTRYPOINT/ENV BRAIN_DIR
and ensure the health command uses the same BRAIN_DIR environment variable and
exits non-zero on failure so orchestrators mark the container unhealthy.

⚠️ Potential issue | 🟠 Major

Container runs as root; add a non-root runtime user.

Running as root increases blast radius for container escapes and file-system abuse. Add useradd + USER and ensure /brain ownership.

Harden runtime user
 FROM python:3.12-slim
 
 WORKDIR /app
@@
-RUN mkdir -p /brain
+RUN useradd --create-home --uid 10001 appuser \
+ && mkdir -p /brain \
+ && chown -R appuser:appuser /app /brain
 
 ENV BRAIN_DIR=/brain
+USER appuser
🧰 Tools
🪛 Checkov (3.2.513)

[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-15: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.14.0)

[info] 11-11: Multiple consecutive RUN instructions. Consider consolidation.

(DL3059)

🪛 Trivy (0.69.3)

[error] 1-1: Image user should not be 'root'

Specify at least 1 USER command in Dockerfile with non-root user as argument

Rule: DS-0002

Learn more

(IaC/Dockerfile)


[info] 1-1: No HEALTHCHECK defined

Add HEALTHCHECK instruction in your Dockerfile

Rule: DS-0026

Learn more

(IaC/Dockerfile)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` around lines 1 - 15, The Dockerfile currently runs the container
as root; create a non-root runtime user and ensure /brain ownership: add a user
(e.g., "gradata") and group via useradd/usermod during image build, chown the
WORKDIR (/app) and the default brain directory (/brain) to that user, and switch
to that user with the USER instruction before ENTRYPOINT so the ENTRYPOINT
("python", "-c", "from gradata.brain import Brain; b = Brain('/brain');
print(f'Gradata brain ready at /brain')") runs unprivileged; keep ENV
BRAIN_DIR=/brain and ensure any files copied into /app are accessible by the new
user.

Comment thread Dockerfile

ENV BRAIN_DIR=/brain

ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Entrypoint exits immediately, so this “deployment” container won’t stay up.

python -c "…; print(...)" finishes and the container stops. For a deployable image, run a long-lived process (service/worker) as PID 1.

Suggested direction
-ENTRYPOINT ["python", "-c", "from gradata.brain import Brain; b = Brain('/brain'); print(f'Gradata brain ready at /brain')"]
+# Keep init check separate (build/test), and run an actual long-lived runtime command.
+# Example:
+# CMD ["python", "-m", "gradata.mcp_server"]
🧰 Tools
🪛 Checkov (3.2.513)

[low] 1-15: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[low] 1-15: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Dockerfile` at line 15, The current ENTRYPOINT runs a short-lived Python
one-liner that instantiates Brain and prints a message, so the container exits;
replace it with a command that launches a long-lived process. Update the
Dockerfile ENTRYPOINT to call a blocking/run method on the Brain class (e.g.,
Brain('/brain').run() or .serve()) or invoke a module-level long-running
entrypoint (e.g., python -m gradata.brain) so PID 1 remains active; if Brain
lacks a blocking method, add a blocking start() or serve() method to the Brain
class that starts the service/worker loop and call that from ENTRYPOINT.

Meta-rule discovery, grouping, and synthesis algorithms are now cloud-only.
Open-source SDK keeps data models, formatting, ranking, validation, and
storage API intact so cloud-generated meta-rules work seamlessly.

- discover_meta_rules() → returns []
- refresh_meta_rules() → validates existing, skips rediscovery
- merge_into_meta() → returns placeholder MetaRule
- Removed: theme clusters, grouping algos, principle synthesis
- Added TIER_UNIVERSAL constant, fixed Pyright Lesson import
- 6 discovery-dependent tests skipped with cloud marker
- 1438 tests pass, 17 skipped, 0 failures

Co-Authored-By: Gradata <noreply@gradata.ai>
@Gradata Gradata merged commit 041e63f into main Apr 7, 2026
5 of 6 checks passed
@Gradata Gradata deleted the feat/sim9-analytics branch April 7, 2026 09:26
@coderabbitai coderabbitai Bot added feature breaking-change and removed bug Something isn't working labels Apr 7, 2026
Comment thread src/gradata/_core.py
Comment on lines +71 to +72
rule_id = f"{rule.category}:{hash(rule.description) % 10000:04d}"
correction_id = f"{correction_category}:{hash(correction_desc) % 10000:04d}"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Conflict IDs use unstable hash() — IDs never match _make_rule_id's SHA-256

_make_rule_id in rule_engine.py (line 298) uses hashlib.sha256 to build a stable, process-safe identifier:

desc_hash = int(hashlib.sha256(lesson.description.encode()).hexdigest(), 16)
return f"{lesson.category}:{desc_hash % 10000:04d}"

The two lines here use Python's built-in hash(), which (a) uses a completely different algorithm and (b) is randomised by PYTHONHASHSEED — IDs change on every interpreter restart.

Net effect: every conflict edge written by _attribute_domain_fires has IDs that will never match those looked up by conflict_count inside apply_rules. The conflict-filtering step ("Step 5.5") is silently a no-op, both within a single run and definitely across restarts once the graph is persisted to rule_graph.json.

Fix — mirror _make_rule_id's formula using a small shared helper (avoids importing rule_engine here):

Suggested change
rule_id = f"{rule.category}:{hash(rule.description) % 10000:04d}"
correction_id = f"{correction_category}:{hash(correction_desc) % 10000:04d}"
import hashlib as _hl
def _sid(cat: str, desc: str) -> str:
h = int(_hl.sha256(desc.encode()).hexdigest(), 16)
return f"{cat}:{h % 10000:04d}"
rule_id = _sid(rule.category, rule.description)
correction_id = _sid(correction_category, correction_desc)
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_core.py
Line: 71-72

Comment:
**Conflict IDs use unstable `hash()` — IDs never match `_make_rule_id`'s SHA-256**

`_make_rule_id` in `rule_engine.py` (line 298) uses `hashlib.sha256` to build a stable, process-safe identifier:
```python
desc_hash = int(hashlib.sha256(lesson.description.encode()).hexdigest(), 16)
return f"{lesson.category}:{desc_hash % 10000:04d}"
```
The two lines here use Python's built-in `hash()`, which (a) uses a completely different algorithm and (b) is randomised by `PYTHONHASHSEED` — IDs change on every interpreter restart.

**Net effect:** every conflict edge written by `_attribute_domain_fires` has IDs that will *never* match those looked up by `conflict_count` inside `apply_rules`. The conflict-filtering step ("Step 5.5") is silently a no-op, both within a single run and definitely across restarts once the graph is persisted to `rule_graph.json`.

Fix — mirror `_make_rule_id`'s formula using a small shared helper (avoids importing `rule_engine` here):
```suggestion
                import hashlib as _hl
                def _sid(cat: str, desc: str) -> str:
                    h = int(_hl.sha256(desc.encode()).hexdigest(), 16)
                    return f"{cat}:{h % 10000:04d}"
                rule_id = _sid(rule.category, rule.description)
                correction_id = _sid(correction_category, correction_desc)
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

return [f"Invalid JSON: {e}"]

required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"]
required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"]

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Adding "proof" to mandatory keys breaks existing manifests

validate_manifest() now returns "Missing required key: proof" for any brain.manifest.json written before this PR. Since the key is only populated by the new brain.manifest() call path, any brain that hasn't been re-manifested will silently fail validation — including marketplace-shared brains and CI artefacts.

Consider keeping "proof" as a soft check (warn instead of error) or splitting it off from required_keys:

Suggested change
required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths", "proof"]
required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"]
optional_keys = ["proof"]
for k in required_keys:
Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/_brain_manifest.py
Line: 194

Comment:
**Adding `"proof"` to mandatory keys breaks existing manifests**

`validate_manifest()` now returns `"Missing required key: proof"` for any `brain.manifest.json` written before this PR. Since the key is only populated by the new `brain.manifest()` call path, any brain that hasn't been re-manifested will silently fail validation — including marketplace-shared brains and CI artefacts.

Consider keeping `"proof"` as a soft check (warn instead of error) or splitting it off from `required_keys`:
```suggestion
    required_keys = ["schema_version", "metadata", "quality", "database", "rag", "paths"]
    optional_keys = ["proof"]
    for k in required_keys:
```

How can I resolve this? If you propose a fix, please make it concise.

Fix in Claude Code

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant