feat: SDK P0 hardening + security — 14 features, 1600+ tests#17
Conversation
Standalone inspection functions take primitive paths (no Brain dependency). Brain gets thin wrappers: rules(), explain(), export_data(). Includes minimal YAML serializer (no PyYAML). 20 tests, 0 regressions. Co-Authored-By: Gradata <noreply@gradata.ai>
- Use `with sqlite3.connect()` context manager in explain_rule (fixes connection leak) - Rename `format` param to `output_format` to avoid shadowing Python builtin - Escape embedded double quotes in _yaml_val before wrapping - Add docstring note for reserved events_path parameter - Remove `if rules:` guard in test — fixture guarantees rules exist - Rename `l` loop variables to `lesson` for readability Co-Authored-By: Gradata <noreply@gradata.ai>
…graduation ceiling Add CorrectionScope enum (UNIVERSAL/DOMAIN/PROJECT/ONE_OFF) and wire it through brain.correct() -> brain_correct() -> lesson scope_json. ONE_OFF scoped lessons are blocked from graduating past INSTINCT. Scope is persisted in lessons.md via format_lessons/parse_lessons roundtrip. 8 new tests, 1466 total passing. Co-Authored-By: Gradata <noreply@gradata.ai>
…storage Extraction pipeline runs on full text for accurate behavioral instructions, then PII is redacted before writing to events.jsonl, pending_approvals, and FTS index. Covers API keys (OpenAI, GitHub, Slack, AWS, Google, GitLab), credit cards, SSNs, phones, and emails. 24 tests (unit + integration). Co-Authored-By: Gradata <noreply@gradata.ai>
…/reject, graduated_rules Three new Brain methods for reviewing graduated rules at session end: - pending_promotions() lists PATTERN/RULE rules for batch review - approve_promotion(rule_id) endorses with event trail (no-op on data) - reject_promotion(rule_id) demotes to INSTINCT with confidence 0.40 Also adds graduated_rules detail list to end_session() result dict. Co-Authored-By: Gradata <noreply@gradata.ai>
Track which corrections led to which rules via rule_provenance table. brain.trace(rule_id) queries provenance, falls back to events.jsonl scan. Co-Authored-By: Gradata <noreply@gradata.ai>
Document the 3-fire (INSTINCT->PATTERN) and 5-fire (PATTERN->RULE) requirements in the docstring. Add debug warning when a single-session confidence jump exceeds PATTERN_THRESHOLD. No behavior change. Co-Authored-By: Gradata <noreply@gradata.ai>
… injection Raw confidence floats (e.g. [RULE:0.95]) in prompt-injected rules leak internal state that could enable prompt injection attacks. Replace with tier labels ([RULE], [PATTERN], [INSTINCT]) in all prompt-facing output. Internal APIs (brain.prove(), brain.rules()) keep raw floats. Co-Authored-By: Gradata <noreply@gradata.ai>
Each brain gets a unique 32-byte salt (.brain_salt) that jitters PATTERN_THRESHOLD (0.60) and RULE_THRESHOLD (0.90) by +/-5% via HMAC-SHA256, preventing attackers from predicting exact promotion boundaries for any specific brain. Co-Authored-By: Gradata <noreply@gradata.ai>
Pads function execution to a minimum duration with random jitter, preventing timing side-channels that could leak internal state. Co-Authored-By: Gradata <noreply@gradata.ai>
Replaces confidence-ranked sort with tier-bucketed shuffle to prevent adversaries from inferring confidence rankings via injection order. Rules grouped by tier (RULE > PATTERN > INSTINCT), shuffled within each tier using secrets (production) or seeded Random (testing). Co-Authored-By: Gradata <noreply@gradata.ai>
…tection Tracks per-endpoint call rates via QueryBudget class. Wired into Brain.apply_brain_rules() to log warnings when rate is exceeded. Co-Authored-By: Gradata <noreply@gradata.ai>
sign_manifest() produces a signed copy without mutating the original. verify_manifest() uses hmac.compare_digest to prevent timing attacks. Co-Authored-By: Gradata <noreply@gradata.ai>
Adds RuleMetadata (what/why/who/when/where/how + utility/safety scores) to _types.py and attaches it to Lesson via default_factory. Scores are clamped to [0, 1]. Each Lesson gets an independent metadata instance. Co-Authored-By: Gradata <noreply@gradata.ai>
Creates signed provenance records for every correction (user_id, correction_hash, session, timestamp + HMAC). Wired into brain_correct() with fail-safe try/except so corrections are never blocked. Co-Authored-By: Gradata <noreply@gradata.ai>
… TOCTOU, timestamp Co-Authored-By: Gradata <noreply@gradata.ai>
Co-Authored-By: Gradata <noreply@gradata.ai>
📝 WalkthroughWalkthroughAdds provenance/audit tracing, PII redaction, per-brain salts and salted thresholds, score obfuscation, query-budgeting, manifest/correction signing, rule inspection/approval APIs, correction-scope tagging, schema additions, new types/metadata, rule-id/formatting changes, and extensive tests. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant Brain
participant Core as CoreModule
participant Safety as SafetyModule
participant Security as SecurityModule
participant DB as Database
Client->>Brain: correct(draft, final, scope)
Brain->>Core: brain_correct(draft, final, scope)
Core->>Safety: redact_pii(draft & final)
Safety-->>Core: redacted_text, report
Core->>Security: compute SHA256(correction), create_provenance_record(..., salt)
Security-->>Core: provenance_dict (HMAC)
Core->>DB: insert correction event (redacted, provenance, correction_scope)
Core->>DB: write lesson / update FTS (redacted truncated text)
Core-->>Brain: correction_result (scope, provenance)
Brain-->>Client: result
sequenceDiagram
participant Client
participant Brain
participant Audit as AuditModule
participant FS as FileSystem
participant DB as Database
Client->>Brain: trace(rule_id)
Brain->>Audit: trace_rule(db_path, events_path, lessons_path, rule_id)
Audit->>FS: load lessons.md, compute _make_rule_id
Audit->>DB: query rule_provenance(rule_id)
DB-->>Audit: provenance_rows
Audit->>FS: scan events.jsonl for correction IDs
FS-->>Audit: correction_events
Audit->>DB: query lesson_transitions(description[:100], category)
DB-->>Audit: transitions
Audit-->>Brain: consolidated trace_result
Brain-->>Client: result
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 30
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/gradata/enhancements/self_improvement.py (1)
656-691:⚠️ Potential issue | 🟠 MajorThe salt only affects
graduate(), not the inline transitions.
update_confidence()in this same module still promotes/demotes with fixedPATTERN_THRESHOLD/RULE_THRESHOLDchecks on Lines 631-645. Any caller that relies on those inline state changes can bypass the per-brain jitter added here.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/self_improvement.py` around lines 656 - 691, The issue: salt jitter is applied only inside graduate() via eff_pattern_threshold/eff_rule_threshold but update_confidence() still uses the global PATTERN_THRESHOLD/RULE_THRESHOLD, letting callers bypass per-brain jitter; fix by making update_confidence use the same salted thresholds or accept them as parameters. Concretely, modify update_confidence (or its callers) so it takes eff_pattern_threshold and eff_rule_threshold (or a salt param) and replace checks against PATTERN_THRESHOLD and RULE_THRESHOLD inside update_confidence with the provided eff_* values (or compute salt_threshold(...) inside update_confidence using the provided salt) so all inline promotions/demotions use the per-brain jitter consistently with graduate().
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_core.py`:
- Around line 81-82: Normalize and validate the incoming scope parameter in
graduate() before storing it in scope_json: ensure scope is coerced to a trimmed
str (or None), lowercased, and validated against the allowed set of correction
scopes (reject/raise ValueError or map to a default for unknown values) so that
checks like scope_json["correction_scope"] == "one_off" can't be bypassed by
casing/typos or non-string inputs; apply the same normalization/validation for
other places that build scope_json (the other occurrences around the places
noted) so persistence only ever stores a normalized string or null.
- Around line 406-416: The current correction_hash uses ambiguous string
concatenation (sha256(f"{draft}|{final}")) which can collide; instead serialize
the payload structurally (e.g., JSON or another deterministic serializer) of
[draft, final] before hashing to preserve tuple boundaries. Update the logic
that computes correction_hash (the variable named correction_hash in the block
that calls create_provenance_record) to import and use a deterministic
serializer (e.g., json.dumps with consistent separators/encoding) on the
draft/final pair, then hash the resulting bytes with _hashlib.sha256 and call
create_provenance_record as before; leave user_id, session and salt retrieval
(context.get(...), session or 0, getattr(brain, "_brain_salt", "")) unchanged.
In `@src/gradata/_types.py`:
- Around line 110-132: Lesson.metadata (RuleMetadata) is not
serialized/deserialized so custom metadata is lost on save/load; update the
Lesson (where metadata is stored), the serialization/deserialization in
src/gradata/enhancements/self_improvement.py, and the inspector
_lesson_to_dict() in src/gradata/inspection.py to include metadata: ensure
RuleMetadata (class RuleMetadata) is converted to a dict via its to_dict() when
saving/exporting, that loading code reconstructs or assigns a RuleMetadata
instance (or dict) back to Lesson.metadata when reading, and that
_lesson_to_dict() exposes the metadata field so it appears in
inspection/exports.
In `@src/gradata/audit.py`:
- Around line 170-215: The trace lookup currently queries lesson_transitions by
mutable target.description which breaks when descriptions change; update
trace_rule()/this code to query lesson_transitions by an immutable rule_id (use
_make_rule_id(target) or the existing rule_id variable) instead of
target.description, i.e. replace the WHERE clause to use WHERE rule_id = ? (and
bind the rule id) and ensure the code that writes lesson_transitions (e.g.
_core.brain_end_session or wherever transitions are inserted) stores the same
rule_id; alternatively, if you cannot add rule_id to the table immediately,
apply the same 100-character truncation used on write when querying (truncate
_make_rule_id(target) or target.description the same way) so reads match stored
values.
In `@src/gradata/brain.py`:
- Around line 812-826: The read-modify-write in reject_promotion() is not
protected by the store lock and can race with other writers; wrap the entire
sequence that computes lessons_path, calls _load_lessons_from_path(), mutates
target (setting state/confidence) and calls
write_lessons_safe(format_lessons(...)) in the same lock used by
_resolve_pending() (use the same lock acquisition/release pattern and lock
object used there) so the full cycle is atomic; ensure you still return the same
error when target is missing and release the lock on all code paths.
- Around line 429-431: The rate-limiter currently only logs when
self._query_budget.is_rate_exceeded("apply_rules") is true but continues to call
the same logic (e.g., loading lessons and returning the full prompt). Change the
over-budget branch in apply_rules so it enforces the limit: after
self._query_budget.record("apply_rules") and the is_rate_exceeded check, either
short-circuit (raise a specific BudgetExceededError or return an explicit error
response) or degrade the response by skipping lesson loading (avoid calling
load_lessons / whatever method populates the prompt block) and returning a
reduced prompt/placeholder; ensure the code paths in apply_rules and any callers
handle the BudgetExceededError or reduced response consistently.
- Around line 710-840: The new inspection/promotion methods (rules, explain,
trace, export_data, pending_promotions, approve_promotion, reject_promotion)
should be extracted out of the Brain class into a separate mixin or companion
module (e.g., BrainInspectionMixin or brain_inspection.py) to keep Brain under
500 lines; move the implementations there, keep their signatures identical, and
have them access the same attributes (self.db_path, self.dir, self.ctx,
self.bus, logger) so callers can use Brain + mixin via multiple inheritance or
composition; update any imports that referenced gradata.inspection helpers
inside these methods to remain local to the new module, and update tests and any
instantiation sites to use the new mixin/module while preserving behavior and
emitted events.
- Around line 746-793: pending_promotions currently returns all on-disk rules
and approve_promotion only emits an in-memory event, so approvals aren't durable
and the rule keeps appearing; modify approve_promotion (which uses _make_rule_id
and _load_lessons_from_path) to persist the review decision: load lessons,
locate the target rule, mark it as reviewed/approved (e.g. set a reviewed flag
or update state/confidence on the lesson object) and write the lessons back to
the lessons_path so disk reflects the approval, then emit the audit event with
self.emit(...) (not just self.bus.emit) so it's recorded, and ensure
pending_promotions filters out reviewed/approved lessons (or read from
end_session()["graduated_rules"] if you prefer that flow) so approved promotions
no longer appear.
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 722-733: The current early continue in the lesson.scope_json block
causes one_off lessons to skip the rest of the lifecycle logic (including
zero-confidence kills, idle→UNTESTABLE transitions, and PATTERN demotions);
instead, detect correction_scope == "one_off" and set a local flag (e.g.,
block_promotion) or adjust logic so only promotion paths are suppressed for
LessonState.INSTINCT and LessonState.PATTERN, while allowing the subsequent
decay/kill/idle-to-UNTESTABLE and demotion code to run normally; update the
handling around lesson.scope_json / LessonState checks to remove the continue
and condition promotion code to consult the new flag.
- Around line 709-717: Replace direct attribute access of
lesson._pre_session_confidence with a safe getattr() call and validate its type
before using it: fetch pre_conf = getattr(lesson, "_pre_session_confidence",
None), ensure pre_conf is an int/float (or cast safely) and only compute jump =
lesson.confidence - pre_conf when pre_conf is numeric; keep the existing
eff_pattern_threshold and _log.warning call unchanged but use pre_conf-guarded
logic so Pyright no longer flags attr-defined issues in the Lesson handling.
In `@src/gradata/inspection.py`:
- Around line 33-47: The current _lesson_to_dict relies on _make_rule_id which
yields a tiny CATEGORY:NNNN space causing collisions; update _lesson_to_dict to
use a true stable identifier instead of _make_rule_id by either (A)
adding/persisting a dedicated stable id field on the Lesson dataclass (e.g.,
lesson.rule_id) and returning that, or (B) replacing the short _make_rule_id
output with a longer deterministic digest (e.g., sha256 hex of
category+description+date) computed in the rule creation path (not each
serialization) so IDs are stable and globally unique; modify _lesson_to_dict to
return the new field (rule_id or the longer digest) and remove reliance on the
short _make_rule_id to prevent future collisions.
In `@src/gradata/rules/rule_engine.py`:
- Around line 649-652: The prompt collapses PATTERN and RULE because
truncate_score(lesson.confidence) returns "H" for >=0.6; preserve maturity by
computing tier_label but then differentiating RULE lessons before building
instruction: after calling truncate_score(), if lesson.type (or
lesson.source/category) indicates a RULE, append or replace the label with a
distinct marker (e.g., "H(RULE)" or "R") so RULEs remain visually distinct from
PATTERNs; update the same logic used where instruction is built (the code that
sets tier_label and constructs instruction near truncate_score(...) and the
similar block around the lines referenced 691-692) to apply this override
consistently.
In `@src/gradata/safety.py`:
- Around line 18-21: The current token matchers for "github_token" and
"slack_token" miss newer token prefixes; update the regex for the "github_token"
entry to also match fine-grained PATs starting with "github_pat_" in addition to
"ghp_", and expand the "slack_token" regex to match tokens starting with
"xoxb-", "xoxp-", "xapp-" and "xwfp-"; locate the tuples named "github_token"
and "slack_token" in src/gradata/safety.py and replace their re.compile patterns
accordingly so all the listed prefixes are covered.
In `@src/gradata/security/__init__.py`:
- Around line 1-17: The module is missing the required future-annotations
import; add the line "from __future__ import annotations" as the very first
statement in this package initializer (before the docstring and other imports)
so it matches sibling modules like query_budget.py and score_obfuscation.py and
ensures forward reference handling for the exported symbols
(obfuscate_instruction, truncate_score, generate_brain_salt,
load_or_create_salt, salt_threshold, QueryBudget, sign_manifest,
verify_manifest, create_provenance_record, verify_provenance).
In `@src/gradata/security/brain_salt.py`:
- Around line 29-38: Ensure the directory exists and make creation atomic to
avoid FileNotFoundError and winners/losers returning partial salts: create the
parent directory for brain_dir (Path(...).mkdir(parents=True, exist_ok=True)),
generate the salt via generate_brain_salt(), write it to a temporary file next
to salt_path (e.g., salt_path.with_suffix(".tmp")) with explicit
write+flush+os.fsync, then atomically rename/replace the temp file to salt_path
(os.replace) so concurrent processes see either no file or the full file; on
FileExistsError/when reading, validate the read content length (expected 64) and
retry a few times with short backoff before failing.
In `@src/gradata/security/correction_provenance.py`:
- Around line 19-44: The create_provenance_record function currently signs
without validating inputs; add input validation at the top of
create_provenance_record to reject empty or whitespace-only user_id and
correction_hash, require salt to be a non-empty string (and not only whitespace)
and require session to be an int >= 0 (or >0 if you prefer); if any check fails
raise a ValueError with a clear message so you never call hmac.new with an
invalid salt or nonsensical fields. Ensure the checks reference the parameters
user_id, correction_hash, session, and salt and run before computing
timestamp/message/signature so invalid data is rejected at the boundary.
In `@src/gradata/security/manifest_signing.py`:
- Around line 15-25: The sign_manifest API should validate inputs at the
boundary: in sign_manifest (and parallel functions like any corresponding
verify_manifest), check that salt is a str and non-empty (e.g., if not
isinstance(salt, str) or not salt.strip(): raise ValueError("salt must be a
non-empty string")), and validate manifest is a dict (raise TypeError/ValueError
if not) before calling _canonical_payload; keep the current behavior of creating
a new dict (signed = dict(manifest)) and then computing signature and signed_at
after validation.
- Around line 34-40: The code currently assumes manifest["signature"] is a
string and will raise on non-string types; update the signature-checking logic
in manifest_signing.py so that after retrieving stored_sig it returns False for
missing or malformed types (e.g., if not isinstance(stored_sig, str) or not
stored_sig), then proceed to compute payload via _canonical_payload(manifest)
and compare using hmac.compare_digest; ensure stored_sig and expected are the
same type (both str) when comparing. This avoids TypeError on ints/dicts and
satisfies the "fail closed" behavior.
In `@src/gradata/security/query_budget.py`:
- Around line 109-115: The _prune method leaves empty deque buckets in
self._calls causing unbounded key growth; after the existing while loop that
popleft()s old timestamps in _prune(endpoint: str), check if calls is empty and
if so remove the mapping (e.g. self._calls.pop(endpoint, None)) to clean up the
empty bucket; reference the _prune method and the self._calls[endpoint] deque
when making this change.
- Around line 24-27: In the __init__ of the QueryBudget limiter (the constructor
shown with parameters window_seconds and max_calls and attribute _calls), add
argument validation to reject non-positive window_seconds and negative max_calls
by raising ValueError with clear messages; keep the rest of the initialization
(setting self.window_seconds, self.max_calls, and self._calls) after the checks
so invalid configurations are prevented at the system boundary.
In `@src/gradata/security/score_obfuscation.py`:
- Around line 21-39: The function truncate_score should enforce that the
incoming confidence is within [0.0, 1.0] before mapping; update truncate_score
to validate the confidence parameter (the confidence argument) and either clamp
it to the range or raise a ValueError when it is out of bounds per project
guidelines — prefer raising an explicit ValueError with a clear message like
"confidence must be in [0.0,1.0]" so callers can't pass invalid scores silently,
then proceed with the existing tier mapping logic (RULE/PATTERN/INSTINCT).
- Around line 57-79: constant_time_pad must validate its timing parameters; add
input checks at the top of constant_time_pad to ensure min_ms and jitter_ms are
numeric and non-negative (e.g. if min_ms < 0 or jitter_ms < 0: raise
ValueError("min_ms and jitter_ms must be non-negative")). Keep the existing
jitter computation and small-threshold branch (jitter_ms >= 0.001) but reject
negative values before using secrets.randbelow and time.sleep to preserve the
padding contract.
In `@tests/test_audit_provenance.py`:
- Around line 117-129: The test test_table_exists_after_migration is ineffective
because the test fixture already creates rule_provenance; modify the test so the
table is not present before calling run_migrations: either delete/drop the table
(connect to brain_dir / "system.db" and execute "DROP TABLE IF EXISTS
rule_provenance" then close) or recreate/start from a fresh DB without the
rule_provenance table prior to invoking run_migrations(), then assert the table
exists afterward; update the test code around the run_migrations call
accordingly.
- Around line 25-41: The SAMPLE_LESSONS fixture uses the wrong key for event IDs
so parse_lessons() never populates Lesson.correction_event_ids; update the
SAMPLE_LESSONS string (the SAMPLE_LESSONS constant) to use the parser's actual
key "Corrections:" instead of "Correction event IDs:" for the entries (lines
that currently read "Correction event IDs: evt_abc123, evt_def456" and
"Correction event IDs: evt_ghi789") so trace_rule() will exercise the
lesson→event lookup path and Lesson.correction_event_ids will be populated.
In `@tests/test_batch_approval.py`:
- Around line 242-248: The test currently only asserts that some rules remain
after brain.end_session(); instead capture the exact pending promotion IDs
before calling end_session() by calling brain.pending_promotions() and
extracting their unique IDs, then call brain.end_session(), retrieve
post-session rules via brain.rules(include_all=True) and extract their IDs, and
assert that the set (or list) of pre-session pending IDs is present in the
post-session rule IDs (e.g., equality or subset as appropriate) so the same
pending promotions survive; update the test_unreviewed_rules_survive_end_session
to perform this ID comparison using the existing brain.pending_promotions(),
brain.end_session(), and brain.rules(include_all=True) helpers.
In `@tests/test_brain_salt.py`:
- Around line 28-45: Add tests that call load_or_create_salt with a non-existent
parent directory (e.g., pass tmp_path / "missing_dir") to ensure the function
creates the directory and writes a 64-char .brain_salt file, and add a
race/partial-read test that creates the directory and writes a truncated
.brain_salt (shorter than 64 chars) before calling load_or_create_salt to verify
the function detects the partial content and returns/writes a full 64-char salt
(covering the FileExistsError/partial-read path in load_or_create_salt).
In `@tests/test_manifest_signing.py`:
- Around line 56-57: Add a new unit test that asserts verify_manifest(...)
returns False when the manifest's "signature" field exists but is of a
non-string type (e.g., int or dict) to ensure fail-closed behavior; update the
tests in tests/test_manifest_signing.py by adding a test function (e.g.,
test_malformed_signature_type_returns_false) that constructs SAMPLE_MANIFEST
with signature set to a non-string value and calls
verify_manifest(SAMPLE_MANIFEST, SALT) asserting it is False, ensuring the
verification logic in verify_manifest handles non-string signature types as
invalid.
In `@tests/test_pii_redaction.py`:
- Around line 157-165: The test test_behavioral_extraction_uses_full_text is too
weak; instead of truthy checks, assert a concrete extraction/summarization
result that depends on the original unredacted draft (variables draft/final)
and/or compare against a control where the draft is pre-redacted. Specifically,
call brain.correct(draft, final) and assert that result["classifications"] (or
result.get("data", {})["severity"]) contains the expected classification token
(e.g., an email/PII label) or that the summary contains "alice@notify.com", and
then assert that running brain.correct on a redacted-draft control does not
produce that same classification or summary; use the function names
test_behavioral_extraction_uses_full_text and brain.correct to locate and update
the test.
In `@tests/test_query_budget.py`:
- Around line 40-80: Tests use real time.sleep causing flakiness; change them to
control time deterministically by mocking time.monotonic (or injecting a clock)
so QueryBudget(window_seconds=...) and its methods record, detect_anomalies, and
count use the mocked clock. Replace sleeps in test_burst_detected,
test_no_false_positive_on_normal_usage, test_below_minimum_calls_no_burst, and
TestWindowExpiry.test_window_expiry with a monkeypatched time.monotonic() that
returns a start value and is advanced programmatically between qb.record(...)
calls to simulate the intended intervals and to trigger window expiry
deterministically.
In `@tests/test_safety_assertion.py`:
- Around line 116-128: The current test test_docstring_mentions_min_applications
uses a loose check ("3" in doc or "MIN_APPLICATIONS" in doc) that can pass
accidentally; update this test to assert the explicit contract in
graduate.__doc__ by checking for the exact transition sentence(s) or both the
numeric and constant forms (e.g., the literal "3 applications" or the constant
name MIN_APPLICATIONS and its documented value) so the test fails if the
required fire-count changes, and likewise ensure tests
test_docstring_mentions_pattern_and_rule_thresholds assert the exact PATTERN and
RULE threshold phrases rather than just presence of the identifiers.
---
Outside diff comments:
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 656-691: The issue: salt jitter is applied only inside graduate()
via eff_pattern_threshold/eff_rule_threshold but update_confidence() still uses
the global PATTERN_THRESHOLD/RULE_THRESHOLD, letting callers bypass per-brain
jitter; fix by making update_confidence use the same salted thresholds or accept
them as parameters. Concretely, modify update_confidence (or its callers) so it
takes eff_pattern_threshold and eff_rule_threshold (or a salt param) and replace
checks against PATTERN_THRESHOLD and RULE_THRESHOLD inside update_confidence
with the provided eff_* values (or compute salt_threshold(...) inside
update_confidence using the provided salt) so all inline promotions/demotions
use the per-brain jitter consistently with graduate().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: b6c6c455-41bc-4050-800e-d7b4058ed80f
📒 Files selected for processing (30)
.gitignoresrc/gradata/_core.pysrc/gradata/_migrations.pysrc/gradata/_types.pysrc/gradata/audit.pysrc/gradata/brain.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/inspection.pysrc/gradata/rules/rule_engine.pysrc/gradata/safety.pysrc/gradata/security/__init__.pysrc/gradata/security/brain_salt.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/manifest_signing.pysrc/gradata/security/query_budget.pysrc/gradata/security/score_obfuscation.pytests/test_audit_provenance.pytests/test_batch_approval.pytests/test_brain_salt.pytests/test_correction_provenance.pytests/test_injection_order.pytests/test_inspection.pytests/test_manifest_signing.pytests/test_pii_redaction.pytests/test_query_budget.pytests/test_rule_engine_v2.pytests/test_rule_metadata.pytests/test_safety_assertion.pytests/test_scope_tagging.pytests/test_score_obfuscation.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (5)
**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Always read a file before editing it. Always prefer editing over creating new files.
Files:
tests/test_rule_engine_v2.pysrc/gradata/_migrations.pytests/test_manifest_signing.pytests/test_query_budget.pytests/test_correction_provenance.pysrc/gradata/rules/rule_engine.pytests/test_rule_metadata.pysrc/gradata/security/__init__.pysrc/gradata/security/manifest_signing.pytests/test_score_obfuscation.pysrc/gradata/safety.pysrc/gradata/security/correction_provenance.pytests/test_scope_tagging.pytests/test_safety_assertion.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pytests/test_pii_redaction.pytests/test_batch_approval.pytests/test_brain_salt.pytests/test_injection_order.pytests/test_inspection.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/_types.pysrc/gradata/inspection.pysrc/gradata/audit.pysrc/gradata/_core.pytests/test_audit_provenance.pysrc/gradata/brain.pysrc/gradata/security/brain_salt.py
**/*.{py,sh,js,ts,json,env*}
📄 CodeRabbit inference engine (CLAUDE.md)
Never hardcode secrets. Never commit .env files. Pre-commit hook blocks both.
Files:
tests/test_rule_engine_v2.pysrc/gradata/_migrations.pytests/test_manifest_signing.pytests/test_query_budget.pytests/test_correction_provenance.pysrc/gradata/rules/rule_engine.pytests/test_rule_metadata.pysrc/gradata/security/__init__.pysrc/gradata/security/manifest_signing.pytests/test_score_obfuscation.pysrc/gradata/safety.pysrc/gradata/security/correction_provenance.pytests/test_scope_tagging.pytests/test_safety_assertion.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pytests/test_pii_redaction.pytests/test_batch_approval.pytests/test_brain_salt.pytests/test_injection_order.pytests/test_inspection.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/_types.pysrc/gradata/inspection.pysrc/gradata/audit.pysrc/gradata/_core.pytests/test_audit_provenance.pysrc/gradata/brain.pysrc/gradata/security/brain_salt.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Keep files under 500 lines. Validate input at system boundaries.
Files:
tests/test_rule_engine_v2.pysrc/gradata/_migrations.pytests/test_manifest_signing.pytests/test_query_budget.pytests/test_correction_provenance.pysrc/gradata/rules/rule_engine.pytests/test_rule_metadata.pysrc/gradata/security/__init__.pysrc/gradata/security/manifest_signing.pytests/test_score_obfuscation.pysrc/gradata/safety.pysrc/gradata/security/correction_provenance.pytests/test_scope_tagging.pytests/test_safety_assertion.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pytests/test_pii_redaction.pytests/test_batch_approval.pytests/test_brain_salt.pytests/test_injection_order.pytests/test_inspection.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/_types.pysrc/gradata/inspection.pysrc/gradata/audit.pysrc/gradata/_core.pytests/test_audit_provenance.pysrc/gradata/brain.pysrc/gradata/security/brain_salt.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_rule_engine_v2.pytests/test_manifest_signing.pytests/test_query_budget.pytests/test_correction_provenance.pytests/test_rule_metadata.pytests/test_score_obfuscation.pytests/test_scope_tagging.pytests/test_safety_assertion.pytests/test_pii_redaction.pytests/test_batch_approval.pytests/test_brain_salt.pytests/test_injection_order.pytests/test_inspection.pytests/test_audit_provenance.py
src/gradata/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use AGPL-3.0 licensing. Source code resides in src/gradata/.
Files:
src/gradata/_migrations.pysrc/gradata/rules/rule_engine.pysrc/gradata/security/__init__.pysrc/gradata/security/manifest_signing.pysrc/gradata/safety.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/_types.pysrc/gradata/inspection.pysrc/gradata/audit.pysrc/gradata/_core.pysrc/gradata/brain.pysrc/gradata/security/brain_salt.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/_migrations.pysrc/gradata/rules/rule_engine.pysrc/gradata/security/__init__.pysrc/gradata/security/manifest_signing.pysrc/gradata/safety.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/_types.pysrc/gradata/inspection.pysrc/gradata/audit.pysrc/gradata/_core.pysrc/gradata/brain.pysrc/gradata/security/brain_salt.py
🪛 GitHub Actions: CI
src/gradata/enhancements/self_improvement.py
[error] 712-712: pyright: Cannot access attribute "_pre_session_confidence" for class "Lesson" (Attribute "_pre_session_confidence" is unknown) (reportAttributeAccessIssue)
src/gradata/brain.py
[warning] 871-871: pyright: Import "gradata.enhancements.memory_extraction" could not be resolved (reportMissingImports)
🔇 Additional comments (6)
.gitignore (1)
144-144: Good addition to keep local stack artifacts out of version control.
Line 144correctly ignores.gstack/, which helps prevent accidental commits of local runtime/config data.tests/test_rule_engine_v2.py (1)
234-234: Good assertion update for obfuscated tier output.Line 234 correctly validates the new prompt format expectation (
[RULE]tokenized tier label).src/gradata/_migrations.py (1)
44-51: Schema/index additions look solid for provenance lookups.
rule_provenancetable creation plusidx_provenance_rule_idis a good minimal baseline for write/read provenance flow.Also applies to: 66-67
tests/test_rule_metadata.py (1)
16-17: Usepytest.approxfor floating point score assertions.These tests compare floats with
==forutility_scoreandsafety_scoreassertions throughout the file. The guideline requires floating point comparisons to usepytest.approx. Update assertions at lines 16–17, 39–40, 46–57, 66–67, 80–81, and 89–90.tests/test_inspection.py (1)
24-287: Solid end-to-end coverage for the inspection surface.The fixture exercises parsing, SQLite enrichment, JSON/YAML export, and the thin
Brainwrappers against isolated temp data.tests/test_injection_order.py (1)
15-136: Deterministic tier/shuffle coverage looks good.This checks tier precedence, seed stability, cross-seed variation, and the empty-input case without relying on incidental ordering.
Security hardening: - Rate limiter now enforces budget (returns empty string, not just warning) - Provenance HMAC validates all inputs at boundary (user_id, hash, salt, session) - Manifest signing validates dict/salt types, non-string signature returns False - Brain salt creation atomic (temp+fsync+os.replace), handles missing dirs + partial files - Credit card regex tightened to 4-4-4-4 format (no false positives on numeric IDs) - GitHub/Slack token patterns expanded (github_pat_, xoxp/xapp/xwfp) - Scope parameter normalized + validated against allowed set - Correction hash uses structural JSON serialization (not ambiguous string concat) - QueryBudget validates init params, prunes empty buckets - truncate_score/constant_time_pad validate input ranges Architecture: - Extracted inspection/promotion methods to BrainInspectionMixin (brain_inspection.py) - approve_promotion now persists reviewed flag to disk via write_lessons_safe - _make_rule_id uses 8-char hex digest (was 4-digit decimal, 16M vs 10K collision space) - _tier_label uses lesson.state.value as ground truth (not confidence re-computation) - one_off scope uses block_promotion flag (decay/kill paths still run) - _pre_session_confidence uses safe getattr + isinstance check - update_confidence accepts salt param for consistent salted thresholds - RuleMetadata now round-trips through format_lessons/parse_lessons - security/__init__.py adds future annotations + exports constant_time_pad Tests (1614 pass, +7 new): - Mocked time.monotonic in query budget tests (no real sleep) - Stricter docstring assertions (exact thresholds + transitions) - Migration test drops table first, SAMPLE_LESSONS uses correct parser key - ID-based batch approval test, malformed signature type test - Brain salt tests for missing dirs + partial files - Stronger PII redaction test with redacted control Co-Authored-By: Gradata <noreply@gradata.ai>
|
|
||
| # ── Lesson Management ────────────────────────────────────────────── | ||
|
|
||
| def forget(self, description: str | None = None, category: str | None = None) -> int: | ||
| """Remove lessons matching description or category.""" | ||
| if not description and not category: | ||
| raise ValueError("Provide at least one of description or category") | ||
| lessons_path = self._find_lessons_path() | ||
| if not lessons_path or not lessons_path.is_file(): | ||
| return 0 | ||
| def forget(self, what: str = "last") -> dict | list[dict]: | ||
| """Human-friendly way to undo lessons. | ||
|
|
||
| Examples: | ||
| brain.forget("last") # most recent lesson | ||
| brain.forget("last 3") # last 3 lessons | ||
| brain.forget("casual tone") # fuzzy match description | ||
| brain.forget("all tone") # everything in TONE category | ||
| """ | ||
| try: | ||
| from gradata.enhancements.self_improvement import parse_lessons, format_lessons | ||
| except ImportError: | ||
| return 0 | ||
| return {"rolled_back": False, "error": "enhancements not installed"} | ||
| from gradata._types import LessonState | ||
| from gradata._db import write_lessons_safe | ||
|
|
||
| lessons_path = self._find_lessons_path() | ||
| if not lessons_path or not lessons_path.is_file(): | ||
| return {"rolled_back": False, "error": "no lessons file"} |
There was a problem hiding this comment.
Breaking API change in
brain.forget() — parameter names and return type changed
The old signature was forget(description: str | None = None, category: str | None = None) -> int. This PR changes it to forget(what: str = "last") -> dict | list[dict].
Any caller using keyword arguments is broken:
brain.forget(description="casual tone") # TypeError: unexpected keyword argument
brain.forget(category="TONE") # TypeError: unexpected keyword argumentThe return type also changed from int (count of removed lessons) to dict | list[dict], breaking callers that do arithmetic on the result (e.g. if brain.forget(...) > 0).
This needs either a deprecation shim or a major-version bump in the changelog. At minimum, a **kwargs guard with a helpful error message would prevent silent breakage:
def forget(self, what: str = "last", **_legacy_kwargs) -> dict | list[dict]:
if _legacy_kwargs:
raise TypeError(
"brain.forget() API changed. Use forget('casual tone') or "
"forget('all TONE') instead of description=/category= kwargs."
)Prompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/brain.py
Line: 469-490
Comment:
**Breaking API change in `brain.forget()` — parameter names and return type changed**
The old signature was `forget(description: str | None = None, category: str | None = None) -> int`. This PR changes it to `forget(what: str = "last") -> dict | list[dict]`.
Any caller using keyword arguments is broken:
```python
brain.forget(description="casual tone") # TypeError: unexpected keyword argument
brain.forget(category="TONE") # TypeError: unexpected keyword argument
```
The return type also changed from `int` (count of removed lessons) to `dict | list[dict]`, breaking callers that do arithmetic on the result (e.g. `if brain.forget(...) > 0`).
This needs either a deprecation shim or a major-version bump in the changelog. At minimum, a `**kwargs` guard with a helpful error message would prevent silent breakage:
```python
def forget(self, what: str = "last", **_legacy_kwargs) -> dict | list[dict]:
if _legacy_kwargs:
raise TypeError(
"brain.forget() API changed. Use forget('casual tone') or "
"forget('all TONE') instead of description=/category= kwargs."
)
```
How can I resolve this? If you propose a fix, please make it concise.| def pending_promotions(self) -> list[dict]: | ||
| """List rules that have graduated (PATTERN or RULE state). | ||
|
|
||
| Silent during work — call at session end to review what graduated. | ||
| Returns list of rule dicts with id, category, state, confidence, etc. | ||
| """ | ||
| from gradata.inspection import list_rules | ||
| return list_rules( | ||
| db_path=self.db_path, | ||
| lessons_path=self._find_lessons_path() or self.dir / "lessons.md", | ||
| ) |
There was a problem hiding this comment.
pending_promotions() is invisible to approval_required=True lessons
When brain.correct(..., approval_required=True) is called, the lesson is created with pending_approval=True and init_conf=0.0, which means graduate() will always skip it (if lesson.pending_approval: continue). The lesson stays at INSTINCT state indefinitely.
pending_promotions() delegates to list_rules() which only returns ELIGIBLE_STATES (PATTERN + RULE). An approval_required lesson stuck at INSTINCT will never appear in pending_promotions(), making the approval workflow inaccessible.
The method needs to also include INSTINCT-state lessons where pending_approval=True:
def pending_promotions(self) -> list[dict]:
from gradata.inspection import list_rules, _load_lessons_from_path, _lesson_to_dict
lessons_path = self._find_lessons_path() or self.dir / "lessons.md"
graduated = list_rules(db_path=self.db_path, lessons_path=lessons_path)
all_lessons = _load_lessons_from_path(lessons_path)
from gradata._types import LessonState
pending = [
_lesson_to_dict(l) for l in all_lessons
if l.state == LessonState.INSTINCT and l.pending_approval
]
return pending + graduatedPrompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/brain_inspection.py
Line: 70-80
Comment:
**`pending_promotions()` is invisible to `approval_required=True` lessons**
When `brain.correct(..., approval_required=True)` is called, the lesson is created with `pending_approval=True` and `init_conf=0.0`, which means `graduate()` will always skip it (`if lesson.pending_approval: continue`). The lesson stays at `INSTINCT` state indefinitely.
`pending_promotions()` delegates to `list_rules()` which only returns `ELIGIBLE_STATES` (PATTERN + RULE). An `approval_required` lesson stuck at INSTINCT will **never appear** in `pending_promotions()`, making the approval workflow inaccessible.
The method needs to also include INSTINCT-state lessons where `pending_approval=True`:
```python
def pending_promotions(self) -> list[dict]:
from gradata.inspection import list_rules, _load_lessons_from_path, _lesson_to_dict
lessons_path = self._find_lessons_path() or self.dir / "lessons.md"
graduated = list_rules(db_path=self.db_path, lessons_path=lessons_path)
all_lessons = _load_lessons_from_path(lessons_path)
from gradata._types import LessonState
pending = [
_lesson_to_dict(l) for l in all_lessons
if l.state == LessonState.INSTINCT and l.pending_approval
]
return pending + graduated
```
How can I resolve this? If you propose a fix, please make it concise.| def _yaml_val(v: object) -> str: | ||
| """Format a scalar value for YAML output.""" | ||
| if v is None: | ||
| return "null" | ||
| if isinstance(v, bool): | ||
| return "true" if v else "false" | ||
| if isinstance(v, (int, float)): | ||
| return str(v) | ||
| s = str(v) | ||
| # Escape embedded double quotes before wrapping | ||
| s = s.replace('"', '\\"') | ||
| # Quote strings that could be misinterpreted | ||
| if s == "" or ":" in s or "#" in s or s.startswith(("-", "[", "{")): | ||
| return f'"{s}"' | ||
| return s |
There was a problem hiding this comment.
_yaml_val doesn't escape newlines — produces malformed YAML
If a rule description or root cause contains a newline character (possible on Windows with CRLF, or from multi-line AI output), _yaml_val emits the raw character without proper YAML escaping. A description like "Use active voice.\nAvoid passive constructions." would break the YAML structure regardless of whether it gets double-quoted.
Add newline escaping before the quoting logic:
def _yaml_val(v: object) -> str:
...
s = str(v)
s = s.replace('\\', '\\\\')
s = s.replace('"', '\\"')
has_special = "\n" in s or "\r" in s or "\t" in s
s = s.replace('\n', '\\n').replace('\r', '\\r').replace('\t', '\\t')
if s == "" or ":" in s or "#" in s or s.startswith(("-", "[", "{")) or has_special:
return f'"{s}"'
return sPrompt To Fix With AI
This is a comment left during a code review.
Path: src/gradata/inspection.py
Line: 186-200
Comment:
**`_yaml_val` doesn't escape newlines — produces malformed YAML**
If a rule description or root cause contains a newline character (possible on Windows with CRLF, or from multi-line AI output), `_yaml_val` emits the raw character without proper YAML escaping. A description like `"Use active voice.\nAvoid passive constructions."` would break the YAML structure regardless of whether it gets double-quoted.
Add newline escaping before the quoting logic:
```python
def _yaml_val(v: object) -> str:
...
s = str(v)
s = s.replace('\\', '\\\\')
s = s.replace('"', '\\"')
has_special = "\n" in s or "\r" in s or "\t" in s
s = s.replace('\n', '\\n').replace('\r', '\\r').replace('\t', '\\t')
if s == "" or ":" in s or "#" in s or s.startswith(("-", "[", "{")) or has_special:
return f'"{s}"'
return s
```
How can I resolve this? If you propose a fix, please make it concise.| .env | ||
| .env.local | ||
| .env.*.local | ||
| .gstack/ |
There was a problem hiding this comment.
.brain_salt not added to .gitignore
load_or_create_salt() writes a .brain_salt file into the brain directory. If any user's brain_dir overlaps a git repo, the salt file could be accidentally committed, exposing the HMAC key used for manifest and provenance signing.
| .gstack/ | |
| .gstack/ | |
| .brain_salt |
Prompt To Fix With AI
This is a comment left during a code review.
Path: .gitignore
Line: 144
Comment:
**`.brain_salt` not added to `.gitignore`**
`load_or_create_salt()` writes a `.brain_salt` file into the brain directory. If any user's `brain_dir` overlaps a git repo, the salt file could be accidentally committed, exposing the HMAC key used for manifest and provenance signing.
```suggestion
.gstack/
.brain_salt
```
How can I resolve this? If you propose a fix, please make it concise.There was a problem hiding this comment.
Actionable comments posted: 18
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_rule_engine_v2.py (1)
65-71: 🧹 Nitpick | 🔵 TrivialTest helper uses non-obfuscated instruction format.
The
_make_appliedhelper at line 70 generates instructions with raw confidence ({lesson.confidence:.2f}), whiletest_two_same_category_mergedat line 234 expects the merged output to contain"[RULE]"without the float. This works becausemerge_related_rulespresumably applies obfuscation, but consider updating the helper to match production format for consistency.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_rule_engine_v2.py` around lines 65 - 71, The test helper _make_applied is generating instructions with a raw confidence float, which doesn't match the obfuscated production format expected by test_two_same_category_merged; update _make_applied to produce an obfuscated instruction string (e.g. replace the confidence float with the token "RULE") so it matches merge_related_rules' output format—something like f"[{lesson.state.value}:RULE] {lesson.category}: {lesson.description}"—so tests and production-like data align.
♻️ Duplicate comments (7)
src/gradata/security/correction_provenance.py (1)
33-40:⚠️ Potential issue | 🟠 MajorType-check the provenance inputs before stripping or encoding them.
create_provenance_record()still calls.strip()on truthy non-strings, andverify_provenance()still lets blank/non-string salts reachsalt.encode(). Insrc/gradata/_core.py, provenance creation failures are swallowed, so these bad boundary inputs silently drop the audit record instead of failing cleanly or returningFalse.Proposed hardening
- if not user_id or not user_id.strip(): + if not isinstance(user_id, str) or not user_id.strip(): raise ValueError("user_id must be a non-empty string") - if not correction_hash or not correction_hash.strip(): + if not isinstance(correction_hash, str) or not correction_hash.strip(): raise ValueError("correction_hash must be a non-empty string") if not isinstance(session, int) or session < 0: raise ValueError(f"session must be a non-negative integer, got {session!r}") - if not salt or not salt.strip(): + if not isinstance(salt, str) or not salt.strip(): raise ValueError("salt must be a non-empty string") @@ def verify_provenance(record: dict, salt: str) -> bool: @@ + if not isinstance(record, dict): + return False + if not isinstance(salt, str) or not salt.strip(): + return False try:As per coding guidelines "Validate input at system boundaries."
Also applies to: 55-74
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/correction_provenance.py` around lines 33 - 40, The input validators call .strip() or .encode() on values that may be non-strings; update create_provenance_record and verify_provenance to first assert types (e.g. ensure user_id and correction_hash are instances of str, session is int and >=0, and salt is str) before calling .strip() or .encode(), raising ValueError for wrong types or blank strings; specifically, change the checks around symbols user_id, correction_hash, session, and salt so type-checking happens prior to any .strip()/.encode() usage to avoid swallowing invalid inputs upstream.src/gradata/security/manifest_signing.py (1)
32-44:⚠️ Potential issue | 🟠 Major
verify_manifest()still doesn't fail closed on malformed caller input.A non-dict
manifest, blank/non-stringsalt, or a non-serializable payload will still raise out of this verifier instead of returningFalse. That breaks the API contract and bypasses the hardening already added tosign_manifest().Proposed hardening
def verify_manifest(manifest: dict, salt: str) -> bool: @@ + if not isinstance(manifest, dict): + return False + if not isinstance(salt, str) or not salt.strip(): + return False stored_sig = manifest.get("signature") if not isinstance(stored_sig, str) or not stored_sig: return False - payload = _canonical_payload(manifest) + try: + payload = _canonical_payload(manifest) + except (TypeError, ValueError): + return False expected = hmac.new(salt.encode(), payload, hashlib.sha256).hexdigest() return hmac.compare_digest(stored_sig, expected)As per coding guidelines "Validate input at system boundaries."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/manifest_signing.py` around lines 32 - 44, verify_manifest currently raises on malformed input instead of returning False; update verify_manifest to validate that manifest is a dict and salt is a non-empty string before proceeding, ensure stored_sig is a non-empty str (as already checked), and wrap the call to _canonical_payload and HMAC calculation in a try/except that returns False on any exception (e.g., non-serializable payload); keep using hmac.compare_digest for timing-safe comparison and mirror the input validation behavior of sign_manifest to preserve the API contract.tests/test_pii_redaction.py (1)
157-170:⚠️ Potential issue | 🟡 MinorThis still doesn't prove extract-before-redact.
Both branches can pass even if extraction runs on already-redacted text, because the positive path only checks for "some classification"/non-unknown severity and the control path never asserts a weaker or different outcome. Assert a concrete classification or summary derived from the raw email/token and show the redacted control does not produce it.
As per coding guidelines "assertions check specific values not just truthiness."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pii_redaction.py` around lines 157 - 170, The test test_behavioral_extraction_uses_full_text is too weak—both branches can pass without proving extraction runs before redaction; update the assertions to check for a concrete, specific classification/value produced from the unredacted draft (use the keys in result produced by brain.correct, e.g., result["classifications"] or result["data"]["summary"] or a known label your system generates for emails) and then assert the redacted control (result_redacted) does not contain that exact classification or yields a clearly different/weaker value; reference the variables result, result_redacted and the brain.correct call to locate where to replace the generic truthy checks with precise equality/containment assertions that demonstrate the unredacted input produced a specific classification that redaction removes.src/gradata/brain_inspection.py (2)
70-80:⚠️ Potential issue | 🟠 MajorDrive
pending_promotions()offpending_approval.This still returns every PATTERN/RULE from
list_rules(), soapprove_promotion()never actually clears the queue.reject_promotion()also leavespending_approvalset, so the lesson remains unresolved on disk. Filter by the flag and clear it in both resolution paths.Also applies to: 105-106, 148-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain_inspection.py` around lines 70 - 80, pending_promotions currently returns every PATTERN/RULE from list_rules so approve_promotion never clears the queue and reject_promotion leaves pending_approval set; change pending_promotions to filter the list_rules result by the lesson/rule flag pending_approval (only return items where pending_approval is true) and update approve_promotion and reject_promotion to clear the pending_approval flag on the resolved lesson/rule (ensure both functions modify the same flag used by pending_promotions and persist the change so the lesson is no longer considered pending).
95-106:⚠️ Potential issue | 🟠 MajorUse a local, locked read-modify-write for promotion decisions.
These mutators call
_find_lessons_path()in read-only mode, which can return the external.claude/lessons.mdfallback, then they read before any lock and rewrite that snapshot. A promotion decision can therefore overwrite the shared external file or drop a concurrentcorrect()/end_session()update. Resolve mutating calls withcreate=Trueand hold the file lock across the full cycle.Also applies to: 138-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain_inspection.py` around lines 95 - 106, The promotion code currently finds lessons_path via _find_lessons_path() and reads/writes without holding a lock, risking overwriting shared external files; change the flow in the methods that modify lessons (the block using _find_lessons_path(), _make_rule_id(), setting target.pending_approval, and calling write_lessons_safe(format_lessons(...))) to open the lessons file with create=True and acquire the existing file lock for the entire read-modify-write cycle (hold the lock while loading lessons, finding the target, mutating target.pending_approval, and calling write_lessons_safe), and apply the same locked read-modify-write pattern to the other mutator block referenced (the one around lines 138-152) so all mutations use a local, locked create=True file handle.src/gradata/audit.py (1)
209-216:⚠️ Potential issue | 🟠 MajorDon't join transition history on the current description text.
This trace still queries
lesson_transitionsbytarget.description[:100]. Reinforcement can lengthen or rewrite a lesson description after the transition was recorded, so historical trace output silently loses its transitions even though the row is still in SQLite. Persist/query an immutablerule_idinstead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/audit.py` around lines 209 - 216, The query in audit.py filters lesson_transitions using the mutable target.description[:100] (and target.category), which breaks historical traces when descriptions change; modify the code that builds the SQL and its parameters (the SELECT against lesson_transitions in the block using conn.execute) to instead use an immutable rule identifier (e.g., rule_id) stored on the target and in the lesson_transitions table: change the WHERE clause to use rule_id = ? (and drop description matching), pass target.rule_id as the parameter, and update any surrounding logic that assumed description-based matching to rely on rule_id for locating stored transitions.tests/test_safety_assertion.py (1)
116-135:⚠️ Potential issue | 🟡 MinorTighten the docstring contract assertions.
These OR checks still pass on unrelated text that merely mentions the constant names or transitions, so the test can miss a real documentation drift. Assert the explicit threshold/transition contract instead of allowing either token to satisfy the check.
As per coding guidelines,
tests/**: "assertions check specific values not just truthiness".🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_safety_assertion.py` around lines 116 - 135, The tests test_docstring_mentions_min_applications and test_docstring_mentions_pattern_and_rule_thresholds are too loose—replace the OR checks with strict assertions that graduate.__doc__ contains the explicit threshold and transition text; specifically require the literal "fire_count >= 3" and "fire_count >= 5" (instead of allowing MIN_APPLICATIONS_FOR_PATTERN/MIN_APPLICATIONS_FOR_RULE as alternatives) and require the literal "INSTINCT -> PATTERN" and "PATTERN -> RULE" transitions (remove the OR fallbacks including the unicode arrows), so the docstring must include those exact phrases.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/gradata/_core.py`:
- Around line 155-157: The lesson's stored correction scope isn't updated on
reinforcement so scope_json (scope_data["correction_scope"]) remains the
original value; update the persisted scope whenever a correction is applied, not
only on creation: locate where correction_scope is computed (variable
correction_scope) and where scope_data["correction_scope"] is set and ensure the
code path that handles reinforcement also assigns and saves the new
correction_scope into scope_data/scope_json (same change applied for the other
occurrence around the reinforcement/promotion logic referenced later) so ONE_OFF
gating sees the latest scope.
- Around line 141-147: The current ImportError branch assigns raw draft/final to
draft_redacted/final_redacted which silently leaks PII; instead fail closed by
re-raising or raising a clear RuntimeError when gradata.safety import or
redact_pii_with_report is unavailable so storage paths cannot persist unredacted
text. If you must allow operation to continue, set draft_redacted and
final_redacted to safe placeholders/minimal summaries (e.g.,
"[REDACTION_UNAVAILABLE]") and log the error—update the except ImportError block
around the import/redaction logic (redact_pii_with_report, draft_redacted,
final_redacted) accordingly.
In `@src/gradata/brain.py`:
- Around line 488-527: The forget() implementation reads lessons via
_find_lessons_path() and then unconditionally rewrites that path, which can
overwrite an external fallback lessons file or race with concurrent writers;
change forget() to acquire the lessons lock for the full read-modify-write cycle
and call _find_lessons_path(create=True) (or otherwise ensure a local lessons
file is created) before mutating and calling write_lessons_safe(…); ensure the
lock covers parsing (parse_lessons), the in-memory mutations that set
LessonState.KILLED/confidence, and the final write_lessons_safe(
format_lessons(lessons) ) so we never rewrite a parent fallback file or lose
concurrent updates.
- Around line 501-506: The handler that parses wl ("last" or "last N") currently
sets n = 0 for "last 0" which makes active[-0:] return the full list; update the
parsing logic in the block handling wl == "last" or wl.startswith("last ") (and
any code path that uses n to build targets and then calls write_lessons_safe) to
validate that the parsed count is an integer >= 1 and reject any values < 1
(e.g., return {"rolled_back": False, "error": "invalid count"} or similar)
instead of proceeding; ensure you still allow the default case (no count -> n=1)
but explicitly check parts[1].isdigit() and int(parts[1]) >= 1 before assigning
n and computing targets.
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 973-977: The current check for writing metadata uses truthiness
and drops valid falsy values like 0.0 (see lesson.metadata and md), so replace
the any(v for v in md.values() if v and v != 0.5) gate with a comparison against
the schema's default metadata dict: compute or import the default metadata
(e.g., default_md) and use if any(md.get(k) != default_md.get(k) for k in
set(md) | set(default_md)) to decide whether to append the metadata line (the
lines.append call), ensuring falsy but non-default values (like 0.0) are
preserved.
- Around line 315-322: The parser currently assumes
json.loads(meta_line[len("Metadata:"):].strip()) returns a dict and does
_md_dict.items(), which raises AttributeError for non-object JSON and aborts
parse_lessons; after loading into _md_dict in the try block, validate that
isinstance(_md_dict, dict) before constructing RuleMetadata (from gradata._types
import RuleMetadata as _RM) and only pass the filtered dict to _RM(**...); if
_md_dict is not a dict or a decode error occurs, set metadata_obj = None so
parse_lessons continues.
- Around line 649-667: The promotion logic in the loop (the block checking
lesson.state and calling transition(..., "promote")) ignores the one-off scope,
allowing ONE_OFF lessons to be promoted during update_confidence(); fix this by
gating any promotions on the lesson not being a one-off (i.e., add a check that
lesson.scope != LessonScope.ONE_OFF or equivalent) before calling transition;
keep demotion behavior unchanged and reference the existing symbols
lesson.state, lesson.scope (or the project's ONE_OFF scope constant),
transition(), and graduate() so promotions remain blocked for one-off lessons
until graduate() handles them.
In `@src/gradata/inspection.py`:
- Around line 130-136: The transition lookup uses the full target.description
but brain_end_session() stored lesson_transitions.lesson_desc as
l.description[:100], so queries miss any rules with descriptions >100 chars;
update the binding in the query inside the function (where rows =
conn.execute(..., (target.description, target.category), ...)) to use the same
truncation (target.description[:100], target.category) or, preferably, switch
the join to use the immutable rule identifier if a rule_id field exists (use
rule_id in both insertion and this SELECT) to ensure consistent matching between
stored keys and lookups.
In `@src/gradata/rules/rule_engine.py`:
- Around line 744-770: The post-shuffle code uses rules[0] for the "REMINDER:"
footer but after bucketed shuffling rules[0] is no longer the top-ranked rule
from apply_rules(); to fix, capture the pre-shuffle top AppliedRule (e.g.,
top_rule = rules[0] right after merge_related_rules() and before creating
buckets) and then after the in-tier randomization use that captured top_rule
when composing the recency reminder footer instead of the shuffled rules[0];
reference the existing symbols merge_related_rules, LessonState, AppliedRule and
the rules list so the captured top_rule is available in the scope where the
footer is generated.
In `@src/gradata/safety.py`:
- Around line 52-77: The provenance hashing currently uses raw draft/final
values while persistence stores draft_redacted/final_redacted from
redact_pii_with_report; update the correction_hash computation to use the
redacted payloads returned by redact_pii_with_report instead of the raw inputs
(i.e., when building correction_hash in the logic that sets
draft_redacted/final_redacted in src/gradata/_core.py, compute the hash from
those redacted strings), ensure any helper that computes correction_hash (the
function or inline block that previously hashed draft/final) is changed to
accept or read the redacted values, and add/update the integration test to
assert that stored correction_hash verifies against the persisted redacted
payloads (not the raw ones).
In `@src/gradata/security/brain_salt.py`:
- Around line 76-78: The calculation of fraction uses struct.unpack(">I",
digest[:4]) to get value and then divides by 0xFFFFFFFF producing a possible
1.0; change the divisor to 0x100000000 so fraction = value / 0x100000000 yields
a proper [0, 1) range (update the code computing fraction where 'digest' is
unpacked and 'fraction' is assigned).
- Around line 56-69: salt_threshold currently accepts invalid inputs; add input
validation to ensure safe, predictable outputs by checking parameters at the
start of the function: verify base is a positive float (raise ValueError if base
<= 0), verify salt is a non-empty 64-character hex string (raise ValueError if
empty or invalid hex/length), and verify tier_name is non-empty (raise
ValueError if empty); update salt_threshold to perform these checks before
computing the HMAC-SHA256 jitter so invalid inputs fail fast with clear errors
(refer to the salt_threshold function and its args base, salt, tier_name).
In `@src/gradata/security/query_budget.py`:
- Around line 84-86: Remove the redundant copy of the timestamps list: since the
variable timestamps is already a list (created earlier), drop the unnecessary
ts_list = list(timestamps) and use timestamps directly (or rename timestamps to
ts_list if clearer) in the block where n = len(ts_list) is computed; update
references in this scope (e.g., n = len(...)) to use the existing timestamps
variable to avoid the extra allocation.
In `@src/gradata/security/score_obfuscation.py`:
- Around line 76-84: The timing pad currently returns immediately if fn()
raises, leaking faster failure timings; wrap the call to fn() in a
try/except/finally (or try/finally) so that you always compute elapsed_ms, add
jitter (using jitter_ms and min_ms), sleep for remaining_ms as before, and only
then re-raise any caught exception; keep the same variables (start, elapsed_ms,
jitter, target_ms, remaining_ms) and the same return path for the successful
result, but ensure exceptions are re-thrown after the sleep.
In `@tests/test_batch_approval.py`:
- Around line 201-206: Replace the exact float equality assertion for the rule
confidence with a pytest.approx comparison: update the assertion that checks
matching[0]["confidence"] (in tests/test_batch_approval.py where all_rules,
matching and rule_id are used) to use pytest.approx(0.40) instead of == 0.40 so
the test compares floats with tolerance per project guidelines.
In `@tests/test_brain_learning.py`:
- Around line 160-162: The test test_forget_last_no_lessons uses a permissive
assertion that passes when keys are missing; change it to explicitly assert no
action by checking both flags are False (e.g., assert result.get("forgot",
False) is False and result.get("rolled_back", False) is False) when calling
fresh_brain.forget("last") so missing keys are treated as False and the intent
is unambiguous.
In `@tests/test_brain_salt.py`:
- Around line 13-22: Move the repeated per-test import to the module top-level:
import generate_brain_salt once using "from gradata.security.brain_salt import
generate_brain_salt" at the top of the test file, then remove the inline imports
inside test_returns_64_char_hex and test_100_salts_unique so the tests call
generate_brain_salt() directly; this keeps test behavior the same while
improving clarity and slightly reducing import overhead.
In `@tests/test_pii_redaction.py`:
- Around line 42-52: Extend the tests in test_github_pat and
test_slack_token_redacted to cover the new prefixes added in
src/gradata/safety.py: for GitHub create a fake token starting with
"github_pat_" (in addition to the existing "ghp_") and assert redact_pii(f"...")
contains "[REDACTED_GITHUB_TOKEN]" and does not contain the raw token; for Slack
add cases for "xoxp-", "xapp-", and "xwfp-" (in addition to "xoxb-") and assert
redact_pii returns "[REDACTED_SLACK_TOKEN]" and strips the raw token. Use the
existing helper _build and the redact_pii function to construct inputs and
expectations (update or add assertions in test_github_pat and
test_slack_token_redacted).
---
Outside diff comments:
In `@tests/test_rule_engine_v2.py`:
- Around line 65-71: The test helper _make_applied is generating instructions
with a raw confidence float, which doesn't match the obfuscated production
format expected by test_two_same_category_merged; update _make_applied to
produce an obfuscated instruction string (e.g. replace the confidence float with
the token "RULE") so it matches merge_related_rules' output format—something
like f"[{lesson.state.value}:RULE] {lesson.category}: {lesson.description}"—so
tests and production-like data align.
---
Duplicate comments:
In `@src/gradata/audit.py`:
- Around line 209-216: The query in audit.py filters lesson_transitions using
the mutable target.description[:100] (and target.category), which breaks
historical traces when descriptions change; modify the code that builds the SQL
and its parameters (the SELECT against lesson_transitions in the block using
conn.execute) to instead use an immutable rule identifier (e.g., rule_id) stored
on the target and in the lesson_transitions table: change the WHERE clause to
use rule_id = ? (and drop description matching), pass target.rule_id as the
parameter, and update any surrounding logic that assumed description-based
matching to rely on rule_id for locating stored transitions.
In `@src/gradata/brain_inspection.py`:
- Around line 70-80: pending_promotions currently returns every PATTERN/RULE
from list_rules so approve_promotion never clears the queue and reject_promotion
leaves pending_approval set; change pending_promotions to filter the list_rules
result by the lesson/rule flag pending_approval (only return items where
pending_approval is true) and update approve_promotion and reject_promotion to
clear the pending_approval flag on the resolved lesson/rule (ensure both
functions modify the same flag used by pending_promotions and persist the change
so the lesson is no longer considered pending).
- Around line 95-106: The promotion code currently finds lessons_path via
_find_lessons_path() and reads/writes without holding a lock, risking
overwriting shared external files; change the flow in the methods that modify
lessons (the block using _find_lessons_path(), _make_rule_id(), setting
target.pending_approval, and calling write_lessons_safe(format_lessons(...))) to
open the lessons file with create=True and acquire the existing file lock for
the entire read-modify-write cycle (hold the lock while loading lessons, finding
the target, mutating target.pending_approval, and calling write_lessons_safe),
and apply the same locked read-modify-write pattern to the other mutator block
referenced (the one around lines 138-152) so all mutations use a local, locked
create=True file handle.
In `@src/gradata/security/correction_provenance.py`:
- Around line 33-40: The input validators call .strip() or .encode() on values
that may be non-strings; update create_provenance_record and verify_provenance
to first assert types (e.g. ensure user_id and correction_hash are instances of
str, session is int and >=0, and salt is str) before calling .strip() or
.encode(), raising ValueError for wrong types or blank strings; specifically,
change the checks around symbols user_id, correction_hash, session, and salt so
type-checking happens prior to any .strip()/.encode() usage to avoid swallowing
invalid inputs upstream.
In `@src/gradata/security/manifest_signing.py`:
- Around line 32-44: verify_manifest currently raises on malformed input instead
of returning False; update verify_manifest to validate that manifest is a dict
and salt is a non-empty string before proceeding, ensure stored_sig is a
non-empty str (as already checked), and wrap the call to _canonical_payload and
HMAC calculation in a try/except that returns False on any exception (e.g.,
non-serializable payload); keep using hmac.compare_digest for timing-safe
comparison and mirror the input validation behavior of sign_manifest to preserve
the API contract.
In `@tests/test_pii_redaction.py`:
- Around line 157-170: The test test_behavioral_extraction_uses_full_text is too
weak—both branches can pass without proving extraction runs before redaction;
update the assertions to check for a concrete, specific classification/value
produced from the unredacted draft (use the keys in result produced by
brain.correct, e.g., result["classifications"] or result["data"]["summary"] or a
known label your system generates for emails) and then assert the redacted
control (result_redacted) does not contain that exact classification or yields a
clearly different/weaker value; reference the variables result, result_redacted
and the brain.correct call to locate where to replace the generic truthy checks
with precise equality/containment assertions that demonstrate the unredacted
input produced a specific classification that redaction removes.
In `@tests/test_safety_assertion.py`:
- Around line 116-135: The tests test_docstring_mentions_min_applications and
test_docstring_mentions_pattern_and_rule_thresholds are too loose—replace the OR
checks with strict assertions that graduate.__doc__ contains the explicit
threshold and transition text; specifically require the literal "fire_count >=
3" and "fire_count >= 5" (instead of allowing
MIN_APPLICATIONS_FOR_PATTERN/MIN_APPLICATIONS_FOR_RULE as alternatives) and
require the literal "INSTINCT -> PATTERN" and "PATTERN -> RULE" transitions
(remove the OR fallbacks including the unicode arrows), so the docstring must
include those exact phrases.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0d8f06dd-c4f9-4402-b476-5f4b4cdeebcb
📒 Files selected for processing (23)
src/gradata/_core.pysrc/gradata/audit.pysrc/gradata/brain.pysrc/gradata/brain_inspection.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/inspection.pysrc/gradata/rules/rule_engine.pysrc/gradata/safety.pysrc/gradata/security/__init__.pysrc/gradata/security/brain_salt.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/manifest_signing.pysrc/gradata/security/query_budget.pysrc/gradata/security/score_obfuscation.pytests/test_audit_provenance.pytests/test_batch_approval.pytests/test_brain_learning.pytests/test_brain_salt.pytests/test_manifest_signing.pytests/test_pii_redaction.pytests/test_query_budget.pytests/test_rule_engine_v2.pytests/test_safety_assertion.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Greptile Review
- GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (5)
**/*
📄 CodeRabbit inference engine (CLAUDE.md)
Always read a file before editing it. Always prefer editing over creating new files.
Files:
tests/test_brain_learning.pytests/test_manifest_signing.pytests/test_query_budget.pysrc/gradata/safety.pysrc/gradata/security/brain_salt.pytests/test_brain_salt.pysrc/gradata/rules/rule_engine.pytests/test_batch_approval.pysrc/gradata/security/__init__.pytests/test_pii_redaction.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/security/manifest_signing.pysrc/gradata/inspection.pytests/test_safety_assertion.pytests/test_audit_provenance.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/_core.pysrc/gradata/brain_inspection.pytests/test_rule_engine_v2.pysrc/gradata/audit.pysrc/gradata/brain.py
**/*.{py,sh,js,ts,json,env*}
📄 CodeRabbit inference engine (CLAUDE.md)
Never hardcode secrets. Never commit .env files. Pre-commit hook blocks both.
Files:
tests/test_brain_learning.pytests/test_manifest_signing.pytests/test_query_budget.pysrc/gradata/safety.pysrc/gradata/security/brain_salt.pytests/test_brain_salt.pysrc/gradata/rules/rule_engine.pytests/test_batch_approval.pysrc/gradata/security/__init__.pytests/test_pii_redaction.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/security/manifest_signing.pysrc/gradata/inspection.pytests/test_safety_assertion.pytests/test_audit_provenance.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/_core.pysrc/gradata/brain_inspection.pytests/test_rule_engine_v2.pysrc/gradata/audit.pysrc/gradata/brain.py
**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Keep files under 500 lines. Validate input at system boundaries.
Files:
tests/test_brain_learning.pytests/test_manifest_signing.pytests/test_query_budget.pysrc/gradata/safety.pysrc/gradata/security/brain_salt.pytests/test_brain_salt.pysrc/gradata/rules/rule_engine.pytests/test_batch_approval.pysrc/gradata/security/__init__.pytests/test_pii_redaction.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/security/manifest_signing.pysrc/gradata/inspection.pytests/test_safety_assertion.pytests/test_audit_provenance.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/_core.pysrc/gradata/brain_inspection.pytests/test_rule_engine_v2.pysrc/gradata/audit.pysrc/gradata/brain.py
tests/**
⚙️ CodeRabbit configuration file
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.
Files:
tests/test_brain_learning.pytests/test_manifest_signing.pytests/test_query_budget.pytests/test_brain_salt.pytests/test_batch_approval.pytests/test_pii_redaction.pytests/test_safety_assertion.pytests/test_audit_provenance.pytests/test_rule_engine_v2.py
src/gradata/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use AGPL-3.0 licensing. Source code resides in src/gradata/.
Files:
src/gradata/safety.pysrc/gradata/security/brain_salt.pysrc/gradata/rules/rule_engine.pysrc/gradata/security/__init__.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/security/manifest_signing.pysrc/gradata/inspection.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/_core.pysrc/gradata/brain_inspection.pysrc/gradata/audit.pysrc/gradata/brain.py
⚙️ CodeRabbit configuration file
src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].
Files:
src/gradata/safety.pysrc/gradata/security/brain_salt.pysrc/gradata/rules/rule_engine.pysrc/gradata/security/__init__.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/security/manifest_signing.pysrc/gradata/inspection.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/score_obfuscation.pysrc/gradata/security/query_budget.pysrc/gradata/_core.pysrc/gradata/brain_inspection.pysrc/gradata/audit.pysrc/gradata/brain.py
🧠 Learnings (1)
📚 Learning: 2026-04-06T15:53:08.573Z
Learnt from: CR
Repo: Gradata/gradata PR: 0
File: CLAUDE.md:0-0
Timestamp: 2026-04-06T15:53:08.573Z
Learning: Correction pipeline: correct → extract behavioral instruction → graduate (INSTINCT→PATTERN→RULE) → inject into next session. Meta-rules emerge when patterns cluster across domains.
Applied to files:
src/gradata/enhancements/self_improvement.py
🔇 Additional comments (14)
src/gradata/security/query_budget.py (2)
24-31: LGTM — past review comments addressed.Constructor validation now correctly rejects non-positive
window_secondsand negativemax_calls. The validation messages are clear and follow guidelines.
113-121: LGTM — empty bucket cleanup implemented.The
_prunemethod now removes empty deques fromself._callsafter pruning, preventing unbounded key growth from high endpoint cardinality.tests/test_query_budget.py (2)
40-62: LGTM — deterministic time mocking implemented.The burst detection tests now use a controllable clock via
patch("gradata.security.query_budget.time")instead oftime.sleep(), eliminating flakiness and improving test speed. The pattern of advancing_clock[0]between records is clean and readable.
110-127: Thorough constructor validation coverage.Tests correctly verify rejection of zero/negative
window_seconds, negativemax_calls, and acceptance ofmax_calls=0edge case.tests/test_rule_engine_v2.py (2)
234-234: LGTM — assertion updated for tier-label obfuscation.The test now correctly expects
"[RULE]"(tier label without raw confidence) in merged instructions, aligning with the score obfuscation changes.
390-390: The rule-id suffix length change is already consistent across the codebase.The only assertion validating rule ID format (tests/test_rule_engine_v2.py:390) correctly expects 8 characters. The
_make_rule_id()function is documented to generate this format, and all tests that create rule IDs via that function will automatically produce consistent results. Hardcoded rule_ids in other tests (e.g., test_audit_provenance.py, test_bug_fixes.py) are intentional mock values and not subject to the CATEGORY:HHHHHHHH format requirement.tests/test_brain_learning.py (2)
183-190: Test handles polymorphic return type appropriately.The test correctly handles both
listanddictreturn types forforget("all TONE"), accommodating the flexible API design.
205-209: LGTM — default behavior test added.Good coverage of the implicit "last" default when
forget()is called with no arguments.src/gradata/security/__init__.py (1)
1-34: LGTM — clean public API surface.The module correctly aggregates and re-exports security utilities with:
from __future__ import annotationspresent (addressing prior review)- Consistent
__all__matching all imports- Logical grouping of related functions
tests/test_manifest_signing.py (2)
59-63: LGTM — malformed signature type test added.The test now covers fail-closed behavior for non-string signature types including
int,dict,list,None, andbool, addressing the prior review comment.
20-37: Good test isolation with deepcopy.
test_does_not_mutate_originalcorrectly usescopy.deepcopyto verify thatsign_manifestdoesn't modify the input dictionary.src/gradata/security/brain_salt.py (1)
29-53: LGTM — atomic salt file creation implemented.The implementation now:
- Creates missing parent directories (line 32)
- Validates existing salt length before returning (lines 38-39)
- Uses atomic temp file +
os.replacepattern (lines 43-49)- Falls back gracefully on
OSError(lines 50-52)This addresses the prior review comment about race conditions and missing directories.
tests/test_brain_salt.py (2)
48-63: LGTM — missing directory and partial salt tests added.The tests now cover:
- Creating missing parent directories (lines 48-54)
- Detecting and regenerating truncated/invalid salt files (lines 56-63)
This addresses the prior review comments about untested failure modes.
110-120: Good statistical distribution test.The test verifies that 50 randomly-salted thresholds stay within bounds and produce at least 5 distinct values, providing reasonable confidence that the jitter isn't degenerate.
| # PII redaction — runs AFTER extraction on full text, BEFORE storage | ||
| try: | ||
| from gradata.safety import redact_pii_with_report | ||
| draft_redacted, _ = redact_pii_with_report(draft) | ||
| final_redacted, _ = redact_pii_with_report(final) | ||
| except ImportError: | ||
| draft_redacted, final_redacted = draft, final |
There was a problem hiding this comment.
Don't fail open on redaction import errors.
If gradata.safety is unavailable, this branch stores the raw draft/final in the correction event, pending-approval rows, and FTS index. That turns an import/packaging problem into a silent PII leak; fail closed here or persist placeholders/minimal summaries instead.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_core.py` around lines 141 - 147, The current ImportError branch
assigns raw draft/final to draft_redacted/final_redacted which silently leaks
PII; instead fail closed by re-raising or raising a clear RuntimeError when
gradata.safety import or redact_pii_with_report is unavailable so storage paths
cannot persist unredacted text. If you must allow operation to continue, set
draft_redacted and final_redacted to safe placeholders/minimal summaries (e.g.,
"[REDACTION_UNAVAILABLE]") and log the error—update the except ImportError block
around the import/redaction logic (redact_pii_with_report, draft_redacted,
final_redacted) accordingly.
| lessons_path = self._find_lessons_path() | ||
| if not lessons_path or not lessons_path.is_file(): | ||
| return {"rolled_back": False, "error": "no lessons file"} | ||
| lessons = parse_lessons(lessons_path.read_text(encoding="utf-8")) | ||
| before = len(lessons) | ||
| filtered = [l for l in lessons if not ( | ||
| (description and description.lower() in l.description.lower()) or | ||
| (category and l.category.upper() == category.upper()))] | ||
| removed = before - len(filtered) | ||
| if removed > 0: | ||
| from gradata._db import write_lessons_safe | ||
| write_lessons_safe(lessons_path, format_lessons(filtered)) | ||
| return removed | ||
|
|
||
| what = what.strip() | ||
| wl = what.lower() | ||
|
|
||
| # Resolve target indices | ||
| active = [(i, l) for i, l in enumerate(lessons) | ||
| if l.state in (LessonState.INSTINCT, LessonState.PATTERN, LessonState.RULE)] | ||
| targets: list[int] = [] | ||
|
|
||
| if wl == "last" or wl.startswith("last "): | ||
| parts = wl.split() | ||
| n = int(parts[1]) if len(parts) == 2 and parts[1].isdigit() else 1 | ||
| if not active: | ||
| return {"rolled_back": False, "error": "no active lessons"} | ||
| targets = [i for i, _ in active[-n:]] | ||
|
|
||
| elif wl.startswith("all "): | ||
| cat = what[4:].strip() | ||
| targets = [i for i, l in active if l.category.upper() == cat.upper()] | ||
| if not targets: | ||
| return {"rolled_back": False, "error": f"no active lessons in '{cat}'"} | ||
|
|
||
| else: | ||
| # Fuzzy match on description — single target | ||
| return self.rollback(description=what) | ||
|
|
||
| # Batch kill: mutate in memory, write once | ||
| results = [] | ||
| for idx in targets: | ||
| lesson = lessons[idx] | ||
| old_state, old_conf = lesson.state.value, lesson.confidence | ||
| lesson.state, lesson.confidence = LessonState.KILLED, 0.0 | ||
| results.append({"rolled_back": True, "lesson_index": idx, | ||
| "category": lesson.category, "description": lesson.description, | ||
| "previous_state": old_state, "previous_confidence": old_conf}) | ||
| write_lessons_safe(lessons_path, format_lessons(lessons)) |
There was a problem hiding this comment.
forget() should not rewrite the external fallback lessons file.
Like the promotion mutators, this uses _find_lessons_path() in read-only mode and then rewrites whatever path came back. If the brain has no local lessons.md, a forget call can modify the parent/working-dir .claude/lessons.md; because the read happens before locking, it can also clobber a concurrent correct() / end_session() update. Use create=True for this destructive path and lock the full cycle.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/brain.py` around lines 488 - 527, The forget() implementation
reads lessons via _find_lessons_path() and then unconditionally rewrites that
path, which can overwrite an external fallback lessons file or race with
concurrent writers; change forget() to acquire the lessons lock for the full
read-modify-write cycle and call _find_lessons_path(create=True) (or otherwise
ensure a local lessons file is created) before mutating and calling
write_lessons_safe(…); ensure the lock covers parsing (parse_lessons), the
in-memory mutations that set LessonState.KILLED/confidence, and the final
write_lessons_safe( format_lessons(lessons) ) so we never rewrite a parent
fallback file or lose concurrent updates.
| if wl == "last" or wl.startswith("last "): | ||
| parts = wl.split() | ||
| n = int(parts[1]) if len(parts) == 2 and parts[1].isdigit() else 1 | ||
| if not active: | ||
| return {"rolled_back": False, "error": "no active lessons"} | ||
| targets = [i for i, _ in active[-n:]] |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
rg -n 'def forget|active\[-n:\]|n = int\(parts\[1\]\)' --type=py
python - <<'PY'
active = ["a", "b", "c"]
n = 0
assert active[-n:] == active
print("Verified: active[-0:] returns the full list")
PYRepository: Gradata/gradata
Length of output: 478
🏁 Script executed:
# Read the complete forget() function to check for any existing guards
sed -n '472,530p' src/gradata/brain.py | cat -nRepository: Gradata/gradata
Length of output: 3063
🏁 Script executed:
# Check if there are any tests that cover the forget() function and edge cases
rg -n 'def test.*forget|forget\(' --type=py -A 2 | head -40Repository: Gradata/gradata
Length of output: 2897
🏁 Script executed:
# Verify the slicing behavior more comprehensively and check what "active" contains
python - <<'PY'
# Verify the exact behavior with different n values
test_cases = [
(0, "n=0 case"),
(1, "n=1 case"),
(-1, "n=-1 case"),
]
active = ["lesson_a", "lesson_b", "lesson_c"]
for n, desc in test_cases:
result = active[-n:] if n != 0 else []
actual = active[-n:]
print(f"{desc}: active[-{n}:] = {actual} (should_be={result})")
print("\n--- Key observation ---")
print(f"active[-0:] == active[-0:] -> {active[-0:] == active}")
print(f"This means n=0 has no special meaning and acts like slicing from 0 to end")
PYRepository: Gradata/gradata
Length of output: 418
Add guard to reject invalid last N counts; last 0 currently deletes all active lessons.
When what is "last 0", the code assigns n = 0 and evaluates active[-0:], which in Python is equivalent to the full slice active[:]—meaning all active lessons get marked as KILLED instead of none. This is unguarded input validation at a destructive system boundary (write_lessons_safe is called immediately after).
The fix is to reject counts less than 1:
Fix
if wl == "last" or wl.startswith("last "):
parts = wl.split()
n = int(parts[1]) if len(parts) == 2 and parts[1].isdigit() else 1
+ if n < 1:
+ return {"rolled_back": False, "error": "count must be >= 1"}
if not active:
return {"rolled_back": False, "error": "no active lessons"}
targets = [i for i, _ in active[-n:]]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/brain.py` around lines 501 - 506, The handler that parses wl
("last" or "last N") currently sets n = 0 for "last 0" which makes active[-0:]
return the full list; update the parsing logic in the block handling wl ==
"last" or wl.startswith("last ") (and any code path that uses n to build targets
and then calls write_lessons_safe) to validate that the parsed count is an
integer >= 1 and reject any values < 1 (e.g., return {"rolled_back": False,
"error": "invalid count"} or similar) instead of proceeding; ensure you still
allow the default case (no count -> n=1) but explicitly check parts[1].isdigit()
and int(parts[1]) >= 1 before assigning n and computing targets.
| elif meta_line.startswith("Metadata:"): | ||
| import json as _json_md | ||
| try: | ||
| _md_dict = _json_md.loads(meta_line[len("Metadata:"):].strip()) | ||
| from gradata._types import RuleMetadata as _RM | ||
| metadata_obj = _RM(**{k: v for k, v in _md_dict.items() if k in _RM.__dataclass_fields__}) | ||
| except (ValueError, TypeError, _json_md.JSONDecodeError): | ||
| metadata_obj = None |
There was a problem hiding this comment.
Ignore non-object Metadata: JSON instead of crashing the parser.
json.loads() can return a list/string/number here, and _md_dict.items() will then raise AttributeError, aborting parse_lessons() on a malformed metadata line. Treat anything other than a JSON object as invalid metadata and continue.
Minimal fix
elif meta_line.startswith("Metadata:"):
import json as _json_md
try:
_md_dict = _json_md.loads(meta_line[len("Metadata:"):].strip())
+ if not isinstance(_md_dict, dict):
+ raise ValueError("metadata must be a JSON object")
from gradata._types import RuleMetadata as _RM
metadata_obj = _RM(**{k: v for k, v in _md_dict.items() if k in _RM.__dataclass_fields__})
except (ValueError, TypeError, _json_md.JSONDecodeError):
metadata_obj = NoneAs per coding guidelines "Validate input at system boundaries."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/enhancements/self_improvement.py` around lines 315 - 322, The
parser currently assumes json.loads(meta_line[len("Metadata:"):].strip())
returns a dict and does _md_dict.items(), which raises AttributeError for
non-object JSON and aborts parse_lessons; after loading into _md_dict in the try
block, validate that isinstance(_md_dict, dict) before constructing RuleMetadata
(from gradata._types import RuleMetadata as _RM) and only pass the filtered dict
to _RM(**...); if _md_dict is not a dict or a decode error occurs, set
metadata_obj = None so parse_lessons continues.
| start = time.perf_counter() | ||
| result = fn() | ||
| elapsed_ms = (time.perf_counter() - start) * 1000 | ||
| jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0 | ||
| target_ms = min_ms + jitter | ||
| remaining_ms = target_ms - elapsed_ms | ||
| if remaining_ms > 0: | ||
| time.sleep(remaining_ms / 1000) | ||
| return result |
There was a problem hiding this comment.
Pad exception paths too.
If fn() raises, control exits before the sleep logic runs, so failures are still measurably faster than success paths. That defeats the constant-time guarantee on the branch attackers care about most.
Minimal fix
if min_ms < 0 or jitter_ms < 0:
raise ValueError("min_ms and jitter_ms must be non-negative")
start = time.perf_counter()
- result = fn()
- elapsed_ms = (time.perf_counter() - start) * 1000
- jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0
- target_ms = min_ms + jitter
- remaining_ms = target_ms - elapsed_ms
- if remaining_ms > 0:
- time.sleep(remaining_ms / 1000)
- return result
+ try:
+ return fn()
+ finally:
+ elapsed_ms = (time.perf_counter() - start) * 1000
+ jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0
+ target_ms = min_ms + jitter
+ remaining_ms = target_ms - elapsed_ms
+ if remaining_ms > 0:
+ time.sleep(remaining_ms / 1000)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| start = time.perf_counter() | |
| result = fn() | |
| elapsed_ms = (time.perf_counter() - start) * 1000 | |
| jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0 | |
| target_ms = min_ms + jitter | |
| remaining_ms = target_ms - elapsed_ms | |
| if remaining_ms > 0: | |
| time.sleep(remaining_ms / 1000) | |
| return result | |
| def constant_time_pad(fn, min_ms: float = 20.0, jitter_ms: float = 5.0): | |
| if min_ms < 0 or jitter_ms < 0: | |
| raise ValueError("min_ms and jitter_ms must be non-negative") | |
| start = time.perf_counter() | |
| try: | |
| return fn() | |
| finally: | |
| elapsed_ms = (time.perf_counter() - start) * 1000 | |
| jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0 | |
| target_ms = min_ms + jitter | |
| remaining_ms = target_ms - elapsed_ms | |
| if remaining_ms > 0: | |
| time.sleep(remaining_ms / 1000) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/security/score_obfuscation.py` around lines 76 - 84, The timing
pad currently returns immediately if fn() raises, leaking faster failure
timings; wrap the call to fn() in a try/except/finally (or try/finally) so that
you always compute elapsed_ms, add jitter (using jitter_ms and min_ms), sleep
for remaining_ms as before, and only then re-raise any caught exception; keep
the same variables (start, elapsed_ms, jitter, target_ms, remaining_ms) and the
same return path for the successful result, but ensure exceptions are re-thrown
after the sleep.
| # Check it exists as INSTINCT in all rules | ||
| all_rules = brain.rules(include_all=True) | ||
| matching = [r for r in all_rules if r["id"] == rule_id] | ||
| assert len(matching) == 1 | ||
| assert matching[0]["state"] == "INSTINCT" | ||
| assert matching[0]["confidence"] == 0.40 |
There was a problem hiding this comment.
Use pytest.approx() for the confidence assertion.
This is a float literal comparison in a test file; the repo standard is to avoid exact == on floats.
Minimal fix
- assert matching[0]["confidence"] == 0.40
+ assert matching[0]["confidence"] == pytest.approx(0.40)As per coding guidelines "floating point comparisons use pytest.approx."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_batch_approval.py` around lines 201 - 206, Replace the exact float
equality assertion for the rule confidence with a pytest.approx comparison:
update the assertion that checks matching[0]["confidence"] (in
tests/test_batch_approval.py where all_rules, matching and rule_id are used) to
use pytest.approx(0.40) instead of == 0.40 so the test compares floats with
tolerance per project guidelines.
| def test_forget_last_no_lessons(self, fresh_brain): | ||
| result = fresh_brain.forget("last") | ||
| assert result.get("forgot") is False or result.get("rolled_back") is False |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Assertion may be overly permissive.
The condition result.get("forgot") is False or result.get("rolled_back") is False succeeds if either key is missing (returns None, which is not False). If the intent is to verify no action was taken, consider a more explicit check.
♻️ Clearer assertion
def test_forget_last_no_lessons(self, fresh_brain):
result = fresh_brain.forget("last")
- assert result.get("forgot") is False or result.get("rolled_back") is False
+ # When no lessons exist, forget should indicate nothing was rolled back
+ assert result.get("rolled_back") is False or result.get("forgot") is False
+ # Or if the API guarantees a specific key:
+ # assert result.get("rolled_back", False) is False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_brain_learning.py` around lines 160 - 162, The test
test_forget_last_no_lessons uses a permissive assertion that passes when keys
are missing; change it to explicitly assert no action by checking both flags are
False (e.g., assert result.get("forgot", False) is False and
result.get("rolled_back", False) is False) when calling
fresh_brain.forget("last") so missing keys are treated as False and the intent
is unambiguous.
| def test_returns_64_char_hex(self): | ||
| from gradata.security.brain_salt import generate_brain_salt | ||
| salt = generate_brain_salt() | ||
| assert len(salt) == 64 | ||
| assert all(c in "0123456789abcdef" for c in salt) | ||
|
|
||
| def test_100_salts_unique(self): | ||
| from gradata.security.brain_salt import generate_brain_salt | ||
| salts = {generate_brain_salt() for _ in range(100)} | ||
| assert len(salts) == 100 |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider top-level imports for test clarity.
The pattern of importing inside each test method (from gradata.security.brain_salt import ...) is unusual. While valid, top-level imports would be more conventional and slightly faster.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_brain_salt.py` around lines 13 - 22, Move the repeated per-test
import to the module top-level: import generate_brain_salt once using "from
gradata.security.brain_salt import generate_brain_salt" at the top of the test
file, then remove the inline imports inside test_returns_64_char_hex and
test_100_salts_unique so the tests call generate_brain_salt() directly; this
keeps test behavior the same while improving clarity and slightly reducing
import overhead.
|
@coderabbitai full review |
✅ Actions performedFull review triggered. |
There was a problem hiding this comment.
Actionable comments posted: 19
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
tests/test_rule_engine_v2.py (1)
226-235: 🛠️ Refactor suggestion | 🟠 MajorKeep the no-raw-score guarantee in this regression test.
This proves
[RULE]is present, but it no longer proves the merged instruction stopped leaking raw confidences. The test would still pass if the output became"[RULE] ... 0.95".🧪 Tighten the regression
assert "merged_" in result[0].rule_id assert "Rule A" in result[0].instruction assert "Rule B" in result[0].instruction assert "[RULE]" in result[0].instruction # tier label, no raw float + assert "0.90" not in result[0].instruction + assert "0.95" not in result[0].instruction🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_rule_engine_v2.py` around lines 226 - 235, The test test_two_same_category_merged currently only checks for the "[RULE]" tier label but doesn't assert that raw confidence floats are omitted; update the assertion for the merged result (from merge_related_rules) to explicitly ensure result[0].instruction does not contain raw numeric confidences (e.g., no substrings matching decimal numbers like "0.95" or a regex for digits with a decimal point) while still asserting it contains "Rule A" and "Rule B" and the "[RULE]" label.
♻️ Duplicate comments (20)
src/gradata/security/query_budget.py (1)
39-45:⚠️ Potential issue | 🟠 MajorDon't recreate empty buckets on unknown endpoints.
count()prunes the endpoint and then immediately re-inserts an empty deque viaself._calls[endpoint]. Repeatedcount()/is_rate_exceeded()calls for unknown names will still growself._calls, so the cleanup in_prune()never actually caps endpoint cardinality.🧹 Minimal fix
def count(self, endpoint: str) -> int: """Return the number of calls for *endpoint* inside the current window. Expired entries are pruned as a side-effect. """ self._prune(endpoint) - return len(self._calls[endpoint]) + calls = self._calls.get(endpoint) + return len(calls) if calls is not None else 0🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/query_budget.py` around lines 39 - 45, The count() method currently prunes and then accesses self._calls[endpoint], which creates an empty deque for unknown endpoints and prevents cardinality from being capped; change count() to call self._prune(endpoint) then check for existence (e.g. if endpoint not in self._calls: return 0) or use self._calls.get(endpoint) to avoid inserting a new key, so only real endpoints create buckets (refer to methods count, _prune, and the _calls dict).tests/test_brain_learning.py (1)
160-162:⚠️ Potential issue | 🟡 MinorThis no-op assertion is still too permissive.
Using
orhere means the test passes if either key is simply missing, so it does not actually verify the empty-history behavior offorget("last").As per coding guidelines "tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness, parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx."✅ Make the expectation explicit
def test_forget_last_no_lessons(self, fresh_brain): result = fresh_brain.forget("last") - assert result.get("forgot") is False or result.get("rolled_back") is False + assert result.get("forgot", False) is False + assert result.get("rolled_back", False) is False🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_brain_learning.py` around lines 160 - 162, The test test_forget_last_no_lessons is too permissive because it uses an "or" on result.get("forgot") and result.get("rolled_back"); change it to explicitly assert the expected empty-history outcome by checking both keys are present and False (e.g., assert result.get("forgot") is False and result.get("rolled_back") is False) or assert equality against the exact expected dict returned by fresh_brain.forget("last"); reference the fresh_brain.forget("last") call and the 'forgot' and 'rolled_back' keys to locate and update the assertion.src/gradata/security/brain_salt.py (1)
76-78:⚠️ Potential issue | 🟡 MinorOff-by-one:
fractioncan reach exactly 1.0.The comment on line 76 claims the range is
[0, 1), but dividing by0xFFFFFFFFallowsfractionto hit exactly1.0whenvalue == 0xFFFFFFFF. This produces+5.5%jitter instead of the documented+5%maximum.While the practical impact is negligible (1 in 4 billion chance), the code contradicts its documentation.
🔧 Suggested fix
# Map first 4 bytes to a float in [0, 1) (value,) = struct.unpack(">I", digest[:4]) - fraction = value / 0xFFFFFFFF # [0, 1] + fraction = value / 0x100000000 # [0, 1)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/brain_salt.py` around lines 76 - 78, The fraction computation can equal 1.0 when value == 0xFFFFFFFF, contradicting the intended [0,1) range; update the divisor used after unpacking (the line with (value,) = struct.unpack(">I", digest[:4]) and fraction = value / 0xFFFFFFFF) to divide by 0x100000000 (or use float(value) / 2**32) so fraction is in [0,1), and update the accompanying comment to reflect the corrected range.src/gradata/security/manifest_signing.py (1)
32-44:⚠️ Potential issue | 🟡 MinorAdd salt validation in
verify_manifestfor consistency withsign_manifest.
sign_manifestvalidates thatsaltis a non-empty string (lines 22-23), butverify_manifestaccepts any salt including empty strings or non-strings. An empty salt would causesalt.encode()to produce empty bytes, computing an HMAC with a weak key and silently returningFalse.🛡️ Suggested fix
def verify_manifest(manifest: dict, salt: str) -> bool: """Verify that *manifest* has a valid HMAC-SHA256 signature. Returns ``False`` if the signature field is missing or invalid. Uses ``hmac.compare_digest`` for timing-safe comparison. """ + if not isinstance(salt, str) or not salt.strip(): + return False stored_sig = manifest.get("signature") if not isinstance(stored_sig, str) or not stored_sig: return FalseAs per coding guidelines: "Validate input at system boundaries."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/manifest_signing.py` around lines 32 - 44, The verify_manifest function currently accepts any salt; add the same input validation used in sign_manifest: ensure salt is a str and non-empty before using salt.encode(); if the check fails return False (or mirror sign_manifest's error handling), so update verify_manifest to validate salt (type and non-empty) prior to computing expected HMAC.tests/test_pii_redaction.py (2)
157-170:⚠️ Potential issue | 🟡 MinorProve extract-before-redact with a concrete result, not presence checks.
Both branches only assert that some classification key exists and that the happy path is not
"unknown". That still passes if extraction runs on redacted text. Assert a specific classification or summary derived from the unredacted diff, and show the pre-redacted control does not produce the same output.As per coding guidelines,
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness, parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pii_redaction.py` around lines 157 - 170, The test test_behavioral_extraction_uses_full_text currently only checks for presence of "classifications" and non-"unknown" severity; update it to assert a specific, concrete classification/summary produced from the unredacted diff (inspect result.get("data") structure and assert a particular field such as the top classification label or summary text from result["data"]["classifications"] or result["data"]["summary"] matches the expected value for draft), then run the same precise assertion against result_redacted (from redacted_draft) and assert that the redacted output does NOT equal the unredacted expected value to prove extraction runs before redaction; keep using the existing variables draft, final, redacted_draft, result, and result_redacted and avoid only truthy checks.
42-52:⚠️ Potential issue | 🟡 MinorAdd direct coverage for the new GitHub/Slack prefixes.
This block still only exercises
ghp_andxoxb-, so the newergithub_pat_,xoxp-,xapp-, andxwfp-branches remain unguarded.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_pii_redaction.py` around lines 42 - 52, Extend the tests for redact_pii to directly cover the newer GitHub and Slack token prefixes so those code paths are exercised: add inputs using the prefixes "github_pat_" (and keep existing "ghp_") in test_github_pat and assert "[REDACTED_GITHUB_TOKEN]" is returned and the original token removed; likewise expand test_slack_token_redacted to include tokens built with "xoxp-", "xapp-", and "xwfp-" (in addition to "xoxb-") and assert "[REDACTED_SLACK_TOKEN]" appears and the raw token is absent; reuse the existing helper _build and the redact_pii function names to construct these new cases.src/gradata/security/score_obfuscation.py (1)
76-84:⚠️ Potential issue | 🟠 MajorPad exception paths too.
If
fn()raises, control exits before the sleep logic runs, so failures stay measurably faster than success paths. That bypasses the constant-time guarantee on the branch attackers care about most.Minimal fix
start = time.perf_counter() - result = fn() - elapsed_ms = (time.perf_counter() - start) * 1000 - jitter = (secrets.randbelow(int(jitter_ms * 1000)) / 1000) if jitter_ms >= 0.001 else 0.0 - target_ms = min_ms + jitter - remaining_ms = target_ms - elapsed_ms - if remaining_ms > 0: - time.sleep(remaining_ms / 1000) - return result + try: + return fn() + finally: + elapsed_ms = (time.perf_counter() - start) * 1000 + jitter = ( + secrets.randbelow(int(jitter_ms * 1000)) / 1000 + if jitter_ms >= 0.001 else 0.0 + ) + target_ms = min_ms + jitter + remaining_ms = target_ms - elapsed_ms + if remaining_ms > 0: + time.sleep(remaining_ms / 1000)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/security/score_obfuscation.py` around lines 76 - 84, The current timing wrapper calls fn() directly so if fn() raises the sleep logic is skipped; change the control flow in the wrapper around fn() (the call to fn(), and the variables jitter_ms and min_ms used to compute target_ms) to run fn() inside a try/except/finally (or try/finally) block: record start before calling fn(), in the finally compute elapsed_ms, jitter, target_ms and remaining_ms, perform time.sleep(remaining_ms / 1000) if remaining_ms > 0, and then either return the result (from the try block) or re-raise the caught exception after the sleep so both success and exception paths are padded equally.src/gradata/rules/rule_engine.py (1)
744-770:⚠️ Potential issue | 🟡 MinorKeep the footer reminder tied to the pre-shuffle top rule.
After the bucket shuffle,
rules[0]is just a random member of the highest tier. TheREMINDER:footer therefore stops reinforcing the best-ranked rule selected byapply_rules().Minimal fix
# Merge related rules to save tokens if merge: rules = merge_related_rules(rules) + top_rule = rules[0] @@ - if rules: - top = rules[0] + if rules: + top = top_rule lines.append("") lines.append(f"REMINDER: {top.lesson.category}: {top.lesson.description}")Also applies to: 798-802
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/rules/rule_engine.py` around lines 744 - 770, Before performing the bucketed shuffle, capture the pre-shuffle top rule (e.g., top_rule = rules[0] if rules) so the final order still reflects the best-ranked rule selected by apply_rules(); after shuffling buckets (or after rebuilding rules), locate that saved top_rule and ensure it is placed at the front of its tier (or moved to rules[0]) while leaving the rest of the bucket order intact. Update the logic around buckets, tier_order, and rules (and references to LessonState / merge_related_rules) to use the saved top_rule to restore the reminder-target rule into position so the REMINDER footer continues to refer to the original top-ranked rule.tests/test_safety_assertion.py (1)
116-135:⚠️ Potential issue | 🟡 MinorAssert the exact docstring contract.
These
orchecks still pass on unrelated mentions ofMIN_APPLICATIONS_FOR_*or the transition names, so a 3→4 or 5→6 requirement change can slip through without failing this guard.As per coding guidelines,
tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness, parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/test_safety_assertion.py` around lines 116 - 135, The current tests (test_docstring_mentions_min_applications and test_docstring_mentions_pattern_and_rule_thresholds) allow loose matches via ORs which can miss exact contract changes; update these to assert the exact required substrings are present in graduate.__doc__ (e.g., require the literal "fire_count >= 3" and the literal "fire_count >= 5" rather than accepting any occurrence of MIN_APPLICATIONS_FOR_*, and require the literal "INSTINCT -> PATTERN" and "PATTERN -> RULE" rather than accepting any alternate tokens); tighten the assertions to check for those exact strings (or exact constant assignments if the doc documents constants), and convert to parametrized assertions if helpful to avoid duplication.src/gradata/_core.py (2)
155-157:⚠️ Potential issue | 🟠 MajorPersist the latest
correction_scopeon reinforcement too.
correction_scopeis computed here and only written in the new-lesson branch below. When a correction reinforces an existing lesson,best_match.scope_jsonstays stale, so later ONE_OFF gating still sees the scope from the first correction instead of the latest one.Also applies to: 286-287
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_core.py` around lines 155 - 157, The computed correction_scope is set into scope_data but only persisted when creating a new lesson; update the reinforcement path to also persist the latest correction_scope so best_match.scope_json is refreshed. Locate where correction_scope is computed (variable correction_scope and scope_data["correction_scope"]) and add code in the existing reinforcement branch (the branch that handles reinforcing an existing lesson / not new-lesson) to write the updated scope into best_match.scope_json (or the same persistence mechanism used in the new-lesson branch) so ONE_OFF gating sees the most recent scope.
141-147:⚠️ Potential issue | 🟠 MajorDon't fail open on redaction import errors.
If
gradata.safetyis missing, this branch feeds unredacteddraft/finalinto stored events,pending_approvals, and FTS. That turns an import or packaging problem into a silent PII leak. Fail closed here or persist a safe placeholder.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/_core.py` around lines 141 - 147, The current ImportError handler for gradata.safety allows unredacted draft/final through (redact_pii_with_report import), which risks silent PII leaks; change the except ImportError block to fail closed by either raising an explicit exception (e.g., RuntimeError including the original ImportError and a clear message referencing redact_pii_with_report/gradata.safety) or, if you must continue, set draft_redacted and final_redacted to a safe non-sensitive placeholder (e.g., "[REDACTION_FAILED]") and log the import failure so downstream storage/FTS/pending_approvals cannot receive raw PII; ensure the chosen behavior is applied where draft_redacted and final_redacted are used.src/gradata/inspection.py (1)
130-136:⚠️ Potential issue | 🟠 MajorUse the stored transition key on read.
brain_end_session()persistslesson_transitions.lesson_descasl.description[:100], but this query binds the full description. Any rule over 100 characters will always come back withtransitions: []. Match the same key here, or move the join to an immutable rule identifier.🛠️ Minimal fix
- (target.description, target.category), + (target.description[:100], target.category),🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/inspection.py` around lines 130 - 136, The query in inspection.py uses the full target.description but brain_end_session() stores lesson_transitions.lesson_desc as l.description[:100], causing mismatches; update the binding in the rows = conn.execute(...) call to use the same stored key (e.g., target.description[:100]) so the WHERE clause matches the persisted lesson_desc (or alternatively join on an immutable rule identifier if available), ensuring the query uses the same truncation logic as the persistence path.src/gradata/enhancements/self_improvement.py (3)
315-322:⚠️ Potential issue | 🟠 MajorIgnore non-object
Metadata:JSON instead of crashing the parser.
json.loads()can return a list, string, or number here, and_md_dict.items()will then raiseAttributeError, abortingparse_lessons()on malformed metadata. Treat anything other than a JSON object as invalid metadata and continue.As per coding guidelines, `**/*.py`: Validate input at system boundaries.🛠️ Minimal fix
elif meta_line.startswith("Metadata:"): import json as _json_md try: _md_dict = _json_md.loads(meta_line[len("Metadata:"):].strip()) + if not isinstance(_md_dict, dict): + raise ValueError("metadata must be a JSON object") from gradata._types import RuleMetadata as _RM metadata_obj = _RM(**{k: v for k, v in _md_dict.items() if k in _RM.__dataclass_fields__}) except (ValueError, TypeError, _json_md.JSONDecodeError): metadata_obj = None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/self_improvement.py` around lines 315 - 322, The parser currently assumes JSON after "Metadata:" is an object and does _md_dict.items(), which crashes if json.loads(meta_line[len("Metadata:"):].strip()) returns a non-dict; update the handling in the block that sets metadata_obj (within parse_lessons) to validate the loaded value: after loading into _md_dict, check isinstance(_md_dict, dict) and only then construct RuleMetadata (from gradata._types as _RM) using _RM(**{k: v for k, v in _md_dict.items() if k in _RM.__dataclass_fields__}); if it's not a dict, set metadata_obj = None. Keep the existing exception handling for JSON decoding/type errors.
649-667:⚠️ Potential issue | 🟠 MajorBlock ONE_OFF promotions inside
update_confidence()too.
correct()calls this function immediately, beforegraduate(). Because this inline promotion block never consultslesson.scope_json, aone_offINSTINCT can still become PATTERN here, and aone_offPATTERN can still become RULE, which breaks the new ceiling semantics.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/self_improvement.py` around lines 649 - 667, In update_confidence(), the inline promotion/demotion block (the if/elif that checks lesson.state, confidence thresholds _uc_rule_thr/_uc_pattern_thr, and MIN_APPLICATIONS_FOR_* then calls transition(lesson.state, "promote") or demotes) should skip any lessons marked one_off; add a guard before that promotion/demotion logic that checks lesson.scope_json (or the lesson one_off marker) and returns/continues if the lesson is one_off so one_off INSTINCT/PATTERN cannot be promoted to PATTERN/RULE here. Ensure you reference update_confidence(), the lesson variable, lesson.scope_json (or the one_off flag), and transition() when making the change.
973-977:⚠️ Potential issue | 🟠 MajorDon't drop explicit
0.0metadata scores on write.
0.0is a valid utility/safety score, but this truthiness check skips it, soformat_lessons()/parse_lessons()round-trip those values back to defaults. Compare against the default metadata dict instead of using truthiness.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/enhancements/self_improvement.py` around lines 973 - 977, The current metadata-writing logic drops explicit 0.0 scores because it filters on truthiness (the md.values() check), causing format_lessons()/parse_lessons() to lose valid zero scores; change the check to compare md against the default metadata instead of using truthiness: compute the default metadata dict (or use the known default values, e.g., 0.5) and only append the Metadata line if any(md.get(k) != default_md.get(k) for k in md) or simply if md != default_md; update the block that defines md (from lesson.metadata) and the conditional that builds lines.append(...) accordingly so 0.0 is preserved.src/gradata/brain_inspection.py (2)
70-80:⚠️ Potential issue | 🟠 Major
pending_promotions()still isn't returning the pending set.This delegates to
list_rules()with the default PATTERN/RULE filter, so it hidesapproval_requiredlessons that_core.brain_correct()creates asINSTINCT + pending_approval=True. And becauselist_rules()does not surfacepending_approval, approved rules keep reappearing here. Filter directly on the persisted review flag instead of raw graduation state.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain_inspection.py` around lines 70 - 80, The pending_promotions method is returning promotions filtered by graduation state instead of the persisted review flag, so modify pending_promotions (the function that calls list_rules) to filter on the persisted "pending_approval"/review flag rather than the default PATTERN/RULE state; either pass an explicit parameter to list_rules to include records with pending_approval=True (or approval_required) or call list_rules then filter its returned dicts by the pending_approval field so INSTINCT items created by _core.brain_correct with pending_approval=True are included and approved items with pending_approval=False are excluded.
95-106:⚠️ Potential issue | 🟠 MajorLock the full promotion review write path and avoid rewriting the fallback lessons file.
Both mutators call
_find_lessons_path()in read-only mode and then rewrite whatever path comes back. If the brain is using a parent/working-dir fallbacklessons.md, these methods will modify that external file, and the unlocked read-modify-write can clobber concurrentcorrect()/end_session()updates. Use_find_lessons_path(create=True)pluslessons_lock()around the entire cycle.Also applies to: 138-152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain_inspection.py` around lines 95 - 106, The current read-modify-write sequence uses _find_lessons_path() (read-only) and then writes whatever path is returned, which can clobber a fallback parent lessons.md and race with concurrent correct()/end_session() updates; change both promotion-review mutators (the block around _make_rule_id(rule_id) and the similar block at lines 138-152) to call _find_lessons_path(create=True) and wrap the entire sequence (path lookup, loading via _load_lessons_from_path, mutation of target.pending_approval, and write_lessons_safe(format_lessons(...))) inside lessons_lock() so the full promotion review write path is locked and you never rewrite an external fallback file without holding the lock.src/gradata/brain.py (2)
488-527:⚠️ Potential issue | 🟠 MajorDon't let
forget()rewrite the external fallback lessons file.This calls
_find_lessons_path()in read-only mode and then rewrites the returned path without holding the lessons lock. If the brain is reading../.claude/lessons.mdor the working-dir fallback,forget()mutates that shared file and can race withcorrect()/end_session(). Use_find_lessons_path(create=True)and lock the full read-modify-write.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain.py` around lines 488 - 527, The forget() flow calls _find_lessons_path() without creating/claiming the canonical lessons file and performs a read-modify-write (parse_lessons -> mutate lessons -> write_lessons_safe) without holding the lessons lock, which can overwrite an external fallback file; change the code to call _find_lessons_path(create=True) so the canonical lessons file is returned/created and perform the entire read-modify-write while holding the lessons lock (use the existing lessons-lock helper on the Brain object) to prevent races between forget(), correct(), and end_session(); ensure parse_lessons, mutation, and write_lessons_safe all occur under that lock.
501-506:⚠️ Potential issue | 🔴 CriticalReject
last 0; it currently kills every active lesson.In Python,
active[-0:]is the full slice, sobrain.forget("last 0")marks all active lessons asKILLEDinstead of none. Guardn < 1before buildingtargets.As per coding guidelines, `**/*.py`: Validate input at system boundaries.🛠️ Minimal fix
if wl == "last" or wl.startswith("last "): parts = wl.split() n = int(parts[1]) if len(parts) == 2 and parts[1].isdigit() else 1 + if n < 1: + return {"rolled_back": False, "error": "count must be >= 1"} if not active: return {"rolled_back": False, "error": "no active lessons"} targets = [i for i, _ in active[-n:]]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/brain.py` around lines 501 - 506, The branch handling the "last" forget syntax accepts `last 0` which makes `n == 0` and causes `active[-0:]` to select all lessons; update the check in the block that parses `wl` (the `if wl == "last" or wl.startswith("last ")` branch) to validate `n` is >= 1 before computing `targets` (the `n` local and `active` slice). If `n < 1`, return an error response (e.g., {"rolled_back": False, "error": "invalid count"}) instead of proceeding to build `targets` so `brain.forget("last 0")` does not kill all active lessons.src/gradata/audit.py (1)
185-216:⚠️ Potential issue | 🟠 MajorDon't anchor traceability to the current description text.
Both
query_provenance(rule_id=rule_id)and thelesson_transitionslookup depend on identifiers derived from the lesson's current description. Reinforcement and wording refinement can change that text after the provenance or transition rows were written, sotrace_rule()silently loses history. Persist and query an immutable lesson ID instead.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/gradata/audit.py` around lines 185 - 216, The code anchors provenance and lesson_transitions lookups to the mutable lesson description; change these to use an immutable lesson identifier instead. Update calls to query_provenance to pass the lesson's immutable id (e.g., target.id or target.lesson_id) and adjust query_provenance to query by lesson_id; in the SQLite query inside trace_rule/audit.py replace the WHERE clause that uses lesson_desc = ? with lesson_id = ? (and bind the immutable id from target), keeping category filtering and the ORDER BY. If target may not have an id, fall back to the current description logic only as a last resort so history is preserved for older records.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.gitignore:
- Line 144: The .gitignore change only added .gstack/ but omitted .brain_salt;
update .gitignore to include the .brain_salt entry (e.g., add a line with
.brain_salt or a pattern like *.brain_salt) under the Secrets section so
per-brain salt files are never tracked, and ensure the change references the
existing .gstack/ entry for context; commit the updated .gitignore so
.brain_salt is ignored going forward.
In `@src/gradata/_core.py`:
- Around line 81-82: The cloud fast-path currently returns
brain._cloud.correct(...) without passing the new scope argument, so scope
semantics are lost when brain._cloud.connected is true; update the code in the
function with signature (min_severity: str = "as-is", scope: str | None = None,
...) to forward the scope through to the cloud call by supplying scope=scope (or
the appropriate named parameter) when invoking brain._cloud.correct so
one_off/project/universal semantics are preserved in cloud mode.
- Around line 544-552: The promotion provenance logic that iterates over
transitions (used in write_provenance and graduated_rules) is brittle because
brain_end_session() and related code key off mutable strings like category +
description[:60], and graduate() can rewrite a PATTERN description before
promotion causing keys to mismatch and skips; fix by changing the tracking key
to an immutable identifier (e.g., a lesson_id stored on the Lesson object) or by
using object identity to record pre/post state in brain_end_session(), update
graduate() to preserve or propagate that immutable lesson_id, and then use that
id in the provenance loop (the code referencing transitions, write_provenance,
_make_rule_id, and graduated_rules) so promotions are detected regardless of
description rewrites; apply the same change to the other similar block handling
promotions (the block around lines 642-654).
In `@src/gradata/audit.py`:
- Around line 47-56: The insert into rule_provenance silently fails on a fresh
DB because the code opens get_connection and swallows schema-related insert
errors; before executing the INSERT in write_provenance (the block that calls
get_connection and inserts into rule_provenance), ensure the DB schema is
created/migrated by invoking the migration routine (e.g., run_migrations or the
existing schema initialization helper) against db_path, then proceed with the
insert and keep the current debug logging for other errors; reference
get_connection, write_provenance, run_migrations, and the rule_provenance table
to locate where to add the migration call.
In `@src/gradata/brain.py`:
- Around line 472-516: The public method Brain.forget has been changed to accept
a single what string which breaks callers using the old description=... or
category=... keywords; add a compatibility shim by updating forget
(Brain.forget) to accept optional keyword args description: str | None = None
and category: str | None = None (or **kwargs and pop them) and map them to the
new behavior: if description is provided call and return
self.rollback(description=description), if category is provided set what = f"all
{category}" (or set wl accordingly) before the existing resolution logic; ensure
the function still references rollback, parse_lessons, LessonState and
write_lessons_safe as before and does not raise TypeError for legacy calls.
In `@src/gradata/inspection.py`:
- Around line 194-200: The _yaml_val() emitter currently only escapes double
quotes, which allows actual newlines, carriage returns, or backslashes to be
emitted as raw multiline scalars; update _yaml_val() (the function that builds s
from v) to first escape backslashes (replace "\" with "\\\\"), then escape
newlines and carriage returns (replace "\n" with "\\n" and "\r" with "\\r"), and
keep the existing escape for double quotes; also modify the quoting condition to
force a quoted scalar whenever the string contains any newline, carriage return,
or backslash (in addition to the existing checks for empty string, ":" , "#" or
starting with "-", "[" or "{") so multiline or backslash-containing values are
always quoted.
In `@src/gradata/rules/rule_engine.py`:
- Around line 297-309: The rule identifier generated by rule_id is too short (8
hex chars) to be a stable unique key for approve_promotion()/reject_promotion();
increase the digest length (e.g., to 16 or 32 hex chars) and/or append a stable
per-lesson discriminator (for example lesson.id, lesson.uuid or a creation
timestamp) to the returned string so identical category+description pairs are
distinguishable; update the digest computation (currently using
hashlib.sha256(f"{lesson.category}:{lesson.description}".encode()).hexdigest()[:8])
to slice a longer prefix and include the chosen stable field in the formatted
return (still preserving the "CATEGORY:..." structure) and adjust any consumers
if they assume 8-char IDs.
In `@src/gradata/safety.py`:
- Around line 14-40: Add an Anthropic API key redaction entry to the
_PII_PATTERNS list: create a tuple like ("anthropic_api_key",
"[REDACTED_ANTHROPIC_KEY]", re.compile(r"sk-ant-[A-Za-z0-9_-]{20,}")) and insert
it among the API key/secret patterns in the _PII_PATTERNS definition so keys
beginning with the "sk-ant-" prefix are detected and replaced.
In `@src/gradata/security/brain_salt.py`:
- Around line 50-52: The fallback branch that calls salt_path.write_text(salt,
encoding="utf-8") can lose data or suffer races; modify the error handler in the
salt-writing routine in brain_salt.py to perform a safe retry/verification: on
OSError attempt to write to a uniquely named temp file (e.g.,
salt_path.with_suffix(".tmp-{pid}-{rand}")), fsync and close it, then
os.replace() to atomically move it into place, and after replace read back
salt_path.read_text() to validate the content equals salt; if validation fails
retry a bounded number of times and raise an error if unsuccessful — reference
the existing salt_path, salt variables and the except OSError block to locate
where to implement these steps.
In `@src/gradata/security/correction_provenance.py`:
- Around line 55-75: verify_provenance currently accepts an empty salt unlike
create_provenance_record; add the same validation at the top of
verify_provenance (e.g., if not salt) and raise a ValueError (or otherwise fail
fast) when salt is empty so callers get a clear error instead of silently
accepting invalid input; update the start of the verify_provenance function to
perform this non-empty salt check before computing the HMAC.
In `@src/gradata/security/query_budget.py`:
- Around line 51-109: The new burst detector detect_anomalies(endpoint) is never
used in production; update the runtime check that currently calls
is_rate_exceeded() (the consumer in brain.py that records "apply_rules" and then
checks is_rate_exceeded) to also call detect_anomalies(endpoint) and treat a
returned {"burst": True} as an exceeded-rate condition. Concretely, modify the
code path in brain.py where apply_rules calls/records the endpoint and checks
is_rate_exceeded() to invoke security.query_budget.detect_anomalies(endpoint)
and short-circuit/flag as rate-exceeded if burst is True (or combine with
is_rate_exceeded result), ensuring the same endpoint identifier is passed
through. Ensure any unit tests or caller semantics that expect boolean behavior
remain compatible by mapping the detect_anomalies result to the existing boolean
check.
In `@src/gradata/security/score_obfuscation.py`:
- Around line 24-40: Replace the hardcoded thresholds in the score-to-tier logic
with the shared constants from the self_improvement module: import
PATTERN_THRESHOLD and RULE_THRESHOLD from gradata.enhancements.self_improvement
(or the module that defines them) and use RULE_THRESHOLD instead of 0.90 and
PATTERN_THRESHOLD instead of 0.60 inside the truncate_score()/tier-mapping
function; keep the existing validation for confidence range and raise the same
ValueError if out of bounds. Ensure the import is added at the top and
references the exact constant names (PATTERN_THRESHOLD, RULE_THRESHOLD) so both
the graduation pipeline and truncate_score() share the same source of truth.
In `@tests/test_batch_approval.py`:
- Around line 26-40: SAMPLE_LESSONS fixture in tests/test_batch_approval.py
never includes a lesson marked as pending approval, so pending_promotions()
tests can miss that code path; update SAMPLE_LESSONS to include at least one
lesson entry with the approval-required/pending_approval state (e.g., add a
lesson block with a tag like [PENDING_APPROVAL] or the exact marker your code
expects), then update the test that calls pending_promotions() to seed/consume
that fixture and assert that the pending lesson is returned; also apply the same
change to the other fixture usages referenced in the file (the region around
lines 93-127) so those tests validate the approval-queue behavior as well.
In `@tests/test_brain_learning.py`:
- Around line 164-176: Update the two tests to assert a single explicit
contract: seed lessons with known, unique identifiers (use _seed_lessons or
create lessons via init_brain so each lesson has a deterministic id or content),
call brain.forget("last") or brain.forget("last 2"), then assert the returned
value(s) exactly match the id(s) of the deleted lesson(s) and confirm those
specific lesson ids are no longer present via a lookup method on brain; modify
test_forget_last_kills_most_recent to check that the single returned id equals
the most recently created lesson id and is absent afterward, and modify
test_forget_last_n to check the returned list contains the two expected ids in
correct order and that both are deleted.
In `@tests/test_correction_provenance.py`:
- Around line 9-30: Add negative-path unit tests in
tests/test_correction_provenance.py alongside TestCreateProvenance to exercise
input validation for create_provenance_record: add tests that call
create_provenance_record with empty user_id, empty correction_hash, empty salt,
and negative session and assert the function raises an error (e.g., ValueError)
for each invalid input; ensure each test is named clearly (e.g.,
test_raises_on_empty_user_id, test_raises_on_empty_correction_hash,
test_raises_on_empty_salt, test_raises_on_negative_session) and uses the same
invocation pattern as the existing tests so regressions in input guards are
detected.
In `@tests/test_inspection.py`:
- Around line 148-160: Select the seeded DRAFTING rule explicitly instead of
using rules[0], e.g. scan the list returned by list_rules() to find the rule
whose metadata or transitions reference "DRAFTING", use that rule's id when
calling explain_rule(rule_id=...), and replace the loose existence assertion
with a concrete check on the returned transitions: assert that
result["transitions"] contains at least one transition object with the expected
state change (for example a transition["from"] == "DRAFTING" or transition["to"]
== "DRAFTING") so the provenance path is actually validated.
In `@tests/test_pii_redaction.py`:
- Around line 147-180: The ImportError fallback path in src/gradata/_core.py
still returns raw draft/final and lacks test coverage; add a new test that
forces the ImportError branch (e.g., via monkeypatching import machinery or
removing the redaction backend so the import in _core.py raises ImportError)
then call brain.correct(draft, final) (reuse the existing patterns in
test_pii_redaction.py) and assert the stored event payloads (result.get("data",
{}).get("draft_text", "")) do not contain raw PII like "alice@notify.com" or the
fake key and instead contain the appropriate redaction tokens (e.g.,
"[REDACTED_EMAIL]", "[REDACTED_OPENAI_KEY]"), ensuring the fallback path also
redacts output.
In `@tests/test_scope_tagging.py`:
- Around line 38-67: Update the tests to exercise the CorrectionScope enum
directly by calling brain.correct with CorrectionScope.<SCOPE> instead of string
literals; e.g., add or modify a test (e.g., in test_explicit_scope_one_off or a
new test function) to import CorrectionScope and pass CorrectionScope.ONE_OFF
(or CorrectionScope.UNIVERSAL) into brain.correct(...) and assert
result.get("correction_scope") matches the expected serialized value, ensuring
brain.correct, CorrectionScope, and the serialization/comparison logic are
exercised.
In `@tests/test_score_obfuscation.py`:
- Around line 153-186: Add a regression test in the TestConstantTimePad suite
that exercises the failure path of constant_time_pad by calling it with a
function that raises (e.g., raise RuntimeError) and verify two things: the
exception is propagated (use an assertion that the call raises the expected
exception) and the call still respects the timing defense by taking at least the
specified min_ms (measure elapsed time around the call and assert elapsed >=
min_ms * 0.95, use jitter_ms=0.0 to avoid noise). Target the TestConstantTimePad
class and the constant_time_pad function when adding this new test case.
---
Outside diff comments:
In `@tests/test_rule_engine_v2.py`:
- Around line 226-235: The test test_two_same_category_merged currently only
checks for the "[RULE]" tier label but doesn't assert that raw confidence floats
are omitted; update the assertion for the merged result (from
merge_related_rules) to explicitly ensure result[0].instruction does not contain
raw numeric confidences (e.g., no substrings matching decimal numbers like
"0.95" or a regex for digits with a decimal point) while still asserting it
contains "Rule A" and "Rule B" and the "[RULE]" label.
---
Duplicate comments:
In `@src/gradata/_core.py`:
- Around line 155-157: The computed correction_scope is set into scope_data but
only persisted when creating a new lesson; update the reinforcement path to also
persist the latest correction_scope so best_match.scope_json is refreshed.
Locate where correction_scope is computed (variable correction_scope and
scope_data["correction_scope"]) and add code in the existing reinforcement
branch (the branch that handles reinforcing an existing lesson / not new-lesson)
to write the updated scope into best_match.scope_json (or the same persistence
mechanism used in the new-lesson branch) so ONE_OFF gating sees the most recent
scope.
- Around line 141-147: The current ImportError handler for gradata.safety allows
unredacted draft/final through (redact_pii_with_report import), which risks
silent PII leaks; change the except ImportError block to fail closed by either
raising an explicit exception (e.g., RuntimeError including the original
ImportError and a clear message referencing
redact_pii_with_report/gradata.safety) or, if you must continue, set
draft_redacted and final_redacted to a safe non-sensitive placeholder (e.g.,
"[REDACTION_FAILED]") and log the import failure so downstream
storage/FTS/pending_approvals cannot receive raw PII; ensure the chosen behavior
is applied where draft_redacted and final_redacted are used.
In `@src/gradata/audit.py`:
- Around line 185-216: The code anchors provenance and lesson_transitions
lookups to the mutable lesson description; change these to use an immutable
lesson identifier instead. Update calls to query_provenance to pass the lesson's
immutable id (e.g., target.id or target.lesson_id) and adjust query_provenance
to query by lesson_id; in the SQLite query inside trace_rule/audit.py replace
the WHERE clause that uses lesson_desc = ? with lesson_id = ? (and bind the
immutable id from target), keeping category filtering and the ORDER BY. If
target may not have an id, fall back to the current description logic only as a
last resort so history is preserved for older records.
In `@src/gradata/brain_inspection.py`:
- Around line 70-80: The pending_promotions method is returning promotions
filtered by graduation state instead of the persisted review flag, so modify
pending_promotions (the function that calls list_rules) to filter on the
persisted "pending_approval"/review flag rather than the default PATTERN/RULE
state; either pass an explicit parameter to list_rules to include records with
pending_approval=True (or approval_required) or call list_rules then filter its
returned dicts by the pending_approval field so INSTINCT items created by
_core.brain_correct with pending_approval=True are included and approved items
with pending_approval=False are excluded.
- Around line 95-106: The current read-modify-write sequence uses
_find_lessons_path() (read-only) and then writes whatever path is returned,
which can clobber a fallback parent lessons.md and race with concurrent
correct()/end_session() updates; change both promotion-review mutators (the
block around _make_rule_id(rule_id) and the similar block at lines 138-152) to
call _find_lessons_path(create=True) and wrap the entire sequence (path lookup,
loading via _load_lessons_from_path, mutation of target.pending_approval, and
write_lessons_safe(format_lessons(...))) inside lessons_lock() so the full
promotion review write path is locked and you never rewrite an external fallback
file without holding the lock.
In `@src/gradata/brain.py`:
- Around line 488-527: The forget() flow calls _find_lessons_path() without
creating/claiming the canonical lessons file and performs a read-modify-write
(parse_lessons -> mutate lessons -> write_lessons_safe) without holding the
lessons lock, which can overwrite an external fallback file; change the code to
call _find_lessons_path(create=True) so the canonical lessons file is
returned/created and perform the entire read-modify-write while holding the
lessons lock (use the existing lessons-lock helper on the Brain object) to
prevent races between forget(), correct(), and end_session(); ensure
parse_lessons, mutation, and write_lessons_safe all occur under that lock.
- Around line 501-506: The branch handling the "last" forget syntax accepts
`last 0` which makes `n == 0` and causes `active[-0:]` to select all lessons;
update the check in the block that parses `wl` (the `if wl == "last" or
wl.startswith("last ")` branch) to validate `n` is >= 1 before computing
`targets` (the `n` local and `active` slice). If `n < 1`, return an error
response (e.g., {"rolled_back": False, "error": "invalid count"}) instead of
proceeding to build `targets` so `brain.forget("last 0")` does not kill all
active lessons.
In `@src/gradata/enhancements/self_improvement.py`:
- Around line 315-322: The parser currently assumes JSON after "Metadata:" is an
object and does _md_dict.items(), which crashes if
json.loads(meta_line[len("Metadata:"):].strip()) returns a non-dict; update the
handling in the block that sets metadata_obj (within parse_lessons) to validate
the loaded value: after loading into _md_dict, check isinstance(_md_dict, dict)
and only then construct RuleMetadata (from gradata._types as _RM) using
_RM(**{k: v for k, v in _md_dict.items() if k in _RM.__dataclass_fields__}); if
it's not a dict, set metadata_obj = None. Keep the existing exception handling
for JSON decoding/type errors.
- Around line 649-667: In update_confidence(), the inline promotion/demotion
block (the if/elif that checks lesson.state, confidence thresholds
_uc_rule_thr/_uc_pattern_thr, and MIN_APPLICATIONS_FOR_* then calls
transition(lesson.state, "promote") or demotes) should skip any lessons marked
one_off; add a guard before that promotion/demotion logic that checks
lesson.scope_json (or the lesson one_off marker) and returns/continues if the
lesson is one_off so one_off INSTINCT/PATTERN cannot be promoted to PATTERN/RULE
here. Ensure you reference update_confidence(), the lesson variable,
lesson.scope_json (or the one_off flag), and transition() when making the
change.
- Around line 973-977: The current metadata-writing logic drops explicit 0.0
scores because it filters on truthiness (the md.values() check), causing
format_lessons()/parse_lessons() to lose valid zero scores; change the check to
compare md against the default metadata instead of using truthiness: compute the
default metadata dict (or use the known default values, e.g., 0.5) and only
append the Metadata line if any(md.get(k) != default_md.get(k) for k in md) or
simply if md != default_md; update the block that defines md (from
lesson.metadata) and the conditional that builds lines.append(...) accordingly
so 0.0 is preserved.
In `@src/gradata/inspection.py`:
- Around line 130-136: The query in inspection.py uses the full
target.description but brain_end_session() stores lesson_transitions.lesson_desc
as l.description[:100], causing mismatches; update the binding in the rows =
conn.execute(...) call to use the same stored key (e.g.,
target.description[:100]) so the WHERE clause matches the persisted lesson_desc
(or alternatively join on an immutable rule identifier if available), ensuring
the query uses the same truncation logic as the persistence path.
In `@src/gradata/rules/rule_engine.py`:
- Around line 744-770: Before performing the bucketed shuffle, capture the
pre-shuffle top rule (e.g., top_rule = rules[0] if rules) so the final order
still reflects the best-ranked rule selected by apply_rules(); after shuffling
buckets (or after rebuilding rules), locate that saved top_rule and ensure it is
placed at the front of its tier (or moved to rules[0]) while leaving the rest of
the bucket order intact. Update the logic around buckets, tier_order, and rules
(and references to LessonState / merge_related_rules) to use the saved top_rule
to restore the reminder-target rule into position so the REMINDER footer
continues to refer to the original top-ranked rule.
In `@src/gradata/security/brain_salt.py`:
- Around line 76-78: The fraction computation can equal 1.0 when value ==
0xFFFFFFFF, contradicting the intended [0,1) range; update the divisor used
after unpacking (the line with (value,) = struct.unpack(">I", digest[:4]) and
fraction = value / 0xFFFFFFFF) to divide by 0x100000000 (or use float(value) /
2**32) so fraction is in [0,1), and update the accompanying comment to reflect
the corrected range.
In `@src/gradata/security/manifest_signing.py`:
- Around line 32-44: The verify_manifest function currently accepts any salt;
add the same input validation used in sign_manifest: ensure salt is a str and
non-empty before using salt.encode(); if the check fails return False (or mirror
sign_manifest's error handling), so update verify_manifest to validate salt
(type and non-empty) prior to computing expected HMAC.
In `@src/gradata/security/query_budget.py`:
- Around line 39-45: The count() method currently prunes and then accesses
self._calls[endpoint], which creates an empty deque for unknown endpoints and
prevents cardinality from being capped; change count() to call
self._prune(endpoint) then check for existence (e.g. if endpoint not in
self._calls: return 0) or use self._calls.get(endpoint) to avoid inserting a new
key, so only real endpoints create buckets (refer to methods count, _prune, and
the _calls dict).
In `@src/gradata/security/score_obfuscation.py`:
- Around line 76-84: The current timing wrapper calls fn() directly so if fn()
raises the sleep logic is skipped; change the control flow in the wrapper around
fn() (the call to fn(), and the variables jitter_ms and min_ms used to compute
target_ms) to run fn() inside a try/except/finally (or try/finally) block:
record start before calling fn(), in the finally compute elapsed_ms, jitter,
target_ms and remaining_ms, perform time.sleep(remaining_ms / 1000) if
remaining_ms > 0, and then either return the result (from the try block) or
re-raise the caught exception after the sleep so both success and exception
paths are padded equally.
In `@tests/test_brain_learning.py`:
- Around line 160-162: The test test_forget_last_no_lessons is too permissive
because it uses an "or" on result.get("forgot") and result.get("rolled_back");
change it to explicitly assert the expected empty-history outcome by checking
both keys are present and False (e.g., assert result.get("forgot") is False and
result.get("rolled_back") is False) or assert equality against the exact
expected dict returned by fresh_brain.forget("last"); reference the
fresh_brain.forget("last") call and the 'forgot' and 'rolled_back' keys to
locate and update the assertion.
In `@tests/test_pii_redaction.py`:
- Around line 157-170: The test test_behavioral_extraction_uses_full_text
currently only checks for presence of "classifications" and non-"unknown"
severity; update it to assert a specific, concrete classification/summary
produced from the unredacted diff (inspect result.get("data") structure and
assert a particular field such as the top classification label or summary text
from result["data"]["classifications"] or result["data"]["summary"] matches the
expected value for draft), then run the same precise assertion against
result_redacted (from redacted_draft) and assert that the redacted output does
NOT equal the unredacted expected value to prove extraction runs before
redaction; keep using the existing variables draft, final, redacted_draft,
result, and result_redacted and avoid only truthy checks.
- Around line 42-52: Extend the tests for redact_pii to directly cover the newer
GitHub and Slack token prefixes so those code paths are exercised: add inputs
using the prefixes "github_pat_" (and keep existing "ghp_") in test_github_pat
and assert "[REDACTED_GITHUB_TOKEN]" is returned and the original token removed;
likewise expand test_slack_token_redacted to include tokens built with "xoxp-",
"xapp-", and "xwfp-" (in addition to "xoxb-") and assert
"[REDACTED_SLACK_TOKEN]" appears and the raw token is absent; reuse the existing
helper _build and the redact_pii function names to construct these new cases.
In `@tests/test_safety_assertion.py`:
- Around line 116-135: The current tests
(test_docstring_mentions_min_applications and
test_docstring_mentions_pattern_and_rule_thresholds) allow loose matches via ORs
which can miss exact contract changes; update these to assert the exact required
substrings are present in graduate.__doc__ (e.g., require the literal
"fire_count >= 3" and the literal "fire_count >= 5" rather than accepting any
occurrence of MIN_APPLICATIONS_FOR_*, and require the literal "INSTINCT ->
PATTERN" and "PATTERN -> RULE" rather than accepting any alternate tokens);
tighten the assertions to check for those exact strings (or exact constant
assignments if the doc documents constants), and convert to parametrized
assertions if helpful to avoid duplication.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: c670fdde-352a-45bf-8a05-a7c086a0e28b
📒 Files selected for processing (32)
.gitignoresrc/gradata/_core.pysrc/gradata/_migrations.pysrc/gradata/_types.pysrc/gradata/audit.pysrc/gradata/brain.pysrc/gradata/brain_inspection.pysrc/gradata/enhancements/self_improvement.pysrc/gradata/inspection.pysrc/gradata/rules/rule_engine.pysrc/gradata/safety.pysrc/gradata/security/__init__.pysrc/gradata/security/brain_salt.pysrc/gradata/security/correction_provenance.pysrc/gradata/security/manifest_signing.pysrc/gradata/security/query_budget.pysrc/gradata/security/score_obfuscation.pytests/test_audit_provenance.pytests/test_batch_approval.pytests/test_brain_learning.pytests/test_brain_salt.pytests/test_correction_provenance.pytests/test_injection_order.pytests/test_inspection.pytests/test_manifest_signing.pytests/test_pii_redaction.pytests/test_query_budget.pytests/test_rule_engine_v2.pytests/test_rule_metadata.pytests/test_safety_assertion.pytests/test_scope_tagging.pytests/test_score_obfuscation.py
| .env | ||
| .env.local | ||
| .env.*.local | ||
| .gstack/ |
There was a problem hiding this comment.
Critical: .brain_salt is missing from .gitignore.
The PR objectives explicitly note that .brain_salt was identified as missing from .gitignore during automated review, yet this change only adds .gstack/. Given that this PR implements per-brain salts for security hardening (graduation jitter, manifest signing, correction provenance), the .brain_salt file likely contains cryptographic secrets that must never be committed to version control.
Without .gitignore protection, .brain_salt could be accidentally committed, exposing security material and undermining the entire salting security model.
🔒 Add `.brain_salt` to the Secrets section
# Secrets
.env
.env.*
*.secret
credentials*
+.brain_salt🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In @.gitignore at line 144, The .gitignore change only added .gstack/ but
omitted .brain_salt; update .gitignore to include the .brain_salt entry (e.g.,
add a line with .brain_salt or a pattern like *.brain_salt) under the Secrets
section so per-brain salt files are never tracked, and ensure the change
references the existing .gstack/ entry for context; commit the updated
.gitignore so .brain_salt is ignored going forward.
| min_severity: str = "as-is", scope: str | None = None, | ||
| ) -> dict: |
There was a problem hiding this comment.
Forward scope through the cloud fast-path.
When brain._cloud.connected is true, this function returns brain._cloud.correct(...) before any of the new scope-tagging/persistence logic runs. one_off/project/universal corrections therefore lose their scope semantics in cloud mode.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_core.py` around lines 81 - 82, The cloud fast-path currently
returns brain._cloud.correct(...) without passing the new scope argument, so
scope semantics are lost when brain._cloud.connected is true; update the code in
the function with signature (min_severity: str = "as-is", scope: str | None =
None, ...) to forward the scope through to the cloud call by supplying
scope=scope (or the appropriate named parameter) when invoking
brain._cloud.correct so one_off/project/universal semantics are preserved in
cloud mode.
| # Write rule provenance for promotions to PATTERN or RULE | ||
| try: | ||
| from gradata.audit import write_provenance | ||
| from gradata.inspection import _make_rule_id | ||
| from datetime import datetime, UTC | ||
| now_prov = datetime.now(UTC).isoformat() | ||
| for l, old_s, new_s in transitions: | ||
| if new_s in ("PATTERN", "RULE"): | ||
| rid = _make_rule_id(l) |
There was a problem hiding this comment.
Description rewrites can suppress promotion provenance.
These blocks rely on transitions, but brain_end_session() keys pre-session state earlier off category + description[:60]. graduate() can rewrite a PATTERN description before promotion, so the old key no longer matches, the promotion never enters transitions, and this provenance / graduated_rules path is skipped entirely. Track pre/post state by object identity or a persisted immutable lesson ID instead of mutable description text.
Also applies to: 642-654
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/_core.py` around lines 544 - 552, The promotion provenance logic
that iterates over transitions (used in write_provenance and graduated_rules) is
brittle because brain_end_session() and related code key off mutable strings
like category + description[:60], and graduate() can rewrite a PATTERN
description before promotion causing keys to mismatch and skips; fix by changing
the tracking key to an immutable identifier (e.g., a lesson_id stored on the
Lesson object) or by using object identity to record pre/post state in
brain_end_session(), update graduate() to preserve or propagate that immutable
lesson_id, and then use that id in the provenance loop (the code referencing
transitions, write_provenance, _make_rule_id, and graduated_rules) so promotions
are detected regardless of description rewrites; apply the same change to the
other similar block handling promotions (the block around lines 642-654).
| try: | ||
| with get_connection(db_path) as conn: | ||
| conn.execute( | ||
| "INSERT INTO rule_provenance " | ||
| "(rule_id, correction_event_id, session, timestamp, user_context) " | ||
| "VALUES (?, ?, ?, ?, ?)", | ||
| (rule_id, correction_event_id, session, timestamp, user_context), | ||
| ) | ||
| except Exception as e: | ||
| _log.debug("write_provenance failed: %s", e) |
There was a problem hiding this comment.
Run migrations before standalone provenance writes.
This is a Layer-0 API, but it only opens the DB and swallows any insert error. On a fresh system.db that has not run run_migrations(), every provenance write becomes a silent no-op. Create or migrate the schema here before inserting.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/audit.py` around lines 47 - 56, The insert into rule_provenance
silently fails on a fresh DB because the code opens get_connection and swallows
schema-related insert errors; before executing the INSERT in write_provenance
(the block that calls get_connection and inserts into rule_provenance), ensure
the DB schema is created/migrated by invoking the migration routine (e.g.,
run_migrations or the existing schema initialization helper) against db_path,
then proceed with the insert and keep the current debug logging for other
errors; reference get_connection, write_provenance, run_migrations, and the
rule_provenance table to locate where to add the migration call.
| def forget(self, what: str = "last") -> dict | list[dict]: | ||
| """Human-friendly way to undo lessons. | ||
|
|
||
| Examples: | ||
| brain.forget("last") # most recent lesson | ||
| brain.forget("last 3") # last 3 lessons | ||
| brain.forget("casual tone") # fuzzy match description | ||
| brain.forget("all tone") # everything in TONE category | ||
| """ | ||
| try: | ||
| from gradata.enhancements.self_improvement import parse_lessons, format_lessons | ||
| except ImportError: | ||
| return 0 | ||
| return {"rolled_back": False, "error": "enhancements not installed"} | ||
| from gradata._types import LessonState | ||
| from gradata._db import write_lessons_safe | ||
|
|
||
| lessons_path = self._find_lessons_path() | ||
| if not lessons_path or not lessons_path.is_file(): | ||
| return {"rolled_back": False, "error": "no lessons file"} | ||
| lessons = parse_lessons(lessons_path.read_text(encoding="utf-8")) | ||
| before = len(lessons) | ||
| filtered = [l for l in lessons if not ( | ||
| (description and description.lower() in l.description.lower()) or | ||
| (category and l.category.upper() == category.upper()))] | ||
| removed = before - len(filtered) | ||
| if removed > 0: | ||
| from gradata._db import write_lessons_safe | ||
| write_lessons_safe(lessons_path, format_lessons(filtered)) | ||
| return removed | ||
|
|
||
| what = what.strip() | ||
| wl = what.lower() | ||
|
|
||
| # Resolve target indices | ||
| active = [(i, l) for i, l in enumerate(lessons) | ||
| if l.state in (LessonState.INSTINCT, LessonState.PATTERN, LessonState.RULE)] | ||
| targets: list[int] = [] | ||
|
|
||
| if wl == "last" or wl.startswith("last "): | ||
| parts = wl.split() | ||
| n = int(parts[1]) if len(parts) == 2 and parts[1].isdigit() else 1 | ||
| if not active: | ||
| return {"rolled_back": False, "error": "no active lessons"} | ||
| targets = [i for i, _ in active[-n:]] | ||
|
|
||
| elif wl.startswith("all "): | ||
| cat = what[4:].strip() | ||
| targets = [i for i, l in active if l.category.upper() == cat.upper()] | ||
| if not targets: | ||
| return {"rolled_back": False, "error": f"no active lessons in '{cat}'"} | ||
|
|
||
| else: | ||
| # Fuzzy match on description — single target | ||
| return self.rollback(description=what) |
There was a problem hiding this comment.
This is a breaking change to Brain.forget()'s public signature.
Replacing the old keyword-based API with a single what string will raise TypeError for existing callers using forget(description=...) or forget(category=...). It also makes the return type data-dependent (dict | list[dict]). Unless this PR is intentionally versioned as a breaking change, keep a compatibility shim for the old parameters.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/gradata/brain.py` around lines 472 - 516, The public method Brain.forget
has been changed to accept a single what string which breaks callers using the
old description=... or category=... keywords; add a compatibility shim by
updating forget (Brain.forget) to accept optional keyword args description: str
| None = None and category: str | None = None (or **kwargs and pop them) and map
them to the new behavior: if description is provided call and return
self.rollback(description=description), if category is provided set what = f"all
{category}" (or set wl accordingly) before the existing resolution logic; ensure
the function still references rollback, parse_lessons, LessonState and
write_lessons_safe as before and does not raise TypeError for legacy calls.
| class TestCreateProvenance: | ||
| def test_returns_dict_with_required_fields(self): | ||
| record = create_provenance_record( | ||
| user_id="oliver", correction_hash="abc123", | ||
| session=5, salt="test-salt", | ||
| ) | ||
| assert isinstance(record, dict) | ||
| assert record["user_id"] == "oliver" | ||
| assert record["session"] == 5 | ||
| assert isinstance(record["hmac"], str) | ||
| assert len(record["hmac"]) == 64 # SHA-256 hex digest | ||
|
|
||
| def test_hmac_is_deterministic_for_same_timestamp(self): | ||
| """Same inputs + same timestamp = same HMAC.""" | ||
| r1 = create_provenance_record( | ||
| user_id="u", correction_hash="h", session=1, salt="s", | ||
| ) | ||
| # Manually rebuild to verify | ||
| import hashlib, hmac, time | ||
| msg = f"u|h|1|{r1['timestamp']}" | ||
| expected = hmac.new(b"s", msg.encode(), hashlib.sha256).hexdigest() | ||
| assert r1["hmac"] == expected |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Cover the rejected-input path for provenance creation.
These tests verify happy-path signing and tamper detection, but they never exercise the empty salt / empty user_id / empty correction_hash / negative session guards. A regression in those boundary checks would currently be invisible.
As per coding guidelines "Validate input at system boundaries."
Also applies to: 33-78
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_correction_provenance.py` around lines 9 - 30, Add negative-path
unit tests in tests/test_correction_provenance.py alongside TestCreateProvenance
to exercise input validation for create_provenance_record: add tests that call
create_provenance_record with empty user_id, empty correction_hash, empty salt,
and negative session and assert the function raises an error (e.g., ValueError)
for each invalid input; ensure each test is named clearly (e.g.,
test_raises_on_empty_user_id, test_raises_on_empty_correction_hash,
test_raises_on_empty_salt, test_raises_on_negative_session) and uses the same
invocation pattern as the existing tests so regressions in input guards are
detected.
| def test_explain_existing_rule(self, brain_dir: Path): | ||
| """explain_rule returns metadata + transitions for an existing rule.""" | ||
| rules = list_rules(db_path=brain_dir / "system.db", | ||
| lessons_path=brain_dir / "lessons.md") | ||
| rule_id = rules[0]["id"] # first RULE/PATTERN | ||
| result = explain_rule(db_path=brain_dir / "system.db", | ||
| events_path=brain_dir / "events.jsonl", | ||
| rule_id=rule_id, | ||
| lessons_path=brain_dir / "lessons.md") | ||
| assert "description" in result | ||
| assert "category" in result | ||
| assert "transitions" in result | ||
|
|
There was a problem hiding this comment.
Assert the seeded transition row, not just that the key exists.
This still passes when explain_rule() returns transitions: [], which is the exact failure mode from a broken transition lookup. Pick the seeded DRAFTING rule explicitly and assert the returned state change so the provenance path is actually covered.
As per coding guidelines, tests/**: assertions check specific values not just truthiness.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_inspection.py` around lines 148 - 160, Select the seeded DRAFTING
rule explicitly instead of using rules[0], e.g. scan the list returned by
list_rules() to find the rule whose metadata or transitions reference
"DRAFTING", use that rule's id when calling explain_rule(rule_id=...), and
replace the loose existence assertion with a concrete check on the returned
transitions: assert that result["transitions"] contains at least one transition
object with the expected state change (for example a transition["from"] ==
"DRAFTING" or transition["to"] == "DRAFTING") so the provenance path is actually
validated.
| def test_pii_redacted_in_stored_event(self, brain): | ||
| """PII in draft/final should be redacted in the emitted event data.""" | ||
| draft = "Send report to alice@notify.com" | ||
| final = "Send report to the client" | ||
| result = brain.correct(draft, final) | ||
| stored_draft = result.get("data", {}).get("draft_text", "") | ||
| # The email in draft should be redacted | ||
| assert "alice@notify.com" not in stored_draft | ||
| assert "[REDACTED_EMAIL]" in stored_draft | ||
|
|
||
| def test_behavioral_extraction_uses_full_text(self, brain): | ||
| """Extraction should produce meaningful classification from unredacted text, | ||
| and redacted input should produce weaker results.""" | ||
| draft = "Contact alice@notify.com for the credentials" | ||
| final = "Contact the client for access credentials" | ||
| result = brain.correct(draft, final) | ||
| # Should still produce valid classifications from the full diff | ||
| assert "classifications" in result | ||
| severity = result.get("data", {}).get("severity", "unknown") | ||
| assert severity != "unknown" | ||
| # Control: pre-redacted draft should still classify but may differ | ||
| redacted_draft = "Contact [REDACTED_EMAIL] for the credentials" | ||
| result_redacted = brain.correct(redacted_draft, final) | ||
| assert "classifications" in result_redacted | ||
|
|
||
| def test_key_redacted_in_event(self, brain): | ||
| """Credential patterns in draft text must not leak to storage.""" | ||
| fake = _build([chr(115), chr(107), "-", "proj-", "K" * 30]) | ||
| draft = f"Use credential {fake} to authenticate" | ||
| final = "Use the provided credentials to authenticate" | ||
| result = brain.correct(draft, final) | ||
| stored_draft = result.get("data", {}).get("draft_text", "") | ||
| assert fake not in stored_draft | ||
| assert "[REDACTED_OPENAI_KEY]" in stored_draft |
There was a problem hiding this comment.
Cover the ImportError fallback branch too.
src/gradata/_core.py:115-130 still emits raw draft[:2000] / final[:2000] in the except ImportError path. These integration tests only hit the happy path, so that PII leak can ship unnoticed. Add a regression that forces the fallback and asserts subscribers still see redacted payloads.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_pii_redaction.py` around lines 147 - 180, The ImportError fallback
path in src/gradata/_core.py still returns raw draft/final and lacks test
coverage; add a new test that forces the ImportError branch (e.g., via
monkeypatching import machinery or removing the redaction backend so the import
in _core.py raises ImportError) then call brain.correct(draft, final) (reuse the
existing patterns in test_pii_redaction.py) and assert the stored event payloads
(result.get("data", {}).get("draft_text", "")) do not contain raw PII like
"alice@notify.com" or the fake key and instead contain the appropriate redaction
tokens (e.g., "[REDACTED_EMAIL]", "[REDACTED_OPENAI_KEY]"), ensuring the
fallback path also redacts output.
| def test_default_scope_is_domain(tmp_path: Path): | ||
| """brain.correct() without scope param defaults to 'domain'.""" | ||
| from tests.conftest import init_brain | ||
|
|
||
| brain = init_brain(tmp_path) | ||
| result = brain.correct("Use em dash — here", "Use colon: here") | ||
| # The correction data should have correction_scope = "domain" | ||
| assert result.get("correction_scope") == "domain" | ||
|
|
||
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Explicit scope override | ||
| # --------------------------------------------------------------------------- | ||
|
|
||
| def test_explicit_scope_universal(tmp_path: Path): | ||
| """brain.correct(scope='universal') sets correction_scope to universal.""" | ||
| from tests.conftest import init_brain | ||
|
|
||
| brain = init_brain(tmp_path) | ||
| result = brain.correct("bad output", "good output", scope="universal") | ||
| assert result.get("correction_scope") == "universal" | ||
|
|
||
|
|
||
| def test_explicit_scope_one_off(tmp_path: Path): | ||
| """brain.correct(scope='one_off') sets correction_scope to one_off.""" | ||
| from tests.conftest import init_brain | ||
|
|
||
| brain = init_brain(tmp_path) | ||
| result = brain.correct("wrong thing", "right thing", scope="one_off") | ||
| assert result.get("correction_scope") == "one_off" |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Drive one integration case through CorrectionScope, not only raw strings.
These tests validate the enum values, but every brain.correct(..., scope=...) call still passes a string literal. The suite will stay green even if CorrectionScope.PROJECT breaks serialization or comparison in the public API. Add at least one case that passes the enum through directly.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_scope_tagging.py` around lines 38 - 67, Update the tests to
exercise the CorrectionScope enum directly by calling brain.correct with
CorrectionScope.<SCOPE> instead of string literals; e.g., add or modify a test
(e.g., in test_explicit_scope_one_off or a new test function) to import
CorrectionScope and pass CorrectionScope.ONE_OFF (or CorrectionScope.UNIVERSAL)
into brain.correct(...) and assert result.get("correction_scope") matches the
expected serialized value, ensuring brain.correct, CorrectionScope, and the
serialization/comparison logic are exercised.
| class TestConstantTimePad: | ||
| """Verify timing-attack defense via constant_time_pad.""" | ||
|
|
||
| def test_padded_takes_at_least_min_ms(self) -> None: | ||
| """Padded function should take at least min_ms milliseconds.""" | ||
| min_ms = 30.0 | ||
| start = time.perf_counter() | ||
| constant_time_pad(lambda: 42, min_ms=min_ms, jitter_ms=0.0) | ||
| elapsed_ms = (time.perf_counter() - start) * 1000 | ||
| assert elapsed_ms >= min_ms * 0.95, ( | ||
| f"Expected >= {min_ms * 0.95:.1f}ms, got {elapsed_ms:.1f}ms" | ||
| ) | ||
|
|
||
| def test_returns_function_result(self) -> None: | ||
| """Should return whatever fn() returns.""" | ||
| result = constant_time_pad(lambda: "hello", min_ms=5.0, jitter_ms=0.0) | ||
| assert result == "hello" | ||
|
|
||
| def test_returns_none_from_void_fn(self) -> None: | ||
| result = constant_time_pad(lambda: None, min_ms=5.0, jitter_ms=0.0) | ||
| assert result is None | ||
|
|
||
| def test_jitter_adds_variance(self) -> None: | ||
| """With jitter, not all durations should be identical.""" | ||
| durations: list[float] = [] | ||
| for _ in range(10): | ||
| start = time.perf_counter() | ||
| constant_time_pad(lambda: 1, min_ms=5.0, jitter_ms=10.0) | ||
| durations.append((time.perf_counter() - start) * 1000) | ||
| # With 10ms jitter range over 10 runs, we expect some variance | ||
| assert max(durations) - min(durations) > 0.5, ( | ||
| f"Expected timing variance from jitter, got spread " | ||
| f"{max(durations) - min(durations):.2f}ms" | ||
| ) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major
Add a failure-path regression for constant_time_pad().
This suite only measures successful calls. Without a case where fn() raises, the timing-defense contract on the exception path stays untested, which is exactly where the current implementation still leaks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/test_score_obfuscation.py` around lines 153 - 186, Add a regression
test in the TestConstantTimePad suite that exercises the failure path of
constant_time_pad by calling it with a function that raises (e.g., raise
RuntimeError) and verify two things: the exception is propagated (use an
assertion that the call raises the expected exception) and the call still
respects the timing defense by taking at least the specified min_ms (measure
elapsed time around the call and assert elapsed >= min_ms * 0.95, use
jitter_ms=0.0 to avoid noise). Target the TestConstantTimePad class and the
constant_time_pad function when adding this new test case.
Summary
SDK P0 hardening (Plan A) + security hardening (Plan C) from OASIS simulation findings.
Driven by: 6 OASIS sims (360 agents, 1,565 comments), 4 academic research documents, 5 Kenoodl bottomlines, exhaustive gap audit.
Plan A — SDK P0 Hardening (6 features)
brain.rules(),brain.explain(),brain.export_data()— rule inspection APIpending_promotions(),approve/reject_promotion()brain.trace(), rule_provenance tablePlan C — Security Hardening (8 features)
Stats
Test plan
Generated with Gradata
Greptile Summary
This PR delivers a large hardening milestone across two parallel tracks: Plan A (SDK P0 — rule inspection API, correction scope tagging, PII redaction, batch approval, SQLite audit trail, safety assertions) and Plan C (security — score obfuscation, per-brain salts, timing-attack defenses, bucketed injection shuffle, query budgeting, manifest signing, rule metadata, and correction provenance). At 3,700+ lines and 170 new tests, the scope is ambitious and the implementation is largely sound — the security primitives in particular are well-designed and correctly wired.
Key findings:
brain.forget()— the parameter signature changed fromforget(description, category) -> inttoforget(what) -> dict | list[dict]. Keyword callers (brain.forget(description="...")) will get aTypeErrorat runtime, and callers that test the integer return value will silently mis-branch. No deprecation shim is provided.pending_promotions()is invisible toapproval_required=Truelessons — these lessons are frozen at INSTINCT withpending_approval=Trueand can never appear inpending_promotions()(which only returns PATTERN/RULE state). The batch-approval workflow is inaccessible for the exact use case it was designed for._yaml_valininspection.pywill produce malformed YAML if any rule description contains a literal newline or carriage return, which is plausible for multi-line AI-generated descriptions on Windows..brain_saltnot in.gitignore— the new per-brain salt file could be accidentally committed if a user's brain directory overlaps a git repo.The security module (
brain_salt,query_budget,score_obfuscation,manifest_signing,correction_provenance) is well-implemented and correctly integrated. Prior review concerns (constant_time_padexport, credit card regex, safety assertion dead code,truncate_scorelabel mismatch) were addressed or are tracked.Confidence Score: 3/5
Not safe to merge until the
brain.forget()API break and thepending_promotions()approval-workflow bug are resolved — both affect publicly callable methods.The security primitives and pipeline integrations are well-executed. But two P1 logic issues remain in the public API:
brain.forget()silently breaks all keyword callers (signature changed without deprecation), andpending_promotions()never surfacesapproval_required=Truelessons, making the batch approval feature non-functional for its primary use case. These are concrete, reproducible bugs — not hypotheticals — and they touch the surface area being advertised in the PR description.src/gradata/brain.py(forget API break),src/gradata/brain_inspection.py(pending_promotions filter),src/gradata/inspection.py(YAML newline escaping)Vulnerabilities
events.jsonlor the lessons file._brain_saltis used as HMAC key in both provenance signing and manifest signing; no log lines emit the salt value.hmac.compare_digestused everywhere —verify_manifestandverify_provenanceboth use constant-time comparison, preventing timing oracle attacks on signature verification.apply_brain_rulesblocks rule injection (returns"") when the sliding-window limit is exceeded, preventing prompt-injection amplification.brain_diris inside a tracked git repo,.brain_saltcould be committed, exposing the HMAC key used for manifest and provenance signing. Impact is limited (attacker would need DB access too), but the fix is trivial (add to.gitignore).Important Files Changed
forget()— butforget()is a breaking API change (parameter names and return type both changed).rules(),explain(),trace(),export_data(),pending_promotions(),approve_promotion(),reject_promotion()—pending_promotions()is broken forapproval_required=Truelessons that stay at INSTINCT.graduate()andupdate_confidence(), ONE_OFF scope blocking, scope_json round-trip serialization, and RuleMetadata parse/format. Logic is correct..brain_salt; HMAC-derived threshold jitter is sound. Well-implemented with atomic rename fallback._prunecorrectly removes stale entries; burst anomaly sub-window edge cases handled cleanly.constant_time_padproperly usessecrets.randbelowfor non-deterministic jitter; all exported from__init__.py.signatureandsigned_atfrom the signed payload;hmac.compare_digestused correctly.verify_provenanceuseshmac.compare_digestfor constant-time comparison.CorrectionScopeenum andRuleMetadatadataclass with 5W1H fields and dual utility/safety scores;__post_init__clamps scores to [0,1].rule_provenancetable and its index viaCREATE IF NOT EXISTS— safe for existing DBs.constant_time_pad(prior review concern resolved).Sequence Diagram
sequenceDiagram participant U as User participant B as Brain participant C as _core.py participant S as safety.py participant P as correction_provenance participant A as audit.py participant G as graduate() participant RE as rule_engine U->>B: brain.correct(draft, final, scope="domain") B->>C: brain_correct(...) C->>C: extract behavioral instruction (full text) C->>S: redact_pii_with_report(draft/final) S-->>C: redacted texts C->>C: tag correction_scope → scope_data C->>C: emit CORRECTION event (redacted) C->>P: create_provenance_record(user_id, hash, session, salt) P-->>C: HMAC-signed provenance C-->>B: event dict U->>B: brain.end_session() B->>C: brain_end_session(salt=_brain_salt) C->>G: graduate(lessons, salt=_brain_salt) G->>G: salt_threshold(PATTERN_THRESHOLD, salt) G->>G: check ONE_OFF scope → block_promotion G->>G: promote/demote lessons G-->>C: active, graduated, transitions C->>A: write_provenance(rule_id, correction_event_id, session) C->>RE: format_rules_for_prompt(rules, bucketed_shuffle) C-->>B: session result U->>B: brain.rules() / brain.explain(rule_id) / brain.trace(rule_id) B->>RE: list_rules / explain_rule / trace_rule RE-->>B: rule dicts (tier label, not raw float)Comments Outside Diff (2)
src/gradata/brain.py, line 474-476 (link)is_rate_exceededreturnsTruebut execution falls straight through to the normal call path — only alogger.warningis emitted. The rate limiter currently has zero effect on real traffic; it is monitoring-only, not enforcement.If the intent is non-blocking (alerting only), the method name and surrounding comment are misleading. If the intent is to actually throttle, the method should either raise or return an empty string:
Prompt To Fix With AI
src/gradata/_core.py, line 107-116 (link)getattr(brain, "_brain_salt", "")returns""if the attribute doesn't exist (unlikely post-__init__, but still possible in test mocks or subclasses).create_provenance_recordaccepts this silently, producing an HMAC keyed with an empty byte string — which is technically valid HMAC but provides no security guarantee whatsoever, and the caller has no way to detect the degraded state.At minimum, log a warning when the salt is empty:
Prompt To Fix With AI
Prompt To Fix All With AI
Reviews (2): Last reviewed commit: "fix: resolve all CodeRabbit review findi..." | Re-trigger Greptile