fix(confidence): 4 confidence-math repairs per gap-analysis synthesis by Gradata · Pull Request #71 · Gradata/gradata

Gradata · 2026-04-15T07:58:57Z

Summary

Closes gap #1 (threshold inconsistency), gap #4 (survival-bonus bypass), gap #7 (penalty oscillation), partial gap #10 (sybil graduation cost).

Commits (4 atomic fixes)

a1ac220 fix(confidence): align RULE_THRESHOLD with injection floor at 0.90
- Closes gap Review: Full codebase audit via Greptile #1: resolves self_improvement.py:35's RULE_THRESHOLD=0.80 vs rule_engine.py:354's 0.90 injection floor
9d154c6 fix(confidence): gate survival fire_count on injection evidence
- Closes gap feat: behavioral extraction + business reposition #4: survival-bonus no longer increments fire_count when lesson was never injected; ends cold-start-to-PATTERN-in-3-silent-sessions path
34d6c6d fix(confidence): cap per-step penalty and enforce monotone updates
- Closes gap Docs site with API reference #7: fsrs_penalty stacking multipliers now capped; resolves "one-then-overwrite" inconsistency with _bayesian_confidence
9a552c4 fix(confidence): cap per-session delta at one graduation tier transition
- Partial gap Support for custom LLM providers in behavioral extraction #10: per-session delta cap narrows sybil attack surface; full fix needs provenance hash + diversity requirement (separate 3-day PR)

Tests

2077 pass per prior audit, ruff clean. No regressions in the tight test set for self_improvement / self_healing.

Why now

Every ablation result was technically suspect while gap #1 was open (different threshold in self_improvement vs rule_engine). Gap #4 meant confidence scores were inflated by an unknown amount. Both are correctness-level fixes that should land before any further threshold tuning.

Co-Authored-By: Gradata noreply@gradata.ai

Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90 floor at injection time. Any lesson graduated in 0.80-0.89 was silently blocked from ever firing — non-deterministic behaviour depending on which gate ran. Plan: - Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate that governs what actually reaches the model). - Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules whose confidence recently dipped after a correction-cycle penalty); left at 0.80 with an explanatory comment. - Confirm existing tests hold: test_safety_assertion, test_enhancements, test_clb, test_spec_compliance range all pass at 0.90. Adversary check: - Does 0.90 break pattern->rule promotion tests? No: surviving lessons at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed). - Does raising the threshold break rule_engine tests? No: rule_engine already hard-codes 0.90. - Does self_healing at 0.80 conflict with the stated goal? Distinct concern: it detects rules that *should have* fired, not rules that *can* be injected. Documented inline.

Audit finding gap-analysis/01-internal-audit.md #1.10: the survival branch in update_confidence unconditionally incremented fire_count, bypassing the "no promotion from silence" invariant asserted in graduate() (self_improvement.py:856-861). A lesson that was never actually injected could reach PATTERN in 3 sessions purely from survival bonuses — exactly the failure mode the fire-count gate is supposed to prevent. Plan: - Thread injection evidence through update_confidence via a new injected_lesson_keys parameter and an existing lesson attribute (_was_injected_this_session). Preserves existing callers that don't track injection. - On survival, still apply the confidence bonus (legacy semantics) but only increment fire_count when either signal confirms the lesson was injected. sessions_since_fire is also gated on the same evidence, so silent lessons correctly age toward UNTESTABLE. - Extend tests/test_safety_assertion.py with 4 cases covering the new behaviour, including the Sybil-style "3 silent survivals" scenario that the audit flagged as a promotion-from-silence bug. Adversary check: - Existing tests that rely on fire_count post-survival? Audited: test_clb.py manually increments fire_count ("# simulate rule being applied"), test_enhancements pre-seeds fire_count to the threshold. Full suite stays green (2070 passed, 23 skipped). - Does dropping sessions_since_fire reset hurt UNTESTABLE detection? No — it strengthens it. Silent lessons now age correctly. - Is injected_lesson_keys ever authoritative when upstream hooks don't wire it? It's opt-in; lessons without evidence default to the conservative "no increment" path, which is the fix's intent.

Audit finding gap-analysis/01-internal-audit.md #1.3: the compound penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8 * rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract ~0.63 from a RULE at 0.90 in a single correction. Combined with the Bayesian blend overwrite that follows, the system oscillates under alternating corrections — exactly the scenario the streak logic is supposed to handle. Plan: - Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment justifying the value (one graduation band's worth of confidence, matching MAX_PER_SESSION_DELTA so a single correction cannot chain two tier demotions in one tick). - Reconcile the two update paths. Pick ONE rule: FSRS first, then Bayesian blend as a second opinion — but clamp the blend result so it can never pull in the opposite direction of the current event. Penalty event: blend cannot increase confidence. Reinforcement event: blend cannot decrease it. This preserves the Bayesian signal while making the per-step update strictly monotone. - Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections never oscillate in the wrong direction. Adversary check: - Does the cap weaken rewrite-severity signals? No: 0.20 per step is still strong — four rewrite corrections drop a 0.90 RULE to 0.10. - Does the monotone guard silently hide Bayesian disagreement? No — blend is still computed and logged via lesson.alpha/beta, only the per-tick assignment is direction-clamped. Over time the posterior still dominates via blend_w. - Does clamping to pre-event confidence break Bayesian convergence? For a truly stable posterior, pre/post converge — the clamp is a no-op. For a divergent one, FSRS leads and Bayesian follows in the same tick-direction.

Audit finding gap-analysis/01-internal-audit.md Gap 4 (red-team note): the Sybil attack. Given +0.12 per rewrite reinforcement, 10 coordinated same-session corrections push a fresh lesson 0 -> PATTERN -> RULE in one tick. Fire-count gates do not catch this once a brain has enough historical fires; the math alone lets confidence chain two tier transitions inside one update_confidence call. Plan: - Add MAX_PER_SESSION_DELTA = 0.30 constant. Value chosen so at most ONE graduation tier can be crossed (INSTINCT->PATTERN spans 0.20; PATTERN->RULE spans 0.30; INSTINCT->RULE would need 0.50). - Snapshot _pre_session_confidence and _pre_session_state on first entry into the per-lesson loop. The pre-confidence attribute was already read by graduate() for its jump warning — this extends the existing pattern so update_confidence is the single source of truth. - Clamp confidence to [pre - delta, pre + delta] AFTER per-correction updates, BEFORE the inline promotion loop evaluates thresholds. - Guard both the inline update_confidence graduation AND the separate graduate() function so neither can re-promote a lesson whose tier already changed this session. Document the invariant inline. - Extend tests with the Sybil burst scenario: 10 stacked rewrite reinforcements must not promote past ONE tier and must respect the absolute delta cap. Adversary check: - Does 0.30 make real convergence too slow? A lesson still crosses one tier per session when corrections support it. Crossing two tiers in one session was never the design intent — graduate()'s docstring explicitly says fire-count gates are non-bypassable. - Does the snapshot leak state across sessions? _pre_session_confidence is set on first entry; it already existed as a caller-owned attr (see test_safety_assertion::TestConfidenceJumpWarning). It survives for the lifetime of the Lesson object which, in the SDK, is per-session (lessons are re-parsed from disk each tick). - Does blocking graduate() promotion for already-transitioned lessons cause silent demotions? No — the block is symmetric with the original _pre_session_state check and transitions are monotone per-tier in graduate() anyway (no double-step code path exists).

greptile-apps

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

coderabbitai · 2026-04-15T07:59:05Z

Warning

Rate limit exceeded

@Gradata has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 43 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 43 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d44cc96-4f30-4f57-98d6-54910d20dacd

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd7215 and 9a552c4.

📒 Files selected for processing (4)

src/gradata/enhancements/self_healing.py
src/gradata/enhancements/self_improvement.py
tests/test_enhancements.py
tests/test_safety_assertion.py

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch worktree-agent-aeb38fc0

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

cloudflare-workers-and-pages · 2026-04-15T08:02:27Z

Deploying gradata-dashboard with Cloudflare Pages

Latest commit:	`9a552c4`
Status:	✅ Deploy successful!
Preview URL:	https://0c112708.gradata-dashboard.pages.dev
Branch Preview URL:	https://worktree-agent-aeb38fc0.gradata-dashboard.pages.dev

View logs

Gradata added 4 commits April 14, 2026 15:34

greptile-apps Bot reviewed Apr 15, 2026

View reviewed changes

Gradata merged commit 393a89d into main Apr 15, 2026
16 checks passed

Gradata deleted the worktree-agent-aeb38fc0 branch April 17, 2026 19:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(confidence): 4 confidence-math repairs per gap-analysis synthesis#71

fix(confidence): 4 confidence-math repairs per gap-analysis synthesis#71
Gradata merged 4 commits into
mainfrom
worktree-agent-aeb38fc0

Gradata commented Apr 15, 2026

Uh oh!

greptile-apps Bot left a comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Rate limit exceeded

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Gradata commented Apr 15, 2026

Summary

Commits (4 atomic fixes)

Tests

Why now

Uh oh!

greptile-apps Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot commented Apr 15, 2026

Rate limit exceeded

Uh oh!

cloudflare-workers-and-pages Bot commented Apr 15, 2026

Deploying gradata-dashboard with Cloudflare Pages

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant