fix(confidence): 4 confidence-math repairs per gap-analysis synthesis#71
Conversation
Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90 floor at injection time. Any lesson graduated in 0.80-0.89 was silently blocked from ever firing — non-deterministic behaviour depending on which gate ran. Plan: - Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate that governs what actually reaches the model). - Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules whose confidence recently dipped after a correction-cycle penalty); left at 0.80 with an explanatory comment. - Confirm existing tests hold: test_safety_assertion, test_enhancements, test_clb, test_spec_compliance range all pass at 0.90. Adversary check: - Does 0.90 break pattern->rule promotion tests? No: surviving lessons at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed). - Does raising the threshold break rule_engine tests? No: rule_engine already hard-codes 0.90. - Does self_healing at 0.80 conflict with the stated goal? Distinct concern: it detects rules that *should have* fired, not rules that *can* be injected. Documented inline.
Audit finding gap-analysis/01-internal-audit.md #1.10: the survival branch in update_confidence unconditionally incremented fire_count, bypassing the "no promotion from silence" invariant asserted in graduate() (self_improvement.py:856-861). A lesson that was never actually injected could reach PATTERN in 3 sessions purely from survival bonuses — exactly the failure mode the fire-count gate is supposed to prevent. Plan: - Thread injection evidence through update_confidence via a new injected_lesson_keys parameter and an existing lesson attribute (_was_injected_this_session). Preserves existing callers that don't track injection. - On survival, still apply the confidence bonus (legacy semantics) but only increment fire_count when either signal confirms the lesson was injected. sessions_since_fire is also gated on the same evidence, so silent lessons correctly age toward UNTESTABLE. - Extend tests/test_safety_assertion.py with 4 cases covering the new behaviour, including the Sybil-style "3 silent survivals" scenario that the audit flagged as a promotion-from-silence bug. Adversary check: - Existing tests that rely on fire_count post-survival? Audited: test_clb.py manually increments fire_count ("# simulate rule being applied"), test_enhancements pre-seeds fire_count to the threshold. Full suite stays green (2070 passed, 23 skipped). - Does dropping sessions_since_fire reset hurt UNTESTABLE detection? No — it strengthens it. Silent lessons now age correctly. - Is injected_lesson_keys ever authoritative when upstream hooks don't wire it? It's opt-in; lessons without evidence default to the conservative "no increment" path, which is the fix's intent.
Audit finding gap-analysis/01-internal-audit.md #1.3: the compound penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8 * rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract ~0.63 from a RULE at 0.90 in a single correction. Combined with the Bayesian blend overwrite that follows, the system oscillates under alternating corrections — exactly the scenario the streak logic is supposed to handle. Plan: - Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment justifying the value (one graduation band's worth of confidence, matching MAX_PER_SESSION_DELTA so a single correction cannot chain two tier demotions in one tick). - Reconcile the two update paths. Pick ONE rule: FSRS first, then Bayesian blend as a second opinion — but clamp the blend result so it can never pull in the opposite direction of the current event. Penalty event: blend cannot increase confidence. Reinforcement event: blend cannot decrease it. This preserves the Bayesian signal while making the per-step update strictly monotone. - Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections never oscillate in the wrong direction. Adversary check: - Does the cap weaken rewrite-severity signals? No: 0.20 per step is still strong — four rewrite corrections drop a 0.90 RULE to 0.10. - Does the monotone guard silently hide Bayesian disagreement? No — blend is still computed and logged via lesson.alpha/beta, only the per-tick assignment is direction-clamped. Over time the posterior still dominates via blend_w. - Does clamping to pre-event confidence break Bayesian convergence? For a truly stable posterior, pre/post converge — the clamp is a no-op. For a divergent one, FSRS leads and Bayesian follows in the same tick-direction.
Audit finding gap-analysis/01-internal-audit.md Gap 4 (red-team note): the Sybil attack. Given +0.12 per rewrite reinforcement, 10 coordinated same-session corrections push a fresh lesson 0 -> PATTERN -> RULE in one tick. Fire-count gates do not catch this once a brain has enough historical fires; the math alone lets confidence chain two tier transitions inside one update_confidence call. Plan: - Add MAX_PER_SESSION_DELTA = 0.30 constant. Value chosen so at most ONE graduation tier can be crossed (INSTINCT->PATTERN spans 0.20; PATTERN->RULE spans 0.30; INSTINCT->RULE would need 0.50). - Snapshot _pre_session_confidence and _pre_session_state on first entry into the per-lesson loop. The pre-confidence attribute was already read by graduate() for its jump warning — this extends the existing pattern so update_confidence is the single source of truth. - Clamp confidence to [pre - delta, pre + delta] AFTER per-correction updates, BEFORE the inline promotion loop evaluates thresholds. - Guard both the inline update_confidence graduation AND the separate graduate() function so neither can re-promote a lesson whose tier already changed this session. Document the invariant inline. - Extend tests with the Sybil burst scenario: 10 stacked rewrite reinforcements must not promote past ONE tier and must respect the absolute delta cap. Adversary check: - Does 0.30 make real convergence too slow? A lesson still crosses one tier per session when corrections support it. Crossing two tiers in one session was never the design intent — graduate()'s docstring explicitly says fire-count gates are non-bypassable. - Does the snapshot leak state across sessions? _pre_session_confidence is set on first entry; it already existed as a caller-owned attr (see test_safety_assertion::TestConfidenceJumpWarning). It survives for the lifetime of the Lesson object which, in the SDK, is per-session (lessons are re-parsed from disk each tick). - Does blocking graduate() promotion for already-transitioned lessons cause silent demotions? No — the block is symmetric with the original _pre_session_state check and transitions are monotone per-tier in graduate() anyway (no double-step code path exists).
There was a problem hiding this comment.
Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 43 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Deploying gradata-dashboard with
|
| Latest commit: |
9a552c4
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://0c112708.gradata-dashboard.pages.dev |
| Branch Preview URL: | https://worktree-agent-aeb38fc0.gradata-dashboard.pages.dev |
Summary
Closes gap #1 (threshold inconsistency), gap #4 (survival-bonus bypass), gap #7 (penalty oscillation), partial gap #10 (sybil graduation cost).
Commits (4 atomic fixes)
a1ac220fix(confidence): align RULE_THRESHOLD with injection floor at 0.90self_improvement.py:35'sRULE_THRESHOLD=0.80vsrule_engine.py:354's0.90injection floor9d154c6fix(confidence): gate survival fire_count on injection evidence34d6c6dfix(confidence): cap per-step penalty and enforce monotone updatesfsrs_penaltystacking multipliers now capped; resolves "one-then-overwrite" inconsistency with_bayesian_confidence9a552c4fix(confidence): cap per-session delta at one graduation tier transitionTests
2077 pass per prior audit, ruff clean. No regressions in the tight test set for self_improvement / self_healing.
Why now
Every ablation result was technically suspect while gap #1 was open (different threshold in
self_improvementvsrule_engine). Gap #4 meant confidence scores were inflated by an unknown amount. Both are correctness-level fixes that should land before any further threshold tuning.Co-Authored-By: Gradata noreply@gradata.ai