Skip to content

fix(confidence): 4 confidence-math repairs per gap-analysis synthesis#71

Merged
Gradata merged 4 commits into
mainfrom
worktree-agent-aeb38fc0
Apr 15, 2026
Merged

fix(confidence): 4 confidence-math repairs per gap-analysis synthesis#71
Gradata merged 4 commits into
mainfrom
worktree-agent-aeb38fc0

Conversation

@Gradata

@Gradata Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner

Summary

Closes gap #1 (threshold inconsistency), gap #4 (survival-bonus bypass), gap #7 (penalty oscillation), partial gap #10 (sybil graduation cost).

Commits (4 atomic fixes)

  • a1ac220 fix(confidence): align RULE_THRESHOLD with injection floor at 0.90
  • 9d154c6 fix(confidence): gate survival fire_count on injection evidence
  • 34d6c6d fix(confidence): cap per-step penalty and enforce monotone updates
    • Closes gap Docs site with API reference #7: fsrs_penalty stacking multipliers now capped; resolves "one-then-overwrite" inconsistency with _bayesian_confidence
  • 9a552c4 fix(confidence): cap per-session delta at one graduation tier transition

Tests

2077 pass per prior audit, ruff clean. No regressions in the tight test set for self_improvement / self_healing.

Why now

Every ablation result was technically suspect while gap #1 was open (different threshold in self_improvement vs rule_engine). Gap #4 meant confidence scores were inflated by an unknown amount. Both are correctness-level fixes that should land before any further threshold tuning.

Co-Authored-By: Gradata noreply@gradata.ai

Gradata added 4 commits April 14, 2026 15:34
Audit finding gap-analysis/01-internal-audit.md #1.1: RULE_THRESHOLD
was 0.80 while validate_assumptions in rule_engine.py enforces a 0.90
floor at injection time. Any lesson graduated in 0.80-0.89 was silently
blocked from ever firing — non-deterministic behaviour depending on
which gate ran.

Plan:
- Raise RULE_THRESHOLD to 0.90 (injection floor wins; it is the gate
  that governs what actually reaches the model).
- Grep SDK for 0.80/0.90 to catch stale references. self_healing.py's
  DEFAULT_MIN_CONFIDENCE is a distinct detection threshold (catch rules
  whose confidence recently dipped after a correction-cycle penalty);
  left at 0.80 with an explanatory comment.
- Confirm existing tests hold: test_safety_assertion, test_enhancements,
  test_clb, test_spec_compliance range all pass at 0.90.

Adversary check:
- Does 0.90 break pattern->rule promotion tests? No: surviving lessons
  at 0.85 + fsrs_bonus still cross 0.90 in one step (confirmed).
- Does raising the threshold break rule_engine tests? No: rule_engine
  already hard-codes 0.90.
- Does self_healing at 0.80 conflict with the stated goal? Distinct
  concern: it detects rules that *should have* fired, not rules that
  *can* be injected. Documented inline.
Audit finding gap-analysis/01-internal-audit.md #1.10: the survival
branch in update_confidence unconditionally incremented fire_count,
bypassing the "no promotion from silence" invariant asserted in
graduate() (self_improvement.py:856-861). A lesson that was never
actually injected could reach PATTERN in 3 sessions purely from
survival bonuses — exactly the failure mode the fire-count gate is
supposed to prevent.

Plan:
- Thread injection evidence through update_confidence via a new
  injected_lesson_keys parameter and an existing lesson attribute
  (_was_injected_this_session). Preserves existing callers that
  don't track injection.
- On survival, still apply the confidence bonus (legacy semantics)
  but only increment fire_count when either signal confirms the
  lesson was injected. sessions_since_fire is also gated on the
  same evidence, so silent lessons correctly age toward UNTESTABLE.
- Extend tests/test_safety_assertion.py with 4 cases covering the
  new behaviour, including the Sybil-style "3 silent survivals"
  scenario that the audit flagged as a promotion-from-silence bug.

Adversary check:
- Existing tests that rely on fire_count post-survival? Audited:
  test_clb.py manually increments fire_count ("# simulate rule being
  applied"), test_enhancements pre-seeds fire_count to the threshold.
  Full suite stays green (2070 passed, 23 skipped).
- Does dropping sessions_since_fire reset hurt UNTESTABLE detection?
  No — it strengthens it. Silent lessons now age correctly.
- Is injected_lesson_keys ever authoritative when upstream hooks
  don't wire it? It's opt-in; lessons without evidence default to
  the conservative "no increment" path, which is the fix's intent.
Audit finding gap-analysis/01-internal-audit.md #1.3: the compound
penalty stack (ACCELERATION 1.5 * streak_mult 1.5 * severity_boost 1.8
* rule_override 1.2 * severity_weight 1.3 * FSRS 1.22) can subtract
~0.63 from a RULE at 0.90 in a single correction. Combined with the
Bayesian blend overwrite that follows, the system oscillates under
alternating corrections — exactly the scenario the streak logic is
supposed to handle.

Plan:
- Add MAX_PER_STEP_PENALTY = 0.20 constant with an explanatory comment
  justifying the value (one graduation band's worth of confidence,
  matching MAX_PER_SESSION_DELTA so a single correction cannot chain
  two tier demotions in one tick).
- Reconcile the two update paths. Pick ONE rule: FSRS first, then
  Bayesian blend as a second opinion — but clamp the blend result so
  it can never pull in the opposite direction of the current event.
  Penalty event: blend cannot increase confidence. Reinforcement
  event: blend cannot decrease it. This preserves the Bayesian signal
  while making the per-step update strictly monotone.
- Extend tests with TestPenaltyCap: fully-stacked rewrite on a RULE
  stays within MAX_PER_STEP_PENALTY, and 10 alternating corrections
  never oscillate in the wrong direction.

Adversary check:
- Does the cap weaken rewrite-severity signals? No: 0.20 per step is
  still strong — four rewrite corrections drop a 0.90 RULE to 0.10.
- Does the monotone guard silently hide Bayesian disagreement? No —
  blend is still computed and logged via lesson.alpha/beta, only the
  per-tick assignment is direction-clamped. Over time the posterior
  still dominates via blend_w.
- Does clamping to pre-event confidence break Bayesian convergence?
  For a truly stable posterior, pre/post converge — the clamp is a
  no-op. For a divergent one, FSRS leads and Bayesian follows in the
  same tick-direction.
Audit finding gap-analysis/01-internal-audit.md Gap 4 (red-team note):
the Sybil attack. Given +0.12 per rewrite reinforcement, 10 coordinated
same-session corrections push a fresh lesson 0 -> PATTERN -> RULE in
one tick. Fire-count gates do not catch this once a brain has enough
historical fires; the math alone lets confidence chain two tier
transitions inside one update_confidence call.

Plan:
- Add MAX_PER_SESSION_DELTA = 0.30 constant. Value chosen so at most
  ONE graduation tier can be crossed (INSTINCT->PATTERN spans 0.20;
  PATTERN->RULE spans 0.30; INSTINCT->RULE would need 0.50).
- Snapshot _pre_session_confidence and _pre_session_state on first
  entry into the per-lesson loop. The pre-confidence attribute was
  already read by graduate() for its jump warning — this extends the
  existing pattern so update_confidence is the single source of truth.
- Clamp confidence to [pre - delta, pre + delta] AFTER per-correction
  updates, BEFORE the inline promotion loop evaluates thresholds.
- Guard both the inline update_confidence graduation AND the separate
  graduate() function so neither can re-promote a lesson whose tier
  already changed this session. Document the invariant inline.
- Extend tests with the Sybil burst scenario: 10 stacked rewrite
  reinforcements must not promote past ONE tier and must respect
  the absolute delta cap.

Adversary check:
- Does 0.30 make real convergence too slow? A lesson still crosses
  one tier per session when corrections support it. Crossing two
  tiers in one session was never the design intent — graduate()'s
  docstring explicitly says fire-count gates are non-bypassable.
- Does the snapshot leak state across sessions? _pre_session_confidence
  is set on first entry; it already existed as a caller-owned attr
  (see test_safety_assertion::TestConfidenceJumpWarning). It survives
  for the lifetime of the Lesson object which, in the SDK, is
  per-session (lessons are re-parsed from disk each tick).
- Does blocking graduate() promotion for already-transitioned lessons
  cause silent demotions? No — the block is symmetric with the
  original _pre_session_state check and transitions are monotone
  per-tier in graduate() anyway (no double-step code path exists).

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown

Warning

Rate limit exceeded

@Gradata has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 8 minutes and 43 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 8 minutes and 43 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2d44cc96-4f30-4f57-98d6-54910d20dacd

📥 Commits

Reviewing files that changed from the base of the PR and between 5fd7215 and 9a552c4.

📒 Files selected for processing (4)
  • src/gradata/enhancements/self_healing.py
  • src/gradata/enhancements/self_improvement.py
  • tests/test_enhancements.py
  • tests/test_safety_assertion.py
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch worktree-agent-aeb38fc0

Comment @coderabbitai help to get the list of available commands and usage tips.

@cloudflare-workers-and-pages

Copy link
Copy Markdown

Deploying gradata-dashboard with  Cloudflare Pages  Cloudflare Pages

Latest commit: 9a552c4
Status: ✅  Deploy successful!
Preview URL: https://0c112708.gradata-dashboard.pages.dev
Branch Preview URL: https://worktree-agent-aeb38fc0.gradata-dashboard.pages.dev

View logs

@Gradata Gradata merged commit 393a89d into main Apr 15, 2026
16 checks passed
@Gradata Gradata deleted the worktree-agent-aeb38fc0 branch April 17, 2026 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant