Skip to content

feat(corrections): hardening — provenance hash + adversarial blocklist + semantic diff#85

Merged
Gradata merged 3 commits into
mainfrom
feat/correction-hardening-v2
Apr 15, 2026
Merged

feat(corrections): hardening — provenance hash + adversarial blocklist + semantic diff#85
Gradata merged 3 commits into
mainfrom
feat/correction-hardening-v2

Conversation

@Gradata

@Gradata Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner

Summary

Three correction-layer hardening commits (authored 2026-04-14) that were sitting on a stale local main and never got PR'd. Cherry-picked clean onto current main; 2547 tests pass locally.

1. `feat(corrections): provenance hash` (bbb28c7 ← cc53acb)

SHA-256 hash + source-kind classification on every correction. Blocks silent graduation of text pasted from external sources (emails, clipboards, imports) into RULE-state injections. Implements defence #5 from the gap analysis against Greshake et al. 2023 ("Not What You've Signed Up For", arXiv:2302.12173) — LLMs can't reliably distinguish data from instructions, so imperative text pasted into corrections becomes persistent context poisoning once graduated.

Tags added to CORRECTION events: `requires_review:true`, `source_kind:`.

2. `feat(corrections): adversarial-phrase blocklist` (273cbd9 ← b8f1498)

Light-touch prompt-injection defence at ingest time. Scans `draft` and `final` for canonical injection openers ("ignore previous instructions", "jailbreak", "you are now", "system prompt", …). Hits set `requires_review=True` so the approval gate blocks graduation until human promotion.

Flags, does not reject. False-positive cost is one click; false-negative cost is a persistent poisoned RULE.

3. `refactor(diff_engine): semantic + surface edit distance` (4534abc ← b009dc0)

Blends Levenshtein with embedding-cosine distance for severity scoring: `blended = 0.3 · lev_normalized + 0.7 · semantic`. Solves the polarity-flip problem — "helpful" → "helpfully" (low severity) vs "helpful" → "unhelpful" (high severity) have nearly identical Levenshtein distance but opposite semantic distance. Preference-learning grounding: Rafailov et al. 2023 (DPO) treats before/after pairs as preference signal.

Opt-in via `compute_diff(..., use_semantic=True)` or injected `embedder=`. Graceful fallback to surface-only when embedder unavailable.

Merge conflicts resolved

  • `_core.py`: merged `applies_to` tagging (feat(correct): add applies_to= binding token per sim21 trust-crisis scenario #57) with provenance fields — both additive
  • `diff_engine.py`: kept the existing `_analyze_line_opcodes` tuple API (upstream) and layered the semantic-blend logic on top (b009dc0) rather than the attempted helper-function split, which would've referenced `_extract_changed_sections` / `_compute_summary_stats` functions that were never shipped

Test plan

  • `pytest tests/test_diff_engine.py` (21 pass)
  • `pytest tests/` full sweep — 2547 pass, 24 skipped
  • CI green on 3.11 / 3.12 / 3.13
  • CodeRabbit / review

Co-Authored-By: Gradata noreply@gradata.ai

Gradata added 3 commits April 15, 2026 08:39
Add SHA-256 provenance hash + source-context classification to every
correction so text pasted from external sources (emails, clipboards,
arbitrary imports) cannot silently graduate into RULE-state injections.

Implements defence #5 from the gap analysis (red-team A1 indirect prompt
injection). Threat model per Greshake et al. 2023, "Not What You've Signed
Up For" (https://arxiv.org/abs/2302.12173): LLMs cannot reliably distinguish
data from instructions, so any imperative text copied into a correction
becomes persistent context poisoning once graduated.

- new module src/gradata/security/correction_hash.py
  - compute_correction_hash(before, after, source_context) -> 64-char SHA-256
    over length-prefixed canonical payload (collision-resistant for
    concatenation attacks like 'ab'+'c' vs 'a'+'bc')
  - classify_source_context(ctx) -> (source_kind, requires_review); accepts
    dict/str/None; fail-safe default: unknown sources require review
  - build_provenance() one-shot helper
  - SOURCE_USER_EDIT / SOURCE_EXTERNAL_PASTE / SOURCE_UNKNOWN constants +
    alias table (paste, clipboard, imported, untrusted, ...)
- _core.brain_correct attaches provenance_hash, source_kind, requires_review
  to the CORRECTION event data, tags, and return value; escalates
  approval_required=True whenever requires_review=True so the existing
  pending_approvals gate blocks graduation until an explicit promote
  action
- tests/test_correction_hash.py: 30 tests covering determinism, context
  ordering, concat-collision protection, alias normalization, fail-safe
  default, unicode, and end-to-end brain.correct() integration
Light-touch prompt-injection defence at correction-ingest time. Scans
`draft` and `final` for canonical injection openers ("ignore previous
instructions", "jailbreak", "you are now", "system prompt", ...); if any
hit, flags the correction `requires_review=True` so the existing
approval gate blocks graduation until a human promote.

Flag, do not reject — users legitimately write about these concepts when
documenting red-team work or teaching. Cost of a false positive is one
click; cost of a false negative is a persistent poisoned RULE.

References:
- Greshake et al. 2023, "Not What You've Signed Up For": indirect prompt
  injection threat model. https://arxiv.org/abs/2302.12173
- Wallace et al. 2019, "Universal Adversarial Triggers for Attacking and
  Analyzing NLP": transferable trigger sequences.
  https://arxiv.org/abs/1908.07125

- new module src/gradata/security/adversarial_blocklist.py
  - ADVERSARIAL_PHRASES seed list (audit-friendly, <30 entries)
  - scan_for_adversarial_phrases() case-insensitive, whitespace-tolerant
    regex match; returns canonical lowercase hits, dedup, order preserved
  - contains_adversarial_phrases() boolean shortcut
  - scan_correction() scans both draft and final (attacker may land payload
    on either side)
- _core.brain_correct attaches `adversarial_hits` list to event data/tags;
  escalates `requires_review` on any hit (which in turn forces
  approval_required=True via the provenance gate)
- tests/test_adversarial_blocklist.py: 22 tests covering classic openers,
  role hijacks, jailbreak jargon, case-insensitivity, whitespace tolerance,
  dedup, benign-text negatives, and brain.correct() integration
Levenshtein conflates morphological rewrites with polarity flips: "helpful"
-> "helpfully" and "helpful" -> "unhelpful" have near-identical surface edit
distance but opposite semantic severity. Preference-learning lit (Rafailov
et al. 2023, DPO) treats before/after pairs as preference signal, so a
semantic delta on the pair is a principled cheap proxy for severity.

Blend rather than replace: 0.3 * lev_normalized + 0.7 * semantic.
Levenshtein still catches Oliver's high-volume surface-style edits (em
dash, tone); semantic catches the corrections where meaning actually
changed. Weights are configurable per-call; the default is justified in
an inline comment.

- new compute_semantic_distance(before, after, embedder=None) -> float|None
  returns clamped cosine distance in [0,1]; lazy-loads
  sentence-transformers/all-MiniLM-L6-v2 (matches existing _embed.py
  LOCAL_MODEL) with graceful fallback to None if the extra is missing
- new combine_distances(lev, semantic, weights) with validation
- compute_diff() gains `use_semantic`, `embedder`, `surface_weight`,
  `semantic_weight` kwargs; default `use_semantic=False` so every existing
  caller is zero-change
- DiffResult gains `semantic_distance` and `blended_distance` (both
  Optional[float], default None) so the dataclass stays backwards-compatible
- when semantic is available, severity is classified from `blended_distance`;
  otherwise from the existing surface (edit_distance / compression) logic,
  so failures in the embedder never break the pipeline

Perf (local, 384-dim MiniLM on CPU):
- surface-only compute_diff: <0.1 ms/call (unchanged)
- warm compute_diff(use_semantic=True): ~20 ms/call
- cold first call (model load): ~30 s, amortized across process lifetime

Callers that run correct() in a hot loop should either pass a cached
embedder or leave use_semantic=False. Opt-in by design.

- tests/test_diff_engine.py: 16 new tests covering identical/orthogonal/
  opposite vectors, morphology-vs-polarity signal, weight validation,
  backwards-compat default, monkeypatched fallback when dep is missing,
  and custom weight propagation. Uses injected fake embedder so tests
  don't depend on sentence-transformers.

@greptile-apps greptile-apps Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gradata has reached the 50-review limit for trial accounts. To continue receiving code reviews, upgrade your plan.

@coderabbitai

coderabbitai Bot commented Apr 15, 2026

Copy link
Copy Markdown
📝 Walkthrough
  • Provenance hash for corrections: Adds SHA-256 provenance hashing and source-kind classification (user_edit, external_paste, unknown) to prevent silent graduation of externally pasted imperative text; marks corrections with requires_review:true and source_kind:<kind> tags.

  • Adversarial phrase blocklist: Implements lightweight ingest-time scanning for canonical prompt-injection openers (e.g., "ignore previous instructions", "jailbreak", "you are now"); flags matching corrections by setting requires_review=True without rejection.

  • Semantic diff engine: Enhances diff severity computation with embedding-based cosine distance, blending surface-level (Levenshtein) and semantic distances (0.3·surface + 0.7·semantic by default) to better capture meaning changes vs. morphological edits; gracefully falls back to surface-only when embedder unavailable.

  • New public APIs in diff_engine.py: compute_semantic_distance(), combine_distances(), and extended compute_diff() signature with optional use_semantic, embedder, surface_weight, and semantic_weight parameters; DiffResult gains semantic_distance and blended_distance optional fields.

  • New public APIs in security modules: correction_hash.py exports compute_correction_hash(), classify_source_context(), build_provenance(); adversarial_blocklist.py exports ADVERSARIAL_PHRASES, scan_for_adversarial_phrases(), contains_adversarial_phrases(), scan_correction().

  • Integration into core correction flow: brain_correct() now computes and attaches provenance metadata, runs adversarial scanning, and escalates approval_required when requires_review=True.

  • Comprehensive test coverage: 2,547 tests pass locally (24 skipped); new test modules validate hash determinism, source classification, adversarial detection, and semantic distance blending; integration tests confirm end-to-end behavior in correction workflows.

Walkthrough

This pull request introduces correction provenance metadata and adversarial phrase detection to the brain_correct function, computes source classification and review flags, and extends the diff engine with optional semantic-distance computation via embeddings for improved severity assessment.

Changes

Cohort / File(s) Summary
Provenance & Adversarial Security
src/gradata/security/adversarial_blocklist.py, src/gradata/security/correction_hash.py
New modules for lightweight adversarial-phrase detection (regex-based, whitespace-tolerant) and correction-hash provenance (SHA-256 hashing, source classification, review gating). Phrase detection scans both before/after text with deduplication; source classification maps external paste and unknown sources to review-required status.
Security Module Exports
src/gradata/security/__init__.py
Re-exports adversarial-phrase and correction-hash utilities (ADVERSARIAL_PHRASES, scan_correction, build_provenance, classify_source_context, etc.) into the security package namespace.
Core Correction Logic
src/gradata/_core.py
Integrates provenance computation and adversarial scanning into brain_correct: computes provenance_hash, source_kind, and requires_review flag; scans for adversarial phrases and forces review if hits detected; escalates approval_required when review is needed; emits provenance fields and tags in correction data and event payload.
Semantic Distance Enhancements
src/gradata/enhancements/diff_engine.py
Extends DiffResult with optional semantic_distance and blended_distance fields; adds compute_semantic_distance (embedder-based cosine distance), combine_distances (weighted blending), and optional semantic parameters to compute_diff with graceful fallback when embedder unavailable.
Adversarial Blocklist Tests
tests/test_adversarial_blocklist.py
Comprehensive test coverage for phrase detection (case-insensitivity, whitespace tolerance, deduplication), scan_correction behavior, blocklist size constraints, and integration with brain-correction workflow (verifies requires_review escalation and adversarial_phrase:true tags).
Correction Hash Tests
tests/test_correction_hash.py
Validates hash determinism, context classification (source mapping, review gating), collision prevention, and integration with brain-correction pipeline; confirms backward compatibility and enforces review for unknown/external sources.
Diff Engine Tests
tests/test_diff_engine.py
Adds tests for semantic-distance computation (via fake embedder), weight-based distance blending, backward compatibility (semantic fields None by default), and graceful fallback when embedder unavailable.

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant brain_correct
    participant ProvBuilder as build_provenance
    participant Adversarial as scan_correction
    participant Graduation
    
    Caller->>brain_correct: call with draft, final, context
    activate brain_correct
    
    brain_correct->>ProvBuilder: compute_correction_hash, classify_source_context
    activate ProvBuilder
    ProvBuilder-->>brain_correct: provenance_hash, source_kind, requires_review
    deactivate ProvBuilder
    
    brain_correct->>Adversarial: scan_correction(before, after)
    activate Adversarial
    Adversarial-->>brain_correct: adversarial_hits[]
    deactivate Adversarial
    
    alt adversarial_hits present
        brain_correct->>brain_correct: escalate requires_review=True
    end
    
    alt requires_review=True
        brain_correct->>brain_correct: escalate approval_required
        brain_correct->>brain_correct: emit tags: requires_review:true, source_kind:..., adversarial_phrase:true
    end
    
    brain_correct-->>Caller: emit correction event with provenance, adversarial_hits
    deactivate brain_correct
    
    Caller->>Graduation: graduation phase
    activate Graduation
    Graduation->>Graduation: check approval_required (untrusted if requires_review)
    deactivate Graduation
Loading
sequenceDiagram
    participant Caller
    participant compute_diff
    participant Surface as Surface Metric
    participant Embedder
    participant Blend as combine_distances
    
    Caller->>compute_diff: compute_diff(draft, final, use_semantic, embedder)
    activate compute_diff
    
    compute_diff->>Surface: compute surface distance (edit/compression)
    activate Surface
    Surface-->>compute_diff: surface_distance
    deactivate Surface
    
    alt use_semantic=True or embedder provided
        compute_diff->>Embedder: compute_semantic_distance(draft, final)
        activate Embedder
        alt embedder available
            Embedder-->>compute_diff: semantic_distance (cosine)
        else embedder unavailable
            Embedder-->>compute_diff: None (fallback)
        end
        deactivate Embedder
        
        alt semantic_distance computed
            compute_diff->>Blend: combine_distances(surface, semantic, weights)
            activate Blend
            Blend-->>compute_diff: blended_distance
            deactivate Blend
            compute_diff->>compute_diff: severity from blended_distance
        else semantic unavailable
            compute_diff->>compute_diff: severity from surface only
        end
    else semantic disabled
        compute_diff->>compute_diff: severity from surface only, semantic fields=None
    end
    
    compute_diff-->>Caller: DiffResult(severity, semantic_distance?, blended_distance?)
    deactivate compute_diff
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Possibly related PRs

Suggested labels

feature, security

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 32.56% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the three main changes: provenance hashing, adversarial blocklist, and semantic diff enhancements for correction-layer hardening.
Description check ✅ Passed The description provides detailed context for all changes, including security rationale, implementation details, test results, and merge conflict resolution notes directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/correction-hardening-v2

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/test_diff_engine.py`:
- Around line 131-155: Replace direct absolute-difference float checks in
TestCombineDistances with pytest.approx to follow test guidelines: update
assertions in test_semantic_dominates_default, test_weights_configurable,
test_default_weights_sum_to_one and any other comparisons using abs(... ) < 1e-6
to use pytest.approx (e.g., blended == pytest.approx(0.7),
DEFAULT_SURFACE_WEIGHT + DEFAULT_SEMANTIC_WEIGHT == pytest.approx(1.0)); keep
the ValueError check and equality checks that are exact (like
combine_distances(0.0,0.0) == 0.0) as-is but apply pytest.approx for all
floating comparisons referencing combine_distances, DEFAULT_SURFACE_WEIGHT,
DEFAULT_SEMANTIC_WEIGHT, and the blended variables.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 73c73f33-da56-4c0d-a454-4a9b0ea043a8

📥 Commits

Reviewing files that changed from the base of the PR and between f892871 and 4534abc.

📒 Files selected for processing (8)
  • src/gradata/_core.py
  • src/gradata/enhancements/diff_engine.py
  • src/gradata/security/__init__.py
  • src/gradata/security/adversarial_blocklist.py
  • src/gradata/security/correction_hash.py
  • tests/test_adversarial_blocklist.py
  • tests/test_correction_hash.py
  • tests/test_diff_engine.py
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Python 3.12
  • GitHub Check: Cloudflare Pages
🧰 Additional context used
📓 Path-based instructions (2)
src/gradata/**/*.py

⚙️ CodeRabbit configuration file

src/gradata/**/*.py: This is the core SDK. Check for: type safety (from future import annotations required), no print()
statements (use logging), all functions accepting BrainContext where DB access occurs, no hardcoded paths. Severity
scoring must clamp to [0,1]. Confidence values must be in [0.0, 1.0].

Files:

  • src/gradata/security/__init__.py
  • src/gradata/_core.py
  • src/gradata/security/adversarial_blocklist.py
  • src/gradata/security/correction_hash.py
  • src/gradata/enhancements/diff_engine.py
tests/**

⚙️ CodeRabbit configuration file

tests/**: Test files. Verify: no hardcoded paths, assertions check specific values not just truthiness,
parametrized tests preferred for boundary conditions, floating point comparisons use pytest.approx.

Files:

  • tests/test_adversarial_blocklist.py
  • tests/test_diff_engine.py
  • tests/test_correction_hash.py
🔇 Additional comments (29)
src/gradata/security/__init__.py (1)

5-58: LGTM!

The re-exports are well-organized with alphabetically sorted __all__ entries. The new security utilities (adversarial_blocklist and correction_hash) are properly exposed through the package's public API.

src/gradata/_core.py (3)

176-193: LGTM! Good fail-safe design.

The defensive try/except with fallback to requires_review=True and source_kind="unknown" ensures that provenance computation failures don't silently allow untrusted corrections to graduate. This aligns with the security-first approach described in the threat model.


195-213: LGTM! Proper layered defense integration.

The adversarial scan correctly:

  1. Initializes adversarial_hits before the try block (avoiding NameError on exception)
  2. Only sets requires_review=True when hits are found and review wasn't already required
  3. Escalates approval_required when requires_review is true, ensuring the existing approval gate handles both provenance and adversarial concerns

215-256: LGTM!

The new provenance and adversarial metadata is consistently propagated through the data payload, tags, and event object. The conditional tag additions (lines 240-245) correctly avoid adding empty or false-value tags.

src/gradata/security/adversarial_blocklist.py (4)

46-96: LGTM!

The phrase list and pattern compilation are well-designed:

  • re.escape() prevents regex injection vulnerabilities
  • \s+ allows flexible whitespace matching for evasion resistance
  • Module-level compilation ensures the pattern is built once at import

99-117: LGTM!

The scan function handles edge cases well:

  • Gracefully returns [] for empty/None input
  • Preserves first-occurrence order while deduplicating
  • Normalizes whitespace for consistent canonical forms

120-124: LGTM!

Efficient boolean shortcut that avoids the overhead of collecting all matches when only presence detection is needed.


127-142: LGTM!

Good design decision to scan both the before and after text, as noted in the docstring — an attacker could paste injected content as the draft and lightly edit to produce the final.

tests/test_adversarial_blocklist.py (5)

23-78: LGTM!

Comprehensive test coverage including:

  • Parametrized tests for case/whitespace variants
  • Detection of different phrase categories
  • Deduplication behavior
  • Order preservation
  • Edge cases (empty, None, benign text)

81-90: LGTM!

Appropriate coverage for the boolean shortcut function with explicit is True/is False assertions.


93-114: LGTM!

Good coverage of the scan_correction function including both-sides scanning and cross-side deduplication.


117-123: LGTM!

Good sanity checks ensuring the phrase list stays auditable and maintains the lowercase canonical invariant.


148-161: No action needed. The test is complete and contains the required assertion on line 161.

The test test_benign_correction_not_flagged_on_blocklist already includes assert event["data"]["requires_review"] is False as shown in your own code snippet. The test properly verifies both that adversarial hits are empty and that the requires_review flag remains false. It follows the coding guidelines by using the tmp_path fixture and checking specific values rather than just truthiness.

			> Likely an incorrect or invalid review comment.
tests/test_diff_engine.py (3)

18-63: LGTM!

Existing test class with no material changes.


70-128: LGTM!

Good test design with the _fake_embedder helper enabling fast, deterministic tests without the sentence-transformers dependency. The test cases cover the key semantic distance behaviors including edge cases (zero vectors, opposite vectors clamping).


158-221: LGTM!

Excellent test coverage for the semantic diff feature including:

  • Backwards compatibility verification
  • Embedder injection
  • Graceful fallback when dependency unavailable
  • Semantic flip vs morphology distinction
  • Weight propagation
src/gradata/security/correction_hash.py (4)

32-65: LGTM!

Well-designed source vocabulary with comprehensive aliases and fail-safe default to unknown for unrecognized sources.


68-83: LGTM!

Robust canonicalization with deterministic JSON serialization (sort_keys=True) and graceful fallback for non-serializable values.


86-117: LGTM!

Secure content-addressed hashing with proper collision resistance via length-prefixing and null-byte separators. The format {len}:{content}\x00 ensures that different concatenations of the same total characters produce different hashes.


120-181: LGTM!

Well-designed fail-safe classification:

  • Backwards compatible: missing source defaults to user_edit (no review tax for existing callers)
  • Thorough normalization handles case, hyphens, and spaces
  • Priority key lookup (source_kind > source > origin) provides flexibility
  • Unknown sources require review (attackers can't bypass by inventing source names)
src/gradata/enhancements/diff_engine.py (5)

91-92: LGTM!

Optional fields with None defaults ensure backwards compatibility for callers not using semantic features.


255-291: LGTM!

Good lazy-loading pattern with:

  • Global cache to avoid repeated model loads
  • Graceful degradation on missing dependency or load failure
  • Debug-level logging for diagnostics without cluttering output
  • Conversion from numpy arrays to plain Python lists for stdlib compatibility

294-309: LGTM!

Correct cosine distance computation with proper handling of zero-norm vectors. The strict=False parameter in zip() requires Python 3.10+, which aligns with the project's Python 3.11+ requirement noted in the skill export.


312-346: LGTM!

Proper semantic distance computation with:

  • Graceful fallback to None when embedder unavailable
  • Clamping to [0.0, 1.0] as required by coding guidelines
  • Edge case handling for empty strings and embedder failures

349-508: LGTM!

Well-structured implementation with:

  • Weight validation enforcing sum-to-one constraint
  • Proper clamping in combine_distances to [0.0, 1.0]
  • Clean fallback path when semantic embedding fails
  • Consistent severity classification from either blended or surface distance
tests/test_correction_hash.py (4)

20-70: LGTM!

Comprehensive hash function tests covering:

  • Determinism
  • Sensitivity to all input components
  • Dictionary key order independence
  • Length-prefix collision resistance
  • Edge cases (empty, None, unicode)

73-134: LGTM!

Thorough classification tests including:

  • Backwards compatibility (None/empty → user_edit, no review)
  • Alias mapping
  • Fail-safe behavior (unknown → requires review)
  • Case insensitivity
  • Alternate dict keys (source_kind, origin)

137-199: LGTM!

Excellent integration tests verifying the full pipeline behavior:

  • User edits pass through without review gate
  • External pastes are flagged
  • Backwards compatibility for callers without source context
  • Fail-safe against attackers using unrecognized source names

202-227: LGTM!

Good coverage of build_provenance including:

  • Output shape verification
  • Attack bypass prevention (Line 215-222)
  • Hash stability guarantee

Comment thread tests/test_diff_engine.py
Comment on lines +131 to +155
class TestCombineDistances:
def test_default_weights_sum_to_one(self):
assert abs(DEFAULT_SURFACE_WEIGHT + DEFAULT_SEMANTIC_WEIGHT - 1.0) < 1e-6

def test_blend_identical_is_zero(self):
assert combine_distances(0.0, 0.0) == 0.0

def test_blend_both_max_is_one(self):
assert combine_distances(1.0, 1.0) == 1.0

def test_semantic_dominates_default(self):
"""Default 0.7 weight on semantic → semantic=1, surface=0 → 0.7 blend."""
blended = combine_distances(0.0, 1.0)
assert abs(blended - 0.7) < 1e-6

def test_weights_configurable(self):
blended = combine_distances(
1.0, 0.0,
surface_weight=0.5, semantic_weight=0.5,
)
assert abs(blended - 0.5) < 1e-6

def test_weights_must_sum_to_one(self):
with pytest.raises(ValueError):
combine_distances(0.5, 0.5, surface_weight=0.3, semantic_weight=0.3)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

LGTM!

Good coverage of combine_distances including the weight validation.

Minor suggestion: Consider using pytest.approx for floating-point comparisons for consistency with test guidelines (e.g., assert blended == pytest.approx(0.7)), though the current abs() < 1e-6 pattern is functionally correct.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_diff_engine.py` around lines 131 - 155, Replace direct
absolute-difference float checks in TestCombineDistances with pytest.approx to
follow test guidelines: update assertions in test_semantic_dominates_default,
test_weights_configurable, test_default_weights_sum_to_one and any other
comparisons using abs(... ) < 1e-6 to use pytest.approx (e.g., blended ==
pytest.approx(0.7), DEFAULT_SURFACE_WEIGHT + DEFAULT_SEMANTIC_WEIGHT ==
pytest.approx(1.0)); keep the ValueError check and equality checks that are
exact (like combine_distances(0.0,0.0) == 0.0) as-is but apply pytest.approx for
all floating comparisons referencing combine_distances, DEFAULT_SURFACE_WEIGHT,
DEFAULT_SEMANTIC_WEIGHT, and the blended variables.

@Gradata

Gradata commented Apr 15, 2026

Copy link
Copy Markdown
Owner Author

CI note — SDK Test (Python 3.11) isolated failure

`Test (Python 3.11)` fails on `tests/test_rule_to_hook.py::TestRuleToHookEvents::test_emits_installed_event_on_success` with "self-test did not block positive example: 'hello — world'".

Not caused by these commits:

  • Passes on 3.12 and 3.13 in the same workflow
  • Passes locally on 3.12 (Windows)
  • Passes locally in isolation and in combined file ordering
  • Neither `rule_to_hook.py` nor `test_rule_to_hook.py` are touched by any of the 3 commits
  • `classify_rule` / `try_generate` don't import `diff_engine` or `security/`
  • Prior SDK Test runs on main for 3.11 are green — but this is the first PR since several others merged overnight, so the runner environment may have shifted (new dep versions, etc.)

Reading as an isolated 3.11-specific environmental issue (likely em-dash encoding under Linux-Py3.11 after a transitive dep update). All other checks pass. Merging with admin override and tracking as follow-up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant