Skip to content

feat: Add diff module that provides compute_diff, apply_diff_*, DiffSet#111

Merged
snawaz merged 5 commits into
mainfrom
snawaz/diff-module
Oct 31, 2025
Merged

feat: Add diff module that provides compute_diff, apply_diff_*, DiffSet#111
snawaz merged 5 commits into
mainfrom
snawaz/diff-module

Conversation

@snawaz
Copy link
Copy Markdown
Collaborator

@snawaz snawaz commented Oct 28, 2025

It is the first PR in the current stack.

This PR implements diff module, used by the next PR #110.

More specifically, this PR implements following algorithms and types:

  • compute_diff() identifies and serializes only the changed portions of account data.
  • apply_diff() applies the diff to the account data, effectively updating it.
  • DiffSet provides a zero-copy deserialized view of the diff for efficient in-memory processing. I used it to implement apply_diff succinctly.

Diff Format:

  • The diff module outlines the diff-format with examples.
  • The diff format is optimized to handle multiple non-contiguous changes within an account by storing offset pairs that indicate where each change exists within the diff and where it should be applied within the account, followed by the concatenated diff bytes.

Summary by CodeRabbit

  • New Features

    • Added a binary multi-slice diff system to compute, serialize, inspect, and apply diffs; supports in-place and copy-based application and handles size expansions/shrinks.
    • Public APIs to create and traverse diff payloads and detect size changes.
  • Improvements

    • Stricter header parsing, alignment and bounds validation with clearer error reporting for invalid or misaligned diffs.
    • Exposed diff APIs at the crate root for easier access.
  • Tests

    • Added tests for no-change, example encodings, and large random-data round-trips.

Copy link
Copy Markdown
Collaborator Author

snawaz commented Oct 28, 2025

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Oct 28, 2025

Walkthrough

Adds a binary diff subsystem: new diff::algorithm and diff::types modules, a DiffSet parser with alignment and bounds validation, public compute/apply diff APIs (in-place and copy), crate re-exports, two new DlpError variants, added dependencies, and unit tests.

Changes

Cohort / File(s) Summary
Module wiring & crate exports
src/diff/mod.rs, src/lib.rs
Declares mod algorithm; and mod types;, re-exports algorithm::* and types::* from src/diff/mod.rs, and re-exports diff::* at crate root in src/lib.rs.
Diff algorithm implementation
src/diff/algorithm.rs
New module implementing pub fn compute_diff(original: &[u8], changed: &[u8]) -> AlignedVec, pub fn detect_size_change(original: &[u8], diff: &DiffSet<'_>) -> Option<SizeChanged>, pub fn apply_diff_in_place(original: &mut [u8], diffset: &DiffSet<'_>) -> Result<(), ProgramError>, and pub fn apply_diff_copy(original: &[u8], diffset: &DiffSet<'_>) -> Vec<u8>. Scans for contiguous differing regions, serializes header (changed length, slice count, per-slice offset pairs) followed by concatenated diff bytes, handles expansion/shrink semantics, includes apply_diff_impl and unit tests.
Diff types & binary layout
src/diff/types.rs
New module defining pub enum SizeChanged { Expanded(usize), Shrunk(usize) }, #[repr(C)] pub struct OffsetPair, pub type OffsetInData = Range<usize>, header-size constants, and pub struct DiffSet<'a> with pub fn try_new(diff: &'a [u8]) -> Result<Self, ProgramError> plus accessors (raw_diff, changed_len, segments_count, offset_pairs, diff_segment_at, iter). Implements alignment and bounds validation and unsafe parsing of header and segments.
Errors
src/error.rs
Adds DlpError::InvalidDiff = 15 ("Invalid diff passed to DiffSet::try_new") and DlpError::InvalidDiffAlignment = 16 ("Diff is not properly aligned").
Cargo manifest
Cargo.toml
Adds dependencies rkyv = "0.7.45" and static_assertions = "1.1.0".

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Caller
    participant Algorithm
    Caller->>Algorithm: compute_diff(original, changed)
    Algorithm->>Algorithm: scan for contiguous differing regions
    Algorithm->>Algorithm: build header (changed_len, num_slices) + offset pairs
    Algorithm->>Algorithm: concatenate diff bytes (include trailing expansion if present)
    Algorithm-->>Caller: AlignedVec (serialized diff bytes)
Loading
sequenceDiagram
    autonumber
    participant Caller
    participant Algorithm
    participant DiffSet

    Caller->>Algorithm: detect_size_change(original, diff_bytes)
    Algorithm->>DiffSet: DiffSet::try_new(diff_bytes)
    alt valid diff
        DiffSet-->>Algorithm: parsed header & offset pairs
        Algorithm-->>Caller: Option<SizeChanged> (None/Expanded/Shrunk)
    else invalid diff
        DiffSet-->>Algorithm: Err(ProgramError)
        Algorithm-->>Caller: Err propagated
    end

    Caller->>Algorithm: apply_diff_in_place(buf, diff_bytes)
    Algorithm->>DiffSet: DiffSet::try_new(diff_bytes)
    alt sizes match
        Algorithm->>Algorithm: apply_diff_impl -> copy each slice into buf
        Algorithm-->>Caller: Ok(())
    else size mismatch
        Algorithm-->>Caller: Err(ProgramError)
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Review focus:
    • src/diff/types.rs: unsafe pointer casts, alignment checks, header/bounds math in DiffSet::try_new and diff_segment_at.
    • src/diff/algorithm.rs: verify serialization layout (endianness, header ordering) matches DiffSet parsing; contiguous-slice detection and expansion/shrink semantics.
    • src/diff/algorithm.rs tests: round-trip, edge cases (no-change, full replace, multiple small slices), and reproducible random tests.
    • src/error.rs: ensure new error variants map correctly to ProgramError.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Description Check ❓ Inconclusive The PR description provides substantive and specific information about the implementation, explaining the purpose, key functions, and diff format design. However, it significantly deviates from the repository's template structure by omitting the status/type/issue table, explicit Problem/Solution sections, and most critically, the Deploy Notes section. The description fails to document the new dependencies added to Cargo.toml (rkyv 0.7.45 and static_assertions 1.1.0), which should be recorded in Deploy Notes for deployment awareness. While the content is clear and helpful, the structural deviation and incomplete deploy notes make it unclear whether the PR meets the repository's documentation expectations.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The PR title "feat: Add diff module that provides compute_diff, apply_diff_*, DiffSet" directly and clearly summarizes the main change: the introduction of a new diff module with key functions and types. It follows conventional commit format, is concise without unnecessary noise, and specifically identifies the primary additions that make up the changeset. The title enables a reviewer scanning history to immediately understand that a new diff module with these core APIs is being introduced.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch snawaz/diff-module

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5304cc4 and 4af1227.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (6)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/mod.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
  • src/lib.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-29T09:28:50.282Z
Learnt from: snawaz
Repo: magicblock-labs/delegation-program PR: 111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.282Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • Cargo.toml
  • src/diff/algorithm.rs
📚 Learning: 2025-10-11T18:00:22.232Z
Learnt from: snawaz
Repo: magicblock-labs/delegation-program PR: 94
File: src/lib.rs:93-103
Timestamp: 2025-10-11T18:00:22.232Z
Learning: When reviewing Rust code, the `TryFromPrimitive` derive macro from the `num_enum` crate automatically generates `TryFrom<u8>` (and other primitive type conversions) for enums marked with `#[repr(u8)]`. Don't flag missing `TryFrom<u8>` implementations when this derive macro is present.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-29T09:47:34.076Z
Learnt from: Dodecahedr0x
Repo: magicblock-labs/delegation-program PR: 90
File: src/instruction_builder/set_fees_receiver.rs:25-29
Timestamp: 2025-10-29T09:47:34.076Z
Learning: In instruction builder functions (files in src/instruction_builder/), using `.unwrap()` on serialization operations like `to_vec()` is standard practice and does not need to be replaced with `.expect()`.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-28T18:27:42.542Z
Learnt from: snawaz
Repo: magicblock-labs/delegation-program PR: 111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.542Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • try_new (45-104)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (14)
src/diff/mod.rs (1)

1-5: LGTM!

The module structure is clean and appropriately organizes the diff subsystem into algorithm and types modules with proper re-exports.

src/error.rs (1)

38-41: LGTM!

The new error variants InvalidDiff and InvalidDiffAlignment are well-defined with clear messages and integrate cleanly into the existing error handling framework.

src/diff/algorithm.rs (6)

10-183: LGTM!

The compute_diff function is well-documented with comprehensive format specification and examples. The implementation correctly identifies contiguous diff regions, handles size changes, and serializes according to the documented format.


185-195: LGTM!

The detect_size_change function uses idiomatic Rust with match and cmp for clear size comparison logic.


197-206: LGTM!

The apply_diff_in_place function appropriately rejects size changes and delegates to the implementation. The precondition check ensures safe in-place modification.


208-230: LGTM!

The apply_diff_copy function correctly handles all size-change scenarios (expansion, shrinking, no change) with appropriate vector allocation.


232-239: LGTM!

The apply_diff_impl function cleanly iterates over diff segments and applies them, with proper error propagation from DiffSet::iter().


241-404: LGTM!

The test suite provides excellent coverage with no-change, small example, and large random data scenarios. The use of a printed seed for the random test ensures reproducibility.

src/diff/types.rs (5)

1-33: LGTM!

The type definitions are well-structured with #[repr(C)] ensuring stable layout for OffsetPair, and compile-time assertions validating alignment and size assumptions.


35-42: LGTM!

The DiffSet structure appropriately uses a raw pointer for zero-copy parsing with proper lifetime management through the 'a parameter.


44-104: Comprehensive validation in try_new.

The constructor performs thorough validation including minimum length, alignment, and header consistency checks, making it safe to use the raw pointer accessors in subsequent methods.


106-126: LGTM!

The accessor methods correctly leverage the invariants established by try_new, safely exposing the parsed diff components.


128-180: LGTM!

The diff_segment_at method includes comprehensive validation of segment bounds and offsets, preventing out-of-bounds access. The iter method provides a clean interface for processing all segments with proper error propagation.

Cargo.toml (1)

50-50: No action needed—static_assertions is already at the latest version.

The latest stable version of static_assertions is 1.1.0, which matches the version specified in Cargo.toml. The dependency is current.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@snawaz snawaz force-pushed the snawaz/diff-module branch 4 times, most recently from 739d076 to 9d76d38 Compare October 28, 2025 10:18
@snawaz snawaz marked this pull request as ready for review October 28, 2025 14:53
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6689bdf and 9d76d38.

📒 Files selected for processing (4)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/mod.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/lib.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • new (31-37)
🪛 GitHub Actions: Run Tests
src/diff/types.rs

[error] 3-3: importing legacy numeric constants. Remove 'use std::usize;'

src/diff/algorithm.rs

[error] 37-37: doc list item without indentation. Clippy suggests indenting the continuation line.


[error] 51-52: doc list item without indentation on lines 51-52. Clippy suggests indenting the continuation lines.


[error] 119-126: if chain can be rewritten with match. Consider using cmp/match instead of a long if/else chain.


[error] 160-166: if chain can be rewritten with match. Consider using cmp/match instead of a long if/else chain.

🔇 Additional comments (1)
src/diff/mod.rs (1)

1-5: Module wiring looks good.

Private submodules with public re-exports keep the surface clean. No issues.

Comment thread src/diff/algorithm.rs Outdated
Comment thread src/diff/algorithm.rs Outdated
Comment thread src/diff/algorithm.rs
Comment thread src/diff/algorithm.rs Outdated
Comment thread src/diff/algorithm.rs
Comment thread src/diff/types.rs
Comment thread src/diff/types.rs
Comment thread src/diff/types.rs Outdated
Comment thread src/diff/types.rs Outdated
Comment thread src/lib.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch from 9d76d38 to a642c6f Compare October 28, 2025 16:05
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (11)
src/lib.rs (1)

19-23: Confirm crate‑root re‑export scope.

pub use diff::*; flattens the diff API at the crate root. If long‑term ergonomics trump namespacing, fine; else prefer pub mod diff; or selective re‑exports.

src/diff/types.rs (5)

1-3: no_std import hygiene: use core, not std.

Switch PhantomData import to core for no_std compatibility.

-use core::slice;
-use std::marker::PhantomData;
+use core::slice;
+use core::marker::PhantomData;

10-15: Document endianness and u32 limits on offsets.

Add a note that fields are little‑endian u32 and must fit in u32; document at producers.

 #[repr(C)]
 #[derive(Debug, Clone, Copy)]
 pub struct OffsetPair {
     pub offset_in_diff: u32,
     pub offset_in_data: u32,
 }
+/// Offsets are serialized as little‑endian u32.
+/// Producers must ensure all offsets/lengths fit in `u32` (<= 4_294_967_295).

49-55: *Creating a &[OffsetPair] from const u8 is UB (alignment). Replace with iterator.

Do not form a typed slice from a byte pointer; yield pairs via unaligned reads.

-    /// Returns the offset pairs
-    pub fn offset_pairs(&self) -> &'a [OffsetPair] {
-        unsafe {
-            let pairs_ptr = self.buf.add(SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS);
-            slice::from_raw_parts(pairs_ptr as *const OffsetPair, self.num_offset_pairs())
-        }
-    }
+    /// Iterates offset pairs without assuming alignment.
+    pub fn iter_offset_pairs(&'a self) -> impl Iterator<Item = OffsetPair> + 'a {
+        let base = SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS;
+        let count = self.num_offset_pairs();
+        (0..count).map(move |i| {
+            let p = unsafe { self.buf.add(base + i * SIZE_OF_SINGLE_OFFSET_PAIR) };
+            let in_diff = unsafe { (p as *const u32).read_unaligned() };
+            let in_data = unsafe { (p.add(4) as *const u32).read_unaligned() };
+            OffsetPair { offset_in_diff: in_diff, offset_in_data: in_data }
+        })
+    }

38-47: Unaligned header reads without length guards (UB risk).

Reading u32 from a u8 buffer assumes alignment and sufficient length. Use unaligned reads and guard sizes.

     /// Returns the length of the changed data (not diff) that is passed
     /// as second argument to commit_diff()
     pub fn changed_len(&self) -> usize {
-        unsafe { *(self.buf as *const u32) as usize }
+        if self.len < SIZE_OF_CHANGED_LEN {
+            return 0;
+        }
+        unsafe { (self.buf as *const u32).read_unaligned() as usize }
     }
@@
     /// Returns the number of offset pairs
     pub fn num_offset_pairs(&self) -> usize {
-        unsafe { *(self.buf.add(SIZE_OF_CHANGED_LEN) as *const u32) as usize }
+        if self.len < SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS {
+            return 0;
+        }
+        unsafe {
+            (self.buf.add(SIZE_OF_CHANGED_LEN) as *const u32).read_unaligned() as usize
+        }
     }

61-85: diff_slice_at relies on the UB slice and lacks bounds checks.

Compute with unaligned reads and enforce all bounds before slicing.

-    pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
-        let num_slices = self.num_offset_pairs();
-        if index >= num_slices {
-            return None;
-        }
-        let offsets = self.offset_pairs();
-        let current_offset = offsets[index];
-        let slice_len = {
-            if index + 1 < num_slices {
-                offsets[index + 1].offset_in_diff - current_offset.offset_in_diff
-            } else {
-                self.concatenated_diff_slice_len() as u32 - current_offset.offset_in_diff
-            }
-        };
-        Some((
-            unsafe {
-                slice::from_raw_parts(
-                    self.concatenated_diff_slice_begin()
-                        .add(current_offset.offset_in_diff as usize),
-                    slice_len as usize,
-                )
-            },
-            OffsetInData(current_offset.offset_in_data as usize),
-        ))
-    }
+    pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
+        let n = self.num_offset_pairs();
+        if index >= n { return None; }
+        let base = SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS;
+        // current pair
+        let p = unsafe { self.buf.add(base + index * SIZE_OF_SINGLE_OFFSET_PAIR) };
+        let cur_in_diff = unsafe { (p as *const u32).read_unaligned() } as usize;
+        let cur_in_data = unsafe { (p.add(4) as *const u32).read_unaligned() } as usize;
+        // next pair or concatenated end
+        let next_in_diff = if index + 1 < n {
+            let q = unsafe { self.buf.add(base + (index + 1) * SIZE_OF_SINGLE_OFFSET_PAIR) };
+            unsafe { (q as *const u32).read_unaligned() } as usize
+        } else {
+            self.concatenated_diff_slice_len()
+        };
+        if next_in_diff < cur_in_diff { return None; }
+        let len = next_in_diff - cur_in_diff;
+        // bounds for concatenated diff region
+        let concat_len = self.concatenated_diff_slice_len();
+        if cur_in_diff > concat_len || cur_in_diff + len > concat_len { return None; }
+        let start = unsafe { self.concatenated_diff_slice_begin().add(cur_in_diff) };
+        Some((unsafe { slice::from_raw_parts(start, len) }, OffsetInData(cur_in_data)))
+    }
src/diff/algorithm.rs (5)

11-23: Docs: fix “comparision” → “comparison”.

Minor rustdoc cleanup.

-///     - bytes comparision
+///     - bytes comparison
@@
-///     - bytes comparision
+///     - bytes comparison
@@
-///     - bytes comparision
+///     - bytes comparison

221-223: Unify import style in tests.

Since you re‑export at crate root, import both from crate::… for consistency.

-    use crate::apply_diff_copy;
-    use crate::diff::apply_diff_in_place;
+    use crate::{apply_diff_copy, apply_diff_in_place};

231-233: Make endianness explicit in test expectation.

Construct expected bytes with to_le_bytes() to avoid host‑endianness assumptions.

-        assert_eq!(diff, vec![100, 0, 0, 0, 0, 0, 0, 0]);
+        let mut expected = Vec::new();
+        expected.extend_from_slice(&(100u32).to_le_bytes());
+        expected.extend_from_slice(&(0u32).to_le_bytes());
+        assert_eq!(diff, expected);

203-210: Bounds check before copying to avoid panics on malformed diffs.

Validate offset..offset+len is within original (consider returning an error instead of early return).

 fn apply_diff_impl(original: &mut [u8], diff: &[u8]) {
     let diffset = DiffSet::new(diff);
     let mut index = 0;
     while let Some((diff_slice, OffsetInData(offset))) = diffset.diff_slice_at(index) {
         index += 1;
-        original[offset..offset + diff_slice.len()].copy_from_slice(diff_slice);
+        let end = offset.saturating_add(diff_slice.len());
+        debug_assert!(offset <= original.len() && end <= original.len());
+        if end > original.len() { return; }
+        original[offset..end].copy_from_slice(diff_slice);
     }
 }

If you’re open to an API change, return Result<(), ApplyError> and bubble this up from apply_diff_in_place/apply_diff_copy.


136-149: Guard u32 casts and arithmetic in serializer.

Prevent silent truncation on large inputs and overflow of offset_in_diff.

-    // size of changed data (4 bytes)
-    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
+    // size of changed data (4 bytes)
+    assert!(changed.len() <= u32::MAX as usize, "diff format limits lengths to u32");
+    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
@@
-    let num_slices = diffs.len() as u32; // first 4 bytes => number of offset-pairs/slices
+    assert!(diffs.len() <= u32::MAX as usize, "too many slices for u32 header");
+    let num_slices = diffs.len() as u32;
@@
-    for (offset_in_account, slice) in &diffs {
+    for (offset_in_account, slice) in &diffs {
+        debug_assert!(*offset_in_account <= u32::MAX as usize);
+        debug_assert!(slice.len() <= u32::MAX as usize);
         output.extend_from_slice(&offset_in_diff.to_le_bytes());
         output.extend_from_slice(&(*offset_in_account as u32).to_le_bytes());
-        offset_in_diff += slice.len() as u32;
+        let add = slice.len() as u32;
+        debug_assert!(offset_in_diff <= u32::MAX - add);
+        offset_in_diff += add;
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d76d38 and a642c6f.

📒 Files selected for processing (4)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/mod.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/lib.rs (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • new (30-36)
🔇 Additional comments (1)
src/diff/mod.rs (1)

1-5: LGTM: module wiring and re‑exports.

mod algorithm; mod types; pub use …::*; is clear and consistent. No issues.

Comment thread src/diff/algorithm.rs
Comment thread src/diff/types.rs
Comment thread src/diff/types.rs
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (6)
src/diff/algorithm.rs (3)

1-1: Prefer core over std to preserve no_std compatibility.

Apply:

-use std::cmp::{min, Ordering};
+use core::cmp::{min, Ordering};

212-218: Optional: add slice bounds assertions before copy.

If you fully trust try_new invariants, ignore. Otherwise, assert offset + len <= original.len() to avoid panics on malformed diffs.

-        original[offset..offset + diff_slice.len()].copy_from_slice(diff_slice);
+        debug_assert!(offset <= original.len());
+        debug_assert!(offset + diff_slice.len() <= original.len());
+        original[offset..offset + diff_slice.len()].copy_from_slice(diff_slice);

146-159: Optional: guard u32 casts for clarity.

Format uses u32 fields; add asserts to fail early on oversized inputs. Domain likely < 4 GiB, so this is belt-and-suspenders.

Apply:

-    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
+    debug_assert!(changed.len() <= u32::MAX as usize);
+    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
@@
-    let num_slices = diffs.len() as u32;
+    debug_assert!(diffs.len() <= u32::MAX as usize);
+    let num_slices = diffs.len() as u32;
@@
-        output.extend_from_slice(&(*offset_in_account as u32).to_le_bytes());
+        debug_assert!(*offset_in_account <= u32::MAX as usize);
+        output.extend_from_slice(&(*offset_in_account as u32).to_le_bytes());
src/diff/types.rs (3)

75-99: Harden diff_slice_at with bounds checks (defensive, avoids panics on malformed input).

Even with validated headers, explicit checks make this self-contained.

Apply:

     pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
         let num_slices = self.num_offset_pairs();
         if index >= num_slices {
             return None;
         }
         let offsets = self.offset_pairs();
         let current_offset = offsets[index];
         let slice_len = {
             if index + 1 < num_slices {
                 offsets[index + 1].offset_in_diff - current_offset.offset_in_diff
             } else {
-                self.concatenated_diff_slice_len() as u32 - current_offset.offset_in_diff
+                self.concatenated_diff_slice_len() as u32 - current_offset.offset_in_diff
             }
         };
+        // Bounds checks against concatenated region
+        let conc_len = self.concatenated_diff_slice_len();
+        let start_in_diff = current_offset.offset_in_diff as usize;
+        let len = slice_len as usize;
+        if start_in_diff > conc_len || start_in_diff + len > conc_len {
+            return None;
+        }
         Some((
             unsafe {
                 slice::from_raw_parts(
                     self.concatenated_diff_slice_begin()
-                        .add(current_offset.offset_in_diff as usize),
-                    slice_len as usize,
+                        .add(start_in_diff),
+                    len,
                 )
             },
             OffsetInData(current_offset.offset_in_data as usize),
         ))
     }

1-5: Import hygiene: use core and bring align_of into scope (fixes compile error).

align_of isn’t in scope and std::marker::PhantomData breaks no_std.

Apply:

-use core::slice;
-use std::marker::PhantomData;
+use core::marker::PhantomData;
+use core::mem::align_of;
+use core::slice;

As per coding guidelines.


33-46: Tie lifetime and validate structure to prevent UB and soundness bugs.

  • Lifetime leak: 'a isn’t tied to diff; you can construct DiffSet<'static> from a short-lived slice.
  • Missing bounds: header+pairs table not checked; offset_pairs() may create an out-of-bounds typed slice.

Apply:

-impl<'a> DiffSet<'a> {
-    pub fn try_new(diff: &[u8]) -> Result<Self, ProgramError> {
+impl<'a> DiffSet<'a> {
+    pub fn try_new(diff: &'a [u8]) -> Result<Self, ProgramError> {
         if diff.len() < (SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS) {
             return Err(DlpError::InvalidDiff.into());
-        } else if diff.as_ptr().align_offset(align_of::<u32>()) != 0 {
+        } else if diff.as_ptr().align_offset(align_of::<u32>()) != 0 {
             return Err(DlpError::InvalidDiffAlignment.into());
         }
-
+        // Structural validation: ensure the offset-pairs table fits.
+        let header = SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS;
+        let num_pairs = {
+            let mut b = [0u8; 4];
+            b.copy_from_slice(&diff[SIZE_OF_CHANGED_LEN..header]);
+            u32::from_le_bytes(b) as usize
+        };
+        let pairs_bytes = num_pairs
+            .checked_mul(SIZE_OF_SINGLE_OFFSET_PAIR)
+            .ok_or(DlpError::InvalidDiff)?;
+        let table_end = header
+            .checked_add(pairs_bytes)
+            .ok_or(DlpError::InvalidDiff)?;
+        if table_end > diff.len() {
+            return Err(DlpError::InvalidDiff.into());
+        }
+        // Optional: basic monotonicity and bounds against concatenated region.
+        let concatenated_len = diff.len() - table_end;
+        let changed_len = {
+            let mut b = [0u8; 4];
+            b.copy_from_slice(&diff[0..SIZE_OF_CHANGED_LEN]);
+            u32::from_le_bytes(b) as usize
+        };
+        let mut prev_off_in_diff = 0usize;
+        for i in 0..num_pairs {
+            let p = header + i * SIZE_OF_SINGLE_OFFSET_PAIR;
+            let mut a = [0u8; 4];
+            a.copy_from_slice(&diff[p..p + 4]);
+            let cur_in_diff = u32::from_le_bytes(a) as usize;
+            if cur_in_diff < prev_off_in_diff || cur_in_diff > concatenated_len {
+                return Err(DlpError::InvalidDiff.into());
+            }
+            let next_in_diff = if i + 1 < num_pairs {
+                let mut n = [0u8; 4];
+                n.copy_from_slice(&diff[p + SIZE_OF_CHANGED_LEN..p + SIZE_OF_CHANGED_LEN + 4]);
+                u32::from_le_bytes(n) as usize
+            } else {
+                concatenated_len
+            };
+            if next_in_diff < cur_in_diff {
+                return Err(DlpError::InvalidDiff.into());
+            }
+            let slice_len = next_in_diff - cur_in_diff;
+            let mut d = [0u8; 4];
+            d.copy_from_slice(&diff[p + 4..p + 8]);
+            let off_in_data = u32::from_le_bytes(d) as usize;
+            if off_in_data
+                .checked_add(slice_len)
+                .map_or(true, |end| end > changed_len)
+            {
+                return Err(DlpError::InvalidDiff.into());
+            }
+            prev_off_in_diff = cur_in_diff;
+        }
         Ok(Self {
             buf: diff.as_ptr(),
             len: diff.len(),
             _marker: PhantomData,
         })
     }

Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a642c6f and b9cae5e.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • try_new (34-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint
🔇 Additional comments (1)
src/error.rs (1)

38-41: LGTM: clear error variants for diff validation.

Error codes/messages look consistent and map cleanly to ProgramError.

Comment thread Cargo.toml Outdated
@snawaz snawaz force-pushed the snawaz/diff-module branch from b9cae5e to 8965cf2 Compare October 29, 2025 07:43
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (3)
Cargo.toml (1)

48-48: Configure rkyv for no_std to support Solana BPF builds.

Solana programs require no_std. The current configuration pulls in std via rkyv's default features, which will break BPF compilation.

Apply this diff:

-rkyv = "0.7.45"
+rkyv = { version = "0.7.45", default-features = false, features = ["alloc"] }
src/diff/algorithm.rs (1)

1-1: Use core::cmp for no_std compatibility.

This import prevents compilation in no_std environments required for Solana BPF programs.

Apply this diff:

-use std::cmp::{min, Ordering};
+use core::cmp::{min, Ordering};
src/diff/types.rs (1)

2-2: Use core::marker::PhantomData for no_std compatibility.

This import prevents compilation in no_std environments required for Solana BPF programs.

Apply this diff:

-use std::marker::PhantomData;
+use core::marker::PhantomData;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9cae5e and 8965cf2.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (2)
  • std (37-37)
  • try_new (34-46)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: install
🔇 Additional comments (2)
src/error.rs (1)

38-41: LGTM!

The new error variants are well-defined with clear messages and properly integrated into the existing error handling system.

src/diff/types.rs (1)

33-46: Good defensive validation added.

The try_new constructor properly validates both minimum length and alignment requirements, addressing previous concerns about undefined behavior. The alignment check is solid defensive programming even with Solana's 8-byte alignment guarantee.

Based on learnings.

Comment thread src/diff/types.rs Outdated
@snawaz snawaz force-pushed the snawaz/diff-module branch 2 times, most recently from 14186fe to fec44c5 Compare October 29, 2025 10:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
Cargo.toml (1)

48-48: Configure rkyv for no_std/BPF compatibility.

This duplicates an earlier review concern: Solana programs compile to BPF as cdylib (line 25) and require no_std. The default rkyv features pull in std, which will break BPF builds.

Apply this diff:

-rkyv = "0.7.45"
+rkyv = { version = "0.7.45", default-features = false, features = ["alloc"] }
src/diff/types.rs (1)

37-37: Use core::mem::align_of for consistency with BPF builds.

This duplicates an earlier review concern: line 37 uses std::mem::align_of while line 1 already imports from core. For Solana BPF compatibility and consistency with the existing core::slice import, use core::mem::align_of.

Apply this diff:

-        } else if diff.as_ptr().align_offset(std::mem::align_of::<u32>()) != 0 {
+        } else if diff.as_ptr().align_offset(core::mem::align_of::<u32>()) != 0 {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8965cf2 and fec44c5.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (2)
  • std (37-37)
  • try_new (34-55)
🪛 GitHub Actions: Run Tests
Cargo.toml

[error] 1-1: Build failed due to Clippy error in src/diff/algorithm.rs. Command 'cargo clippy -- --deny=warnings' exited with code 101.

src/diff/algorithm.rs

[error] 99-110: Clippy error: empty line after doc comment. Documentation comment must be followed by a line without a blank line. Could not compile magicblock-delegation-program (lib) due to 1 previous error.


[error] 99-99: empty line after doc comment

🔇 Additional comments (1)
src/error.rs (1)

38-41: LGTM! Clean error variant additions.

The new error variants are properly numbered, have clear messages, and integrate cleanly with the existing error handling infrastructure.

Comment thread src/diff/algorithm.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch from fec44c5 to f050cd8 Compare October 29, 2025 10:16
@snawaz snawaz requested a review from GabrielePicco October 29, 2025 11:50
@snawaz snawaz force-pushed the snawaz/diff-module branch from f050cd8 to 574d272 Compare October 29, 2025 12:46
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (3)
Cargo.toml (1)

48-48: Consider gating rkyv features (only if you ever build to BPF).

Default rkyv enables std; BPF/no_std builds would break. If you don’t target BPF directly, ignore. Otherwise, switch to alloc-only.

-rkyv = "0.7.45"
+rkyv = { version = "0.7.45", default-features = false, features = ["alloc"] }
src/diff/types.rs (2)

37-39: Use core::mem::align_of for consistency with core::slice.

Matches surrounding imports; avoids mixing std/core in low-level code.

-        } else if diff.as_ptr().align_offset(std::mem::align_of::<u32>()) != 0 {
+        } else if diff.as_ptr().align_offset(core::mem::align_of::<u32>()) != 0 {

72-78: Note safety preconditions on offset_pairs().

The typed slice relies on try_new invariants (alignment, bounds). Document that locally.

     /// Returns the offset pairs
     pub fn offset_pairs(&self) -> &'a [OffsetPair] {
         unsafe {
+            // SAFETY: `try_new` enforces 4-byte alignment and that the table fits in `self.len`,
+            // so forming a typed slice here is sound.
             let pairs_ptr = self.buf.add(SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS);
             slice::from_raw_parts(pairs_ptr as *const OffsetPair, self.num_offset_pairs())
         }
     }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f050cd8 and 574d272.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (2)
  • std (37-37)
  • try_new (34-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test
🔇 Additional comments (1)
src/error.rs (1)

38-41: LGTM: error variants are well-scoped and non-breaking.

Codes 15–16 append cleanly; messages are clear; conversions remain correct.

Comment thread src/diff/algorithm.rs
Comment thread src/diff/algorithm.rs
Comment thread src/diff/types.rs
Comment thread src/diff/types.rs
Comment thread src/diff/types.rs Outdated
@snawaz snawaz force-pushed the snawaz/diff-module branch 2 times, most recently from cae30e4 to 4edad9f Compare October 29, 2025 13:16
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
src/diff/algorithm.rs (1)

310-311: Tests: avoid magic numbers; use constants for serialized length.

Compute expected length using SIZE_* constants for clarity and resilience.

-        assert_eq!(actual_diff.len(), 4 + 4 + 8 + 8 + (4 + 8));
+        let expected_len =
+            SIZE_OF_CHANGED_LEN
+            + SIZE_OF_NUM_OFFSET_PAIRS
+            + 2 * SIZE_OF_SINGLE_OFFSET_PAIR
+            + core::mem::size_of::<u32>()   // first slice
+            + core::mem::size_of::<u64>();  // second slice
+        assert_eq!(actual_diff.len(), expected_len);
src/diff/types.rs (1)

84-108: Prevent UB: validate offsets/lengths in diff_slice_at before forming slices.

Currently, malformed diffs can produce out-of-bounds slices (or u32 underflow), leading to UB. Add monotonicity and bounds checks using the aligned header you already validated in try_new.

 pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
-    let num_slices = self.num_offset_pairs();
-    if index >= num_slices {
-        return None;
-    }
-    let offsets = self.offset_pairs();
-    let current_offset = offsets[index];
-    let slice_len = {
-        if index + 1 < num_slices {
-            offsets[index + 1].offset_in_diff - current_offset.offset_in_diff
-        } else {
-            self.concatenated_diff_slice_len() as u32 - current_offset.offset_in_diff
-        }
-    };
-    Some((
-        unsafe {
-            slice::from_raw_parts(
-                self.concatenated_diff_slice_begin()
-                    .add(current_offset.offset_in_diff as usize),
-                slice_len as usize,
-            )
-        },
-        OffsetInData(current_offset.offset_in_data as usize),
-    ))
+    let n = self.num_offset_pairs();
+    if index >= n {
+        return None;
+    }
+    let offsets = self.offset_pairs();
+    let cur_in_diff = offsets[index].offset_in_diff as usize;
+    let cur_in_data = offsets[index].offset_in_data as usize;
+    let concat_len = self.concatenated_diff_slice_len();
+    if cur_in_diff > concat_len {
+        return None;
+    }
+    let next_in_diff = if index + 1 < n {
+        offsets[index + 1].offset_in_diff as usize
+    } else {
+        concat_len
+    };
+    if next_in_diff < cur_in_diff || next_in_diff > concat_len {
+        return None;
+    }
+    let len = next_in_diff - cur_in_diff;
+    let changed_len = self.changed_len();
+    if cur_in_data > changed_len || cur_in_data + len > changed_len {
+        return None;
+    }
+    let start = unsafe { self.concatenated_diff_slice_begin().add(cur_in_diff) };
+    Some((
+        unsafe { slice::from_raw_parts(start, len) },
+        OffsetInData(cur_in_data),
+    ))
 }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cae30e4 and 4edad9f.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (2)
  • std (37-37)
  • try_new (34-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint
🔇 Additional comments (2)
Cargo.toml (1)

48-48: rkyv dependency looks appropriate for AlignedVec.

Addition matches the new diff module’s alignment needs. No further changes requested. Based on learnings.

src/error.rs (1)

38-41: New error variants are well-scoped and numbered correctly.

Good mapping for diff validation failures; conversions remain intact.

Comment thread src/diff/algorithm.rs Outdated
@snawaz snawaz force-pushed the snawaz/diff-module branch from 4edad9f to d547adb Compare October 29, 2025 13:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (3)
src/diff/algorithm.rs (1)

45-52: Doc grammar nit.

“the headers below captures all these necessary information” → “the headers below capture all the necessary information.”

-/// position of change in the original data. So the headers below captures all these necessary information:
+/// position of change in the original data. The headers below capture all the necessary information:
src/diff/types.rs (2)

33-55: Strengthen try_new with structural validation (prevents malformed diffs from reaching unsafe code).

Minimal, single-pass checks: monotonic OffsetInDiff, in-bounds within concatenated area, and each (offset_in_data, len) fits within changed_len. This lets apply_diff_* rely on invariants.

Apply:

@@
     pub fn try_new(diff: &'a [u8]) -> Result<Self, ProgramError> {
         if diff.len() < (SIZE_OF_CHANGED_LEN + SIZE_OF_NUM_OFFSET_PAIRS) {
             return Err(DlpError::InvalidDiff.into());
-        } else if diff.as_ptr().align_offset(std::mem::align_of::<u32>()) != 0 {
+        } else if diff.as_ptr().align_offset(core::mem::align_of::<u32>()) != 0 {
             return Err(DlpError::InvalidDiffAlignment.into());
         }
@@
         let header_size = SIZE_OF_CHANGED_LEN
             + SIZE_OF_NUM_OFFSET_PAIRS
             + this.num_offset_pairs() * SIZE_OF_SINGLE_OFFSET_PAIR;
         if header_size > diff.len() {
             return Err(DlpError::InvalidDiff.into());
         }
+
+        // Structural validation of offset pairs against concatenated region and changed_len.
+        let n = this.num_offset_pairs();
+        let concat_len = this.concatenated_diff_slice_len();
+        let changed_len = this.changed_len();
+        let pairs = this.offset_pairs(); // safe due to alignment and header_size check
+        for i in 0..n {
+            let start = pairs[i].offset_in_diff as usize;
+            let end = if i + 1 < n {
+                pairs[i + 1].offset_in_diff as usize
+            } else {
+                concat_len
+            };
+            if end < start || end > concat_len {
+                return Err(DlpError::InvalidDiff.into());
+            }
+            let len = end - start;
+            let data_off = pairs[i].offset_in_data as usize;
+            if data_off > changed_len || data_off + len > changed_len {
+                return Err(DlpError::InvalidDiff.into());
+            }
+        }
 
         Ok(this)
     }

Note: Kept std/core choice aligned only for align_of to avoid unnecessary std dependency here. Based on learnings.


84-108: UB risk in diff_slice_at: arithmetic underflow and out-of-bounds pointer.add on malformed diffs.

Current code trusts offset pairs; if next.offset_in_diff < current, subtraction underflows (u32), producing a huge length, and pointer.add() goes out of bounds → undefined behavior. Also not checking offset_in_data + len against changed_len can cause downstream panics.

Harden diff_slice_at with monotonicity and bounds checks before any pointer arithmetic.

Apply:

@@
-    pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
-        let num_slices = self.num_offset_pairs();
-        if index >= num_slices {
-            return None;
-        }
-        let offsets = self.offset_pairs();
-        let current_offset = offsets[index];
-        let slice_len = {
-            if index + 1 < num_slices {
-                offsets[index + 1].offset_in_diff - current_offset.offset_in_diff
-            } else {
-                self.concatenated_diff_slice_len() as u32 - current_offset.offset_in_diff
-            }
-        };
-        Some((
-            unsafe {
-                slice::from_raw_parts(
-                    self.concatenated_diff_slice_begin()
-                        .add(current_offset.offset_in_diff as usize),
-                    slice_len as usize,
-                )
-            },
-            OffsetInData(current_offset.offset_in_data as usize),
-        ))
-    }
+    pub fn diff_slice_at(&self, index: usize) -> Option<(&'a [u8], OffsetInData)> {
+        let n = self.num_offset_pairs();
+        if index >= n {
+            return None;
+        }
+        let offsets = self.offset_pairs();
+        let concat_len = self.concatenated_diff_slice_len();
+
+        let start_in_diff = offsets[index].offset_in_diff as usize;
+        let end_in_diff = if index + 1 < n {
+            offsets[index + 1].offset_in_diff as usize
+        } else {
+            concat_len
+        };
+        // Monotonicity and bounds within concatenated diff
+        if end_in_diff < start_in_diff || end_in_diff > concat_len {
+            return None;
+        }
+        let len = end_in_diff - start_in_diff;
+
+        let in_data = offsets[index].offset_in_data as usize;
+        let changed_len = self.changed_len();
+        // Target bounds within changed_len
+        if in_data > changed_len || in_data + len > changed_len {
+            return None;
+        }
+
+        let start_ptr = unsafe { self.concatenated_diff_slice_begin().add(start_in_diff) };
+        Some((
+            unsafe { slice::from_raw_parts(start_ptr, len) },
+            OffsetInData(in_data),
+        ))
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4edad9f and d547adb.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (2)
  • std (37-37)
  • try_new (34-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint
🔇 Additional comments (3)
Cargo.toml (1)

48-48: Dependency addition looks fine.

rkyv 0.7.45 provides AlignedVec used in the diff module. No action needed.

src/error.rs (1)

38-41: Good, precise error variants for diff validation.

InvalidDiff and InvalidDiffAlignment are appropriate and scoped to parsing/alignment errors.

src/diff/algorithm.rs (1)

119-183: Optional: guard u32 casts during serialization.

If changed/original could exceed u32::MAX, add debug assertions before the casts to u32 (changed.len(), diffs.len(), offsets). Keeps producers honest with the format constraints without runtime cost in release.
Apply (optional):

@@
-    // size of changed data (4 bytes)
-    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
+    // size of changed data (4 bytes)
+    debug_assert!(changed.len() <= u32::MAX as usize, "changed.len exceeds u32");
+    output.extend_from_slice(&(changed.len() as u32).to_le_bytes());
@@
-    // number of slices / offset-pairs (4 bytes)
-    output.extend_from_slice(&(diffs.len() as u32).to_le_bytes());
+    // number of slices / offset-pairs (4 bytes)
+    debug_assert!(diffs.len() <= u32::MAX as usize, "num slices exceeds u32");
+    output.extend_from_slice(&(diffs.len() as u32).to_le_bytes());
@@
-    for (offset_in_account, slice) in &diffs {
+    for (offset_in_account, slice) in &diffs {
+        debug_assert!(*offset_in_account <= u32::MAX as usize);
+        debug_assert!(slice.len() <= u32::MAX as usize);
         output.extend_from_slice(&offset_in_diff.to_le_bytes());
         output.extend_from_slice(&(*offset_in_account as u32).to_le_bytes());
         offset_in_diff += slice.len() as u32;
     }

Comment thread src/diff/types.rs
Comment thread src/diff/types.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch from d547adb to a8c61d0 Compare October 29, 2025 17:23
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d547adb and a8c61d0.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (4)
  • Cargo.toml (1 hunks)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
  • src/error.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • Cargo.toml
  • src/diff/algorithm.rs
📚 Learning: 2025-10-11T18:00:22.232Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#94
File: src/lib.rs:93-103
Timestamp: 2025-10-11T18:00:22.232Z
Learning: When reviewing Rust code, the `TryFromPrimitive` derive macro from the `num_enum` crate automatically generates `TryFrom<u8>` (and other primitive type conversions) for enums marked with `#[repr(u8)]`. Don't flag missing `TryFrom<u8>` implementations when this derive macro is present.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-29T09:47:34.052Z
Learnt from: Dodecahedr0x
PR: magicblock-labs/delegation-program#90
File: src/instruction_builder/set_fees_receiver.rs:25-29
Timestamp: 2025-10-29T09:47:34.052Z
Learning: In instruction builder functions (files in src/instruction_builder/), using `.unwrap()` on serialization operations like `to_vec()` is standard practice and does not need to be replaced with `.expect()`.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • try_new (44-103)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: install
🔇 Additional comments (9)
Cargo.toml (1)

48-49: LGTM! Dependency additions are appropriate.

The rkyv dependency provides AlignedVec for 16-byte aligned diff buffers, and static_assertions enables compile-time verification of struct layout invariants. Both are used appropriately in the new diff module.

src/error.rs (1)

38-41: LGTM! Error variants are well-defined.

The new error variants provide clear, actionable messages for diff validation failures. They integrate cleanly with the existing error handling infrastructure.

src/diff/algorithm.rs (4)

10-119: Excellent documentation!

The comprehensive format specification, complete with ASCII diagrams and concrete examples, makes the diff format immediately understandable. This level of documentation will greatly assist future maintainers and users of this module.


120-183: Implementation correctly follows the documented format.

The diff computation algorithm efficiently identifies contiguous changed regions and handles expansion/shrinkage cases. The serialization matches the documented format exactly, and the use of AlignedVec ensures proper alignment for DiffSet construction.


197-240: Apply functions are well-structured.

The separation of concerns is clean: detect_size_change validates, apply_diff_in_place enforces exact size match, apply_diff_copy handles all resize scenarios, and apply_diff_impl does the actual work. The error handling with SizeChanged provides clear feedback on size mismatches.


317-405: Excellent stress test with large random data.

The randomized test with 2-10 MB accounts and random mutations provides robust validation of the diff algorithm under realistic conditions. The use of a seeded RNG ensures reproducibility when failures occur.

src/diff/types.rs (3)

18-26: Excellent use of compile-time assertions.

The static_assertions verify critical layout invariants at compile time: OffsetPair has 4-byte alignment (matching u32) and 8-byte size. This catches layout issues immediately if the definition changes.


34-41: Well-designed zero-copy structure.

The DiffSet design efficiently pre-parses header fields during construction while maintaining zero-copy access to the payload. The lifetime parameter correctly ties the structure to the input buffer.


131-157: diff_segment_at includes proper bounds validation.

The method correctly handles edge cases: validates the index, computes segment bounds using either the next offset or the concatenated diff length, and includes an explicit bounds check before returning the slice.

Comment thread src/diff/types.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch from a8c61d0 to 5c58cfb Compare October 29, 2025 18:24
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ccdb999 and fbb877b.

📒 Files selected for processing (2)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
📚 Learning: 2025-10-11T18:00:22.232Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#94
File: src/lib.rs:93-103
Timestamp: 2025-10-11T18:00:22.232Z
Learning: When reviewing Rust code, the `TryFromPrimitive` derive macro from the `num_enum` crate automatically generates `TryFrom<u8>` (and other primitive type conversions) for enums marked with `#[repr(u8)]`. Don't flag missing `TryFrom<u8>` implementations when this derive macro is present.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-29T09:47:34.052Z
Learnt from: Dodecahedr0x
PR: magicblock-labs/delegation-program#90
File: src/instruction_builder/set_fees_receiver.rs:25-29
Timestamp: 2025-10-29T09:47:34.052Z
Learning: In instruction builder functions (files in src/instruction_builder/), using `.unwrap()` on serialization operations like `to_vec()` is standard practice and does not need to be replaced with `.expect()`.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • try_new (45-104)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: lint

Comment thread src/diff/types.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch from fbb877b to b6c097a Compare October 30, 2025 17:59
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fbb877b and b6c097a.

📒 Files selected for processing (2)
  • src/diff/algorithm.rs (1 hunks)
  • src/diff/types.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📚 Learning: 2025-10-29T09:28:50.254Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/algorithm.rs:1-1
Timestamp: 2025-10-29T09:28:50.254Z
Learning: The magicblock-labs/delegation-program project uses std throughout the codebase and no_std compatibility is not a goal. Do not suggest using core instead of std.

Applied to files:

  • src/diff/algorithm.rs
📚 Learning: 2025-10-11T18:00:22.232Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#94
File: src/lib.rs:93-103
Timestamp: 2025-10-11T18:00:22.232Z
Learning: When reviewing Rust code, the `TryFromPrimitive` derive macro from the `num_enum` crate automatically generates `TryFrom<u8>` (and other primitive type conversions) for enums marked with `#[repr(u8)]`. Don't flag missing `TryFrom<u8>` implementations when this derive macro is present.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-29T09:47:34.052Z
Learnt from: Dodecahedr0x
PR: magicblock-labs/delegation-program#90
File: src/instruction_builder/set_fees_receiver.rs:25-29
Timestamp: 2025-10-29T09:47:34.052Z
Learning: In instruction builder functions (files in src/instruction_builder/), using `.unwrap()` on serialization operations like `to_vec()` is standard practice and does not need to be replaced with `.expect()`.

Applied to files:

  • src/diff/types.rs
📚 Learning: 2025-10-28T18:27:42.504Z
Learnt from: snawaz
PR: magicblock-labs/delegation-program#111
File: src/diff/types.rs:87-102
Timestamp: 2025-10-28T18:27:42.504Z
Learning: In Solana runtime, `account_info.data()` is guaranteed to have 8-byte alignment.

Applied to files:

  • src/diff/types.rs
🧬 Code graph analysis (1)
src/diff/algorithm.rs (1)
src/diff/types.rs (1)
  • try_new (45-104)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: install

Comment thread src/diff/types.rs
@snawaz snawaz force-pushed the snawaz/diff-module branch 2 times, most recently from 5fa0ded to 5304cc4 Compare October 30, 2025 18:42
Copy link
Copy Markdown
Contributor

@GabrielePicco GabrielePicco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants