Skip to content

fix(repository): verify content hash in FILE_INDEX fast path on mtime+size match#65

Draft
geekgonecrazy wants to merge 1 commit into
devfrom
fix/file-index-mtime-hash-verification-clean
Draft

fix(repository): verify content hash in FILE_INDEX fast path on mtime+size match#65
geekgonecrazy wants to merge 1 commit into
devfrom
fix/file-index-mtime-hash-verification-clean

Conversation

@geekgonecrazy
Copy link
Copy Markdown
Contributor

Problem

The FILE_INDEX fast path in repository/status.rs (introduced in 8ab0e7d "AI Cleanup") compared only mtime + size against the cache and unconditionally declared a file Clean when both matched, skipping the hash check entirely:

if current_secs == cached_secs
    && current_nanos == cached_nanos
    && current_size == cached_size
{
    // mtime + size match → Clean — skip entirely
    index_hit_count += 1;
    continue;
}

On Linux, mtime has 1-second resolution. Two writes within the same clock second produce identical mtime values. When the new content is also the same byte length as the original, the fast path silently hides the modification:

  • status() returns no Modified entries
  • record() sees nothing to do and returns NothingToRecord
  • The file change is silently dropped

The FILE_INDEX tuple is (secs, nanos, size, hash) — the hash was stored but never consulted in the fast path.

Fix

When hash_contents is enabled (set to true by StatusOptions::default(), which record() uses), verify the cached hash against the current file content even when mtime + size match:

if current_secs == cached_secs
    && current_nanos == cached_nanos
    && current_size == cached_size
{
    if !options.hash_contents {
        index_hit_count += 1;
        continue; // fast path: no hash requested, trust mtime+size
    }
    // verify hash even on mtime+size match — guards against coarse Linux clock
    hash_count += 1;
    match hash_file_contents(&abs_path) {
        Ok(h) if h == cached_hash => { index_hit_count += 1; continue; }
        Ok(h) => { /* report Modified */ }
        Err(_) => { /* report Modified */ }
    }
}

StatusOptions::fast() (hash_contents: false) retains the original short-circuit for callers that don't need content verification.

Patch theory

This fix does not violate patch theory. FILE_INDEX is a pure performance cache — record() always re-reads modified files unconditionally to compute the actual diff against pristine. The cached hash is never consulted during recording. The fix is strictly more conservative: it only adds cases where Modified is correctly detected. Verified by reading record/workflow/record/mod.rs (record_modified_file always calls working_copy.read_file).

Regression test

test_file_index_fast_path_same_mtime_and_size_detects_modification in repository/tests/status_tests.rs.

Reproduces the bug deterministically on any platform by:

  1. Recording a file to populate FILE_INDEX
  2. Writing different content of the same byte length ("original\n""modified\n", both 9 bytes)
  3. Using filetime to reset the mtime back to the indexed value — simulating Linux's 1-second clock
  4. Asserting status() reports Modified
  5. Asserting record() succeeds (not NothingToRecord)

Without the fix the test fails:

assertion failed: status() must detect a same-size content change when mtime matches FILE_INDEX.
All status entries: []
  left: 0
 right: 1

With the fix the test passes.

…+size match

The FILE_INDEX fast path (introduced in 8ab0e7d) compared only mtime+size and
unconditionally declared a file Clean when both matched, skipping the hash check.
On Linux, mtime has 1-second resolution. Two writes within the same clock second
produce identical mtimes. When the new content is also the same byte length as
the original, the fast path hides the modification: status() returns no Modified
entries and record() returns NothingToRecord even though the file changed.

Fix: when hash_contents is enabled (the default used by record()), verify the
stored hash against the current file content even when mtime+size match. The
StatusOptions::fast() path (hash_contents=false) retains the original short-circuit.

Adds a regression test that reproduces the bug deterministically on any platform
by using filetime to reset the mtime after writing same-size content, simulating
Linux's coarse 1-second clock. The test asserts that status() reports Modified
and that record() succeeds (does not return NothingToRecord).

Does not violate patch theory: FILE_INDEX is a pure performance cache. record()
always re-reads modified files to compute the actual diff against pristine;
the cached hash is never consulted during recording. The fix is strictly more
conservative — it only adds cases where Modified is correctly detected.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant