Skip to content

fix(status): correctly detect tracked-file modifications and post-switch state#48

Closed
geekgonecrazy wants to merge 5 commits into
devfrom
fix/status-detect-modifications-without-file-index
Closed

fix(status): correctly detect tracked-file modifications and post-switch state#48
geekgonecrazy wants to merge 5 commits into
devfrom
fix/status-detect-modifications-without-file-index

Conversation

@geekgonecrazy
Copy link
Copy Markdown
Contributor

@geekgonecrazy geekgonecrazy commented May 7, 2026

Summary

Three related correctness bugs in atomic-repository caused atomic status (and therefore diff and record) to lie about the working copy. Each surfaces as silent data loss in the agent record path: edits the agent made were invisible to record(all=true) and lost on the next view switch.

Two commits, each paired with a failing test that pins the bug, then the fix:

Commit Bug Test
dd1d59b status() silently classifies tracked files with no FILE_INDEX entry as Clean test_status_detects_modification_when_file_index_missing
b2064c2 After view switch with a sibling view, status reports phantom Deleted and false Modified for clean working copy test_status_clean_after_view_switch_with_sibling_changes

Bug 1 — silent fallthrough when FILE_INDEX entry is missing

Symptom

After certain workflows that materialize files into the working copy without re-recording (notably atomic insert of inbound changes, atomic clone, server-side push, or any operation that populates TREE without going through record), atomic status, atomic diff, and atomic record -a would all report the working tree clean even when a tracked file had been edited on disk. The next record would silently drop the modification.

Repro (against atomic 0.5.3)

# Inside a real working repo where some files were materialized via insert/clone
# (in the original report, the user did atomic split → record → switch → insert)
echo "// edit $(date +%s)" >> src/client/components/GraphView/GraphView.tsx
atomic status      # → only shows untracked .claude/settings.local.json
atomic diff        # → "No changes detected"
atomic record -m t # → "Nothing to record - working tree is clean"

RUST_LOG=debug atomic status |& grep status:
# status: TREE scan took 0ms (38 tracked files, 0 dirs)
# status: FILE_INDEX loaded 0ms (11 entries)        ← only 11 of 38 indexed
# status: classify took 0ms (stat=38, index_hit=0, hashed=11)

The 27 tracked files with no FILE_INDEX entry are silently invisible to status.

Root cause

atomic-repository/src/repository/status.rs:308-311 (before this PR):

// No FILE_INDEX entry — file is tracked with graph content
// but was never indexed. Assume clean (can't compare without
// reconstructing graph content, which is expensive).

FILE_INDEX is written in only two places — record() (atomic-repository/src/repository/record.rs:646) and materialize_view() (atomic-repository/src/repository/materialize.rs:154). The change-application paths (insert, clone, view switch, server push) materialize files into the working copy but do not populate FILE_INDEX. Any file that arrived via one of those paths therefore has no entry, and status() falls through with "Assume clean" — silent data loss.

The same logic is the reason agent turns with edits to tracked files were dropped: the agent record path calls repo.record(with_all(true)), which calls repo.status(StatusOptions::default()) internally, which hits this fallthrough and never reports the modified files to filter_files.

Fix

When a tracked file has no FILE_INDEX entry, emit a conservative FileStatus::Modified entry instead of falling through. With hash_contents=true we read+hash the file so the entry carries the current hash; in fast mode we skip the hash and just emit Modified. The entry sets details = "FILE_INDEX entry missing" for diagnostics.

This is intentionally conservative: a file that is actually unchanged but has no index entry will be reported Modified once. The recording workflow already handles that case — record_modified_file produces an empty hunk for content that matches pristine, recorded.is_empty() filters it out, and the post-apply step in record() writes a fresh FILE_INDEX entry, returning the file to the fast path.

Pinning test

atomic-repository/src/repository/tests/status_tests.rs::test_status_detects_modification_when_file_index_missing records a file (populating FILE_INDEX), drops the entry to simulate the post-insert state, modifies the file on disk, and asserts status() reports it as Modified.

Failing on the parent commit:

panicked: status() must detect modifications to tracked files even when
FILE_INDEX has no entry for them. entries=[]

Bug 2 — phantom Deleted and false Modified after view switch

Symptom

After atomic view switch round-trip across views with diverged content, atomic status lies about the working copy. The disk is correct (materialize restored the destination view's content), but status reports phantom Deleted entries for files that only exist on sibling views and false Modified entries for files that materialize had just rewritten.

Repro (against atomic 0.5.3, in a fresh /tmp repo)

mkdir /tmp/atomic-repro && cd /tmp/atomic-repro
atomic init

# Record three files on dev
echo alpha   > alpha.txt
echo bravo   > bravo.txt
echo charlie > charlie.txt
atomic add alpha.txt bravo.txt charlie.txt
atomic record -m "Add 3 files on dev"

# Split feature off dev, switch to it, modify alpha + add delta, record
atomic split feature --switch
echo "alpha-modified" > alpha.txt
echo delta            > delta.txt
atomic add delta.txt
atomic record -m "Edit alpha + add delta on feature"

# Switch back to dev
atomic view switch dev

# Disk is correct…
ls          # alpha.txt bravo.txt charlie.txt   (no delta.txt)
cat alpha.txt    # "alpha"

# …but status lies
atomic status
# Changes to be recorded:
#   modified:   alpha.txt
#   deleted:    delta.txt

atomic diff confirms status is wrong — it produces the diff headers but no +/- hunks for either file, because their disk content matches the recorded view state.

Root cause #2a — unsound filter_is_universal fast path

atomic-repository/src/repository/status.rs:43-48 (before this PR) had a "fast path" that skipped the view-change-id filter for is_shared() && parent.is_none() views, on the assumption that ALL changes in GRAPH were visible from such a view. The comment:

// Fast path: for a Shared view with no parent (the common case
// after `atomic init` or `atomic git import`), ALL changes in
// GRAPH are visible.  Skip the expensive O(N) scan entirely.

The assumption breaks the moment atomic split creates a sibling view. dev is still shared with no parent, but feature has its own changes that dev does not. TREE is global, so when feature recorded delta.txt it added an INODES entry whose position.change points to feature's change. When dev runs status with the universal shortcut, the filter is skipped, the entry is accepted, the file isn't on disk → reported Deleted.

Root cause #2b — populate_file_index was a silent no-op

atomic-repository/src/repository/materialize.rs:154-180 (before this PR):

fn populate_file_index(&self, result: &MaterializeResult) {
    let mut entries = Vec::new();
    for path in result.file_results.keys() {  // ← always empty
        // ... read disk + hash, push entry
    }
    if !entries.is_empty() {                  // ← never enters
        let _ = self.update_file_index(&entries);
    }
}

MaterializeResult::merge_file_result(_, store_result: bool) only inserts into file_results when store_result=true. The single production call site of materialize_view passes false (atomic-core/src/output/repo/repository/mod.rs:269):

result.merge_file_result(file_result, false);

So result.file_results is always empty in production, and populate_file_index silently iterated nothing. FILE_INDEX was never refreshed by materialize, leaving stale per-view hashes after every view switch.

After dev → feature → dev, FILE_INDEX still held feature's alpha.txt hash. Status hashed the on-disk content (alpha\n, dev's content), found it differed from the cached hash (feature's alpha-modified\n hash), and reported Modified. Materialize had written the right bytes to disk; the index just wasn't told.

Fixes

2a — status.rs: drop the universal fast path. Always compute current_view_change_ids via collect_visible_change_ids_with_deps and apply the filter to every iter_tree entry. Cost is one O(C) B-tree scan per status call where C = changes on the view — bounded and fast even on large repos. The legacy "show everything" fallback for a missing current view is preserved via Option<HashSet<NodeId>> (treated as "no filter").

2b — materialize.rs: rewrite populate_file_index to iterate list_tracked_files() instead of result.file_results.keys(). Materialize has just synced the working copy to the destination view's recorded state, so hashing the on-disk content for each tracked file produces the authoritative baseline FILE_INDEX should cache. Cost is bounded by tracked-file count, same order as materialize itself. The result parameter is kept for callsite compatibility but unused.

Pinning test

atomic-repository/src/repository/tests/integration_tests.rs::test_status_clean_after_view_switch_with_sibling_changes reproduces the workflow above in a temp repo and asserts both that the disk is correct and that status().is_clean() immediately after the switch.

Failing on the parent commit:

panicked: status() after view switch must be clean — disk and view state agree,
but status reported phantom dirty entries:
[("delta.txt", Deleted), ("alpha.txt", Modified)]

Why these bugs surfaced together (real-world impact)

The original report was from an agent (OpenCode) workflow where a turn:

  1. Opened in a draft view split off dev
  2. Created 4 new files and edited 2 already-tracked files (server.ts + the existing GraphView.tsx placeholder)
  3. Ran build verifications (passed) and recorded the change
  4. The user pushed/switched/inserted the change into another view

The recorded change contained only the 4 creates. Both file edits were silently dropped because the agent's record_turn called repo.record(with_all(true)), whose internal status(default()) hit Bug 1's fallthrough and never reported the edited files. After view-switching, the now-incomplete change was inserted into the parent view; Bug 2 then showed phantom modified/deleted entries that obscured what was missing. The user's mental model — "I built it, the test passed, I merged it in, where did it go?" — was correct; the changes were created, never recorded, then erased by the next switch.

Test plan

  • cargo test -p atomic-repository --lib — 754 passed, 0 failed (753 prior + 2 new)
  • cargo test -p atomic-agent --lib — 1133 passed, 0 failed
  • cargo fmt -p atomic-repository --check — clean
  • cargo clippy -p atomic-repository --lib --tests — no new warnings on changed files
  • Each pinning test verified failing on its parent commit and passing after the fix
  • Manual repro in a real working repo: agent turn that creates 4 new files + edits 2 pre-existing tracked files now records all 6 changes; round-tripping dev → feature → dev leaves status clean

Files changed

  • atomic-repository/src/repository/status.rs — drop universal filter shortcut; replace silent fallthrough on missing FILE_INDEX with conservative Modified entry
  • atomic-repository/src/repository/materialize.rs — fix populate_file_index to actually update the index after materialize
  • atomic-repository/src/repository/tests/status_tests.rs — failing test for bug 1
  • atomic-repository/src/repository/tests/integration_tests.rs — failing test for bug 2

Follow-up (not in this PR)

The complementary fix is to populate FILE_INDEX from the apply/insert/clone paths so the conservative "no entry → Modified" branch in bug 1's fix triggers even less often. Materialize is now correct after these changes; insert/clone are the remaining gaps. Happy to do that in a follow-up PR.

@geekgonecrazy geekgonecrazy changed the title fix(status): detect modifications when FILE_INDEX entry is missing fix(status): correctly detect tracked-file modifications and post-switch state May 7, 2026
A tracked file that lands in the working copy via `atomic insert`,
`atomic clone`, or `atomic view switch` has no FILE_INDEX entry —
those code paths materialize content but only `record()` and
`materialize_view()` populate the index. When that file was later
modified, `status()` looked up FILE_INDEX, found nothing, and fell
through with an "Assume clean" comment. The modification became
invisible to `atomic status`, `atomic diff`, and `atomic record -a`.

Concretely this caused agent turns to silently drop edits to
already-tracked files: the agent record path uses `repo.record(all=true)`,
which calls `status(default())` internally, which dropped the entry,
which made the file invisible to filter_files. Turns that mixed new
files with edits to tracked files recorded only the new files.

Replace the silent fallthrough with a conservative Modified entry.
When `hash_contents=true`, hash the file (consistent with the
existing fast-path branch when mtime+size differ). When false,
emit the entry without hashing — same shape as the existing
"mtime changed but no hash" branch. The recording workflow already
handles false positives: `record_modified_file` produces an empty
hunk for content that matches pristine, and the file is filtered
out by `recorded.is_empty()`. Subsequent records re-populate
FILE_INDEX, returning the file to the fast path.

Test added in status_tests.rs records a file (populating FILE_INDEX),
drops the entry to simulate the post-insert state, modifies the file,
and asserts status() reports it as Modified. Fails against the
previous code with entries=[].
Two related bugs caused `atomic status` to lie about the working copy
immediately after `atomic view switch`. The disk was correct (materialize
had restored the destination view's content), but status reported phantom
`Deleted` entries for files that only exist on sibling views and false
`Modified` entries for files that materialize had just rewritten.

Test added in integration_tests.rs records alpha.txt + bravo.txt on dev,
splits feature off dev, on feature edits alpha.txt and adds delta.txt
and records, then switches back to dev. Asserts status().is_clean()
after the switch. Failed on the parent commit with
[("delta.txt", Deleted), ("alpha.txt", Modified)].

1) status.rs — drop the unsound "universal filter" fast path

   For `is_shared() && parent.is_none()` views, status was skipping the
   view-change-id filter on the assumption that ALL changes in GRAPH
   belonged to the view. That assumption breaks the moment `atomic split`
   creates a sibling view: the dev view is still shared+root, but it no
   longer contains every change — feature's record introduces TREE
   entries (e.g. delta.txt's inode_position pointing to feature's change)
   that must NOT surface in dev's status.

   Always compute `current_view_change_ids` via
   `collect_visible_change_ids_with_deps` and apply the filter. Cost is
   one O(C) B-tree scan per status call where C is changes on the view —
   bounded and fast even on large repos. Preserve the legacy "show
   everything" fallback when no current view exists by using
   `Option<HashSet<NodeId>>`.

2) materialize.rs — actually populate FILE_INDEX after materialize

   `populate_file_index` iterated `result.file_results.keys()` to update
   FILE_INDEX with current on-disk hashes. But `materialize_view` calls
   `merge_file_result(_, store_result=false)` at its only production
   call site, so `file_results` is always empty in practice and this
   function silently no-op'd. FILE_INDEX retained whatever hashes the
   previous view had recorded, so post-switch `status()` would hash
   alpha.txt on disk, find it differs from the cached (sibling-view)
   hash, and report Modified — even though materialize had just written
   the destination view's correct content.

   Iterate `list_tracked_files()` instead. Materialize has just synced
   the working copy to the destination view's recorded state, so
   hashing the on-disk content for each tracked file produces the
   correct authoritative baseline for FILE_INDEX. Cost is bounded by
   tracked-file count, same order as materialize itself.

Together: the view filter excludes sibling-only TREE entries (no more
phantom Deleted), and the FILE_INDEX refresh keeps cached hashes
consistent with materialized disk content (no more false Modified).
@geekgonecrazy geekgonecrazy force-pushed the fix/status-detect-modifications-without-file-index branch from def256c to b2064c2 Compare May 7, 2026 21:25
leefaus and others added 3 commits May 7, 2026 17:47
The dd1d59b status fix's "self-healing" claim was wrong for the
false-positive case. A tracked file with no FILE_INDEX entry whose
on-disk content matches pristine (e.g. landed via insert/clone or via
an old binary's view-switch that didn't populate FILE_INDEX) shows as
Modified by the conservative branch in status.rs:322-349. record then
proves the content matches pristine, produces empty hunks, and skips
the file. The existing FILE_INDEX update at record.rs:632 only iterates
`outcome.recorded_files()` — skipped paths are never written. Result:
the file shows Modified again on the next status, surviving arbitrarily
many records as a permanent dirty-status tail.

Observed in the wild on /Users/aaron/code/test-projects/atomic-ui after
an opencode agent turn: 22 phantom-Modified files persisted through a
record that legitimately touched 4 unrelated files (`atomic diff` for
the 22 had no hunks, confirming content matched pristine).

Track confirmed-clean paths inline at the two skip sites where we
already know the content matches pristine (byte-equal old/new content
at record.rs:449, and empty hunks from record_modified_file at
record.rs:501) and write FILE_INDEX entries for them after the
processing loop. Runs before the NothingToRecord early return so the
heal still happens when every "Modified" file turns out to be phantom.
We have new_content in scope at both sites, so Hash::of is a single
in-memory pass with no extra disk reads.

Two tests added in status_tests.rs:
- test_record_heals_file_index_for_phantom_modified_files: drops
  FILE_INDEX for a tracked file whose content matches pristine, runs
  record (returns NothingToRecord), asserts status is clean.
- test_record_heals_file_index_alongside_real_changes: mirrors the
  agent-record scenario — one real edit + one phantom-Modified file
  in the same record. Asserts the recorded change contains only the
  real edit AND post-record status is clean.

Both tests fail on the parent commit with phantom.txt still showing
Modified after the record, confirming they pin the regression.
@geekgonecrazy
Copy link
Copy Markdown
Contributor Author

Closing in favor of #49

@geekgonecrazy geekgonecrazy deleted the fix/status-detect-modifications-without-file-index branch May 7, 2026 22:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants