fix(status): correctly detect tracked-file modifications and post-switch state#48
Closed
geekgonecrazy wants to merge 5 commits into
Closed
fix(status): correctly detect tracked-file modifications and post-switch state#48geekgonecrazy wants to merge 5 commits into
geekgonecrazy wants to merge 5 commits into
Conversation
A tracked file that lands in the working copy via `atomic insert`, `atomic clone`, or `atomic view switch` has no FILE_INDEX entry — those code paths materialize content but only `record()` and `materialize_view()` populate the index. When that file was later modified, `status()` looked up FILE_INDEX, found nothing, and fell through with an "Assume clean" comment. The modification became invisible to `atomic status`, `atomic diff`, and `atomic record -a`. Concretely this caused agent turns to silently drop edits to already-tracked files: the agent record path uses `repo.record(all=true)`, which calls `status(default())` internally, which dropped the entry, which made the file invisible to filter_files. Turns that mixed new files with edits to tracked files recorded only the new files. Replace the silent fallthrough with a conservative Modified entry. When `hash_contents=true`, hash the file (consistent with the existing fast-path branch when mtime+size differ). When false, emit the entry without hashing — same shape as the existing "mtime changed but no hash" branch. The recording workflow already handles false positives: `record_modified_file` produces an empty hunk for content that matches pristine, and the file is filtered out by `recorded.is_empty()`. Subsequent records re-populate FILE_INDEX, returning the file to the fast path. Test added in status_tests.rs records a file (populating FILE_INDEX), drops the entry to simulate the post-insert state, modifies the file, and asserts status() reports it as Modified. Fails against the previous code with entries=[].
Two related bugs caused `atomic status` to lie about the working copy
immediately after `atomic view switch`. The disk was correct (materialize
had restored the destination view's content), but status reported phantom
`Deleted` entries for files that only exist on sibling views and false
`Modified` entries for files that materialize had just rewritten.
Test added in integration_tests.rs records alpha.txt + bravo.txt on dev,
splits feature off dev, on feature edits alpha.txt and adds delta.txt
and records, then switches back to dev. Asserts status().is_clean()
after the switch. Failed on the parent commit with
[("delta.txt", Deleted), ("alpha.txt", Modified)].
1) status.rs — drop the unsound "universal filter" fast path
For `is_shared() && parent.is_none()` views, status was skipping the
view-change-id filter on the assumption that ALL changes in GRAPH
belonged to the view. That assumption breaks the moment `atomic split`
creates a sibling view: the dev view is still shared+root, but it no
longer contains every change — feature's record introduces TREE
entries (e.g. delta.txt's inode_position pointing to feature's change)
that must NOT surface in dev's status.
Always compute `current_view_change_ids` via
`collect_visible_change_ids_with_deps` and apply the filter. Cost is
one O(C) B-tree scan per status call where C is changes on the view —
bounded and fast even on large repos. Preserve the legacy "show
everything" fallback when no current view exists by using
`Option<HashSet<NodeId>>`.
2) materialize.rs — actually populate FILE_INDEX after materialize
`populate_file_index` iterated `result.file_results.keys()` to update
FILE_INDEX with current on-disk hashes. But `materialize_view` calls
`merge_file_result(_, store_result=false)` at its only production
call site, so `file_results` is always empty in practice and this
function silently no-op'd. FILE_INDEX retained whatever hashes the
previous view had recorded, so post-switch `status()` would hash
alpha.txt on disk, find it differs from the cached (sibling-view)
hash, and report Modified — even though materialize had just written
the destination view's correct content.
Iterate `list_tracked_files()` instead. Materialize has just synced
the working copy to the destination view's recorded state, so
hashing the on-disk content for each tracked file produces the
correct authoritative baseline for FILE_INDEX. Cost is bounded by
tracked-file count, same order as materialize itself.
Together: the view filter excludes sibling-only TREE entries (no more
phantom Deleted), and the FILE_INDEX refresh keeps cached hashes
consistent with materialized disk content (no more false Modified).
def256c to
b2064c2
Compare
The dd1d59b status fix's "self-healing" claim was wrong for the false-positive case. A tracked file with no FILE_INDEX entry whose on-disk content matches pristine (e.g. landed via insert/clone or via an old binary's view-switch that didn't populate FILE_INDEX) shows as Modified by the conservative branch in status.rs:322-349. record then proves the content matches pristine, produces empty hunks, and skips the file. The existing FILE_INDEX update at record.rs:632 only iterates `outcome.recorded_files()` — skipped paths are never written. Result: the file shows Modified again on the next status, surviving arbitrarily many records as a permanent dirty-status tail. Observed in the wild on /Users/aaron/code/test-projects/atomic-ui after an opencode agent turn: 22 phantom-Modified files persisted through a record that legitimately touched 4 unrelated files (`atomic diff` for the 22 had no hunks, confirming content matched pristine). Track confirmed-clean paths inline at the two skip sites where we already know the content matches pristine (byte-equal old/new content at record.rs:449, and empty hunks from record_modified_file at record.rs:501) and write FILE_INDEX entries for them after the processing loop. Runs before the NothingToRecord early return so the heal still happens when every "Modified" file turns out to be phantom. We have new_content in scope at both sites, so Hash::of is a single in-memory pass with no extra disk reads. Two tests added in status_tests.rs: - test_record_heals_file_index_for_phantom_modified_files: drops FILE_INDEX for a tracked file whose content matches pristine, runs record (returns NothingToRecord), asserts status is clean. - test_record_heals_file_index_alongside_real_changes: mirrors the agent-record scenario — one real edit + one phantom-Modified file in the same record. Asserts the recorded change contains only the real edit AND post-record status is clean. Both tests fail on the parent commit with phantom.txt still showing Modified after the record, confirming they pin the regression.
Contributor
Author
|
Closing in favor of #49 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three related correctness bugs in
atomic-repositorycausedatomic status(and thereforediffandrecord) to lie about the working copy. Each surfaces as silent data loss in the agent record path: edits the agent made were invisible torecord(all=true)and lost on the next view switch.Two commits, each paired with a failing test that pins the bug, then the fix:
dd1d59bstatus()silently classifies tracked files with no FILE_INDEX entry as Cleantest_status_detects_modification_when_file_index_missingb2064c2view switchwith a sibling view, status reports phantomDeletedand falseModifiedfor clean working copytest_status_clean_after_view_switch_with_sibling_changesBug 1 — silent fallthrough when FILE_INDEX entry is missing
Symptom
After certain workflows that materialize files into the working copy without re-recording (notably
atomic insertof inbound changes,atomic clone, server-side push, or any operation that populates TREE without going throughrecord),atomic status,atomic diff, andatomic record -awould all report the working tree clean even when a tracked file had been edited on disk. The nextrecordwould silently drop the modification.Repro (against
atomic 0.5.3)The 27 tracked files with no FILE_INDEX entry are silently invisible to status.
Root cause
atomic-repository/src/repository/status.rs:308-311(before this PR):FILE_INDEX is written in only two places —
record()(atomic-repository/src/repository/record.rs:646) andmaterialize_view()(atomic-repository/src/repository/materialize.rs:154). The change-application paths (insert,clone,view switch, server push) materialize files into the working copy but do not populate FILE_INDEX. Any file that arrived via one of those paths therefore has no entry, andstatus()falls through with"Assume clean"— silent data loss.The same logic is the reason agent turns with edits to tracked files were dropped: the agent record path calls
repo.record(with_all(true)), which callsrepo.status(StatusOptions::default())internally, which hits this fallthrough and never reports the modified files tofilter_files.Fix
When a tracked file has no FILE_INDEX entry, emit a conservative
FileStatus::Modifiedentry instead of falling through. Withhash_contents=truewe read+hash the file so the entry carries the current hash; in fast mode we skip the hash and just emit Modified. The entry setsdetails = "FILE_INDEX entry missing"for diagnostics.This is intentionally conservative: a file that is actually unchanged but has no index entry will be reported Modified once. The recording workflow already handles that case —
record_modified_fileproduces an empty hunk for content that matches pristine,recorded.is_empty()filters it out, and the post-apply step inrecord()writes a fresh FILE_INDEX entry, returning the file to the fast path.Pinning test
atomic-repository/src/repository/tests/status_tests.rs::test_status_detects_modification_when_file_index_missingrecords a file (populating FILE_INDEX), drops the entry to simulate the post-insert state, modifies the file on disk, and assertsstatus()reports it asModified.Failing on the parent commit:
Bug 2 — phantom Deleted and false Modified after
view switchSymptom
After
atomic view switchround-trip across views with diverged content,atomic statuslies about the working copy. The disk is correct (materializerestored the destination view's content), but status reports phantomDeletedentries for files that only exist on sibling views and falseModifiedentries for files that materialize had just rewritten.Repro (against
atomic 0.5.3, in a fresh/tmprepo)atomic diffconfirms status is wrong — it produces the diff headers but no+/-hunks for either file, because their disk content matches the recorded view state.Root cause #2a — unsound
filter_is_universalfast pathatomic-repository/src/repository/status.rs:43-48(before this PR) had a "fast path" that skipped the view-change-id filter foris_shared() && parent.is_none()views, on the assumption that ALL changes in GRAPH were visible from such a view. The comment:The assumption breaks the moment
atomic splitcreates a sibling view.devis still shared with no parent, but feature has its own changes thatdevdoes not. TREE is global, so when feature recordeddelta.txtit added an INODES entry whoseposition.changepoints to feature's change. Whendevruns status with the universal shortcut, the filter is skipped, the entry is accepted, the file isn't on disk → reportedDeleted.Root cause #2b —
populate_file_indexwas a silent no-opatomic-repository/src/repository/materialize.rs:154-180(before this PR):MaterializeResult::merge_file_result(_, store_result: bool)only inserts intofile_resultswhenstore_result=true. The single production call site ofmaterialize_viewpassesfalse(atomic-core/src/output/repo/repository/mod.rs:269):So
result.file_resultsis always empty in production, andpopulate_file_indexsilently iterated nothing. FILE_INDEX was never refreshed by materialize, leaving stale per-view hashes after everyview switch.After
dev → feature → dev, FILE_INDEX still held feature'salpha.txthash. Status hashed the on-disk content (alpha\n, dev's content), found it differed from the cached hash (feature'salpha-modified\nhash), and reportedModified. Materialize had written the right bytes to disk; the index just wasn't told.Fixes
2a —
status.rs: drop the universal fast path. Always computecurrent_view_change_idsviacollect_visible_change_ids_with_depsand apply the filter to everyiter_treeentry. Cost is one O(C) B-tree scan per status call where C = changes on the view — bounded and fast even on large repos. The legacy "show everything" fallback for a missing current view is preserved viaOption<HashSet<NodeId>>(treated as "no filter").2b —
materialize.rs: rewritepopulate_file_indexto iteratelist_tracked_files()instead ofresult.file_results.keys(). Materialize has just synced the working copy to the destination view's recorded state, so hashing the on-disk content for each tracked file produces the authoritative baseline FILE_INDEX should cache. Cost is bounded by tracked-file count, same order as materialize itself. Theresultparameter is kept for callsite compatibility but unused.Pinning test
atomic-repository/src/repository/tests/integration_tests.rs::test_status_clean_after_view_switch_with_sibling_changesreproduces the workflow above in a temp repo and asserts both that the disk is correct and thatstatus().is_clean()immediately after the switch.Failing on the parent commit:
Why these bugs surfaced together (real-world impact)
The original report was from an agent (OpenCode) workflow where a turn:
devserver.ts+ the existingGraphView.tsxplaceholder)The recorded change contained only the 4 creates. Both file edits were silently dropped because the agent's
record_turncalledrepo.record(with_all(true)), whose internalstatus(default())hit Bug 1's fallthrough and never reported the edited files. After view-switching, the now-incomplete change was inserted into the parent view; Bug 2 then showed phantommodified/deletedentries that obscured what was missing. The user's mental model — "I built it, the test passed, I merged it in, where did it go?" — was correct; the changes were created, never recorded, then erased by the next switch.Test plan
cargo test -p atomic-repository --lib— 754 passed, 0 failed (753 prior + 2 new)cargo test -p atomic-agent --lib— 1133 passed, 0 failedcargo fmt -p atomic-repository --check— cleancargo clippy -p atomic-repository --lib --tests— no new warnings on changed filesdev → feature → devleaves status cleanFiles changed
atomic-repository/src/repository/status.rs— drop universal filter shortcut; replace silent fallthrough on missing FILE_INDEX with conservative Modified entryatomic-repository/src/repository/materialize.rs— fixpopulate_file_indexto actually update the index after materializeatomic-repository/src/repository/tests/status_tests.rs— failing test for bug 1atomic-repository/src/repository/tests/integration_tests.rs— failing test for bug 2Follow-up (not in this PR)
The complementary fix is to populate FILE_INDEX from the apply/insert/clone paths so the conservative "no entry → Modified" branch in bug 1's fix triggers even less often. Materialize is now correct after these changes; insert/clone are the remaining gaps. Happy to do that in a follow-up PR.