Skip to content

unpack-trees: skip lstats for deleted VFS entries in checkout#865

Merged
dscho merged 1 commit into
microsoft:vfs-2.53.0from
tyrielv:tyrielv/checkout-hydration-fix
Mar 26, 2026
Merged

unpack-trees: skip lstats for deleted VFS entries in checkout#865
dscho merged 1 commit into
microsoft:vfs-2.53.0from
tyrielv:tyrielv/checkout-hydration-fix

Conversation

@tyrielv
Copy link
Copy Markdown

@tyrielv tyrielv commented Mar 6, 2026

When core_virtualfilesystem is set and a branch switch deletes entries (present in old tree, absent in new tree), deleted_entry() calls verify_absent_if_directory() with 'ce' pointing to a tree entry from traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to fail, falling through to verify_absent_1() which lstats every such path. In a VFS repo each lstat may trigger callbacks, creating placeholders. On a large repo switching between LTS releases this produces tens of thousands of placeholders that the VFS must then clean up when they are deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old) to the tree entry (ce) when core_virtualfilesystem is set. This allows the existing fast-path to work, eliminating the unnecessary lstats entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
Before: ~135s checkout, ~23k folder placeholders created
After: ~25s checkout, 0 folder placeholders created

  • This change only applies to the virtualization hook and VFS for Git.

@tyrielv tyrielv marked this pull request as ready for review March 6, 2026 21:10
derrickstolee
derrickstolee previously approved these changes Mar 9, 2026
Copy link
Copy Markdown

@derrickstolee derrickstolee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good find! I think your code is correct. I have a suggested restructuring that I think will be equivalent, but I'm guessing you wrote it that way on purpose. I'm curious to hear what you think.

Comment thread unpack-trees.c Outdated
Copy link
Copy Markdown
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very well explained, and I appreciate the well-documented test case! I have just one or two suggestions.

Comment thread t/t1093-virtualfilesystem.sh Outdated
Comment thread unpack-trees.c Outdated
@tyrielv tyrielv force-pushed the tyrielv/checkout-hydration-fix branch 2 times, most recently from f10b38c to 9562fbc Compare March 23, 2026 15:26
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such
path. In a VFS repo each lstat may trigger callbacks, creating
placeholders. On a large repo switching between LTS releases this
produces tens of thousands of placeholders that the VFS must then
clean up when they are deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry
(old) to the tree entry (ce) when core_virtualfilesystem is set.
This allows the existing fast-path to work, eliminating the
unnecessary lstats entirely.

This is safe in VFS mode because the virtual filesystem is responsible
for tracking which files are hydrated and cleaning up placeholders
when entries are removed from the index. Additionally, when
GVFS_NO_DELETE_OUTSIDE_SPARSECHECKOUT is set (always the case in VFS
repos), deleted_entry() preserves CE_SKIP_WORKTREE on the CE_REMOVE
entry and git does not unlink skip-worktree files from disk, so the
lstat result would not be acted upon anyway.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@tyrielv tyrielv force-pushed the tyrielv/checkout-hydration-fix branch from 9562fbc to aa27eec Compare March 25, 2026 17:19
Copy link
Copy Markdown
Member

@dscho dscho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! @tyrielv I'll add your Signed-off-by: trailer and remove the Co-authored-by: trailer that has a bogus email address, during the next rebase. Out of curiosity: which AI model did you use (I find that more valuable information than which AI orchestrator was used), I guess Claude Opus 4.6?

@dscho dscho merged commit 8c6cda3 into microsoft:vfs-2.53.0 Mar 26, 2026
23 checks passed
dscho added a commit that referenced this pull request Apr 12, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
dscho added a commit that referenced this pull request Apr 17, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
dscho added a commit that referenced this pull request Apr 17, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
dscho added a commit that referenced this pull request Apr 20, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
dscho added a commit that referenced this pull request Apr 20, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
dscho added a commit that referenced this pull request Apr 20, 2026
When core_virtualfilesystem is set and a branch switch deletes entries
(present in old tree, absent in new tree), deleted_entry() calls
verify_absent_if_directory() with 'ce' pointing to a tree entry from
traverse_trees(). This tree entry lacks CE_NEW_SKIP_WORKTREE because
that flag is only set on src_index entries by mark_new_skip_worktree().

The missing flag causes verify_absent_if_directory()'s fast-path to
fail, falling through to verify_absent_1() which lstats every such path.
In a VFS repo each lstat may trigger callbacks, creating placeholders.
On a large repo switching between LTS releases this produces tens of
thousands of placeholders that the VFS must then clean up when they are
deleted as part of the checkout.

Fix this by propagating CE_NEW_SKIP_WORKTREE from the index entry (old)
to the tree entry (ce) when core_virtualfilesystem is set. This allows
the existing fast-path to work, eliminating the unnecessary lstats
entirely.

Measured on a 2.8M file VFS repo (0% hydrated):
  Before: ~135s checkout, ~23k folder placeholders created
  After:  ~25s checkout, 0 folder placeholders created

* [x] This change only applies to the virtualization hook and VFS for
Git.
tyrielv added a commit to tyrielv/microsoft-git that referenced this pull request May 14, 2026
When 'git checkout <tree> -- <pathspec>' updates the index with entries
from the source tree, update_some() creates a new cache entry that
unconditionally clears skip-worktree.  In a virtual filesystem repo
(core.virtualfilesystem is set), this causes checkout_entry() to
attempt unlink() on files that exist only as virtual projections with
no physical NTFS entry.  The unlink fails with ENOENT, producing
'error: unable to unlink old' messages and exit code 255.

Fix this in three places:

1. update_some(): Propagate CE_SKIP_WORKTREE from the existing index
   entry to the replacement entry when core_virtualfilesystem is set.

2. mark_ce_for_checkout_overlay(): Allow skip-worktree entries that
   have CE_UPDATE (set by update_some for tree entries) to still
   match the pathspec, so report_path_error() does not reject them.

3. checkout_worktree(): Skip checkout_entry() for entries that have
   both CE_MATCHED and CE_SKIP_WORKTREE, since the virtual filesystem
   provider will serve the correct content from the updated projection.

The index is updated to the new tree entry's OID while the working
tree write is skipped entirely.  The virtual filesystem provider
re-reads the updated index and serves the correct content on next
access.  Same approach as the CE_NEW_SKIP_WORKTREE propagation in
deleted_entry() (unpack-trees.c, PR microsoft#865) which avoids unnecessary
lstats on virtualized paths during branch switches.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
tyrielv added a commit to tyrielv/microsoft-git that referenced this pull request May 15, 2026
When 'git checkout <tree> -- <pathspec>' updates the index with entries
from the source tree, update_some() creates a new cache entry that
unconditionally clears skip-worktree.  In a virtual filesystem repo
(core.virtualfilesystem is set), this causes checkout_entry() to
attempt unlink() on files that exist only as virtual projections with
no physical NTFS entry.  The unlink fails with ENOENT, producing
'error: unable to unlink old' messages and exit code 255.

Fix this in three places:

1. update_some(): Propagate CE_SKIP_WORKTREE from the existing index
   entry to the replacement entry when core_virtualfilesystem is set.

2. mark_ce_for_checkout_overlay(): Allow skip-worktree entries that
   have CE_UPDATE (set by update_some for tree entries) to still
   match the pathspec, so report_path_error() does not reject them.

3. checkout_worktree(): Skip checkout_entry() for entries that have
   both CE_MATCHED and CE_SKIP_WORKTREE, since the virtual filesystem
   provider will serve the correct content from the updated projection.

The index is updated to the new tree entry's OID while the working
tree write is skipped entirely.  The virtual filesystem provider
re-reads the updated index and serves the correct content on next
access.  Same approach as the CE_NEW_SKIP_WORKTREE propagation in
deleted_entry() (unpack-trees.c, PR microsoft#865) which avoids unnecessary
lstats on virtualized paths during branch switches.

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyrielv@gmail.com>
dscho added a commit that referenced this pull request May 15, 2026
## Problem

`git checkout <tree> -- <pathspec>` fails with `error: unable to unlink
old '<path>': No such file or directory` and exit code 255 when the
target file is a virtual ProjFS projection that has not been
materialized as a placeholder in a GVFS-mounted repo.

Files that have been accessed (and thus have a ProjFS placeholder with a
physical NTFS entry) are unaffected — `unlink()` works fine for those.
The failure is specific to files that exist only in the virtual
projection: `lstat()` succeeds (ProjFS intercepts stat calls for
projected files) but `unlink()` fails because there is no physical NTFS
entry to delete.

Even without ProjFS, the unfixed code path is wasteful: it clears
skip-worktree, writes the file to disk, and triggers ProjFS placeholder
creation — all unnecessary because the virtual filesystem provider will
serve the correct content once the index is updated.

## Root Cause

`update_some()` in `builtin/checkout.c` creates a replacement cache
entry with `create_ce_flags(0)` which unconditionally clears
skip-worktree. This causes `mark_ce_for_checkout_overlay()` to mark the
entry for checkout, and `checkout_entry_ca()` attempts `unlink()` +
`write_entry()` on a path that may only exist as a virtual projection
with no physical file.

## Fix

Three changes in `builtin/checkout.c`, all gated on
`core_virtualfilesystem` and/or `ce_skip_worktree()`:

1. **`update_some()`**: Propagate `CE_SKIP_WORKTREE` from the existing
index entry to the replacement entry when `core_virtualfilesystem` is
set. The index is updated with the new tree entry's OID while the
virtual filesystem flag is preserved.

2. **`mark_ce_for_checkout_overlay()`**: Allow skip-worktree entries
that have `CE_UPDATE` (set by `update_some` for changed tree entries) to
still match the pathspec, so `report_path_error()` does not reject them.

3. **`checkout_worktree()`**: Skip `checkout_entry()` for entries that
have both `CE_MATCHED` and `CE_SKIP_WORKTREE`. The virtual filesystem
provider will serve the correct content from the updated projection on
next access.

Non-VFS repos and hydrated files (skip-worktree cleared by the VFS hook)
are completely unaffected.

## Related

Same approach as the `CE_NEW_SKIP_WORKTREE` propagation in #865 which
avoids unnecessary lstats on virtualized paths during branch switches.
Both bugs stem from skip-worktree being dropped on replacement cache
entries, causing unnecessary physical filesystem operations on virtual
files.

## Tests

Two new tests in `t/t1093-virtualfilesystem.sh`:

1. **checkout path preserves skip-worktree**: Creates two commits with
different file content, enables VFS mode with 0% hydration, removes the
physical file, runs `checkout HEAD~1 -- file`, and verifies the index is
updated while no file is written to disk.

2. **restore --staged after checkout path**: Verifies that `restore
--staged --ignore-skip-worktree-bits` correctly restores the index entry
back to HEAD after a checkout-path operation with skip-worktree
preserved.

Verified that test 1 fails without the product code change (checkout
writes the file to disk) and passes with it.

* [x] This change only applies to the virtualization hook and VFS for
Git.
tyrielv pushed a commit to tyrielv/microsoft-git that referenced this pull request May 18, 2026
Detailed analysis of each finding from the original audit, tested
against a GVFS-mounted repo with ProjFS placeholder tracking.

Key findings:
- verify_uptodate_1 (2.1): confirmed but low practical impact
- merge-ort create_ce_flags(0) (1.1): only affects attr_index, not real
- stash apply (1.2): correctly scoped to stash-modified files
- verify_absent_1 (2.2): already protected by PR microsoft#865
- unlink_entry (2.4): protected by GVFS_NO_DELETE_OUTSIDE_SPARSECHECKOUT
- checkout_entry_ca (2.3): already fixed by PR microsoft#915

Assisted-by: Claude Opus 4.6
Signed-off-by: Tyrie Vella <tyvella@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants