Skip to content

update-index: add --refresh-stat-only#2125

Open
giorgidze wants to merge 2 commits into
gitgitgadget:masterfrom
giorgidze:master
Open

update-index: add --refresh-stat-only#2125
giorgidze wants to merge 2 commits into
gitgitgadget:masterfrom
giorgidze:master

Conversation

@giorgidze
Copy link
Copy Markdown

This two-patch series adds "git update-index --refresh-stat-only", a
one-shot way to update the index's cached stat data to match the
current filesystem without rehashing file contents.

When a working tree is produced or restored by means other than a
normal checkout -- a CI cache restore, container provisioning, a
tarball extraction, or a copy from another machine -- the files may
be byte-for-byte identical while filesystem-local stat fields like
inode and device numbers no longer match. Today the available
workarounds are to (a) pay for full content rehashing on the next
"git status", or (b) set core.checkStat=minimal, which is sticky
and weakens every subsequent operation. Neither composes well with
modern container-based CI, where every job step would otherwise
need to set and preserve the configuration.

A similar idea ("--assume-content-unchanged") was discussed in
January 2017; see the thread starting at
20170105112359.GN8116@chrystal.oracle.com. The concern raised
there was that exposing a way to update cached stat data without
content comparison opens the index to abuse. The flag in this
series is deliberately narrower than the 2017 proposal:

  • one-shot action, not a sticky config or per-entry bit;
  • the name describes what the invocation does, not a trust state
    attached to entries (contrast --assume-unchanged);
  • intended for closed-loop callers (CI cache restore, container
    provisioning, backup/restore tooling) that produced or verified
    the worktree atomically;
  • the failure mode -- stale object IDs becoming invisible until
    the next content check -- is named directly in the docs, and
    the flag must be typed explicitly.

The series is organised so the bug fix is reviewable on its own:

1/2 preload-index: respect --really-refresh override of
assume-unchanged

   A latent bug observable today via GIT_TEST_PRELOAD_INDEX=1:
   preload_thread() never sets CE_MATCH_IGNORE_VALID, so it
   honours the "assume unchanged" bit and marks modified
   assume-unchanged entries as uptodate before the
   --really-refresh loop sees them. Plumb refresh flags
   through to preload threads and add a regression test under
   t2106.

2/2 update-index: add --refresh-stat-only

   Add the new flag, extend the preload mask to also recognise
   REFRESH_STAT_ONLY, document the assume-unchanged override
   behaviour alongside the flag, and add coverage for object
   ID preservation, missing-file handling (with and without
   --ignore-missing), assume-unchanged override, and quiet
   output under t2109.

CC: Junio C Hamano gitster@pobox.com

giorgidze added 2 commits May 24, 2026 23:01
When refresh_index() is invoked with REFRESH_REALLY (e.g. via
"git update-index --really-refresh"), the documented behaviour is that
the "assume unchanged" bit on cache entries is disregarded so that
stale stat data on those entries is still refreshed.

The preload pass runs before the single-threaded refresh loop and is
intended to mark up-to-date entries quickly so the slow path only has
to deal with the leftovers. However, preload_thread() unconditionally
called ie_match_stat() with CE_MATCH_RACY_IS_DIRTY|CE_MATCH_IGNORE_FSMONITOR
and never with CE_MATCH_IGNORE_VALID, so it honoured the "assume
unchanged" bit. When a modified file's entry was marked
assume-unchanged, preload would conclude the entry was clean and call
ce_mark_uptodate(); the subsequent --really-refresh loop would then
skip the entry (because ce_uptodate(ce) is true) and never report it
as needing an update.

This only manifests when preload is active, so it has been latent in
default configurations. It is observable today via GIT_TEST_PRELOAD_INDEX=1.

Plumb the refresh flags through to the preload threads via a new
refresh_flags field on struct thread_data, and have preload_thread()
add CE_MATCH_IGNORE_VALID to its match options when REFRESH_REALLY is
in effect. Update refresh_index() to pass "flags & REFRESH_REALLY" to
preload_index() instead of a bare 0.

Add a regression test under t2106 that forces preload on and confirms
that "update-index --really-refresh" reports a modified
assume-unchanged entry as needing update.

Signed-off-by: George Giorgidze <giorgidze@meta.com>
When a working tree is copied from another machine, or restored from
a tarball, container image, or CI cache on the same machine, the
files may be byte-for-byte identical while cached stat data in the
index no longer matches. Backup and sync tools can preserve mtimes,
but fields like inode and device numbers are filesystem-local, so
large repositories can still end up paying for expensive refresh
checks on every "git status".

Git already has runtime configuration for reducing which stat fields
are checked, such as core.checkStat=<minimal|default>. That affects
how future checks interpret cached stat data, but it does not provide
a one-shot way to update the index's cached stat data to match the
current filesystem without also rehashing file contents. Setting
core.checkStat=minimal is "sticky": it weakens every subsequent
operation in the repository for the duration of the configuration,
rather than performing a single, bounded correction at a well-defined
point.

A similar idea was discussed on the list in January 2017 under the
name "--assume-content-unchanged"; see the thread starting at
<20170105112359.GN8116@chrystal.oracle.com>. The concern raised there
was that exposing a way to update cached stat data without content
comparison opens the index to abuse: an interactive user could skip a
slow refresh, lie to Git about the worktree, then file a bug after a
later merge corrupts a file. That concern is taken seriously here,
and this proposal is deliberately narrower than the 2017 one:

  * It is a one-shot action, not a sticky configuration or per-entry
    bit. The name --refresh-stat-only reflects that: it describes
    what the command does in a single invocation, not a trust state
    attached to entries (contrast with --assume-unchanged).

  * The trust assertion is intended for closed-loop callers (CI cache
    restore, container provisioning, backup/restore tooling) where
    the worktree and the index were produced or verified together by
    the same process. It is not a knob for interactive users to reach
    for when "git status" feels slow.

  * The failure mode is named directly in the documentation: if the
    worktree does not in fact match the index, affected entries will
    appear clean while the recorded object ID remains stale. The user
    must type the flag, having read the warning. This is a narrower
    contract than core.checkStat=minimal, which silently affects
    every subsequent operation.

Container-based CI has become the dominant deployment model in the
years since that 2017 discussion. The current workaround -- setting
core.checkStat=minimal in every job step, or accepting the cost of
full content rehashing -- is operationally fragile: it requires every
step in every pipeline to set and preserve the configuration, and it
permanently weakens stat semantics for every command those steps
run. A single explicit invocation at restore time is a tighter, more
local fix.

Teach git update-index --refresh-stat-only to refresh only cached
stat information. It follows the existing refresh machinery, but
skips ie_modified() and treats racy entries as dirty by stat instead
of resolving them by content. Like --really-refresh, it ignores the
"assume unchanged" setting, so stale stat data on those entries is
still updated; that behaviour is documented alongside the flag.

The preload pass is extended to recognise REFRESH_STAT_ONLY (on top
of REFRESH_REALLY, which was wired up in the preceding commit) so
that assume-unchanged entries are not marked uptodate before the main
refresh path can update them.

Add tests covering object ID preservation, missing-file handling with
and without --ignore-missing, assume-unchanged override, and quiet
output.

Signed-off-by: George Giorgidze <giorgidze@meta.com>
@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 25, 2026

Welcome to GitGitGadget

Hi @giorgidze, and welcome to GitGitGadget, the GitHub App to send patch series to the Git mailing list from GitHub Pull Requests.

Please make sure that either:

  • Your Pull Request has a good description, if it consists of multiple commits, as it will be used as cover letter.
  • Your Pull Request description is empty, if it consists of a single commit, as the commit message should be descriptive enough by itself.

You can CC potential reviewers by adding a footer to the PR description with the following syntax:

CC: Revi Ewer <revi.ewer@example.com>, Ill Takalook <ill.takalook@example.net>

NOTE: DO NOT copy/paste your CC list from a previous GGG PR's description,
because it will result in a malformed CC list on the mailing list. See
example.

Also, it is a good idea to review the commit messages one last time, as the Git project expects them in a quite specific form:

  • the lines should not exceed 76 columns,
  • the first line should be like a header and typically start with a prefix like "tests:" or "revisions:" to state which subsystem the change is about, and
  • the commit messages' body should be describing the "why?" of the change.
  • Finally, the commit messages should end in a Signed-off-by: line matching the commits' author.

It is in general a good idea to await the automated test ("Checks") in this Pull Request before contributing the patches, e.g. to avoid trivial issues such as unportable code.

Contributing the patches

Before you can contribute the patches, your GitHub username needs to be added to the list of permitted users. Any already-permitted user can do that, by adding a comment to your PR of the form /allow. A good way to find other contributors is to locate recent pull requests where someone has been /allowed:

Both the person who commented /allow and the PR author are able to /allow you.

An alternative is the channel #git-devel on the Libera Chat IRC network:

<newcontributor> I've just created my first PR, could someone please /allow me? https://github.com/gitgitgadget/git/pull/12345
<veteran> newcontributor: it is done
<newcontributor> thanks!

Once on the list of permitted usernames, you can contribute the patches to the Git mailing list by adding a PR comment /submit.

If you want to see what email(s) would be sent for a /submit request, add a PR comment /preview to have the email(s) sent to you. You must have a public GitHub email address for this. Note that any reviewers CC'd via the list in the PR description will not actually be sent emails.

After you submit, GitGitGadget will respond with another comment that contains the link to the cover letter mail in the Git mailing list archive. Please make sure to monitor the discussion in that thread and to address comments and suggestions (while the comments and suggestions will be mirrored into the PR by GitGitGadget, you will still want to reply via mail).

If you do not want to subscribe to the Git mailing list just to be able to respond to a mail, you can download the mbox from the Git mailing list archive (click the (raw) link), then import it into your mail program. If you use GMail, you can do this via:

curl -g --user "<EMailAddress>:<Password>" \
    --url "imaps://imap.gmail.com/INBOX" -T /path/to/raw.txt

To iterate on your change, i.e. send a revised patch or patch series, you will first want to (force-)push to the same branch. You probably also want to modify your Pull Request description (or title). It is a good idea to summarize the revision by adding something like this to the cover letter (read: by editing the first comment on the PR, i.e. the PR description):

Changes since v1:
- Fixed a typo in the commit message (found by ...)
- Added a code comment to ... as suggested by ...
...

To send a new iteration, just add another PR comment with the contents: /submit.

Need help?

New contributors who want advice are encouraged to join git-mentoring@googlegroups.com, where volunteers who regularly contribute to Git are willing to answer newbie questions, give advice, or otherwise provide mentoring to interested contributors. You must join in order to post or view messages, but anyone can join.

You may also be able to find help in real time in the developer IRC channel, #git-devel on Libera Chat. Remember that IRC does not support offline messaging, so if you send someone a private message and log out, they cannot respond to you. The scrollback of #git-devel is archived, though.

@spkrka
Copy link
Copy Markdown

spkrka commented May 25, 2026

/allow

@giorgidze
Copy link
Copy Markdown
Author

/preview

@gitgitgadget
Copy link
Copy Markdown

gitgitgadget Bot commented May 25, 2026

Error: User giorgidze is not yet permitted to use GitGitGadget

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants