Skip to content

[RB] Optimize git operations run by the CLI#11432

Merged
maggie-lou merged 2 commits intomasterfrom
rb_opt
Mar 10, 2026
Merged

[RB] Optimize git operations run by the CLI#11432
maggie-lou merged 2 commits intomasterfrom
rb_opt

Conversation

@maggie-lou
Copy link
Collaborator

@maggie-lou maggie-lou commented Feb 26, 2026

git ls-remote --symref origin typically takes 3-4s on my local machine.

Refactor the code in getBaseBranchAndCommit to not rely on the output from that command.
The two commands it's replaced with only take a couple ms each:
git show-ref --verify refs/remotes/origin/X
git merge-base --is-ancestor HEAD refs/remotes/origin/X

In this follow up PR I add caching of the default branch, to reduce how often we have to run the ls-remote command to fetch the default branch

@maggie-lou maggie-lou changed the title [RB] Optimize git operations [RB] Optimize git operations run by the CLI Feb 26, 2026
@maggie-lou maggie-lou requested a review from Copilot February 26, 2026 21:23
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors the remote bazel CLI’s git base-ref detection to avoid relying on git ls-remote --symref output when choosing the base branch/commit, aiming to reduce slow git operations during CLI runs.

Changes:

  • Updates determineDefaultBranch to execute git ls-remote --symref <remote> internally rather than parsing pre-fetched output.
  • Refactors getBaseBranchAndCommit to use git show-ref --verify refs/remotes/<remote>/<branch> and git merge-base --is-ancestor to validate branch/commit presence.
  • Removes the previous shared “fetch remote data once” flow and rewires Config() to pass remoteName + defaultBranch into getBaseBranchAndCommit.
Comments suppressed due to low confidence (3)

cli/remotebazel/remotebazel.go:258

  • The doc comment for determineDefaultBranch is inconsistent with the new signature: it still hard-codes origin even though the function takes remoteName, and the “master).” text has mismatched backtick/paren. Please update the comment to reflect the remoteName` argument and fix the formatting typo.
// determineDefaultBranch parses the output from `git ls-remote --symref origin`
// and returns the HEAD branch for the repo (often `main` or `master).
//

cli/remotebazel/remotebazel.go:372

  • The comment above getBaseBranchAndCommit still refers to remoteData and git remote show origin, but the function now takes remoteName and defaultBranch. Updating this comment will prevent confusion about what inputs the function expects.
// getBaseBranchAndCommit returns the git branch and commit that the remote run
// should be based off
//
// remoteData is the output from `git remote show origin`
func getBaseBranchAndCommit(remoteName string, defaultBranch string) (branch string, commit string, err error) {

cli/remotebazel/remotebazel.go:409

  • This path now returns the raw error from getHeadCommitForLocalBranch(defaultBranch) without adding context (previously it included the branch name). Wrapping the error (or using status.WrapErrorf) would make failures easier to diagnose and keep error messaging consistent.
		defaultBranchCommitHash, err := getHeadCommitForLocalBranch(defaultBranch)
		if err != nil {
			return "", "", err
		}

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +340 to 348
defaultBranch, err := determineDefaultBranch(remote.name)
if err != nil {
return nil, status.WrapError(err, "get base branch and commit")
return nil, status.WrapError(err, "get default branch")
}

defaultBranch, err := determineDefaultBranch(remoteData)
branch, commit, err := getBaseBranchAndCommit(remote.name, defaultBranch)
if err != nil {
log.Warnf("Failed to fetch default branch: %s", err)
return nil, status.WrapError(err, "get base branch and commit")
}
Copy link

Copilot AI Feb 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config() now always calls determineDefaultBranch(remote.name), which runs git ls-remote --symref (the slow command noted in the PR). This happens even when --run_from_branch / --run_from_commit are provided and getBaseBranchAndCommit would early-return, so it can negate the intended optimization and also makes default-branch lookup failures fatal. Consider resolving the default branch lazily only when it’s actually needed (e.g. when falling back to the default branch and/or when GIT_REPO_DEFAULT_BRANCH must be populated), and preserve the previous non-fatal behavior if default-branch detection fails.

Copilot uses AI. Check for mistakes.
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think it's worth it to always fetch the default branch, because we use it for fallback snapshot behavior. The performance wins of having a fallback snapshot probably outweigh the cost of the git operation

@maggie-lou maggie-lou marked this pull request as ready for review February 26, 2026 21:33
Copy link
Member

@bduffany bduffany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to digest this a bit more but one thing to point out is that if remotes/origin/<branch> exists locally, that doesn't guarantee that it exists remotely (and I think it might be somewhat common for this expectation to not hold). For example if you've got multiple checkouts of the repo locally (which is becoming more common with agents and such) and you force push the branch in one repo but not the other, the local tracking branch can diverge from the remote

Does this PR handle the case where the local tracking branch diverges from the remote branch? (Maybe worth adding some tests as well)

currentBranchExistsRemotely := branchExistsRemotely(remoteData, currentBranch)
currentBranchExistsRemotely, err := branchExistsRemotely(remoteName, currentBranch)
if err != nil {
log.Warnf("Failed to check if branch %s exists remotely. Falling back to running on default branch: %s", currentBranch, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if we fail to check whether the remote branch exists, there's probably a deeper error going on (e.g. no network connection or GitHub is down?)

What do you think about just returning the error here instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can fail if there is no local tracking for the branch, for example if there's a shallow clone

//
// remoteData is the output from `git remote show origin`
func getBaseBranchAndCommit(remoteData string) (branch string, commit string, err error) {
func getBaseBranchAndCommit(remoteName string, defaultBranch string) (branch string, commit string, err error) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The func comment is outdated now that remoteData is no longer a param

Also I'm a little confused what this means

// getBaseBranchAndCommit returns the git branch and commit that the remote run
// should be based off

I'm confused what it means for the remote run to be "based off" a commit - is it meant to be the merge-base commit between the default branch and the current branch tip?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the comment - LMK if that's clearer

regex := fmt.Sprintf("\\brefs/heads/%s\\b", regexp.QuoteMeta(branch))
re := regexp.MustCompile(regex)
return re.MatchString(remoteData)
func branchExistsRemotely(remoteName string, branch string) (bool, error) {
Copy link
Member

@bduffany bduffany Feb 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better name would be "branchIsTrackedRemotely" I think, since we're no longer checking whether the branch actually exists remotely (we're just looking at the local tracking branch)

For example if you've got multiple checkouts of the repo and you've force-pushed the branch or something, the local origin/ refs won't match the remote anymore.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack updated

return match[1], nil
}
return "", status.NotFoundErrorf("failed to get HEAD commit for remote branch %s from:\n%s", branch, remoteData)
func commitExistsRemotely(remoteName, branch, commit string) bool {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to the above func I think a better name is commitIsAncestorInRemoteTrackingBranch or something like that, since it doesn't actually check the remote

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@maggie-lou
Copy link
Collaborator Author

maggie-lou commented Mar 3, 2026

if remotes/origin/<branch> exists locally, that doesn't guarantee that it exists remotely (and I think it might be somewhat common for this expectation to not hold). For example if you've got multiple checkouts of the repo locally (which is becoming more common with agents and such) and you force push the branch in one repo but not the other, the local tracking branch can diverge from the remote

Hmm that's a good point that local could diverge from remote. In this logic, we're checking to see if refs exist remotely so that the remote runner can fetch them. This divergence would only be an issue if something else had deleted the remote ref, which feels more rare

The git ls-remote command can take up to 10s in my experience (and we have a small repo). It's a bad user experience because if you don't have debug logging on it can seem like the CLI is hanging

@maggie-lou maggie-lou requested a review from bduffany March 3, 2026 23:23
@maggie-lou maggie-lou merged commit f04d40b into master Mar 10, 2026
12 checks passed
@maggie-lou maggie-lou deleted the rb_opt branch March 10, 2026 17:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants