[RB] Optimize git operations run by the CLI#11432
Conversation
There was a problem hiding this comment.
Pull request overview
Refactors the remote bazel CLI’s git base-ref detection to avoid relying on git ls-remote --symref output when choosing the base branch/commit, aiming to reduce slow git operations during CLI runs.
Changes:
- Updates
determineDefaultBranchto executegit ls-remote --symref <remote>internally rather than parsing pre-fetched output. - Refactors
getBaseBranchAndCommitto usegit show-ref --verify refs/remotes/<remote>/<branch>andgit merge-base --is-ancestorto validate branch/commit presence. - Removes the previous shared “fetch remote data once” flow and rewires
Config()to passremoteName+defaultBranchintogetBaseBranchAndCommit.
Comments suppressed due to low confidence (3)
cli/remotebazel/remotebazel.go:258
- The doc comment for
determineDefaultBranchis inconsistent with the new signature: it still hard-codesorigineven though the function takesremoteName, and the “master).” text has mismatched backtick/paren. Please update the comment to reflect theremoteName` argument and fix the formatting typo.
// determineDefaultBranch parses the output from `git ls-remote --symref origin`
// and returns the HEAD branch for the repo (often `main` or `master).
//
cli/remotebazel/remotebazel.go:372
- The comment above
getBaseBranchAndCommitstill refers toremoteDataandgit remote show origin, but the function now takesremoteNameanddefaultBranch. Updating this comment will prevent confusion about what inputs the function expects.
// getBaseBranchAndCommit returns the git branch and commit that the remote run
// should be based off
//
// remoteData is the output from `git remote show origin`
func getBaseBranchAndCommit(remoteName string, defaultBranch string) (branch string, commit string, err error) {
cli/remotebazel/remotebazel.go:409
- This path now returns the raw error from
getHeadCommitForLocalBranch(defaultBranch)without adding context (previously it included the branch name). Wrapping the error (or usingstatus.WrapErrorf) would make failures easier to diagnose and keep error messaging consistent.
defaultBranchCommitHash, err := getHeadCommitForLocalBranch(defaultBranch)
if err != nil {
return "", "", err
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| defaultBranch, err := determineDefaultBranch(remote.name) | ||
| if err != nil { | ||
| return nil, status.WrapError(err, "get base branch and commit") | ||
| return nil, status.WrapError(err, "get default branch") | ||
| } | ||
|
|
||
| defaultBranch, err := determineDefaultBranch(remoteData) | ||
| branch, commit, err := getBaseBranchAndCommit(remote.name, defaultBranch) | ||
| if err != nil { | ||
| log.Warnf("Failed to fetch default branch: %s", err) | ||
| return nil, status.WrapError(err, "get base branch and commit") | ||
| } |
There was a problem hiding this comment.
Config() now always calls determineDefaultBranch(remote.name), which runs git ls-remote --symref (the slow command noted in the PR). This happens even when --run_from_branch / --run_from_commit are provided and getBaseBranchAndCommit would early-return, so it can negate the intended optimization and also makes default-branch lookup failures fatal. Consider resolving the default branch lazily only when it’s actually needed (e.g. when falling back to the default branch and/or when GIT_REPO_DEFAULT_BRANCH must be populated), and preserve the previous non-fatal behavior if default-branch detection fails.
There was a problem hiding this comment.
I actually think it's worth it to always fetch the default branch, because we use it for fallback snapshot behavior. The performance wins of having a fallback snapshot probably outweigh the cost of the git operation
There was a problem hiding this comment.
Need to digest this a bit more but one thing to point out is that if remotes/origin/<branch> exists locally, that doesn't guarantee that it exists remotely (and I think it might be somewhat common for this expectation to not hold). For example if you've got multiple checkouts of the repo locally (which is becoming more common with agents and such) and you force push the branch in one repo but not the other, the local tracking branch can diverge from the remote
Does this PR handle the case where the local tracking branch diverges from the remote branch? (Maybe worth adding some tests as well)
| currentBranchExistsRemotely := branchExistsRemotely(remoteData, currentBranch) | ||
| currentBranchExistsRemotely, err := branchExistsRemotely(remoteName, currentBranch) | ||
| if err != nil { | ||
| log.Warnf("Failed to check if branch %s exists remotely. Falling back to running on default branch: %s", currentBranch, err) |
There was a problem hiding this comment.
I guess if we fail to check whether the remote branch exists, there's probably a deeper error going on (e.g. no network connection or GitHub is down?)
What do you think about just returning the error here instead?
There was a problem hiding this comment.
This can fail if there is no local tracking for the branch, for example if there's a shallow clone
| // | ||
| // remoteData is the output from `git remote show origin` | ||
| func getBaseBranchAndCommit(remoteData string) (branch string, commit string, err error) { | ||
| func getBaseBranchAndCommit(remoteName string, defaultBranch string) (branch string, commit string, err error) { |
There was a problem hiding this comment.
The func comment is outdated now that remoteData is no longer a param
Also I'm a little confused what this means
// getBaseBranchAndCommit returns the git branch and commit that the remote run
// should be based off
I'm confused what it means for the remote run to be "based off" a commit - is it meant to be the merge-base commit between the default branch and the current branch tip?
There was a problem hiding this comment.
Updated the comment - LMK if that's clearer
cli/remotebazel/remotebazel.go
Outdated
| regex := fmt.Sprintf("\\brefs/heads/%s\\b", regexp.QuoteMeta(branch)) | ||
| re := regexp.MustCompile(regex) | ||
| return re.MatchString(remoteData) | ||
| func branchExistsRemotely(remoteName string, branch string) (bool, error) { |
There was a problem hiding this comment.
A better name would be "branchIsTrackedRemotely" I think, since we're no longer checking whether the branch actually exists remotely (we're just looking at the local tracking branch)
For example if you've got multiple checkouts of the repo and you've force-pushed the branch or something, the local origin/ refs won't match the remote anymore.
cli/remotebazel/remotebazel.go
Outdated
| return match[1], nil | ||
| } | ||
| return "", status.NotFoundErrorf("failed to get HEAD commit for remote branch %s from:\n%s", branch, remoteData) | ||
| func commitExistsRemotely(remoteName, branch, commit string) bool { |
There was a problem hiding this comment.
similar to the above func I think a better name is commitIsAncestorInRemoteTrackingBranch or something like that, since it doesn't actually check the remote
Hmm that's a good point that local could diverge from remote. In this logic, we're checking to see if refs exist remotely so that the remote runner can fetch them. This divergence would only be an issue if something else had deleted the remote ref, which feels more rare The |
git ls-remote --symref origintypically takes 3-4s on my local machine.Refactor the code in
getBaseBranchAndCommitto not rely on the output from that command.The two commands it's replaced with only take a couple ms each:
git show-ref --verify refs/remotes/origin/Xgit merge-base --is-ancestor HEAD refs/remotes/origin/XIn this follow up PR I add caching of the default branch, to reduce how often we have to run the
ls-remotecommand to fetch the default branch