Skip to content

Initial implementation of incremental target graph module#74

Closed
yushan8 wants to merge 10 commits into
mainfrom
init-itg
Closed

Initial implementation of incremental target graph module#74
yushan8 wants to merge 10 commits into
mainfrom
init-itg

Conversation

@yushan8

@yushan8 yushan8 commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Why?

This change Introduces the Incremental Target Graph (ITG) module — a new core/itg package that avoids full Bazel queries by finding the nearest cached graph and applying incremental updates only to packages affected by changed files.

What?

Most of these changes are internal logic that's already been implemented in ctc. IncrementProvider getGraph incrementally calculates the cached zero_revision and the target revision (HEAD), as long as it's ITG applicable.
New: core/itg

  • itg.go — Provider.GetGraph finds the nearest cached graph via a floor-key lookup, validates the change range for supported complexity, then
    runs a single incremental update from the cache point to TargetRef using the workspace git/bazel. The base-only request (no user diffs)
    seeds baseShaGraph into the ITG cache for concurrent PR requests to reuse.
  • cache/ — Storage-backed cache keyed by (remote, commit_timestamp, sha). FloorKey binary-searches cached entries to find the closest
    ancestor graph.
  • changeanalyzer/ — Classifies the complexity of changes between two refs (NoChange, RegularFilesModificationOnly, ReparsePackagesNeeded,
    FullRecalculationNeeded) and determines whether ITG can handle the change incrementally.
  • graph/ — OptimizedGraph data structure with deduped string tables. UpdateGraph patches changed packages in-place; InvalidateTargets
    propagates source hash changes through the reverse-dep graph.
  • workspaceutils/ — Helpers to locate BUILD/BUILD.bazel files and find the containing Bazel package for a given file path.

Supporting changes

  • core/git — Added DiffWithStatus (parses git diff --name-status into []DiffEntry) and GetCommitTimeSecond (Unix timestamp for a ref) to
    Interface; fixed RevParse to trim whitespace; updated gitmock accordingly.
  • core/storage — Added List(ctx) ([]string, error) to the Storage interface; implemented in memstorage, disk, and storagemock; updated test
    stubs.
  • core/workspace — Minor adjustments to NewRequest signature and GitRequest.

Test Plan

unit test
Will follow up with manual testing after implementing native graph runner to invoke ITG.

Issue

@yushan8 yushan8 changed the title Init itg Initial implementation fo incremental target graph module Apr 16, 2026
@yushan8 yushan8 marked this pull request as ready for review April 16, 2026 20:14
@yushan8 yushan8 requested review from a team as code owners April 16, 2026 20:14
@yushan8 yushan8 changed the title Initial implementation fo incremental target graph module Initial implementation of incremental target graph module Apr 16, 2026
Comment thread core/itg/itg.go
// SeedCache stores a full bazel query result in the ITG cache so future incremental
// requests can use it as a base. It should only be called when the query was run at
// a pure main-branch commit (no user diffs applied), i.e. HEAD == BaseSha.
func (p *Provider) SeedCache(ctx context.Context, req SeedCacheRequest) error {

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only needed in the case we do a full query from a base commit and the changes are not ITG applicable. This allows future requests to find the zero_revision graph and calculate incrementally.

Comment thread core/itg/graph/graph.go
Comment on lines +93 to +95
ReverseDeps IntSet `json:"reverseDeps"`
Tags []int `json:"tagIDs"`
Root bool `json:"root"`

@yushan8 yushan8 Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main differences between OptimizedTarget and https://github.com/uber/tango/blob/main/tangopb/tango.pb.go#L325 is:

  • ReverseDeps — Stores the reverse dependencies of the target. Used heavily in invalidate.go to propagate hash invalidation up the dep graph.
  • HashWithoutDeps — used during re-hashing to separate a target's own content hash from the hash of its transitive deps, enabling incremental
    invalidation without re-querying bazel.
    We can look into convering the data models later but ideally we don't want to store the ReverseDeps on every targetgraph as this can blow up the graph by almost 2x the memory. We would also need to store everything as an IntSet for faster lookup when incrementally recomputing the target graph.
    For now we can keep it as is for feature support and re-evaluate the data model.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in later calculateGraphIncrementally in core/itg/itg.go, the targetGraph is still of type*graph.OptimizedGraph, so these 2 fields are still serialized and stored in cache?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are, but we only store them if the request only contains the baseCommit (meaning it's a commit that's on the main branch). We need some cached graph in the *graph.OptimizedGraph format to calculate the target graph incrementally

Comment thread core/itg/graph/graph.go
Comment on lines +93 to +95
ReverseDeps IntSet `json:"reverseDeps"`
Tags []int `json:"tagIDs"`
Root bool `json:"root"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but in later calculateGraphIncrementally in core/itg/itg.go, the targetGraph is still of type*graph.OptimizedGraph, so these 2 fields are still serialized and stored in cache?

Comment thread core/itg/itg.go
// GetGraph incrementally computes the full target graph for the given request.
// It finds the nearest cached graph and applies incremental updates from that
// cache point to TargetRef using the workspace git and bazel.
func (p *Provider) GetGraph(ctx context.Context, req GetGraphRequest) (GetGraphResult, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

previous comment got me to realize that ITG graph and Tango graph are not interchangeable. This means for already computed tango graph, we cannot use it for next ITG calls. But for already computed ITG graphs, we can potentially call optimizedGraphToProto immeditately to convert to tango graph and return? Is it possible?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes for already computed ITG graphs, we'll convert it to tango graph and return it.
We can actually convert tango graphs -> ITG graph too. This is the case if the changes from the cached ITG graph -> target ref is not ITG applicable (requires full bazel query). We'd want to do a full bazel query and upload the ITG graph so future requests can use the ITG graph to calculate incrementally.

Comment thread core/itg/itg.go
}

return GetGraphResult{
TargetRefGraph: optimizedGraphToProto(targetGraph),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we store this result in tango cache?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup we intend on calling ITG in orchestrator and caching the result

@yushan8

yushan8 commented Apr 16, 2026

Copy link
Copy Markdown
Contributor Author

Closing to split the changes into separate PRs

@yushan8 yushan8 closed this Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants