Conversation
| // SeedCache stores a full bazel query result in the ITG cache so future incremental | ||
| // requests can use it as a base. It should only be called when the query was run at | ||
| // a pure main-branch commit (no user diffs applied), i.e. HEAD == BaseSha. | ||
| func (p *Provider) SeedCache(ctx context.Context, req SeedCacheRequest) error { |
There was a problem hiding this comment.
This is only needed in the case we do a full query from a base commit and the changes are not ITG applicable. This allows future requests to find the zero_revision graph and calculate incrementally.
| ReverseDeps IntSet `json:"reverseDeps"` | ||
| Tags []int `json:"tagIDs"` | ||
| Root bool `json:"root"` |
There was a problem hiding this comment.
The main differences between OptimizedTarget and https://github.com/uber/tango/blob/main/tangopb/tango.pb.go#L325 is:
- ReverseDeps — Stores the reverse dependencies of the target. Used heavily in invalidate.go to propagate hash invalidation up the dep graph.
- HashWithoutDeps — used during re-hashing to separate a target's own content hash from the hash of its transitive deps, enabling incremental
invalidation without re-querying bazel.
We can look into convering the data models later but ideally we don't want to store the ReverseDeps on every targetgraph as this can blow up the graph by almost 2x the memory. We would also need to store everything as an IntSet for faster lookup when incrementally recomputing the target graph.
For now we can keep it as is for feature support and re-evaluate the data model.
There was a problem hiding this comment.
but in later calculateGraphIncrementally in core/itg/itg.go, the targetGraph is still of type*graph.OptimizedGraph, so these 2 fields are still serialized and stored in cache?
There was a problem hiding this comment.
They are, but we only store them if the request only contains the baseCommit (meaning it's a commit that's on the main branch). We need some cached graph in the *graph.OptimizedGraph format to calculate the target graph incrementally
| ReverseDeps IntSet `json:"reverseDeps"` | ||
| Tags []int `json:"tagIDs"` | ||
| Root bool `json:"root"` |
There was a problem hiding this comment.
but in later calculateGraphIncrementally in core/itg/itg.go, the targetGraph is still of type*graph.OptimizedGraph, so these 2 fields are still serialized and stored in cache?
| // GetGraph incrementally computes the full target graph for the given request. | ||
| // It finds the nearest cached graph and applies incremental updates from that | ||
| // cache point to TargetRef using the workspace git and bazel. | ||
| func (p *Provider) GetGraph(ctx context.Context, req GetGraphRequest) (GetGraphResult, error) { |
There was a problem hiding this comment.
previous comment got me to realize that ITG graph and Tango graph are not interchangeable. This means for already computed tango graph, we cannot use it for next ITG calls. But for already computed ITG graphs, we can potentially call optimizedGraphToProto immeditately to convert to tango graph and return? Is it possible?
There was a problem hiding this comment.
Yes for already computed ITG graphs, we'll convert it to tango graph and return it.
We can actually convert tango graphs -> ITG graph too. This is the case if the changes from the cached ITG graph -> target ref is not ITG applicable (requires full bazel query). We'd want to do a full bazel query and upload the ITG graph so future requests can use the ITG graph to calculate incrementally.
| } | ||
|
|
||
| return GetGraphResult{ | ||
| TargetRefGraph: optimizedGraphToProto(targetGraph), |
There was a problem hiding this comment.
can we store this result in tango cache?
There was a problem hiding this comment.
yup we intend on calling ITG in orchestrator and caching the result
|
Closing to split the changes into separate PRs |
Why?
This change Introduces the Incremental Target Graph (ITG) module — a new core/itg package that avoids full Bazel queries by finding the nearest cached graph and applying incremental updates only to packages affected by changed files.
What?
Most of these changes are internal logic that's already been implemented in ctc. IncrementProvider getGraph incrementally calculates the cached zero_revision and the target revision (HEAD), as long as it's ITG applicable.
New: core/itg
runs a single incremental update from the cache point to TargetRef using the workspace git/bazel. The base-only request (no user diffs)
seeds baseShaGraph into the ITG cache for concurrent PR requests to reuse.
ancestor graph.
FullRecalculationNeeded) and determines whether ITG can handle the change incrementally.
propagates source hash changes through the reverse-dep graph.
Supporting changes
Interface; fixed RevParse to trim whitespace; updated gitmock accordingly.
stubs.
Test Plan
unit test
Will follow up with manual testing after implementing native graph runner to invoke ITG.
Issue