Add product-focused blog post on running RB with agents#11412
Add product-focused blog post on running RB with agents#11412maggie-lou merged 2 commits intomasterfrom
Conversation
|
|
||
| For more details on the technical setup of our remote runners, see our [blog post on our Firecracker cloning architecture](https://www.buildbuddy.io/blog/fast-runners-at-scale). | ||
|
|
||
| ## Why bazel is a bad fit for coding agents |
There was a problem hiding this comment.
I don't want people to skim this and say "Bazel bad". How about
Why Bazel on a laptop is a bad fit for coding agents
or
Why local Bazel is a bad fit for coding agents
There was a problem hiding this comment.
Hahah you beat me to it - I think "bad fit" gives the wrong message. I think "bottlenecks" might be a better framing
There was a problem hiding this comment.
Ack - good call. Renamed and added a line about the benefits of using bazel with agents
siggisim
left a comment
There was a problem hiding this comment.
Lgtm, a couple of nits though
| 1. **Resource contention**: When running on local machines, developers may run multiple agents in parallel, or they themselves may be developing in tandem with agents. The cloud runners provided by most AI providers are typically resource constrained and have limited network bandwidth. Resource contention can quickly become a bottleneck, slowing down builds. | ||
| 1. **Analysis cache thrash**: When multiple agents are running builds in parallel, the analysis cache can be thrown out if build options change. AI cloud runners are often ephemeral and scoped to each task, and are reset to a clean state between tasks. This means they have to restart the analysis phase from scratch each time, which can take several minutes. | ||
| 1. **Architecture limitations**: Agents typically run builds on the same architecture as the machine they are running on. This can make it challenging to run tests on different architectures. For example, developers on Macs may want to run Linux-only tests. | ||
|
|
There was a problem hiding this comment.
There's a security / isolation angle here too
There was a problem hiding this comment.
There are a couple different ways we could take this. Could you give an example of what you're thinking of on this point?
bduffany
left a comment
There was a problem hiding this comment.
Nice! This will be a great resource.
| When a coding agent runs `bazel test //...` on a developer's laptop or a lightweight cloud dev box, several things can go wrong: | ||
|
|
||
| 1. **Network latency**: Network latency is often the biggest bottleneck in many Bazel Remote Build Execution and Remote Caching setups. Network round trips can quickly add up and hamper build times. | ||
| 1. **Resource contention**: When running on local machines, developers may run multiple agents in parallel, or they themselves may be developing in tandem with agents. The cloud runners provided by most AI providers are typically resource constrained and have limited network bandwidth. Resource contention can quickly become a bottleneck, slowing down builds. |
There was a problem hiding this comment.
Another thing probably worth mentioning is that bazel's workspace lockfile doesn't allow running multiple builds that use the same output directory (output_base), which means that if you want parallel builds then you need more complicated setups wrangling multiple output_bases, which can consume a lot of disk resources, in addition to the CPU resource contention issue already mentioned.
(This point kinda spans between Resource contention and Analysis cache thrash, since you don't get analysis cache thrashing with the multi-output-base approach, but it comes at a high disk usage + memory cost)
There was a problem hiding this comment.
Good call - added
| ### Tips | ||
|
|
||
| When looking for a warm snapshot, we try to find a runner on the same git branch to optimize performance. If no runner for that branch exists, we fall back to a runner on the repository's default branch (i.e. `main` or `master`). We recommend regularly running Remote Bazel on the default branch to ensure that a fresh snapshot is regularly generated and available as a starting point for builds on other branches. |
There was a problem hiding this comment.
I only ever run bazel on PR branches - so for me personally, I'm interpreting this as more "extra cognitive load / work that I have to remember to do," rather than "a useful tip."
Would be nice if we could remove this limitation soon and maybe match to "any snapshot" (I remember we were discussing this in slack a bit)
There was a problem hiding this comment.
Fair enough - removed
0d8f88c to
beef9b7
Compare
In my head there are 3 related but independent topics:
There's obviously some overlap, so open to feedback on the structure of this series