Chores: add coding guidelines to the AGENTS.md of the prover. #2532
Chores: add coding guidelines to the AGENTS.md of the prover. #2532AlexandreBelling wants to merge 6 commits intomainfrom
Conversation
prover/AGENTS.md
Outdated
| * Document each functions, method and structure. State the behavior of the function and its invariants. Document the parameters and the returns arguments. | ||
| * Only create new structures they are worth the added maintainance cost or if they decrease the maintainance cost. | ||
| * Prefer small functions. Less than 10 lines of code when possible. | ||
| * Small, non-technical utility functions go in `./utils` |
There was a problem hiding this comment.
For me personally utils/ package is where functions go to die. Better not to implement it there when it is trivial (a la ToInt[T int | *big.Int](v T) int) and the same pattern appears several times over different packages. Otherwise if something is package-local then better just to implement it in the same package as a helper.
Imo always better to use Go std library functions if there are such (i.e. slices, maps etc.) and otherwise implement inline. Only for something very particular (a la DivCeil, DivFloor, NextPower2)
There was a problem hiding this comment.
Right, but with all the many packages that we have, I found that we ended up having way too many occurences of the same function everywhere and some of the implementation are not even correct. For me, having stuffs in "utils" is like having a way to advertise the existence of the function because everybody imports utils all the time.
There was a problem hiding this comment.
For sure, utils have a place. But imo there are also occurrences where people generify too soon something into "utils" package and later it is only used once. Now, there are over-generalized util functions and if there are too many "utils", then in the end it becomes too crowded that I never go to find anything from there (unless its already something I know). A la https://github.com/Consensys/linea-monorepo/blob/prover/dev-small-fields/prover/utils/utils.go#L224-L248 imo are pointless, it is something which cannot be incorrectly implemented (the linter will catch if I would do it incorrectly). It is one quick example but there are imo more.
My rule of thumb would be:
- if the pattern is easy to implement incorrectly
- if the pattern is used 3+ times in code
- there doesn't exist existing std-lib functionality (slices, maps, errors)
- it allows implementing no-overhead (i.e. no additional memory allocs compared if we would implement it inline)
- do not assume utils are used outside of the project
Additionally, we should occasionally go over the utils packages, the implementations change and a lot of utils are not used anymore, so we can remove stuff from there aggressively.
There was a problem hiding this comment.
I see what you mean, and the example you give is a good one. But it also break the rule, that we should not reinvent what is already available in std. Ok, we to be more precise in the guideline.
prover/AGENTS.md
Outdated
|
|
||
| ## Code style | ||
|
|
||
| * Less package is better to avoid dependency cycles. |
There was a problem hiding this comment.
We should instruct the agents to ask us about the package structure. Right now it is too generic and I'm afraid agents wouldn't take it too seriously (not creating any new packages at all).
* Less packages is better. When implementing new functionality then propose to the user package structure. Do not define new packages without user consent.
prover/AGENTS.md
Outdated
| ## Code style | ||
|
|
||
| * Less package is better to avoid dependency cycles. | ||
| * Use external library with parcimony |
There was a problem hiding this comment.
Better to be explicit:
* never add new external dependencies to go.mod without user consent. Explain why new external dependency is better than implementing ourself. External dependencies are not prohibited, only discouraged, so propose if it makes sense
* Go standard library is always allowed
* Go experimental packages are allowed (`golang.org/x`), but notify user.
* Packages already added to `go.mod` are allowed, but notify user.
prover/AGENTS.md
Outdated
| * Use external library with parcimony | ||
| * Document each functions, method and structure. State the behavior of the function and its invariants. Document the parameters and the returns arguments. | ||
| * Only create new structures they are worth the added maintainance cost or if they decrease the maintainance cost. | ||
| * Prefer small functions. Less than 10 lines of code when possible. |
There was a problem hiding this comment.
Imo doesn't make sense, the 10-line limit is arbitrary and too small for any practical purpose. I personally prefer having larger functions if it makes sense. Rather say:
* if several functions overlap in functionality, then refactor the overlapping part into a separate function. If you add new function which overlaps with existing functionality then consider refactoring. Do not refactor existing functions without explicit user conset.
There was a problem hiding this comment.
Yeah, right the 10 line limit is off. Maybe, we could say.
Prefer small functions. If you can factor out a common functionality from several overlapping functions, do it but the behavior of the function should be easy to explain.
prover/AGENTS.md
Outdated
| * Only create new structures they are worth the added maintainance cost or if they decrease the maintainance cost. | ||
| * Prefer small functions. Less than 10 lines of code when possible. | ||
| * Small, non-technical utility functions go in `./utils` | ||
| * Use hard assertions a lot. |
There was a problem hiding this comment.
Add concrete examples:
- when we assume inputs are of same length, assert
- when we assume something is in range, assert
- when we access slices, check the slices has capacity
- etc
- for assertions use Go expressions `if len(A) != len(B) { utils.Panicf("len(A) != len(B)") }.
There was a problem hiding this comment.
I'd prefer to keep it concise because this file also eats the LLM context space. Perhaps, we could still make it more precise by stating that: "whenever an implicit invariant, pre-condition, post-condition is testable then it should be asserted (non-nilness, slice size)."
prover/AGENTS.md
Outdated
| * Error out only if failures are acceptable or expectable. If not, panic. | ||
| * Comment explains implementations key considerations. Use with parcimony. Don't paraphrase the code. | ||
| * Use docs.go for general package-level user documentations. Keep it short. Less than 1000 characters. | ||
| * Use Readme.<name>.md files for package-level maintainer documentation. Keep it less than 5000 character per file. |
There was a problem hiding this comment.
The 5000 character per files restriction is too strict. Better to be explicit. When I'm asking for some method description (last I asked about degree reduction), then it goes easily over 5000. Rather - let the user review the documentation.
There was a problem hiding this comment.
And we should also say that we should keep the documentation up to date:
- check if the current package/file/function is described in any documentation (README..., docs.go). If so, then ensure you update it according with the method implementation.
There was a problem hiding this comment.
For the Readme. I think it is better to keep it short because the agent will read it whenever it goes across the package and it will end up eating their context space super fast. If we force him to work with small pieces of context then it helps it a lot. The coding file can be big, but the overall doc for the degree reduction does not need to be big. Just a good summary explaining what it does and the algo involved to get the LLM started on the package/file.
But maybe we could maintain "for-human" documentation separately.
There was a problem hiding this comment.
Oh - I don't think agents need READMEs. In my experience, they work a lot better looking at the code and then figuring out themselves how the things work. Unless it is something very obscure, but imo for 90% cases it is sufficient they look at the code. Otherwise if we have accompanying READMEs then they just get stale at some point and the agents have to do 2x the work (first read README, then look at the code, understand there is a mismatch and then just look at the implementation directly).
So, for me, docs.go and README.<>.md are for humans. Anything useful for agents can be in the code itself (comments etc.), imo it would reduce the context they need as it is usually more dense than README.
There was a problem hiding this comment.
Having done a few experiments. I found that giving them high-level context before letting them dive in the code resulted in them getting faster through the code. When using them for debugging, I found it made a very stark difference. But I guess we can give up the "Readme" is for agent and find another solution for agents.
Maybe a .gitignore-d file per package where they compact their reading of a package and we make them use it as a cache for their context.
prover/AGENTS.md
Outdated
| * Use docs.go for general package-level user documentations. Keep it short. Less than 1000 characters. | ||
| * Use Readme.<name>.md files for package-level maintainer documentation. Keep it less than 5000 character per file. | ||
| * If you want to document more stuffs, factor out the documentation in more Readme files but keep them shorts. | ||
| * Use new structures and interfaces with parcimony. Extend existing one when possible. |
There was a problem hiding this comment.
Imo, for performance:
- if allocating much from heap (1MB+) for returning, then allow providing the destination as a parameter. Only allocate on heap when the result doesn't fit or dst is nil.
Bad:
func double(x []int) []int {
res := make([]int, len(x))
for i := range x {
res[i] = 2*x[i]
}
return res
}Good:
func double(dst, x []int) []int {
if len(dst) < len(x) {
dst = append(dst, len(x)-len(dst))
}
for i := range x {
res[i] = 2*x[i]
}
return dst
}There was a problem hiding this comment.
I don't disagree with that pattern on hot path where performance is a concern but I am a little bit afraid of what would come out of this advice is this get applied systematically. Perhaps, we should craft an optimization playbook separately where we consign our general experience at optimizing go code and let the agent crawl the code, profile and optimize whatever it can optimize. That would be a fun experiment.
There was a problem hiding this comment.
Yup, I think an optimization playbook would be good. Perhaps in a different section with instruction:
* these are common optimization patterns. Do not optimize early, if you see an obvious performance regression then notify user to apply any of the techniques.
* otherwise, keep a note `// XXX(agents): ` for a future optimization round
prover/AGENTS.md
Outdated
| * Use docs.go for general package-level user documentations. Keep it short. Less than 1000 characters. | ||
| * Use Readme.<name>.md files for package-level maintainer documentation. Keep it less than 5000 character per file. | ||
| * If you want to document more stuffs, factor out the documentation in more Readme files but keep them shorts. | ||
| * Use new structures and interfaces with parcimony. Extend existing one when possible. |
There was a problem hiding this comment.
Another one for performance:
- for parallel computation use
parallel.Execute. Assume 100+ CPUs. - in compute-heavy parts avoid using channels and locks for single work scheduling due to overhead. Try to prepare uniform-shaped work chunks to run using
parallel.Execute. For scheduling/work shaping you can useruntime.GOMAXPROCS(0)
| - CI workflow: `.github/workflows/prover-testing.yml` | ||
| - Static checks run first (gofmt, golangci-lint) | ||
| - Compressor tests run separately from main test suite | ||
| - 30-minute timeout for test suite |
There was a problem hiding this comment.
We should add:
- ensure new functions are unit tested. If new function is part of larger computation pipeline, then ensure we have integration unit-tests.
- unit tests should be useful - if it takes more than 30 seconds to run on user device, then make it faster (using mocking, smaller instances etc.).
- in case tests need test data which cannot be synthezised then ask user. Otherwise, synchesize test data. If test-data is more than several kilobytes, then store it in
testdata/folder.
There was a problem hiding this comment.
There is section for testing (and also security) in AGEND.md. We could add these guidelines here. I would also add coverage metrics. Though, I would let it stick to unit-testing functions functions that are testable. For larger testdata, we should instruct it to implement a testdata generator.
|
And imo - after our rounds of inputs, it is better to ask any agent to optimize |
|
I have done a pass on all these aspect:
|
I really like the instructions! Good on my side! |
This PR implements issue(s) #
Checklist
PR.
Note
Low Risk
Documentation-only updates that add conventions and guardrails; no code or behavior changes, so risk is low aside from potential process friction if guidelines are misunderstood.
Overview
Updates
prover/AGENTS.mdto replace the brief package blurb with a detailed description of the prover pipeline and to clarify theprover/directory structure.Adds extensive contributor guidelines: prover-specific safety rules, testing conventions (including mandatory circuit soundness/negative tests and fuzzing guidance), code style/dependency policies, documentation standards, explicit circuit-change prohibitions without user directive, and performance/benchmarking guidance.
Written by Cursor Bugbot for commit 57d2ec4. This will update automatically on new commits. Configure here.