|
1 | 1 | # Contributing To NeMo-Gym |
2 | 2 |
|
| 3 | +## Quality control |
| 4 | +A checklist for all verifier data to be submitted to Nemo Gym. Please follow this pipeline before submitting a merge request. |
| 5 | + |
| 6 | +1. Necessary information to be included in the merge request: |
| 7 | + 1. Corresponding dataset on the spreadsheet. |
| 8 | + 2. Description of the prompt. What is the source of the prompt, which domain is it covering? |
| 9 | + 3. Description of the environment, if there is any. |
| 10 | + 4. Description of the verifier. How is it verified and whether we have checked the correctness of the verifier. |
| 11 | + 5. Legal approval status? If synthetically generated by ourselves with open models, please note there so that we know we don’t need legal approval. |
| 12 | +2. Simple correctness check: After finishing implementing your own resources_servers (and/or your own customized code for more complicated tasks), please follow the guideline here to run the server, query OpenAI gpt-4o model (or any other model you like) and get 5 example rollouts and corresponding rewards there. Please include in your PR merge request: |
| 13 | + 1. The command you used to run the server for the uploaded data |
| 14 | + 2. The resulting rollout and judges (include 5 examples here for people to understand better the data samples, and to ensure reward here is correct.) |
| 15 | + 3. Other additional notes for running the server properly with the new PR. |
| 16 | +3. Test: Please follow the guideline here to implement your own test and run test for your environment. Tests are strongly encouraged and you must have at least one test for every server you make. Test coverage is not explicitly required which means that YOU ARE RESPONSIBLE FOR YOUR OWN SERVER CORRECTNESS AND FUNCTION. |
| 17 | +4. Reward Profiling: Please run inference on your prompts and environments (a ~500 small subset is OK) on two models: |
| 18 | + 1. Qwen 3 30B A3B |
| 19 | + 2. Qwen 3 235B Instruct (if that’s for agent / agentic coding / instruction following / game environments) or Qwen 3 235B Thinking (if math / competition coding) |
| 20 | + 3. Generate 16 responses for each prompt, and report the reward distribution (percent of all correct, all incorrect, and mixture of correct & incorrect prompts there). |
| 21 | + 4. [If using tool calling] Please also provide metrics around the number of tool calls issued on average per prompt in the environment, and the correlation of the reward with the number of tool calls. |
| 22 | +5. [After Nemo Gym + Nemo RL integration is done] Training-based correctness check: Please train on the following models with GRPO and include both training accuracy curve and test benchmark accuracy curve: |
| 23 | + 1. Qwen 30B A3B Instruct |
| 24 | + 2. [With more compute available] Qwen 235B Instruct |
| 25 | +6. [PR Check and Review] Please assign another person in your team for reproducing and reviewing the PRs once it’s ready. The person for review needs to |
| 26 | + 1. Verify the content for all the above 1-5 steps |
| 27 | + 2. Check the correctness of the 5 examples |
| 28 | + 3. Re-run the procedure provided in README to ensure one can generate the same dataset |
| 29 | + 4. After the person confirms reproduction and gives greenlight on the PR, please ping @banghuaz-nvidia @bxyu-nvidia. |
| 30 | + |
| 31 | + |
3 | 32 | ## Signing Your Work |
4 | 33 |
|
5 | 34 | * We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license. |
|
0 commit comments