Skip to content

Commit c015b72

Browse files
bxyu-nvidiaabhibha-nvidia
authored andcommitted
Docs update (#47)
Address #43 and #45. Thank you to @xinyu-dev for the raises! --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com>
1 parent ebfa3db commit c015b72

File tree

3 files changed

+37
-4
lines changed

3 files changed

+37
-4
lines changed

CONTRIBUTING.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,34 @@
11
# Contributing To NeMo-Gym
22

3+
## Quality control
4+
A checklist for all verifier data to be submitted to Nemo Gym. Please follow this pipeline before submitting a merge request.
5+
6+
1. Necessary information to be included in the merge request:
7+
1. Corresponding dataset on the spreadsheet.
8+
2. Description of the prompt. What is the source of the prompt, which domain is it covering?
9+
3. Description of the environment, if there is any.
10+
4. Description of the verifier. How is it verified and whether we have checked the correctness of the verifier.
11+
5. Legal approval status? If synthetically generated by ourselves with open models, please note there so that we know we don’t need legal approval.
12+
2. Simple correctness check: After finishing implementing your own resources_servers (and/or your own customized code for more complicated tasks), please follow the guideline here to run the server, query OpenAI gpt-4o model (or any other model you like) and get 5 example rollouts and corresponding rewards there. Please include in your PR merge request:
13+
1. The command you used to run the server for the uploaded data
14+
2. The resulting rollout and judges (include 5 examples here for people to understand better the data samples, and to ensure reward here is correct.)
15+
3. Other additional notes for running the server properly with the new PR.
16+
3. Test: Please follow the guideline here to implement your own test and run test for your environment. Tests are strongly encouraged and you must have at least one test for every server you make. Test coverage is not explicitly required which means that YOU ARE RESPONSIBLE FOR YOUR OWN SERVER CORRECTNESS AND FUNCTION.
17+
4. Reward Profiling: Please run inference on your prompts and environments (a ~500 small subset is OK) on two models:
18+
1. Qwen 3 30B A3B
19+
2. Qwen 3 235B Instruct (if that’s for agent / agentic coding / instruction following / game environments) or Qwen 3 235B Thinking (if math / competition coding)
20+
3. Generate 16 responses for each prompt, and report the reward distribution (percent of all correct, all incorrect, and mixture of correct & incorrect prompts there).
21+
4. [If using tool calling] Please also provide metrics around the number of tool calls issued on average per prompt in the environment, and the correlation of the reward with the number of tool calls.
22+
5. [After Nemo Gym + Nemo RL integration is done] Training-based correctness check: Please train on the following models with GRPO and include both training accuracy curve and test benchmark accuracy curve:
23+
1. Qwen 30B A3B Instruct
24+
2. [With more compute available] Qwen 235B Instruct
25+
6. [PR Check and Review] Please assign another person in your team for reproducing and reviewing the PRs once it’s ready. The person for review needs to
26+
1. Verify the content for all the above 1-5 steps
27+
2. Check the correctness of the 5 examples
28+
3. Re-run the procedure provided in README to ensure one can generate the same dataset
29+
4. After the person confirms reproduction and gives greenlight on the PR, please ping @banghuaz-nvidia @bxyu-nvidia.
30+
31+
332
## Signing Your Work
433

534
* We require that all contributors "sign-off" on their commits. This certifies that the contribution is your original work, or you have rights to submit it under the same license, or a compatible license.

README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -578,6 +578,12 @@ ng_collect_rollouts +agent_name=multineedle_simple_agent \
578578
+num_samples_in_parallel=null
579579
```
580580

581+
The supported parameters include:
582+
- `limit`: Limits how many examples from the input JSONL file to process
583+
- `num_repeats`: Repeats each input example multiple times to collect multiple rollouts per example
584+
- `num_samples_in_parallel`: Controls how many rollout collection requests run concurrently
585+
586+
581587
View the rollouts just collected!
582588
```
583589
ng_viewer +jsonl_fpath=results/multineedle_rollout_collection.jsonl

resources_servers/simple_weather/create_examples.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@
1414
import json
1515

1616
from nemo_gym.openai_utils import (
17-
NeMoGymEasyInputMessageParam,
17+
NeMoGymEasyInputMessage,
1818
NeMoGymResponseCreateParamsNonStreaming,
1919
)
2020

@@ -57,9 +57,7 @@
5757
example_strs = []
5858
for query in queries:
5959
example = base_response_create_params.model_copy(
60-
update={
61-
"input": base_response_create_params.input + [NeMoGymEasyInputMessageParam(role="user", content=query)]
62-
}
60+
update={"input": base_response_create_params.input + [NeMoGymEasyInputMessage(role="user", content=query)]}
6361
)
6462
example_strs.append(json.dumps({"responses_create_params": example.model_dump(exclude_unset=True)}) + "\n")
6563

0 commit comments

Comments
 (0)