You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
**Runtime overrides** using Hydra syntax for maximum flexibility. These runtime command line have the highest priority, meaning they can override any previous setting set in the config.yaml or env.yaml files.
94
+
**Runtime overrides** using Hydra syntax for maximum flexibility. These command line arguments have the highest priority and can override any settings from config.yaml or env.yaml files.
Copy file name to clipboardExpand all lines: docs/about/concepts/core-abstractions.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -10,7 +10,7 @@ Before diving into code, let's understand the three core abstractions in NeMo Gy
10
10
11
11
:::{tab-item} Model
12
12
13
-
Responses API Model servers are model endpoints that performs text inference - stateless, single-call text generation without conversation memory or orchestration. You will always have at least one Response API Model server active during training, typically known as the "policy" model.
13
+
Responses API Model servers are stateless model endpoints that perform single-call text generation without conversation memory or orchestration. During training, you will always have at least one active Responses API Model server, typically called the "policy" model.
14
14
15
15
**Available Implementations:**
16
16
-`openai_model`: Direct integration with OpenAI's Responses API
@@ -22,7 +22,7 @@ Responses API Model servers are model endpoints that performs text inference - s
22
22
23
23
:::{tab-item} Resources
24
24
25
-
Resources servers provide tools implementations that can be invoked via tool calling and verification logic that measure task performance. NeMo Gym contains a variety of NVIDIA and communitycontributed resources servers that you may wish to utilize during training. We also have tutorials on how to add your own Resource server.
25
+
Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
26
26
27
27
**Resources Provide**
28
28
-**Tools**: Functions agents can call (e.g., `get_weather`, `search_web`)
@@ -50,7 +50,7 @@ Responses API Agent servers orchestrate the interaction between models and resou
50
50
- Handle multi-turn conversations
51
51
- Format responses consistently
52
52
53
-
An agent can also be referred to as a "training environment." NeMo Gym contains several training environment patterns that cover a variety of scenarios including multi-step, multi-turn, or user modeling scenarios.
53
+
Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
54
54
55
55
**Examples:**
56
56
-`simple_agent`: Basic agent that coordinates model calls with resource tools
Copy file name to clipboardExpand all lines: docs/about/concepts/task-verification.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -6,17 +6,17 @@
6
6
7
7
## What is Verification?
8
8
9
-
Every resource server in NeMo Gym has a `verify()` function that **measure task performance**. The purpose of this function is to define how to measure how well that task was accomplished.
9
+
Every resource server in NeMo Gym implements a `verify()` function that returns a reward value for task performance.
10
10
11
-
**The Problem**: When you ran the weather example in the quickstart, it successfully called the tool and gave a response. But was that response *good*? Should the model be rewarded or penalized for that behavior? Without verification, there's no way to measure improvement.
11
+
**The Problem**: When you ran the weather example in the quickstart, the agent successfully called the tool and provided a response. But was that response *good*? Should the model be rewarded or penalized? Without verification, you cannot measure performance or guide improvement.
12
12
13
13
**The Solution**: Each resource server must define exactly what "good performance" means for its domain.
14
14
15
15
## Why Verification Matters
16
16
17
17
**Tool Execution ≠ Good Performance**
18
18
19
-
- The right tool call was issued i.e.`get_weather("San Francisco")`
19
+
- The right tool call was issued, e.g.,`get_weather("San Francisco")`
20
20
- But was helpful advice given? Was the response accurate? Was it efficient?
21
21
- Verification answers these questions with numerical scores
**NeMo Gym's Role**: Within this ecosystem, Gym focuses specifically on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic, making it practical to generate large-scale, high-quality training data that feeds into NeMo RL and other training frameworks.
26
+
**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks.
Copy file name to clipboardExpand all lines: docs/about/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -11,6 +11,6 @@ NeMo Gym generates training data for reinforcement learning by capturing how AI
11
11
12
12
Three components work together to generate and evaluate agent interactions:
13
13
14
-
-**Agents**: Orchestrate multi-turn interactions between models and resources. Handle conversation flow, tool routing, and response formatting.
15
-
-**Models**: LLM inference endpoints (OpenAI-compatible or vLLM). Handle single-turn text generation and tool-calling decisions.
16
-
-**Resources**: Provide tools (functions agents call) + verifiers (logic to score performance). Examples: math environments, code sandboxes, web search.
14
+
-**Agents**: Orchestrate multi-turn interactions between models and resources, handling conversation flow, tool routing, and response formatting.
15
+
-**Models**: LLM inference endpoints (OpenAI-compatible or vLLM) that handle single-turn text generation and tool-calling decisions.
16
+
-**Resources**: Provide tools (functions agents call) and verifiers (logic to score performance). Examples include math environments, code sandboxes, and web search.
Copy file name to clipboardExpand all lines: docs/get-started/index.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -4,12 +4,12 @@
4
4
5
5
**Estimated Time**: 25-30 minutes
6
6
7
-
This guided tutorial experience is designed for those brand new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
7
+
This guided tutorial is designed for users new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
8
8
9
9
**By the end of this tutorial series, you will have:**
10
10
11
-
✅ A working NeMo Gym installation with servers running
12
-
✅ Ability to generate rollouts for RL training
11
+
✅ A working NeMo Gym installation with servers running
12
+
✅ The ability to generate rollouts for RL training
13
13
14
14
## Before You Start
15
15
@@ -23,7 +23,7 @@ Make sure you have these prerequisites ready before beginning the tutorials:
23
23
24
24
## Tutorial Path
25
25
26
-
Follow these four tutorials in sequence to build your first AI agent from scratch:
26
+
Follow these tutorials in sequence to build your first AI agent from scratch:
Copy file name to clipboardExpand all lines: docs/how-to-faq.md
+4-4Lines changed: 4 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,7 +1,7 @@
1
-
# How-To's and FAQ's
1
+
# How-Tos and FAQs
2
2
3
3
:::{warning}
4
-
This document is a smattering of How-To's and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
4
+
This document is a collection of How-Tos and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
5
5
:::
6
6
7
7
# How To: Run tests for simple agent
@@ -851,7 +851,7 @@ TODO @bxyu-nvidia: expand on this later.
851
851
852
852
# FAQ: NeMo Gym what CI/CD do I need to pass?
853
853
854
-
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of the mare not.
854
+
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of them are not.
855
855
856
856
For the majority of PRs, there are 5 checks that need to pass:
857
857
1. DCO
@@ -865,7 +865,7 @@ Examples of PR checks that most PRs do not need to wait for to pass:
865
865
2. CICD NeMo / Nemo_CICD_Test (push)
866
866
...
867
867
868
-
# FAQ: Why aiohttp backend and not httpx/httpcore for async http?
868
+
# FAQ: Why use aiohttp backend instead of httpx/httpcore for async http?
869
869
870
870
TL;DR: httpx is O(n^2) runtime where n is the number of queued requests (i.e. for each request, we check all other queued requests). This is terribly inefficient and results in major slowdowns.
Copy file name to clipboardExpand all lines: docs/index.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,9 +2,9 @@
2
2
3
3
# NeMo Gym Documentation
4
4
5
-
NeMo Gym is a framework for building reinforcement learning (RL) training environments large language models (LLMs). Gym provides training environment development scaffolding and training environment patterns such as multi-step, multi-turn, and user modeling scenarios.
5
+
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
6
6
7
-
At the core of NeMo Gym are three server concepts: **Responses API Model servers**are model endpoints, **Resources servers** contain tool implementations and verification logic, and **Response API Agent servers** orchestrate the interaction between models and resources.
7
+
NeMo Gym has three core server types: **Responses API Model servers**provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
Copy file name to clipboardExpand all lines: docs/tutorials/index.md
+3-3Lines changed: 3 additions & 3 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,7 +5,7 @@
5
5
Hands-on learning experiences that guide you through building, training, and deploying AI agents with NeMo Gym.
6
6
7
7
:::{tip}
8
-
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial experience from installation through your first verified agent. Return here after completing those tutorials to learn about advanced topics like additional rollout collection methods and training data generation.
8
+
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial from installation through your first verified agent. Return here afterward to learn about advanced topics like additional rollout collection methods and training data generation.
9
9
:::
10
10
---
11
11
@@ -27,9 +27,9 @@ Transform rollouts into training data for supervised fine-tuning (SFT) and direc
27
27
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` RL Training with NeMo RL
28
28
:link: rl-training-with-nemo-rl
29
29
:link-type: doc
30
-
Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs.
30
+
Train a model with NeMo RL. Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
0 commit comments