Skip to content

Commit e7a43f0

Browse files
committed
docs: various improvements and fixes
Signed-off-by: Ahmad Kiswani <kiswani.ahmad@gmail.com>
1 parent 061e4c2 commit e7a43f0

File tree

12 files changed

+33
-33
lines changed

12 files changed

+33
-33
lines changed

docs/about/concepts/configuration-system.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ This allows for:
1818

1919
:::{tab-item} 1. Server YAML Config Files
2020

21-
These are your base configurations that define server structures and default values. Later files override earlier files.
21+
These base configurations define server structures and default values, with later files overriding earlier ones.
2222

2323
Example: Multi-Server Configuration
2424
```bash
@@ -91,7 +91,7 @@ ng_run '+config_paths=${simple_weather_config_paths}'
9191

9292
:::{tab-item} 3. Command Line Arguments
9393

94-
**Runtime overrides** using Hydra syntax for maximum flexibility. These runtime command line have the highest priority, meaning they can override any previous setting set in the config.yaml or env.yaml files.
94+
**Runtime overrides** using Hydra syntax for maximum flexibility. These command line arguments have the highest priority and can override any settings from config.yaml or env.yaml files.
9595

9696
Basic Overrides
9797
```bash

docs/about/concepts/core-abstractions.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Before diving into code, let's understand the three core abstractions in NeMo Gy
1010

1111
:::{tab-item} Model
1212

13-
Responses API Model servers are model endpoints that performs text inference - stateless, single-call text generation without conversation memory or orchestration. You will always have at least one Response API Model server active during training, typically known as the "policy" model.
13+
Responses API Model servers are stateless model endpoints that perform single-call text generation without conversation memory or orchestration. During training, you will always have at least one active Responses API Model server, typically called the "policy" model.
1414

1515
**Available Implementations:**
1616
- `openai_model`: Direct integration with OpenAI's Responses API
@@ -22,7 +22,7 @@ Responses API Model servers are model endpoints that performs text inference - s
2222

2323
:::{tab-item} Resources
2424

25-
Resources servers provide tools implementations that can be invoked via tool calling and verification logic that measure task performance. NeMo Gym contains a variety of NVIDIA and community contributed resources servers that you may wish to utilize during training. We also have tutorials on how to add your own Resource server.
25+
Resources servers provide tool implementations that can be invoked via tool calling and verification logic that measures task performance. NeMo Gym includes various NVIDIA and community-contributed resources servers for use during training, and provides tutorials for creating your own Resource server.
2626

2727
**Resources Provide**
2828
- **Tools**: Functions agents can call (e.g., `get_weather`, `search_web`)
@@ -50,7 +50,7 @@ Responses API Agent servers orchestrate the interaction between models and resou
5050
- Handle multi-turn conversations
5151
- Format responses consistently
5252

53-
An agent can also be referred to as a "training environment." NeMo Gym contains several training environment patterns that cover a variety of scenarios including multi-step, multi-turn, or user modeling scenarios.
53+
Agents are also called "training environments." NeMo Gym includes several training environment patterns covering multi-step, multi-turn, and user modeling scenarios.
5454

5555
**Examples:**
5656
- `simple_agent`: Basic agent that coordinates model calls with resource tools

docs/about/concepts/task-verification.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -6,17 +6,17 @@
66

77
## What is Verification?
88

9-
Every resource server in NeMo Gym has a `verify()` function that **measure task performance**. The purpose of this function is to define how to measure how well that task was accomplished.
9+
Every resource server in NeMo Gym implements a `verify()` function that returns a reward value for task performance.
1010

11-
**The Problem**: When you ran the weather example in the quickstart, it successfully called the tool and gave a response. But was that response *good*? Should the model be rewarded or penalized for that behavior? Without verification, there's no way to measure improvement.
11+
**The Problem**: When you ran the weather example in the quickstart, the agent successfully called the tool and provided a response. But was that response *good*? Should the model be rewarded or penalized? Without verification, you cannot measure performance or guide improvement.
1212

1313
**The Solution**: Each resource server must define exactly what "good performance" means for its domain.
1414

1515
## Why Verification Matters
1616

1717
**Tool Execution ≠ Good Performance**
1818

19-
- The right tool call was issued i.e. `get_weather("San Francisco")`
19+
- The right tool call was issued, e.g., `get_weather("San Francisco")`
2020
- But was helpful advice given? Was the response accurate? Was it efficient?
2121
- Verification answers these questions with numerical scores
2222

@@ -178,6 +178,6 @@ reward = await expensive_api_call(predicted, expected)
178178

179179
This verification system is what makes NeMo Gym powerful for model training:
180180
- **Resource servers** provide both tools AND scoring systems
181-
- **Verification patterns** vary by domain but follow common principles
181+
- **Verification patterns** vary by domain but follow common principles
182182
- **Reward signals** from verification drive model improvement through RL
183183
- **Good verification** is reliable, meaningful, and scalable

docs/about/ecosystem.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -23,4 +23,4 @@ The [NeMo Framework](https://github.com/NVIDIA-NeMo) is NVIDIA's GPU-accelerated
2323
* **NeMo Guardrails**: Programmable safety guardrails
2424
* And more...
2525

26-
**NeMo Gym's Role**: Within this ecosystem, Gym focuses specifically on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic, making it practical to generate large-scale, high-quality training data that feeds into NeMo RL and other training frameworks.
26+
**NeMo Gym's Role**: Within this ecosystem, Gym focuses on standardizing scalable rollout collection for RL training. It provides unified interfaces to heterogeneous RL environments and curated resource servers with verification logic. This makes it practical to generate large-scale, high-quality training data for NeMo RL and other training frameworks.

docs/about/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,6 @@ NeMo Gym generates training data for reinforcement learning by capturing how AI
1111

1212
Three components work together to generate and evaluate agent interactions:
1313

14-
- **Agents**: Orchestrate multi-turn interactions between models and resources. Handle conversation flow, tool routing, and response formatting.
15-
- **Models**: LLM inference endpoints (OpenAI-compatible or vLLM). Handle single-turn text generation and tool-calling decisions.
16-
- **Resources**: Provide tools (functions agents call) + verifiers (logic to score performance). Examples: math environments, code sandboxes, web search.
14+
- **Agents**: Orchestrate multi-turn interactions between models and resources, handling conversation flow, tool routing, and response formatting.
15+
- **Models**: LLM inference endpoints (OpenAI-compatible or vLLM) that handle single-turn text generation and tool-calling decisions.
16+
- **Resources**: Provide tools (functions agents call) and verifiers (logic to score performance). Examples include math environments, code sandboxes, and web search.

docs/get-started/index.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,12 @@
44

55
**Estimated Time**: 25-30 minutes
66

7-
This guided tutorial experience is designed for those brand new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
7+
This guided tutorial is designed for users new to training models with reinforcement learning (RL). These tutorials walk you through the complete journey from installation to generating training data at scale.
88

99
**By the end of this tutorial series, you will have:**
1010

11-
✅ A working NeMo Gym installation with servers running
12-
Ability to generate rollouts for RL training
11+
✅ A working NeMo Gym installation with servers running
12+
The ability to generate rollouts for RL training
1313

1414
## Before You Start
1515

@@ -23,7 +23,7 @@ Make sure you have these prerequisites ready before beginning the tutorials:
2323

2424
## Tutorial Path
2525

26-
Follow these four tutorials in sequence to build your first AI agent from scratch:
26+
Follow these tutorials in sequence to build your first AI agent from scratch:
2727

2828
::::{grid} 1 1 1 1
2929
:gutter: 3

docs/get-started/rollout-collection.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Rollout Collection
22

3-
A rollout is complete record of a task instance execution that captures:
3+
A rollout is a complete record of a task instance execution that captures:
44
- What the model was asked to do (input)
55
- How the model reasoned (internal processing)
66
- What tools were used (tool calls and tool responses)

docs/how-to-faq.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
1-
# How-To's and FAQ's
1+
# How-Tos and FAQs
22

33
:::{warning}
4-
This document is a smattering of How-To's and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
4+
This document is a collection of How-Tos and FAQs that have not made their way into an official tutorial yet. The following guides are **experimental** and may contain bugs. Proceed with caution.
55
:::
66

77
# How To: Run tests for simple agent
@@ -851,7 +851,7 @@ TODO @bxyu-nvidia: expand on this later.
851851

852852
# FAQ: NeMo Gym what CI/CD do I need to pass?
853853

854-
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of the mare not.
854+
NeMo Gym has an E2E suite of CI/CD in the form of Github actions workflows. Some of these are critical to PR merge and some of them are not.
855855

856856
For the majority of PRs, there are 5 checks that need to pass:
857857
1. DCO
@@ -865,7 +865,7 @@ Examples of PR checks that most PRs do not need to wait for to pass:
865865
2. CICD NeMo / Nemo_CICD_Test (push)
866866
...
867867

868-
# FAQ: Why aiohttp backend and not httpx/httpcore for async http?
868+
# FAQ: Why use aiohttp backend instead of httpx/httpcore for async http?
869869

870870
TL;DR: httpx is O(n^2) runtime where n is the number of queued requests (i.e. for each request, we check all other queued requests). This is terribly inefficient and results in major slowdowns.
871871

docs/index.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2,9 +2,9 @@
22

33
# NeMo Gym Documentation
44

5-
NeMo Gym is a framework for building reinforcement learning (RL) training environments large language models (LLMs). Gym provides training environment development scaffolding and training environment patterns such as multi-step, multi-turn, and user modeling scenarios.
5+
NeMo Gym is a framework for building reinforcement learning (RL) training environments for large language models (LLMs). It provides training environment development scaffolding and training environment patterns for multi-step, multi-turn, and user modeling scenarios.
66

7-
At the core of NeMo Gym are three server concepts: **Responses API Model servers** are model endpoints, **Resources servers** contain tool implementations and verification logic, and **Response API Agent servers** orchestrate the interaction between models and resources.
7+
NeMo Gym has three core server types: **Responses API Model servers** provide model endpoints, **Resources servers** contain tool implementations and verification logic, and **Responses API Agent servers** orchestrate interactions between models and resources.
88

99
## Quickstart
1010

docs/tutorials/index.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
Hands-on learning experiences that guide you through building, training, and deploying AI agents with NeMo Gym.
66

77
:::{tip}
8-
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial experience from installation through your first verified agent. Return here after completing those tutorials to learn about advanced topics like additional rollout collection methods and training data generation.
8+
**New to NeMo Gym?** Begin with the {doc}`Get Started <../get-started/index>` section for a guided tutorial from installation through your first verified agent. Return here afterward to learn about advanced topics like additional rollout collection methods and training data generation.
99
:::
1010
---
1111

@@ -27,9 +27,9 @@ Transform rollouts into training data for supervised fine-tuning (SFT) and direc
2727
:::{grid-item-card} {octicon}`workflow;1.5em;sd-mr-1` RL Training with NeMo RL
2828
:link: rl-training-with-nemo-rl
2929
:link-type: doc
30-
Train a model with NeMo RL. Learn how to set up NeMo Gym + NeMo RL training environment, run tests, prepare data, and launch single and multi-node training runs.
30+
Train a model with NeMo RL. Learn how to set up NeMo Gym and NeMo RL training environments, run tests, prepare data, and launch single-node and multi-node training runs.
3131
+++
32-
{bdg-secondary}`sft` {bdg-secondary}`dpo`
32+
{bdg-secondary}`rl` {bdg-secondary}`training`
3333
:::
3434

3535
::::

0 commit comments

Comments
 (0)