Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -130,7 +130,7 @@ simple_agent:
name: ???
model_server:
type: responses_api_models
name: openai_model
name: policy_model
```

This is how this YAML config translates to the simple agent config as defined in Python in `responses_api_agents/simple_agent/app.py`.
Expand All @@ -146,21 +146,21 @@ You can define your server configs to require or accept any arbitrary structures

If your config contains a server reference that doesn't exist, NeMo Gym will let you know e.g.:
```bash
AssertionError: Could not find type='responses_api_models' name='simple_model_server' in the list of available servers: [AgentServerRef(type='responses_api_agents', name='simple_agent'), ModelServerRef(type='responses_api_models', name='openai_model'), ResourcesServerRef(type='resources_servers', name='simple_weather')]
AssertionError: Could not find type='responses_api_models' name='simple_model_server' in the list of available servers: [AgentServerRef(type='responses_api_agents', name='simple_agent'), ModelServerRef(type='responses_api_models', name='policy_model'), ResourcesServerRef(type='resources_servers', name='simple_weather')]
```

If your config is missing an argument or argument value, NeMo Gym will let you know e.g.:
```bash
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: openai_model.responses_api_models.openai_model.openai_api_key
full_key: openai_model.responses_api_models.openai_model.openai_api_key
omegaconf.errors.MissingMandatoryValue: Missing mandatory value: policy_model.responses_api_models.openai_model.openai_api_key
full_key: policy_model.responses_api_models.openai_model.openai_api_key
object_type=dict
```


### Special policy model placeholders
There is one set of special NeMo Gym variables relating to the target agent model. These are the `policy_base_url`, `policy_api_key`, `policy_model_name` variables. When you go to train a model, these are the information that will be used to query the model server endpoint you are trying to train. By default, every agent will refer to this shared `openai_model` model server.
There is one set of special NeMo Gym variables relating to the agent policy model. These are the `policy_base_url`, `policy_api_key`, `policy_model_name` variables. When you go to train a model, these are the information that will be used to query the model server endpoint you are trying to train. By default, every agent will refer to this shared `policy_model` model server.
```yaml
openai_model:
policy_model:
responses_api_models:
openai_model:
entrypoint: app.py
Expand Down Expand Up @@ -599,7 +599,7 @@ multineedle_simple_agent:
name: multineedle_resources_server
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion nemo_gym/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -447,7 +447,7 @@ def init_resources_server(): # pragma: no cover
name: {server_type_name}_resources_server
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/comp_coding/configs/comp_coding.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ comp_coding_simple_agent:
name: comp_coding
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/google_search/configs/google_search.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ google_search_simple_agent:
name: google_search
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ instruction_following_simple_agent:
name: instruction_following
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/library_judge_math/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ The following are example commands for running this resource server, along with
config_paths="responses_api_models/openai_model/configs/openai_model.yaml, \
resources_servers/library_judge_math/configs/library_judge_math.yaml"
ng_run "+config_paths=[$config_paths]" \
+library_judge_math.resources_servers.library_judge_math.judge_model_server.name=openai_model
+library_judge_math.resources_servers.library_judge_math.judge_model_server.name=policy_model
```

To download the OpenMathReasoning dataset, the following command can be run:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ library_judge_math_simple_agent:
name: library_judge_math
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/library_judge_math/configs/dapo17k.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ library_judge_math_simple_agent:
name: library_judge_math
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ library_judge_math_simple_agent:
name: library_judge_math
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/mcqa/configs/mcqa.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ mcqa_simple_agent:
name: mcqa
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
2 changes: 1 addition & 1 deletion resources_servers/multineedle/configs/multineedle.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ multineedle_simple_agent:
name: multineedle_resources_server
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ multiverse_math_hard_simple_agent:
name: multiverse_math_hard
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand All @@ -23,4 +23,4 @@ multiverse_math_hard_simple_agent:
license: Apache 2.0
- name: example
type: example
jsonl_fpath: resources_servers/multiverse_math_hard/data/example.jsonl
jsonl_fpath: resources_servers/multiverse_math_hard/data/example.jsonl
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ python_math_exec_simple_agent:
name: python_math_exec
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ simple_weather_simple_agent:
name: simple_weather
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: example
type: example
Expand Down
4 changes: 2 additions & 2 deletions resources_servers/workbench/configs/workbench.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ workbench_simple_agent:
name: workbench
model_server:
type: responses_api_models
name: openai_model
name: policy_model
datasets:
- name: train
type: train
Expand All @@ -24,4 +24,4 @@ workbench_simple_agent:
- name: example
type: example
jsonl_fpath: resources_servers/workbench/data/example.jsonl
max_steps: 6
max_steps: 6
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ simple_agent:
name: ???
model_server:
type: responses_api_models
name: openai_model
name: policy_model
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ simple_agent_stateful:
name: ???
model_server:
type: responses_api_models
name: openai_model
name: policy_model
4 changes: 2 additions & 2 deletions responses_api_models/openai_model/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@

async def main():
task_1 = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/responses",
json={"input": "hello"},
)
task_2 = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/chat/completions",
json={
"messages": [{"role": "user", "content": "hello"}],
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
openai_model:
policy_model:
responses_api_models:
openai_model:
entrypoint: app.py
Expand Down
8 changes: 4 additions & 4 deletions responses_api_models/vllm_model/client.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,12 +21,12 @@

async def main():
task_1a = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/responses",
json={"input": [{"role": "user", "content": "hello"}]},
)
task_1b = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/responses",
json={
"input": [
Expand Down Expand Up @@ -54,14 +54,14 @@ async def main():
},
)
task_2a = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/chat/completions",
json={
"messages": [{"role": "user", "content": "hello"}],
},
)
task_2b = await server_client.post(
server_name="openai_model",
server_name="policy_model",
url_path="/v1/chat/completions",
json={
"messages": [
Expand Down
2 changes: 1 addition & 1 deletion responses_api_models/vllm_model/configs/vllm_model.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
openai_model:
policy_model:
responses_api_models:
vllm_model:
entrypoint: app.py
Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
openai_model:
policy_model:
responses_api_models:
vllm_model:
entrypoint: app.py
Expand Down
18 changes: 9 additions & 9 deletions tests/unit_tests/test_train_data_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ def load_multineedle_test_global_config_dict() -> DictConfig:
"resources_servers/multineedle/configs/multineedle.yaml",
"responses_api_models/openai_model/configs/openai_model.yaml",
],
# For openai_model
# For policy_model
"policy_base_url": "",
"policy_api_key": "",
"policy_model_name": "",
Expand Down Expand Up @@ -88,7 +88,7 @@ def test_load_and_validate_server_instance_configs_sanity(self, monkeypatch: Mon
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
},
Expand Down Expand Up @@ -130,7 +130,7 @@ def test_load_datasets_sanity(self) -> None:
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -175,7 +175,7 @@ def test_load_datasets_missing_example_dataset_raises_AssertionError(self) -> No
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -230,7 +230,7 @@ def test_load_datasets_missing_train_dataset_shouldnt_download_raises_AssertionE
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -290,7 +290,7 @@ def custom_open(filename, mode="r"):
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -374,7 +374,7 @@ def custom_open(filename, mode="r"):
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -498,7 +498,7 @@ def custom_open(filename, mode="r"):
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down Expand Up @@ -594,7 +594,7 @@ def custom_open(filename, mode="r"):
},
"model_server": {
"type": "responses_api_models",
"name": "openai_model",
"name": "policy_model",
},
}
}
Expand Down