Bxyu/misc infra 20251001 #116

bxyu-nvidia · 2025-10-03T19:32:37Z

No description provided.

Signed-off-by: Brian Yu <bxyu@nvidia.com>

copy-pr-bot · 2025-10-03T19:32:40Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

README.md

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Instantiate one httpx async client per unique connection / base url (#75) Signed-off-by: Brian Yu <bxyu@nvidia.com> * handle model calling failures, e.g. max token limits * Swap async http backend from httpx to aiohttp; various server infra improvements (#77) Signed-off-by: Brian Yu <bxyu@nvidia.com> * Remove unnecessary GHA CI and add uv config to enable dependency scanning (#66) Remove unnecessary Github Action CI and add uv config to enable dependency scanning * This project's current CI doesn't need to build and test through a docker image. So, deleting the unnecssary CI Dockerfile and Github Actions template * Adding `managed = true` under `[uv.tool]` to allow for repo dependency scanning --------- Signed-off-by: Charlie Truong <chtruong@nvidia.com> * VLLMModel fix whitespace stripping and unwarranted spaces (#70) Signed-off-by: Brian Yu <bxyu@nvidia.com> * gracefully handle vllm failures * check in metrics * Fix aggregation rounding in ng_prepare_data (#76) This implements a simple rounding rule for `AvgMinMax` floats in order to keep example_metrics consistent. For background, the addition of median and std dev did not assign a ceiling for decimal places, so trivial value differences such as `1.2 != 1.200002` caused ValueErrors. --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> * working on returning a list of transitions * Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering (#79) Signed-off-by: Brian Yu <bxyu@nvidia.com> * break if over token limit * log total length when response call fails * bbh * update_agent_state * EnvStateMessage support * support collapsing env states during rollout * fix import? * bbh data prep * set default factory * Delete .github/ISSUE_TEMPLATE directory (#87) From now on, any Github repo under the Nvidia-NeMo org will use the default Issues system / behavior (unless overwritten i.e. in this case the individual repo would have a .github/ISSUE_TEMPLATE folder in the repo itself, effectively overriding the default behavior coming from the NVIDIA-NeMo/.github repo [which establishes the default system]). Default behavior for Issues creation now is : - There is a Bug Report issue - There is a Feature Request issue - Blank issues are NOT allowed to be created * wip * drop secrets * Add support for `num_repeats` (#99) # Add `num_repeats` hyperparameter for dataset repetition ## Summary Adds optional `num_repeats` parameter to `DatasetConfig` that allows repeating each dataset sample during training data processing and preparation. ## Changes - **Config**: Added `num_repeats: Optional[int] = Field(default=None, ge=1)` to `DatasetConfig` - **Processing**: Modified `_iter_dataset_lines()` to repeat each line `num_repeats` times (defaults to 1) - **Integration**: Updated data validation, metrics aggregation, and preparation workflows to handle repeated samples - **Documentation**: Updated README with usage examples ## Usage ```yaml datasets: - name: train type: train jsonl_fpath: data/train.jsonl num_repeats: 3 # Each sample appears 3 times during processing ``` ## Testing Added comprehensive unit tests covering: - Configuration validation (accepts positive integers, rejects invalid values) - Data iteration with different repeat values - Metrics aggregation with repeated samples - Data preparation workflow integration All existing functionality preserved with backward compatibility (defaults to 1 repetition). --------- Signed-off-by: Mahan Fathi <mfathi@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> * Comp coding fixes; lots of misc infra items (#90) Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> * chore: Update cherry-pick workflow to use v0.63.0 (#108) * Make Workbench stateful and sign commits (#110) Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com> * better max_seq_len check, more error handling * Clean deprecated Comp coding (#106) Signed-off-by: Brian Yu <bxyu@nvidia.com> * logging etc * await json() * bunch of fixes for EnvStateMessage hiding and Qwen3 * log max seq len too * set max_steps in config * update no-tool-call message * actually collapse * only collapse env state messages after successful transitions * allow postprocessing empty message * Bxyu/misc infra 20251001 (#116) Signed-off-by: Brian Yu <bxyu@nvidia.com> * Resource Server Organization (#80) This PR adds a pre-commit hook that scans the resource_server directories and populates/updates the designated table in the readme with the pertinent information. Addresses item 1 from #81 --------- Signed-off-by: Frankie Siino <fsiino@nvidia.com> Signed-off-by: Brian Yu <bxyu@nvidia.com> Co-authored-by: Brian Yu <bxyu@nvidia.com> * Add metrics conflict error FAQ to Readme (#93) Signed-off-by: Frankie Siino <fsiino@nvidia.com> * Azure OpenAI model support (#112) Signed-off-by: slikhite-1 <slikhite@nvidia.com> Co-authored-by: slikhite-1 <slikhite@nvidia.com> * Use python env for precommit hook; alter files trigger (#125) This change uses python instead of bash system language to get around the `python: command not found` error when using vs code source control gui. Signed-off-by: Frankie Siino <fsiino@nvidia.com> * retry seed session if it fails * Update issue templates (#152) Signed-off-by: Brian Yu <bxyu@nvidia.com> * Add back Nemo Framework templates (#153) Signed-off-by: Brian Yu <bxyu@nvidia.com> * Fix Workbench invalid function name (#167) Signed-off-by: Brian Yu <bxyu@nvidia.com> * VLLMModel enable reasoning parsing (#129) Signed-off-by: Brian Yu <bxyu@nvidia.com> * raise reset response for status * Add Attributions for Third Party Softwares (#154) Add Attributions for Third Party Softwares. --------- Signed-off-by: banghuaz <banghuaz@nvidia.com> * Fix infinite OpenAI endpoint query; misc improvements (#171) Signed-off-by: Brian Yu <bxyu@nvidia.com> * set domain; pass cookies * domain should be coding * proper gsm8k train/val split * clean up bbh stuff --------- Signed-off-by: Brian Yu <bxyu@nvidia.com> Signed-off-by: Charlie Truong <chtruong@nvidia.com> Signed-off-by: Frankie Siino <fsiino@nvidia.com> Signed-off-by: Mahan Fathi <mfathi@nvidia.com> Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com> Signed-off-by: slikhite-1 <slikhite@nvidia.com> Signed-off-by: banghuaz <banghuaz@nvidia.com> Co-authored-by: bxyu-nvidia <bxyu@nvidia.com> Co-authored-by: Charlie Truong <chtruong@nvidia.com> Co-authored-by: fsiino-nvidia <fsiino@nvidia.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: Mahan <25934206+MahanFathi@users.noreply.github.com> Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com> Co-authored-by: abhibha-nvidia <abhibhag@nvidia.com> Co-authored-by: slikhite-1 <slikhite@nvidia.com> Co-authored-by: banghuaz-nvidia <banghuaz@nvidia.com>

bxyu-nvidia added 8 commits October 1, 2025 16:51

tweak

06f0487

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add analysis script

8c4dabf

Signed-off-by: Brian Yu <bxyu@nvidia.com>

print cumulative pct

f6cf89d

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add time taken

d9a63c8

Signed-off-by: Brian Yu <bxyu@nvidia.com>

actually print content

88ac48a

Signed-off-by: Brian Yu <bxyu@nvidia.com>

try fix base model

fa7eeff

Signed-off-by: Brian Yu <bxyu@nvidia.com>

address #109

4bde134

Signed-off-by: Brian Yu <bxyu@nvidia.com>

clarify separate terminal

b8179c0

Signed-off-by: Brian Yu <bxyu@nvidia.com>

cwing-nvidia reviewed Oct 3, 2025

View reviewed changes

README.md Show resolved Hide resolved

bxyu-nvidia added 3 commits October 3, 2025 12:39

add snippet

1170d6c

Signed-off-by: Brian Yu <bxyu@nvidia.com>

add source instruction

edca0ba

Signed-off-by: Brian Yu <bxyu@nvidia.com>

start

9a2cd09

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia marked this pull request as ready for review October 7, 2025 05:24

slap in todo

5fb3776

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia merged commit 511ac42 into main Oct 7, 2025
6 checks passed

bxyu-nvidia deleted the bxyu/misc-infra-20251001 branch October 7, 2025 05:28

bxyu-nvidia added a commit that referenced this pull request Oct 21, 2025

Bxyu/misc infra 20251001 (#116)

f1e963c

Signed-off-by: Brian Yu <bxyu@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bxyu/misc infra 20251001 #116

Bxyu/misc infra 20251001 #116

Uh oh!

bxyu-nvidia commented Oct 3, 2025

Uh oh!

copy-pr-bot bot commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Bxyu/misc infra 20251001 #116

Bxyu/misc infra 20251001 #116

Uh oh!

Conversation

bxyu-nvidia commented Oct 3, 2025

Uh oh!

copy-pr-bot bot commented Oct 3, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants