Skip to content

Conversation

@bxyu-nvidia
Copy link
Contributor

No description provided.

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 3, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
@bxyu-nvidia bxyu-nvidia marked this pull request as ready for review October 7, 2025 05:24
Signed-off-by: Brian Yu <bxyu@nvidia.com>
@bxyu-nvidia bxyu-nvidia merged commit 511ac42 into main Oct 7, 2025
6 checks passed
@bxyu-nvidia bxyu-nvidia deleted the bxyu/misc-infra-20251001 branch October 7, 2025 05:28
bxyu-nvidia added a commit that referenced this pull request Oct 21, 2025
Signed-off-by: Brian Yu <bxyu@nvidia.com>
sidnarayanan added a commit that referenced this pull request Oct 29, 2025
* Instantiate one httpx async client per unique connection / base url (#75)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* handle model calling failures, e.g. max token limits

* Swap async http backend from httpx to aiohttp; various server infra improvements (#77)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Remove unnecessary GHA CI and add uv config to enable dependency scanning (#66)

Remove unnecessary Github Action CI and add uv config to enable
dependency scanning

* This project's current CI doesn't need to build and test through a
docker image. So, deleting the unnecssary CI Dockerfile and Github
Actions template
* Adding `managed = true` under `[uv.tool]` to allow for repo dependency
scanning

---------

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

* VLLMModel fix whitespace stripping and unwarranted spaces (#70)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* gracefully handle vllm failures

* check in metrics

* Fix aggregation rounding in ng_prepare_data (#76)

This implements a simple rounding rule for `AvgMinMax` floats in order
to keep example_metrics consistent.
For background, the addition of median and std dev did not assign a
ceiling for decimal places, so trivial value differences such as `1.2 !=
1.200002` caused ValueErrors.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* working on returning a list of transitions

* Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering (#79)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* break if over token limit

* log total length when response call fails

* bbh

* update_agent_state

* EnvStateMessage support

* support collapsing env states during rollout

* fix import?

* bbh data prep

* set default factory

* Delete .github/ISSUE_TEMPLATE directory (#87)

From now on, any Github repo under the Nvidia-NeMo org will use the
default Issues system / behavior
(unless overwritten i.e. in this case the individual repo would have a
.github/ISSUE_TEMPLATE folder in the repo itself, effectively overriding
the default behavior coming from the NVIDIA-NeMo/.github repo [which
establishes the default system]).

Default behavior for Issues creation now is :
- There is a Bug Report issue
- There is a Feature Request issue
- Blank issues are NOT allowed to be created

* wip

* drop secrets

* Add support for `num_repeats` (#99)

# Add `num_repeats` hyperparameter for dataset repetition

## Summary
Adds optional `num_repeats` parameter to `DatasetConfig` that allows
repeating each dataset sample during training data processing and
preparation.

## Changes
- **Config**: Added `num_repeats: Optional[int] = Field(default=None,
ge=1)` to `DatasetConfig`
- **Processing**: Modified `_iter_dataset_lines()` to repeat each line
`num_repeats` times (defaults to 1)
- **Integration**: Updated data validation, metrics aggregation, and
preparation workflows to handle repeated samples
- **Documentation**: Updated README with usage examples

## Usage
```yaml
datasets:
  - name: train
    type: train
    jsonl_fpath: data/train.jsonl
    num_repeats: 3  # Each sample appears 3 times during processing
```

## Testing
Added comprehensive unit tests covering:
- Configuration validation (accepts positive integers, rejects invalid
values)
- Data iteration with different repeat values
- Metrics aggregation with repeated samples
- Data preparation workflow integration

All existing functionality preserved with backward compatibility
(defaults to 1 repetition).

---------

Signed-off-by: Mahan Fathi <mfathi@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

* Comp coding fixes; lots of misc infra items (#90)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

* chore: Update cherry-pick workflow to use v0.63.0 (#108)

* Make Workbench stateful and sign commits  (#110)

Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com>

* better max_seq_len check, more error handling

* Clean deprecated Comp coding (#106)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* logging etc

* await json()

* bunch of fixes for EnvStateMessage hiding and Qwen3

* log max seq len too

* set max_steps in config

* update no-tool-call message

* actually collapse

* only collapse env state messages after  successful transitions

* allow postprocessing empty message

* Bxyu/misc infra 20251001 (#116)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Resource Server Organization (#80)

This PR adds a pre-commit hook that scans the resource_server
directories and populates/updates the designated table in the readme
with the pertinent information.

Addresses item 1 from #81

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

* Add metrics conflict error FAQ to Readme (#93)

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* Azure OpenAI model support (#112)

Signed-off-by: slikhite-1 <slikhite@nvidia.com>
Co-authored-by: slikhite-1 <slikhite@nvidia.com>

* Use python env for precommit hook; alter files trigger (#125)

This change uses python instead of bash system language to get around
the `python: command not found` error when using vs code source control
gui.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* retry seed session if it fails

* Update issue templates (#152)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Add back Nemo Framework templates (#153)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Fix Workbench invalid function name (#167)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* VLLMModel enable reasoning parsing (#129)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* raise reset response for status

* Add Attributions for Third Party Softwares (#154)

Add Attributions for Third Party Softwares.

---------

Signed-off-by: banghuaz <banghuaz@nvidia.com>

* Fix infinite OpenAI endpoint query; misc improvements (#171)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* set domain; pass cookies

* domain should be coding

* proper gsm8k train/val split

* clean up bbh stuff

---------

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Mahan Fathi <mfathi@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com>
Signed-off-by: slikhite-1 <slikhite@nvidia.com>
Signed-off-by: banghuaz <banghuaz@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: fsiino-nvidia <fsiino@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Mahan <25934206+MahanFathi@users.noreply.github.com>
Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: abhibha-nvidia <abhibhag@nvidia.com>
Co-authored-by: slikhite-1 <slikhite@nvidia.com>
Co-authored-by: banghuaz-nvidia <banghuaz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants