Resource Server Organization #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

bxyu-nvidia merged 28 commits into main from fsiino/resource-server-organization

Oct 8, 2025

Contributor

fsiino-nvidia commented Sep 22, 2025 •

edited

Loading

This PR adds a pre-commit hook that scans the resource_server directories and populates/updates the designated table in the readme with the pertinent information.

Addresses item 1 from #81

copy-pr-bot bot commented Sep 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

fsiino-nvidia added 2 commits

September 22, 2025 14:29


          Init

cb63a92

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Rename script, pretty print table, add error handling

452c8e1

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

fsiino-nvidia force-pushed the fsiino/resource-server-organization branch from 47a10a1 to 452c8e1 Compare

September 22, 2025 21:33

copy-pr-bot bot commented Sep 22, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

fsiino-nvidia added 15 commits

September 22, 2025 15:08


          Copyright

82f6930

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Enforce license and domain in resource server yaml

07ca159

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Extract training license, use enum for domains

23d59e3

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Update script

443cb2c

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Simplify domain check

c1c2e72

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Update readme

011b22b

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Recall hf_utils

1a9b2b5

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Test precommit order

28d998d

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Revert precommits, remove extra readme section

b847d19

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Test idempotent write

7ffa731

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Make sorting consistent

feb0e22

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          More sorting conflicts

c3f4104

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Merge remote-tracking branch 'github/main' into fsiino/resource-serve…

a902713

…r-organization

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Simplify

a5acbcb

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Add docstring

4e292f8

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

fsiino-nvidia marked this pull request as ready for review

September 24, 2025 16:43

fsiino-nvidia added 2 commits

September 24, 2025 09:44


          Empty commit for tests

d5f2448

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Strip domain from non-resource server

eb19f4f

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

fsiino-nvidia requested review from banghuaz-nvidia and bxyu-nvidia

September 24, 2025 18:01

fsiino-nvidia linked an issue

that may be closed by this pull request

HF Dataset Format Checking & Migration, Auto summary table generation #81

Closed

banghuaz-nvidia reviewed

View reviewed changes

README.md Outdated Show resolved Hide resolved

banghuaz-nvidia reviewed

View reviewed changes

resources_servers/google_search/configs/google_search.yaml Outdated Show resolved Hide resolved


          Change google_search to agent

08ef6d9

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

banghuaz-nvidia previously approved these changes

View reviewed changes

Contributor

banghuaz-nvidia left a comment

LGTM. Thanks Frankie for covering this!


          Merge remote-tracking branch 'github/main' into fsiino/resource-serve…

86dde55

…r-organization

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

# Conflicts:
#	resources_servers/comp_coding/configs/comp_coding.yaml

fsiino-nvidia dismissed banghuaz-nvidia’s stale review via

86dde55

September 28, 2025 20:10

fsiino-nvidia added 6 commits

September 28, 2025 13:10


          Remove name

6296eb3

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Make path a link, include dataset usages

926f517

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Use raw html links

e41b627

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Detect changes to limit script run

a4433cd

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Run readme update on staged config file changes

20909af

Signed-off-by: Frankie Siino <fsiino@nvidia.com>


          Remove git add

1739dd4

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

bxyu-nvidia approved these changes

View reviewed changes


          Merge branch 'main' into fsiino/resource-server-organization

dff9fb9

Signed-off-by: Brian Yu <bxyu@nvidia.com>

bxyu-nvidia merged commit 5bbbed6 into main

5 checks passed

bxyu-nvidia deleted the fsiino/resource-server-organization branch

October 8, 2025 17:40

bxyu-nvidia added a commit that referenced this pull request


          Resource Server Organization (#80)

5912a46

This PR adds a pre-commit hook that scans the resource_server
directories and populates/updates the designated table in the readme
with the pertinent information.

Addresses item 1 from #81

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>

sidnarayanan added a commit that referenced this pull request


          Training (#3)

c7c1e67

* Instantiate one httpx async client per unique connection / base url (#75)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* handle model calling failures, e.g. max token limits

* Swap async http backend from httpx to aiohttp; various server infra improvements (#77)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Remove unnecessary GHA CI and add uv config to enable dependency scanning (#66)

Remove unnecessary Github Action CI and add uv config to enable
dependency scanning

* This project's current CI doesn't need to build and test through a
docker image. So, deleting the unnecssary CI Dockerfile and Github
Actions template
* Adding `managed = true` under `[uv.tool]` to allow for repo dependency
scanning

---------

Signed-off-by: Charlie Truong <chtruong@nvidia.com>

* VLLMModel fix whitespace stripping and unwarranted spaces (#70)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* gracefully handle vllm failures

* check in metrics

* Fix aggregation rounding in ng_prepare_data (#76)

This implements a simple rounding rule for `AvgMinMax` floats in order
to keep example_metrics consistent.
For background, the addition of median and std dev did not assign a
ceiling for decimal places, so trivial value differences such as `1.2 !=
1.200002` caused ValueErrors.

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* working on returning a list of transitions

* Add profiling; improve rollout collection usability and efficiency; add uvicorn logging filtering (#79)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* break if over token limit

* log total length when response call fails

* bbh

* update_agent_state

* EnvStateMessage support

* support collapsing env states during rollout

* fix import?

* bbh data prep

* set default factory

* Delete .github/ISSUE_TEMPLATE directory (#87)

From now on, any Github repo under the Nvidia-NeMo org will use the
default Issues system / behavior
(unless overwritten i.e. in this case the individual repo would have a
.github/ISSUE_TEMPLATE folder in the repo itself, effectively overriding
the default behavior coming from the NVIDIA-NeMo/.github repo [which
establishes the default system]).

Default behavior for Issues creation now is :
- There is a Bug Report issue
- There is a Feature Request issue
- Blank issues are NOT allowed to be created

* wip

* drop secrets

* Add support for `num_repeats` (#99)

# Add `num_repeats` hyperparameter for dataset repetition

## Summary
Adds optional `num_repeats` parameter to `DatasetConfig` that allows
repeating each dataset sample during training data processing and
preparation.

## Changes
- **Config**: Added `num_repeats: Optional[int] = Field(default=None,
ge=1)` to `DatasetConfig`
- **Processing**: Modified `_iter_dataset_lines()` to repeat each line
`num_repeats` times (defaults to 1)
- **Integration**: Updated data validation, metrics aggregation, and
preparation workflows to handle repeated samples
- **Documentation**: Updated README with usage examples

## Usage
```yaml
datasets:
  - name: train
    type: train
    jsonl_fpath: data/train.jsonl
    num_repeats: 3  # Each sample appears 3 times during processing
```

## Testing
Added comprehensive unit tests covering:
- Configuration validation (accepts positive integers, rejects invalid
values)
- Data iteration with different repeat values
- Metrics aggregation with repeated samples
- Data preparation workflow integration

All existing functionality preserved with backward compatibility
(defaults to 1 repetition).

---------

Signed-off-by: Mahan Fathi <mfathi@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

* Comp coding fixes; lots of misc infra items (#90)

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>

* chore: Update cherry-pick workflow to use v0.63.0 (#108)

* Make Workbench stateful and sign commits  (#110)

Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com>

* better max_seq_len check, more error handling

* Clean deprecated Comp coding (#106)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* logging etc

* await json()

* bunch of fixes for EnvStateMessage hiding and Qwen3

* log max seq len too

* set max_steps in config

* update no-tool-call message

* actually collapse

* only collapse env state messages after  successful transitions

* allow postprocessing empty message

* Bxyu/misc infra 20251001 (#116)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Resource Server Organization (#80)

This PR adds a pre-commit hook that scans the resource_server
directories and populates/updates the designated table in the readme
with the pertinent information.

Addresses item 1 from #81

---------

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Brian Yu <bxyu@nvidia.com>
Co-authored-by: Brian Yu <bxyu@nvidia.com>

* Add metrics conflict error FAQ to Readme (#93)

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* Azure OpenAI model support (#112)

Signed-off-by: slikhite-1 <slikhite@nvidia.com>
Co-authored-by: slikhite-1 <slikhite@nvidia.com>

* Use python env for precommit hook; alter files trigger (#125)

This change uses python instead of bash system language to get around
the `python: command not found` error when using vs code source control
gui.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

* retry seed session if it fails

* Update issue templates (#152)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Add back Nemo Framework templates (#153)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* Fix Workbench invalid function name (#167)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* VLLMModel enable reasoning parsing (#129)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* raise reset response for status

* Add Attributions for Third Party Softwares (#154)

Add Attributions for Third Party Softwares.

---------

Signed-off-by: banghuaz <banghuaz@nvidia.com>

* Fix infinite OpenAI endpoint query; misc improvements (#171)

Signed-off-by: Brian Yu <bxyu@nvidia.com>

* set domain; pass cookies

* domain should be coding

* proper gsm8k train/val split

* clean up bbh stuff

---------

Signed-off-by: Brian Yu <bxyu@nvidia.com>
Signed-off-by: Charlie Truong <chtruong@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Mahan Fathi <mfathi@nvidia.com>
Signed-off-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Signed-off-by: Abhibha Gupta <abhibhag@nvidia.com>
Signed-off-by: slikhite-1 <slikhite@nvidia.com>
Signed-off-by: banghuaz <banghuaz@nvidia.com>
Co-authored-by: bxyu-nvidia <bxyu@nvidia.com>
Co-authored-by: Charlie Truong <chtruong@nvidia.com>
Co-authored-by: fsiino-nvidia <fsiino@nvidia.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Mahan <25934206+MahanFathi@users.noreply.github.com>
Co-authored-by: Khushi Bhardwaj <kbhardwaj@nvidia.com>
Co-authored-by: abhibha-nvidia <abhibhag@nvidia.com>
Co-authored-by: slikhite-1 <slikhite@nvidia.com>
Co-authored-by: banghuaz-nvidia <banghuaz@nvidia.com>

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet