Skip to content

Conversation

@fsiino-nvidia
Copy link
Contributor

@fsiino-nvidia fsiino-nvidia commented Oct 28, 2025

This PR provides warnings for any server failing validation and outputs the malformed or incorrect parts of the config. It also adds a new optional env variable error_on_almost_servers for a ValueError to be raised.

Currently in the gym spinup process, any server(s) failing validation results in the server getting silently dropped and raising a generic AssertionError.

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Oct 28, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

…reporting

Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@fsiino-nvidia fsiino-nvidia linked an issue Oct 28, 2025 that may be closed by this pull request
2 tasks
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
…reporting

Signed-off-by: Frankie Siino <fsiino@nvidia.com>

# Conflicts:
#	nemo_gym/global_config.py
@fsiino-nvidia fsiino-nvidia marked this pull request as ready for review October 30, 2025 17:52
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
Signed-off-by: Frankie Siino <fsiino@nvidia.com>
@bxyu-nvidia bxyu-nvidia merged commit 2f66c55 into main Nov 4, 2025
6 checks passed
@bxyu-nvidia bxyu-nvidia deleted the fsiino/almost-server-reporting branch November 4, 2025 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Global config dict resolution reports (and errors?) on almost servers

4 participants