Skip to content

fix: Issue all queued nmx-m operations even if the monitor encounters an error#249

Closed
tmcroberts97 wants to merge 1 commit into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-monitor-early-exit
Closed

fix: Issue all queued nmx-m operations even if the monitor encounters an error#249
tmcroberts97 wants to merge 1 commit into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-monitor-early-exit

Conversation

@tmcroberts97

Copy link
Copy Markdown
Contributor

Description

Do not exit early from execute_nmx_m_operations() if we cannot issue an operation to NMX-M. Instead add only successfully enqueued operations to pending list and ignore any that errored out. Exiting execute_nmx_m_operations() early with an error was resulting in skipped db updates even for those operations that were successfully enqueued and completed by NMX-M.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

@tmcroberts97 tmcroberts97 requested a review from a team as a code owner February 11, 2026 19:42
Do not exit early from execute_nmx_m_operations() if we cannot issue an
operation to NMX-M. Instead add only successfully enqueued operations to
pending list and ignore any that errored out. Exiting
execute_nmx_m_operations() early with an error was resulting in skipped
db updates even for those operations that were successfully enqueued and
completed by NMX-M.

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
@tmcroberts97 tmcroberts97 force-pushed the fix/nvl-monitor-early-exit branch from edc3e6a to 34a943c Compare February 11, 2026 19:43
@tmcroberts97

Copy link
Copy Markdown
Contributor Author

Closing because of stale fork, opening #273 instead

kfelternv added a commit to kfelternv/infra-controller that referenced this pull request May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants