Skip to content

fix: nvlink partitioning metrics#275

Merged
tmcroberts97 merged 2 commits into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-metrics
Feb 17, 2026
Merged

fix: nvlink partitioning metrics#275
tmcroberts97 merged 2 commits into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-metrics

Conversation

@tmcroberts97
Copy link
Copy Markdown
Contributor

Description

This PR fixes two issues with nvlink partitioning metrics:

  • Always record metrics, even if the partition monitor encounters an error.
  • populate the operation name correctly in metrics' applied changes tracker.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

- Always record metrics, even if the partition monitor encounters an
  error.
- populate the operation name correctly in metrics' applied changes
  tracker.

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
@tmcroberts97 tmcroberts97 requested a review from a team as a code owner February 12, 2026 21:22
@tmcroberts97 tmcroberts97 mentioned this pull request Feb 16, 2026
10 tasks
@tmcroberts97
Copy link
Copy Markdown
Contributor Author

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
@tmcroberts97 tmcroberts97 merged commit 95af704 into NVIDIA:main Feb 17, 2026
62 of 64 checks passed
jd-nv pushed a commit that referenced this pull request Feb 19, 2026
## Description
<!-- Describe what this PR does -->
This PR fixes two issues with nvlink partitioning metrics:
- Always record metrics, even if the partition monitor encounters an
error.
- populate the operation name correctly in metrics' applied changes
tracker.


## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality  
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated  
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
tmcroberts97 added a commit to tmcroberts97/infra-controller-core that referenced this pull request Mar 12, 2026
## Description
<!-- Describe what this PR does -->
This PR fixes two issues with nvlink partitioning metrics:
- Always record metrics, even if the partition monitor encounters an
error.
- populate the operation name correctly in metrics' applied changes
tracker.


## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality  
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated  
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
tmcroberts97 added a commit to tmcroberts97/infra-controller-core that referenced this pull request Mar 12, 2026
## Description
<!-- Describe what this PR does -->
This PR fixes two issues with nvlink partitioning metrics:
- Always record metrics, even if the partition monitor encounters an
error.
- populate the operation name correctly in metrics' applied changes
tracker.

## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
jd-nv pushed a commit that referenced this pull request Mar 12, 2026
## Description
<!-- Describe what this PR does -->
This PR fixes two issues with nvlink partitioning metrics:
- Always record metrics, even if the partition monitor encounters an
error.
- populate the operation name correctly in metrics' applied changes
tracker.

## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
nvcoop pushed a commit to nvcoop/bare-metal-manager-core that referenced this pull request Mar 12, 2026
## Description
<!-- Describe what this PR does -->
This PR fixes two issues with nvlink partitioning metrics:
- Always record metrics, even if the partition monitor encounters an
error.
- populate the operation name correctly in metrics' applied changes
tracker.


## Type of Change
<!-- Check one that best describes this PR -->
- [ ] **Add** - New feature or capability
- [ ] **Change** - Changes in existing functionality  
- [x] **Fix** - Bug fixes
- [ ] **Remove** - Removed features or deprecated functionality
- [ ] **Internal** - Internal changes (refactoring, tests, docs, etc.)

## Related Issues (Optional)
<!-- If applicable, provide GitHub Issue. -->

## Breaking Changes
- [ ] This PR contains breaking changes

<!-- If checked above, describe the breaking changes and migration steps
-->

## Testing
<!-- How was this tested? Check all that apply -->
- [ ] Unit tests added/updated
- [ ] Integration tests added/updated  
- [ ] Manual testing performed
- [ ] No testing required (docs, internal refactor, etc.)

## Additional Notes
<!-- Any additional context, deployment notes, or reviewer guidance -->

---------

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
kfelternv pushed a commit to kfelternv/infra-controller that referenced this pull request May 18, 2026
…es one (NVIDIA#275)

Signed-off-by: Patrice Breton <pbreton@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants