Skip to content

fix: nvlink partitioning metrics#247

Closed
tmcroberts97 wants to merge 1 commit into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-metrics
Closed

fix: nvlink partitioning metrics#247
tmcroberts97 wants to merge 1 commit into
NVIDIA:mainfrom
tmcroberts97:fix/nvl-metrics

Conversation

@tmcroberts97

Copy link
Copy Markdown
Contributor

Description

This MR fixes some issues with nvlink partitioning metrics:

  • Always record metrics, even if the partition monitor encounters an error.
  • populate the operation name correctly in metrics' applied changes tracker.

Type of Change

  • Add - New feature or capability
  • Change - Changes in existing functionality
  • Fix - Bug fixes
  • Remove - Removed features or deprecated functionality
  • Internal - Internal changes (refactoring, tests, docs, etc.)

Related Issues (Optional)

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required (docs, internal refactor, etc.)

Additional Notes

- Always record metrics, even if the partition monitor encounters an
  error.
- populate the operation name correctly in metrics' applied changes
  tracker.

Signed-off-by: Thomas McRoberts <tmcroberts@nvidia.com>
@tmcroberts97 tmcroberts97 requested a review from a team February 11, 2026 19:23
otel.status_message = tracing::field::Empty,
metrics = tracing::field::Empty,
);
let _enter = check_nvl_partition_span.enter();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that doesn't work with async/await. You need self.run_single_iteration_inner(&mut metrics).instrument(check_nvl_partition_span).await

@tmcroberts97

Copy link
Copy Markdown
Contributor Author

Closing in favour of #275 because the fork is stale

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants