Skip to content

feat: sync list gpu cards with new sdk usage#593

Open
steven-chiu-bigstack wants to merge 3 commits into
developfrom
feat/sync-list-gpu-cards-with-new-sdk
Open

feat: sync list gpu cards with new sdk usage#593
steven-chiu-bigstack wants to merge 3 commits into
developfrom
feat/sync-list-gpu-cards-with-new-sdk

Conversation

@steven-chiu-bigstack

Copy link
Copy Markdown
Contributor

What type of PR is this?

Feat

Which issue(s) this PR fixes?

Related to bigstack-oss/cubecos#859

What this PR does?

Sync list GPU cards API implementation with the new SDK usage.

Test results (optional)

1). make sure the api docs have been updated


2). make sure the api works properly

Signed-off-by: steven-chiu-bigstack <steven.chiu@bigstack.co>
Signed-off-by: steven-chiu-bigstack <steven.chiu@bigstack.co>
Comment thread docs/developing/README.md
Comment thread internal/apis/v1/handlers/nodes/gpu.go Outdated
continue
}

profileId := profileIdMap[*instance.ProfileAlias]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should use ok to check is profile exist:

profileId, ok := profileIdMap[*instance.ProfileAlias]
if ok {
  remainingMap[profileId] = max(remainingMap[profileId]-1, 0)
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you're right. Otherwise profileId will default to 0 thanks to GO's built-in behavior.

Comment thread internal/cubecos/nodes.go Outdated
}

if !IsHexSuccessful(err) {
log.Errorf("nodes: output error when listing vgpu profiles for gpu %s via hex_sdk: %v", gpuId, err)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The err is already guaranteed nil in the previous if err != nil {.

So following code will never be executed.

log.Errorf("nodes: output error when listing vgpu profiles for gpu %s via hex_sdk: %v", gpuId, err)
		return collection

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're definitely right!
I copied this pattern from other files; not sure why there's a bunch of redundant check in the codebase. 🤔
I'll proceed to remove this if !IsHexSuccessful(err) { ... } block because it's unnecessary.

Code_Max8AJPqwY.mp4

AliasName string `json:"aliasName"`
Count int `json:"count"`
Remaining int `json:"remaining"`
Id uint32 `json:"id"`

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These field change from string to number.
Maybe we should update the openapi doc also?

@steven-chiu-bigstack steven-chiu-bigstack Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, and if I'm not mistaken, it should've been updated already (for list GPU cards API, see docs):

{168C33DD-DFFB-4AC9-8D62-E8B6E6E79DD1}

As for the openapi docs for updating GPU API, perhaps we can handle that during your API implementation?

Comment thread internal/apis/v1/handlers/nodes/gpu.go Outdated
for _, instance := range attachedInstances {
profile := profileMapByAlias[*instance.ProfileAlias]
profileInstanceCountMap[profile.Id]++
for _, profile := range *hexProfileCollection.Sriov {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a nil guard before for _, profile := range *hexProfileCollection.Sriov {?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good catch!

Comment thread internal/apis/v1/handlers/nodes/gpu.go Outdated
profile.Remaining = profile.Count - instanceCount
migProfileRemainingMap := createMigProfileRemainingMap(hexProfileCollection.MigBacked, attachedInstances)

for _, profile := range *hexProfileCollection.MigBacked {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should add a nil guard before or _, profile := range *hexProfileCollection.MigBacked {?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, good catch!

Comment thread internal/apis/v1/handlers/nodes/gpu.go Outdated
}

vgpuProfiles, hexProfilesMap := listVgpuProfiles(device, hexGpu)
hexProfilesMap, hexProfileCollection := cubecos.GetNodeVgpuProfilesMap(hexGpu.PciAddress)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion:
We can call isVgpu function before cubecos.GetNodeVgpuProfilesMap(hexGpu.PciAddress).
If type is unset or pgpu, we can just skip the cubecos.GetNodeVgpuProfilesMap(hexGpu.PciAddress) call.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. 👌

if vgpuProfiles != nil && attachedInstances != nil {
updateVgpuProfilesRemaining(*vgpuProfiles, *attachedInstances)
}
profileCollection := toProfileCollection(hexProfileCollection, attachedInstances)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussion:
We can call isVgpu function before toProfileCollection.
If type is unset or pgpu, we can just skip the toProfileCollection call and assign an empty array to profileCollection.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer keeping the current pattern since it's pure data mapping (raw data to view model), which should happen regardless of the GPU type.

Signed-off-by: steven-chiu-bigstack <steven.chiu@bigstack.co>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants