Skip to content

DRA API: implement ResourceClaim strategy for DRADeviceTaints#132927

Merged
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
pohly:dra-api-strategy-todo
Oct 21, 2025
Merged

DRA API: implement ResourceClaim strategy for DRADeviceTaints#132927
k8s-ci-robot merged 2 commits intokubernetes:masterfrom
pohly:dra-api-strategy-todo

Conversation

@pohly
Copy link
Copy Markdown
Contributor

@pohly pohly commented Jul 14, 2025

What type of PR is this?

/kind bug

What this PR does / why we need it:

Dropping the disabled "Tolerations" field in the ResourceClaim API was missing.

This wasn't possible at the time of implementing the Device Taints API, at least not completely, because it depended on prioritized list being merged first, to cover the "FirstAvailable" field introduced together with that feature.

That the device taints PR got merged despite this gap was an oversight. The confusing TODO probably didn't help: the entire implementation was missing (or got lost due to a bad merge conflict resolution, not sure anymore) and it referenced the wrong other feature (partitionable devices doesn't affect ResourceClaim).

Which issue(s) this PR is related to:

KEP: kubernetes/enhancements#5055

Special notes for your reviewer:

This allowed clients to set the field when it should have been dropped. It simply had no effect. Clients can keep updating such objects because of the "feature in use" check, as long as the spec remains immutable.

Does this PR introduce a user-facing change?

DRA API: the "tolerations" field in exact and sub requests now gets dropped properly when the DRADeviceTaints API is disabled.

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/bug Categorizes issue or PR as related to a bug. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Jul 14, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

This issue is currently awaiting triage.

If a SIG or subproject determines this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@k8s-ci-robot k8s-ci-robot added needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. labels Jul 14, 2025
@k8s-ci-robot k8s-ci-robot requested review from bart0sh and klueska July 14, 2025 13:12
@k8s-ci-robot k8s-ci-robot added the wg/device-management Categorizes an issue or PR as relevant to WG Device Management. label Jul 14, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Jul 14, 2025
// dropDisabledFields removes fields which are covered by a feature gate.
func dropDisabledFields(newClaim, oldClaim *resource.ResourceClaim) {
dropDisabledDRAPrioritizedListFields(newClaim, oldClaim)
dropDisabledDRADeviceTaintsFields(newClaim, oldClaim) // Intentionally after dropDisabledDRAPrioritizedListFields to avoid iterating over FirstAvailable slice which needs to be dropped.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need the same functionality in ResourceClaimTemplate?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Will add it there, too. Good catch!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For some reason, ResourceClaimTemplate update testing was less complete than the update testing of ResourceClaim. Fixed by copying the entire TestStrategyUpdate over and switching it to testing ResourceClaimTemplates.

I kept the existing test, to ensure that I am not removing coverage.

}
},
},
"drop-fields-device-taints": {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have tests here that covers interaction with the PrioritizedList feature?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

}
},
},
"drop-fields-device-taints-in-prioritized-list": {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we have some tests that don't include PrioritizedList?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Now I am questioning my own sanity. I could have sworn that I had added them.

@pohly pohly force-pushed the dra-api-strategy-todo branch from edeec84 to fd7c1c2 Compare July 15, 2025 09:14
@k8s-ci-robot k8s-ci-robot added size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. and removed size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Jul 15, 2025
@pohly pohly moved this from 🆕 New to 👀 In review in Dynamic Resource Allocation Jul 15, 2025
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Jul 17, 2025

/assign @mortent

Please check again, it should be complete now.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2025
@pohly pohly force-pushed the dra-api-strategy-todo branch from fd7c1c2 to 9fb7bbf Compare August 18, 2025 13:39
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Aug 18, 2025
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Aug 18, 2025

This fell through the cracks because other PRs were more important and both @mortent and I were on vacation.

Shall we still try to get this included in v1.34? It's "only" for an alpha feature, but getting the API right is important.

/assign @liggitt

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Aug 19, 2025

Shall we still try to get this included in v1.34? It's "only" for an alpha feature, but getting the API right is important.

/assign @liggitt

I just saw this, paging in context ... this is fixing clearing an alpha field which was added in 1.33 in #130447?

I think it's too late for any non-release-blocking changes in 1.34, and if 1.33 already released without this, I don't think we'd consider it release-blocking for 1.34 either.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Aug 19, 2025

/milestone v1.35

@k8s-ci-robot k8s-ci-robot added this to the v1.35 milestone Aug 19, 2025
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Sep 12, 2025
@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Sep 15, 2025

/test pull-kubernetes-node-e2e-containerd-2-0-dra-alpha-beta-features

We had a test flake with some tests not cleaning up properly after themselves, should be fixed now (
#134047).

@liggitt liggitt moved this from Changes requested to In progress in API Reviews Sep 24, 2025
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Sep 24, 2025

Thanks for the update, @tallclair will take a pass and then we can get this in

@tico88612 tico88612 moved this to Pending inclusion in [sig-release] Bug Triage Oct 12, 2025
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 13, 2025
@Prajyot-Parab
Copy link
Copy Markdown
Member

Hello @pohly @tallclair @liggitt @mortent
This PR has not been updated for 3 weeks, so I'd like to check what's the status. If there's anything we can do, please let us know.
The code freeze is starting Friday 7th November 2025, 12:00 UTC (about 3 weeks from now). Please make sure the PR has both lgtm and approved labels before the code freeze. Thanks!

@Prajyot-Parab Prajyot-Parab moved this from Pending inclusion to Tracked in [sig-release] Bug Triage Oct 14, 2025
…or DRADeviceTaints

This wasn't possible at the time of implementing the Device Taints API, at
least not completely, because it depended on prioritized list being merged
first, to cover the "FirstAvailable" field introduced together with that
feature.

That the device taints PR got merged despite this gap was an oversight. The
confusing TODO probably didn't help: the entire implementation was missing (or
got lost due to a bad merge conflict resolution, not sure anymore) and it
referenced the wrong other feature (partitionable devices doesn't affect
ResourceClaim).

For some reason, ResourceClaimTemplate update testing was less complete than
the update testing of ResourceClaim. Fixed by copying the entire
TestStrategyUpdate over and switching it to testing ResourceClaimTemplates.
@pohly pohly force-pushed the dra-api-strategy-todo branch from c812bbd to b556bbe Compare October 15, 2025 13:45
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Oct 15, 2025
Copy link
Copy Markdown
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, just a nit about the file path.

Comment thread pkg/api/resourceclaimspec/util.go
…mTemplate

The spec is the same in both, so those fields are now handled by common
code. For ResourceClaim spec updates, the "in use" check now only considers
the spec.

Theoretically some features could be in use in an old ResourceClaim status
and not in use in the spec. This can only occur in a spec update, which is
currently prevented because the entire spec is immutable. Even if it was
allowed, preventing adding disabled fields to the spec is the right thing to
do regardless of what may have ended up in the status earlier.
@pohly pohly force-pushed the dra-api-strategy-todo branch from b556bbe to da80b55 Compare October 21, 2025 10:23
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

k8s-ci-robot commented Oct 21, 2025

@pohly: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
pull-kubernetes-linter-hints da80b55 link false /test pull-kubernetes-linter-hints

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@pohly
Copy link
Copy Markdown
Contributor Author

pohly commented Oct 21, 2025

/skip

"fake.NewSimpleClientset is deprecated" - not sure whether I really should care. It's good enough here.

@liggitt
Copy link
Copy Markdown
Member

liggitt commented Oct 21, 2025

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 21, 2025
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

LGTM label has been added.

DetailsGit tree hash: 4515ae67cd2d34590c91b4f599dc8a1854a7c0bd

@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: liggitt, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 21, 2025
@liggitt
Copy link
Copy Markdown
Member

liggitt commented Oct 21, 2025

/cla

@liggitt liggitt added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. and removed cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 21, 2025
@k8s-ci-robot k8s-ci-robot merged commit 3eeb838 into kubernetes:master Oct 21, 2025
20 checks passed
@github-project-automation github-project-automation Bot moved this from Tracked to Done in [sig-release] Bug Triage Oct 21, 2025
@liggitt liggitt moved this from In progress to API review completed, 1.35 in API Reviews Oct 23, 2025
@pohly pohly moved this from 👀 In review to ✅ Done in Dynamic Resource Allocation Oct 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api-review Categorizes an issue or PR as actively needing an API review. approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. needs-priority Indicates a PR lacks a `priority/foo` label and requires one. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. wg/device-management Categorizes an issue or PR as relevant to WG Device Management.

Projects

Status: API review completed, 1.35
Status: ✅ Done
Archived in project

Development

Successfully merging this pull request may close these issues.

9 participants