KubernetesHook: add AWS exec-auth botocore guardrails for EKS token flow (#60943) by Vamsi-klu · Pull Request #61936 · apache/airflow

Vamsi-klu · 2026-02-15T04:39:35Z

Why this change

Issue #60943 reports intermittent KubernetesPodOperator task failures on Celery workers when multiple tasks start together and kubeconfig uses aws eks get-token exec auth.

The failure mode is subtle:

the auth subprocess (aws eks get-token) can fail due to older botocore race behavior around ~/.aws/cli/cache
Kubernetes client then proceeds with invalid/empty auth and surfaces a generic 403 Forbidden
this looks identical to real RBAC failures, so operators often lose time debugging the wrong problem

This PR adds explicit runtime guardrails for that path so operators get a clear signal before task execution fails in a misleading way.

Impact of the change

This adds a policy-driven runtime check only when kubeconfig exec auth actually uses aws eks get-token:

warn (default): emits an actionable warning if botocore is vulnerable (< 1.40.2) or version cannot be detected
fail: hard-fails early with a clear error to enforce platform policy
ignore: bypasses the check when users intentionally manage this externally

Operational impact:

Improves diagnosability of a production issue that often appears as ambiguous 403
Reduces MTTR by surfacing root-cause guidance at connection/auth setup time
Adds governance controls for teams that need strict enforcement (fail) without forcing everyone into that mode
Keeps backwards compatibility with default warn

Scope and non-goals

Scope is intentionally limited to the AWS EKS exec-auth path (aws eks get-token) because this is the concrete failing path in Race Condition in AWS CLI Cache Creation During Parallel KubernetesPodOperator Authentication #60943.
This PR does not change Kubernetes retry semantics for 403 responses, and does not change auth flow for non-AWS exec plugins.

Configuration

New Kubernetes connection extra:

exec_auth_aws_cli_version_check_mode: warn (default) | fail | ignore

Validation

Added unit coverage for:
- kubeconfig exec-auth detection (aws eks get-token)
- botocore version parsing from aws --version
- mode behavior (warn, fail, ignore, invalid fallback)
- integration points in get_conn and default kubeconfig client path
Test command used:
- AIRFLOW_HOME=/tmp/airflow-60943 uv run --python 3.12 -m pytest providers/cncf/kubernetes/tests/unit/cncf/kubernetes/hooks/test_kubernetes.py -q

closes #60943

Srabasti · 2026-02-15T07:02:58Z

Static tests are failing @Vamsi-klu
Please run prek locally in your branch. Prek will fix any formatting errors, and then you can push the commit from your branch.

Vamsi-klu · 2026-02-15T07:28:31Z

@Srabasti i have updated the PR with the relevant checks. Can you please review this and let me know your feedback? Thanks!

jscheffl · 2026-02-15T22:57:55Z

+_AWS_EXEC_AUTH_VERSION_CHECK_MODE_FIELD = "exec_auth_aws_cli_version_check_mode"
+_AWS_EXEC_AUTH_VERSION_CHECK_MODES = {"warn", "fail", "ignore"}
+_AWS_EXEC_AUTH_AWS_BINARY_NAMES = {"aws", "aws.exe", "aws2", "aws2.exe"}
+_AWS_EXEC_AUTH_FIXED_BOTOCORE_VERSION = (1, 40, 2)
+_BOTOCORE_VERSION_PATTERN = re.compile(r"botocore/(?P<version>\d+(?:\.\d+){1,2})")


I do not understand why AKS specific settings are introduced in K8s provider package (as well as RST docs above and tests below). Why is this not makde in the AWS specific package?

I my view K8s standard package should not be tainted by AWS, Google or Azure specific handling if no K8s standard.

Thanks for the feedback @jscheffl, that's a fair point about keeping the K8s provider cloud agnostic.

The reason this was placed in KubernetesHook is that the vulnerability affects any kubeconfig using aws eks get-token exec auth, not just users going through EksHook or EksPodOperator. Many users configure a generic Kubernetes connection with a kubeconfig file that happens to use AWS exec auth and they never interact with the AWS provider at all.

That said, I agree AWS specific constants and detection logic shouldn't live here. How about this approach:

In the K8s provider, add a minimal, generic exec auth validation hook point in KubernetesHook.get_conn() (for example a discoverable entry point or registry pattern like _validate_exec_auth(kubeconfig, context)). No AWS/GCP/Azure specific code, just a generic extension mechanism.
In the AWS provider, register a validator that handles the AWS specific detection (aws eks get-token binary matching, botocore version parsing, subprocess call) and the connection extra field (exec_auth_aws_cli_version_check_mode). All AWS constants, helpers, tests, and docs move there.
This way the K8s hook stays cloud agnostic while still protecting users who use generic KubernetesHook with an EKS kubeconfig. It also makes the pattern extensible so any other provider could register their own exec auth validators in the future.

Would this approach work for you, or would you prefer everything moved entirely into the AWS provider package?

Not an expert here... @o-nikolas / @eladkal how were such things handled in the past if cloud-provider specific stuff needed to be integrated into a base provider package?

Off the top of my head I can't think of previous case quite like this. Can you @eladkal?

I see both sides, I don't like AWS stuff in the K8s hook, but I also see the point @Vamsi-klu mentions. And the workaround sounds quite a bit more complicated than the change already is, which I don't love 😬

I really don't understand what we are trying to solve here.
but if k8s provider needs optional stuff from amazon provider it should have set it as optional dependency

I'm pretty hesitant to add this much code to add a min version check like this, especially one that isn't really directly related to this provider.

The lower bound in the amazon provider is already higher than what we'd defend against here. And, our constraint files are already putting in a higher version. (2.11.1 has the new version, 2.11.0 did not, fwiw)

If we went with the proposed generic validator + validate botocore version in the aws provider, we'd not have a problem because of the min version over there (ignoring when that min was bumped over there).

I'm just wondering if this is now not really a likely issue with the latest versions of stuff already...

I think we better just close this as won't fix. This is just too much to accommodate older Boto versions where it's very unlikely that anyone needs this by now.

I'll close this PR and focus on other issues . Thanks everyone who commented your thoughts on this PR

Vamsi-klu · 2026-02-22T07:14:56Z

Collaborators for this PR: @codingrealitylabs and @girlcoder-gaming. They helped me raise this PR.

Add AWS exec auth botocore version guardrails

e4c89e4

Vamsi-klu requested review from hussein-awala, jedcunningham and jscheffl as code owners February 15, 2026 04:39

boring-cyborg Bot added area:providers kind:documentation provider:cncf-kubernetes Kubernetes (k8s) provider related issues labels Feb 15, 2026

Format Kubernetes exec-auth guardrails with ruff

cfd722d

jscheffl reviewed Feb 15, 2026

View reviewed changes

jscheffl mentioned this pull request Feb 15, 2026

Fix race condition in AWS CLI cache creation during parallel KubernetesPodOperator auth (#60943) #61935

Closed

2 tasks

Vamsi-klu mentioned this pull request Feb 22, 2026

EksHook: pre-create ~/.aws/cli/cache to prevent botocore race condition (#60943) #62307

Closed

4 tasks

jedcunningham closed this Feb 26, 2026

jscheffl mentioned this pull request Mar 14, 2026

Fix exec-plugin cache dir race condition in KubernetesHook #63610

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KubernetesHook: add AWS exec-auth botocore guardrails for EKS token flow (#60943)#61936

KubernetesHook: add AWS exec-auth botocore guardrails for EKS token flow (#60943)#61936
Vamsi-klu wants to merge 2 commits into
apache:mainfrom
Vamsi-klu:codex/60943-exec-auth-version-guardrails

Vamsi-klu commented Feb 15, 2026

Uh oh!

Srabasti commented Feb 15, 2026

Uh oh!

Vamsi-klu commented Feb 15, 2026

Uh oh!

jscheffl Feb 15, 2026

Uh oh!

Vamsi-klu Feb 16, 2026

Uh oh!

jscheffl Feb 16, 2026

Uh oh!

o-nikolas Feb 17, 2026

Uh oh!

eladkal Feb 24, 2026

Uh oh!

jedcunningham Feb 25, 2026

Uh oh!

eladkal Feb 25, 2026

Uh oh!

Vamsi-klu Feb 25, 2026

Uh oh!

Vamsi-klu commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

Conversation

Vamsi-klu commented Feb 15, 2026

Why this change

Impact of the change

Scope and non-goals

Configuration

Validation

Uh oh!

Srabasti commented Feb 15, 2026

Uh oh!

Vamsi-klu commented Feb 15, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Vamsi-klu commented Feb 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants