Describe the bug
When authenticating a service principal via OIDC/federated token (no secret), az login intermittently fails with:
Attempting Azure CLI login by using OIDC...
Error: No subscriptions found for ***.
Re-running the exact same command, unchanged, succeeds. We see this in ~10–20% of CI runs. Configuration (SP, tenant, subscription, RBAC) is stable and correct.
This is the CLI-side counterpart of the widely-reported GitHub Actions symptom in Azure/login#592 (where @isra-fel is investigating) and the retry request in Azure/login#591. Since azure/login just shells out to az login, I believe the behaviour originates here, in Profile.login().
Root cause analysis
In src/azure-cli-core/azure/cli/core/_profile.py, Profile.login() calls SubscriptionFinder.find_using_specific_tenant() → a single client.subscriptions.list(). I believe ARM transiently returns a successful HTTP 200 with an empty value[] (not an error), because newly-effective access can lag due to ARM/RBAC propagation ("up to 10 minutes," with ARM caching — see Troubleshoot Azure RBAC). The empty list then trips:
if not subscriptions and not allow_no_subscriptions:
...
raise CLIError("No subscriptions found for {}.".format(username))
Crucially, the azure-core RetryPolicy cannot mitigate this: is_retry() returns False for any status_code < 400 and never inspects the body, so a 200-with-empty-list is terminal to the HTTP retry layer. The retry has to live at the application layer, where the empty result is evaluated.
Caveat: I've verified each link in the chain — empty-200 semantics (Subscriptions - List), the RetryPolicy behaviour (azure-core _retry.py), and the documented RBAC propagation latency — from primary sources, but I have not found an MS doc attributing this exact intermittent-OIDC symptom to the race verbatim; it is inferred and corroborated by Azure/login#592.
Proposed fix
A bounded retry of subscription discovery, firing only on the exact path that would otherwise raise (not subscriptions and not is_bare_mode and not allow_no_subscriptions), with exponential backoff + jitter and an environment variable to tune/disable it. This adds zero latency for --allow-no-subscriptions and --skip-subscription-discovery logins (excluded by construction). I have a worked-out patch design and unit tests ready to implement, and will open a PR as soon as the approach/layer is confirmed.
Question for maintainers
Before I open the PR: do you agree the fix belongs in azure-cli (Profile.login) rather than in azure/login? And is a small, env-tunable retry-on-empty acceptable here, or would you prefer a different approach (e.g. surfacing a clearer transient error)?
Expected behavior
A transient empty subscription response from ARM immediately after token issuance should not fail az login; it should be retried briefly before giving up, so legitimate logins do not fail ~10–20% of the time.
Environment summary
Describe the bug
When authenticating a service principal via OIDC/federated token (no secret),
az loginintermittently fails with:Re-running the exact same command, unchanged, succeeds. We see this in ~10–20% of CI runs. Configuration (SP, tenant, subscription, RBAC) is stable and correct.
This is the CLI-side counterpart of the widely-reported GitHub Actions symptom in Azure/login#592 (where @isra-fel is investigating) and the retry request in Azure/login#591. Since
azure/loginjust shells out toaz login, I believe the behaviour originates here, inProfile.login().Root cause analysis
In
src/azure-cli-core/azure/cli/core/_profile.py,Profile.login()callsSubscriptionFinder.find_using_specific_tenant()→ a singleclient.subscriptions.list(). I believe ARM transiently returns a successful HTTP 200 with an emptyvalue[](not an error), because newly-effective access can lag due to ARM/RBAC propagation ("up to 10 minutes," with ARM caching — see Troubleshoot Azure RBAC). The empty list then trips:Crucially, the
azure-coreRetryPolicycannot mitigate this:is_retry()returnsFalsefor anystatus_code < 400and never inspects the body, so a 200-with-empty-list is terminal to the HTTP retry layer. The retry has to live at the application layer, where the empty result is evaluated.Proposed fix
A bounded retry of subscription discovery, firing only on the exact path that would otherwise raise (
not subscriptions and not is_bare_mode and not allow_no_subscriptions), with exponential backoff + jitter and an environment variable to tune/disable it. This adds zero latency for--allow-no-subscriptionsand--skip-subscription-discoverylogins (excluded by construction). I have a worked-out patch design and unit tests ready to implement, and will open a PR as soon as the approach/layer is confirmed.Question for maintainers
Before I open the PR: do you agree the fix belongs in
azure-cli(Profile.login) rather than inazure/login? And is a small, env-tunable retry-on-empty acceptable here, or would you prefer a different approach (e.g. surfacing a clearer transient error)?Expected behavior
A transient empty subscription response from ARM immediately after token issuance should not fail
az login; it should be retried briefly before giving up, so legitimate logins do not fail ~10–20% of the time.Environment summary
azure/login@v3(commit532459e) on GitHub-hostedubuntu-latest, which invokes the runner's bundledaz. v3: Intermittent "No subscriptions found" failure with OIDC login despite subscription-id being provided login#592 reports it across theazshipped with loginv2.2.0andv3.auth-type: SERVICE_PRINCIPAL, explicitsubscription-id/tenant-id,environment: azurecloud.