You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is not a request to fix the error itself. The error is correct
given the state of the local subscription cache at the time. During an
upstream incident, ARM / Azure Resource Graph returned HTTP 200 with a
truncated subscription list (intermittently — the same call would
sometimes return the full list and sometimes a reduced one). az login
faithfully persisted the truncated list to ~/.azure/azureProfile.json,
and the subscription the user later asked for via az account set --subscription <ID> was simply not in that cached list, so the CLI
raised:
ERROR: The subscription of '<ID>' doesn't exist in cloud 'AzureCloud'.
The upstream root cause has since been mitigated. This issue is a request
for the azure-cli maintainers to consider, as a repair item, two
resilience changes that would (a) have made the next occurrence easier to
diagnose with upstream teams, and (b) would likely have self-healed the
incident without operator intervention. Both are scoped to the same code
path — please feel free to accept, decline, or scope down at your
discretion.
The two asks are listed under Expected behavior below.
Related command
az account set --subscription <SUBSCRIPTION_ID>
The same cache-only-then-throw pattern also exists in Profile.get_subscription,
which is hit by any command that takes --subscription (e.g. az account show -s <SUBSCRIPTION_ID>). The throw message there is
different — "Subscription '' not found. Check the spelling and casing
and try again." — but the underlying behavior (no refresh, no upstream
correlation context) is the same. Both functions would benefit from the
same two enhancements.
Errors
Below is the relevant excerpt from the original failing run. Subscription /
tenant GUIDs have been replaced with <SUBSCRIPTION_ID> and the unrelated
pipeline correlation ID with <CORRELATION_ID>. Timestamps and the
surrounding log markers are preserved verbatim:
2026-05-18T17:57:30.2075706Z [command]C:\Windows\system32\cmd.exe /D /S /C ""C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd" account set --subscription <SUBSCRIPTION_ID>"
2026-05-18T17:57:31.3353709Z ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.3380491Z ##[error]Error Code: [1]
2026-05-18T17:57:31.3383338Z ##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=<CORRELATION_ID>;]Error Code: [1]
2026-05-18T17:57:31.3423787Z ##[error]Error: Error in setting up subscription
2026-05-18T17:57:31.3425309Z ##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=<CORRELATION_ID>;]Error: Error in setting up subscription
2026-05-18T17:57:31.3517954Z ##[debug]task result: Failed
2026-05-18T17:57:31.3717321Z ##[error]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.3719136Z ##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=<CORRELATION_ID>;]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.3881519Z ##[debug]Processed: ##vso[task.complete result=Failed;]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.4751521Z ##[debug]task result: Failed
2026-05-18T17:57:31.4781067Z ##[error]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.4782792Z ##[debug]Processed: ##vso[task.issue type=error;source=TaskInternal;correlationId=<CORRELATION_ID>;]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
2026-05-18T17:57:31.4796809Z ##[debug]Processed: ##vso[task.complete result=Failed;]Script failed with error: ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
The surrounding pipeline framing (##[error], ##[debug], ##vso[...])
is from a generic CI agent and is not part of the CLI output — the CLI
output proper is the single line:
ERROR: The subscription of '<SUBSCRIPTION_ID>' doesn't exist in cloud 'AzureCloud'.
Note what is not in the output: any indication of which upstream API
call provided the subscription list that the CLI is now consulting, any x-ms-request-id / x-ms-correlation-request-id from that call, the
HTTP status, or the timestamp of the last refresh. This is the basis for
ask #1 below.
Issue script & Debug output
A minimal repro of the failing invocation is:
az account set --subscription <SUBSCRIPTION_ID> --debug
--debug output is not available for the original incident. The
failure occurred in a production CI pipeline where the command was invoked
without --debug, and the local CLI state from that agent at that time
has since been discarded; the upstream condition has also been mitigated,
so re-running today would no longer reproduce.
It is worth noting, however, that even with --debug the existing log
output would not include the ARM / ARG correlation IDs from the subscriptions.list (or equivalent) call that originally populated the
cache — by the time the cache-miss is detected, that earlier HTTP call is
long gone and its response headers have not been retained anywhere. This
is exactly the gap that ask #1 below is asking you to consider closing.
If maintainers can suggest a synthetic repro path (e.g., edit ~/.azure/azureProfile.json to remove the target subscription entry, then
run az account set --subscription <SUBSCRIPTION_ID> --debug), I am happy
to capture and attach a --debug trace from that controlled scenario.
Expected behavior
Two independent suggestions, both scoped to Profile.set_active_subscription and Profile.get_subscription in src/azure-cli-core/azure/cli/core/_profile.py. Either is useful on its
own; together they fully address the failure mode we hit.
Ask #1 — surface upstream correlation context in the cache-lookup error
When the CLI throws
The subscription of '<id>' doesn't exist in cloud '<cloud>'.
…augment the error (and/or --debug log) with whatever provenance is
available about the cached subscription list, so users have something
concrete to give to the ARM / ARG owners when the cause is suspected to be
upstream (truncated list, replica skew, etc.). Useful fields would include:
x-ms-request-id and x-ms-correlation-request-id from the subscriptions.list (or equivalent) call that populated the cache,
the HTTP status and endpoint of that call,
the timestamp of the last successful refresh,
(optionally) the count of subscriptions currently in the cache.
This requires retaining a small amount of metadata alongside the cached
subscription list — currently, ~/.azure/azureProfile.json has no
upstream-call provenance at all, so when the cache turns out to be wrong
there is no evidence trail to hand to the ARM/ARG team.
When set_active_subscription (and get_subscription when called with an
explicit subscription argument) finds zero matches in the cached
list:
Re-query subscriptions from ARM via the same path as az account list --refresh (SubscriptionFinder.find_using_*_tenant),
persist the result, and retry the match exactly once.
If still not found, raise the existing CLIError, but augmented with
the refresh-attempt outcome (and ideally the provenance from ask Code bootstrap #1
above) so the user knows the cache has already been refreshed.
This would have self-healed our incident: the upstream API was
intermittently returning the correct list and intermittently returning a
truncated one, so a single retry on cache-miss had a high probability of
landing on a good response and would have unblocked the pipeline without
operator intervention.
Constraints:
No regression on the happy path — if the cache already contains the
target subscription, no extra ARM round-trip should occur.
At most one refresh attempt per command, to avoid storms when a
subscription is truly missing.
Environment Summary
az --version from the original incident is not available — see Issue
script & Debug output above. The failure was observed on Microsoft-hosted
Windows CI agents using the Azure CLI installed at C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd. The patch
version shipped on those agents at the time of the incident can be
provided on request if it would help narrow down the affected releases.
The relevant code path in _profile.py appears unchanged on the current dev branch (see Additional context).
azure-cli <not available from incident>
core <not available from incident>
telemetry <not available from incident>
The exact CLIError we observed is raised in Profile.set_active_subscription
(src/azure-cli-core/azure/cli/core/_profile.py, lines ~564–580). The
function calls load_cached_subscriptions(all_clouds=True), filters
in-memory, and raises CLIError("The subscription of '{}' {} in cloud '{}'.") if len(result) != 1. There is no call out to ARM at any point.
The symmetric case is Profile.get_subscription
(same file, lines ~600–616), which also reads from load_cached_subscriptions() and raises "Subscription '' not
found. Check the spelling and casing and try again." with no upstream
retry.
The cache is populated only when _set_subscriptions is called from Profile.refresh_accounts (triggered by az login / az account list --refresh), via SubscriptionFinder.find_using_*_tenant.
Unit test (regression): cache already contains the target → set_active_subscription succeeds with zero ARM calls.
Unit test (ask Code bootstrap #1): when the cache-lookup CLIError is raised, the
message (or the --debug log) includes whatever upstream-call
provenance was retained from the most recent subscriptions.list call
that populated the cache (request id, correlation id, status, timestamp).
Provenance
Reported from internal Microsoft incidents (links available to Microsoft
employees on request). Multiple CI pipelines failed when their hosted
agents' Azure CLI cache went stale after the upstream subscription-listing
service returned an intermittently-truncated HTTP 200; every subsequent az account set --subscription <ID> in the same job failed until the
agent was rebuilt or az account list --refresh was manually run. The
upstream issue has since been mitigated — this issue exists only to ask
whether you want to harden against the class.
Describe the bug
This is not a request to fix the error itself. The error is correct
given the state of the local subscription cache at the time. During an
upstream incident, ARM / Azure Resource Graph returned HTTP 200 with a
truncated subscription list (intermittently — the same call would
sometimes return the full list and sometimes a reduced one).
az loginfaithfully persisted the truncated list to
~/.azure/azureProfile.json,and the subscription the user later asked for via
az account set --subscription <ID>was simply not in that cached list, so the CLIraised:
The upstream root cause has since been mitigated. This issue is a request
for the azure-cli maintainers to consider, as a repair item, two
resilience changes that would (a) have made the next occurrence easier to
diagnose with upstream teams, and (b) would likely have self-healed the
incident without operator intervention. Both are scoped to the same code
path — please feel free to accept, decline, or scope down at your
discretion.
The two asks are listed under Expected behavior below.
Related command
The same cache-only-then-throw pattern also exists in
Profile.get_subscription,which is hit by any command that takes
--subscription(e.g.az account show -s <SUBSCRIPTION_ID>). The throw message there isdifferent — "Subscription '' not found. Check the spelling and casing
and try again." — but the underlying behavior (no refresh, no upstream
correlation context) is the same. Both functions would benefit from the
same two enhancements.
Errors
Below is the relevant excerpt from the original failing run. Subscription /
tenant GUIDs have been replaced with
<SUBSCRIPTION_ID>and the unrelatedpipeline correlation ID with
<CORRELATION_ID>. Timestamps and thesurrounding log markers are preserved verbatim:
The surrounding pipeline framing (
##[error],##[debug],##vso[...])is from a generic CI agent and is not part of the CLI output — the CLI
output proper is the single line:
Note what is not in the output: any indication of which upstream API
call provided the subscription list that the CLI is now consulting, any
x-ms-request-id/x-ms-correlation-request-idfrom that call, theHTTP status, or the timestamp of the last refresh. This is the basis for
ask #1 below.
Issue script & Debug output
A minimal repro of the failing invocation is:
--debugoutput is not available for the original incident. Thefailure occurred in a production CI pipeline where the command was invoked
without
--debug, and the local CLI state from that agent at that timehas since been discarded; the upstream condition has also been mitigated,
so re-running today would no longer reproduce.
It is worth noting, however, that even with
--debugthe existing logoutput would not include the ARM / ARG correlation IDs from the
subscriptions.list(or equivalent) call that originally populated thecache — by the time the cache-miss is detected, that earlier HTTP call is
long gone and its response headers have not been retained anywhere. This
is exactly the gap that ask #1 below is asking you to consider closing.
If maintainers can suggest a synthetic repro path (e.g., edit
~/.azure/azureProfile.jsonto remove the target subscription entry, thenrun
az account set --subscription <SUBSCRIPTION_ID> --debug), I am happyto capture and attach a
--debugtrace from that controlled scenario.Expected behavior
Two independent suggestions, both scoped to
Profile.set_active_subscriptionandProfile.get_subscriptioninsrc/azure-cli-core/azure/cli/core/_profile.py. Either is useful on itsown; together they fully address the failure mode we hit.
Ask #1 — surface upstream correlation context in the cache-lookup error
When the CLI throws
…augment the error (and/or
--debuglog) with whatever provenance isavailable about the cached subscription list, so users have something
concrete to give to the ARM / ARG owners when the cause is suspected to be
upstream (truncated list, replica skew, etc.). Useful fields would include:
x-ms-request-idandx-ms-correlation-request-idfrom thesubscriptions.list(or equivalent) call that populated the cache,This requires retaining a small amount of metadata alongside the cached
subscription list — currently,
~/.azure/azureProfile.jsonhas noupstream-call provenance at all, so when the cache turns out to be wrong
there is no evidence trail to hand to the ARM/ARG team.
Ask #2 — refresh-on-miss with a single retry
When
set_active_subscription(andget_subscriptionwhen called with anexplicit
subscriptionargument) finds zero matches in the cachedlist:
az account list --refresh(SubscriptionFinder.find_using_*_tenant),persist the result, and retry the match exactly once.
CLIError, but augmented withthe refresh-attempt outcome (and ideally the provenance from ask Code bootstrap #1
above) so the user knows the cache has already been refreshed.
This would have self-healed our incident: the upstream API was
intermittently returning the correct list and intermittently returning a
truncated one, so a single retry on cache-miss had a high probability of
landing on a good response and would have unblocked the pipeline without
operator intervention.
Constraints:
target subscription, no extra ARM round-trip should occur.
subscription is truly missing.
Environment Summary
az --versionfrom the original incident is not available — see Issuescript & Debug output above. The failure was observed on Microsoft-hosted
Windows CI agents using the Azure CLI installed at
C:\Program Files\Microsoft SDKs\Azure\CLI2\wbin\az.cmd. The patchversion shipped on those agents at the time of the incident can be
provided on request if it would help narrow down the affected releases.
The relevant code path in
_profile.pyappears unchanged on the currentdevbranch (see Additional context).Additional context
Where in the source
Pinned at commit
62e498cef22e6bb5cc4be5b5261a7c2acc8d0ab0:The exact
CLIErrorwe observed is raised inProfile.set_active_subscription(
src/azure-cli-core/azure/cli/core/_profile.py, lines ~564–580). Thefunction calls
load_cached_subscriptions(all_clouds=True), filtersin-memory, and raises
CLIError("The subscription of '{}' {} in cloud '{}'.")iflen(result) != 1. There is no call out to ARM at any point.The symmetric case is
Profile.get_subscription(same file, lines ~600–616), which also reads from
load_cached_subscriptions()and raises "Subscription '' notfound. Check the spelling and casing and try again." with no upstream
retry.
The cache is populated only when
_set_subscriptionsis called fromProfile.refresh_accounts(triggered byaz login/az account list --refresh), viaSubscriptionFinder.find_using_*_tenant.Suggested acceptance criteria
subscription; ARM returns it on refresh →
set_active_subscriptionsucceeds, cache is updated, exactly one refresh call is made.
ARM refresh also does not return it → the existing
CLIErroris raised,and the message indicates a refresh was attempted.
set_active_subscriptionsucceeds with zero ARM calls.CLIErroris raised, themessage (or the
--debuglog) includes whatever upstream-callprovenance was retained from the most recent
subscriptions.listcallthat populated the cache (request id, correlation id, status, timestamp).
Provenance
Reported from internal Microsoft incidents (links available to Microsoft
employees on request). Multiple CI pipelines failed when their hosted
agents' Azure CLI cache went stale after the upstream subscription-listing
service returned an intermittently-truncated
HTTP 200; every subsequentaz account set --subscription <ID>in the same job failed until theagent was rebuilt or
az account list --refreshwas manually run. Theupstream issue has since been mitigated — this issue exists only to ask
whether you want to harden against the class.