feat(anc): gate check-hotfix on enable_provisioning_hotfix contract field#8717
feat(anc): gate check-hotfix on enable_provisioning_hotfix contract field#8717Devinwong wants to merge 1 commit into
Conversation
|
The latest Buf updates on your PR. Results from workflow Buf CI / buf (pull_request).
|
2d4b37d to
59b7cef
Compare
59b7cef to
d80ae7d
Compare
|
Acknowledged - no action needed. This is the automated Buf CI status, and it reports Build, Format, Lint, and Breaking all passing for the additive optional field |
d80ae7d to
6854cfa
Compare
f842590 to
3ebabf0
Compare
297282e to
d552a0a
Compare
|
Read-channel pivot note (no change to this PR's gating contract). The hotfix-pointer read channel is moving from Option 2 (kube-system ConfigMap + bootstrap token) to Option 4: the live-patching-service (LPS) IMDS-attested endpoint, which e2e validated is reachable pre-kubelet. That fetch/auth rewrite lives in #8696 and is channel-specific. This PR (2.1d) is channel-agnostic. The
|
d552a0a to
21ae228
Compare
…ield Replaces the env-delivery approach (systemd drop-in + cse_cmd.sh) with a single contract field. check-hotfix self-gates on the new AKSNodeConfig field enable_provisioning_hotfix (proto tag 45, optional bool); when it is not true the command no-ops with telemetry outcome=disabled and makes no apiserver call. Default-off, fail-open. Relaxes the ENABLE_PROVISIONING_HOTFIX env gate introduced in 2.1c so the wrapper calls check-hotfix unconditionally; gating now lives in the Go binary via the contract field as the single source of truth. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
9b0f1fd to
abdcd9f
Compare
21ae228 to
5a5b74a
Compare
Changes cached containers or packages on windows VHDsPlease get a Windows SIG member to approve. The following dif file shows any additions or deletions from what will be cached on windows VHDs organised by VHD type.
diff --git a/vhd_files/2022-containerd-gen2.txt b/vhd_files/2022-containerd-gen2.txt
index db10c9e..c51a47f 100644
--- a/vhd_files/2022-containerd-gen2.txt
+++ b/vhd_files/2022-containerd-gen2.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -129,0 +130 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -131 +131,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -133 +133,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -135 +135,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -137 +136,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2022-containerd.txt b/vhd_files/2022-containerd.txt
index 94de353..7312c49 100644
--- a/vhd_files/2022-containerd.txt
+++ b/vhd_files/2022-containerd.txt
@@ -122,0 +123 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -124 +124,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -129,0 +130 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -131 +131,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -133 +133,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -135 +135,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -137 +136,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2025-gen2.txt b/vhd_files/2025-gen2.txt
index d0ea692..36e3641 100644
--- a/vhd_files/2025-gen2.txt
+++ b/vhd_files/2025-gen2.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -59,0 +60 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -61 +61,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -63 +63,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -65 +65,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -67 +66,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1
diff --git a/vhd_files/2025.txt b/vhd_files/2025.txt
index ab44d8b..b8873d5 100644
--- a/vhd_files/2025.txt
+++ b/vhd_files/2025.txt
@@ -52,0 +53 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.34.6-windows-hp
+mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.2-windows-hp
@@ -54 +54,0 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.3-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes-csi/azurefile-csi:v1.35.4-windows-hp
@@ -59,0 +60 @@ mcr.microsoft.com/oss/v2/kubernetes-csi/secrets-store/driver:v1.5.4
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.11-windows-hpc-1
@@ -61 +61,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.13-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.33.14-windows-hpc-1
@@ -63 +63,2 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.10-windows-hp
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.11-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.34.8-windows-hpc-1
+mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.3-windows-hpc-1
@@ -65 +65,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.5-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.35.6-windows-hpc-1
@@ -67 +66,0 @@ mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.1-windows-hpc
-mcr.microsoft.com/oss/v2/kubernetes/azure-cloud-node-manager:v1.36.2-windows-hpc-1 |
2.1d - gate check-hotfix on the enable_provisioning_hotfix contract field
POC / M1 draft. AgentBaker / Node SIG side only.
This is the final layer of the provisioning-hotfix stack. It makes the AKSNodeConfig
contract field the single source of truth for whether
aks-node-controller check-hotfixdoes any work, and relaxes the env gate added in 2.1c.
What changed
bool enable_provisioning_hotfix = 45;toaksnodeconfig/v1/config.proto(next free tag aftercse_timeout = 44) and regeneratethe Go bindings.
check-hotfixreads the field at the very top ofcheckHotfix()viaApp.provisioningHotfixEnabled()(reads the node-config JSON that is already on disk andcalls
GetEnableProvisioningHotfix()). When the field is not true (false, unset, or theconfig cannot be read/parsed) it returns the new telemetry outcome
disabledand exits 0WITHOUT any remote hotfix call. Fail-open everywhere.
aks-node-controller-wrapper.shnow callscheck-hotfixUNCONDITIONALLY (still wrapped defensively so it can never block provisioning). The
Go binary self-gates on the contract field.
Read channel
The hotfix-pointer READ CHANNEL is moving from a kube-system ConfigMap (read with a bootstrap
token) to the live-patching-service (LPS) IMDS-attested endpoint, validated reachable pre-kubelet
in e2e. That fetch/auth rewrite lives in #8696; this PR is channel-agnostic. The
enable_provisioning_hotfixcontract field and the Go self-gate decide WHETHERcheck-hotfixruns at all, independent of which channel it then uses, so the proto field is intentionally
channel-neutral and unchanged by the pivot.
Supersedes the env-delivery approach
An earlier revision of this PR delivered the toggle as an env var via a
cse_cmd.shtemplate var plus a systemd drop-in (
Environment="ENABLE_PROVISIONING_HOTFIX=...") onaks-node-controller.service, mirroring the IMDS-restriction pattern. That approach wasdropped because:
check-hotfixalready parses the AKSNodeConfig for its own connection details, so a realcontract field is available to the binary with zero new plumbing -
no template var, no drop-in, no env var.
env/drop-in written during provisioning would only take effect on the NEXT boot. Reading the
contract field directly avoids that activation-timing problem - it works on the same boot
because the config JSON is on disk before the service starts.
This also means absvc sets ONE field (the contract bool), not an env var plus a field.
Relaxes the 2.1c env gate
This PR relaxes the
ENABLE_PROVISIONING_HOTFIXenv gate introduced in #8715 (2.1c); gatingnow lives in the Go binary via the
enable_provisioning_hotfixcontract field - single sourceof truth, so absvc sets ONE field, not an env var plus a field. The 2.1c env gate is
intentionally added-then-relaxed across the stack so each PR stays reviewable on its own.
Default-off and fail-open
When
enable_provisioning_hotfixis false or unset, behavior is byte-identical to before thisstack:
check-hotfixmakes no remote call and provisioning proceeds unchanged. Any read orparse error is treated as off. This preserves the 6-month VHD support window in both directions
(older VHD + newer config, and newer VHD + older binary are both safe).
Before / after
check-hotfixreturns outcome=disabled, no remote call, exit 0check-hotfixreads the hotfix pointer from the LPS (live-patching-service)endpoint and stages it (the read channel itself is owned by feat(anc): add check-hotfix subcommand to read hotfix pointer from LPS #8696)
Stack
The aks-rp region toggle that sets the field is in a different repo and is the only remaining
out-of-repo piece. With the field settable on a node, the on-node PoC e2e tests (fail-open and
multi-base) become runnable.
Tests
go test ./...inaks-node-controller: all check-hotfix tests pass, including new gatetests (disabled -> outcome=disabled and the injected fetcher is never called; enabled ->
fetch path runs). Pre-existing Windows-only failures (CRLF goldens, file locks, os-release
message text) are unrelated and also fail on the base branch.