Summary
When a Helm chart is deployed with HelmChart.Options.UpgradeOptions.UpgradeCRDs: true, the addon-controller
re-applies the chart's entire crds/ folder and then does an unconditional time.Sleep(30 * time.Second)
on every upgrade — even when nothing has changed. Combined with SyncModeContinuousWithDriftDetection
(where shouldUpgrade always returns true), this means each UpgradeCRDs: true chart pays a flat 30s on
every reconcile, regardless of whether the chart version, values, or the CRDs themselves changed.
When several such charts are deployed by the same profile, their upgrades are serialized, so the penalty is
roughly 30s × (number of UpgradeCRDs charts) added to each reconcile/rollout.
Affected versions
Observed on v1.1.1; the same code is present on the latest release v1.11.1 and current main
(controllers/handlers_helm.go). Not fixed.
Details
1. The 30s sleep is unconditional and counts no-op patches
upgradeCRDs (in controllers/handlers_helm.go):
// upgradeCRDsInFile
_, err = dr.Patch(ctx, crds[i].GetName(), types.ApplyPatchType, crdData, options)
if err != nil { return upgradedCRDs, err }
upgradedCRDs += 1 // counts EVERY patch, not actual changes
// upgradeCRDs
if upgradedCRDs > 0 {
const waitForCRDs = 30
time.Sleep(waitForCRDs * time.Second) // blind fixed sleep
}
- The counter increments for every CRD server-side-applied, not for CRDs that actually changed. An SSA
patch of an identical CRD is a server-side no-op but still increments upgradedCRDs.
- So whenever the chart ships any CRD under
crds/, upgradedCRDs > 0 is always true and the code sleeps a
fixed 30 seconds — even when no CRD was modified.
- The sleep is a fixed duration rather than a wait on each CRD's
Established/NamesAccepted condition,
which normally settles in well under a second.
2. In drift-detection sync mode, this runs on every reconcile
shouldUpgrade short-circuits for SyncModeContinuousWithDriftDetection:
func shouldUpgrade(...) bool {
if requestedChart.HelmChartAction == HelmChartActionUninstall { return false }
if clusterSummary...SyncMode != SyncModeContinuousWithDriftDetection {
// ... compare values hash; compare installed vs requested chart version;
// return false when unchanged ...
}
// With drift detection mode, there is reconciliation due to configuration drift even
// when version is same. So skip this check in SyncModeContinuousWithDriftDetection
return true
}
The upgrade path is handleChart → handleUpgrade → doUpgradeRelease → upgradeRelease, and upgradeRelease
calls upgradeCRDs before the helm upgrade. Because shouldUpgrade returns true on every reconcile in
drift mode, upgradeCRDs (and its 30s sleep) fires on every reconcile for every UpgradeCRDs: true chart —
not only when the chart version/values changed.
(First install is unaffected: the install path handleInstall → doInstallRelease → installRelease uses
Helm's install action, which applies crds/ natively and never calls upgradeCRDs. The cost only appears
on the 2nd and subsequent reconciles, which take the upgrade path.)
Reproduction
- Deploy a chart that ships CRDs in
crds/ (e.g. node-feature-discovery, kube-prometheus-stack,
envoy-gateway) via a ClusterProfile/Profile with syncMode: ContinuousWithDriftDetection and
helmCharts[].options.upgradeOptions.upgradeCRDs: true.
- Let it install (rev1) — fast.
- Trigger any reconcile of the ClusterSummary (e.g. change an unrelated chart's values, or just let drift
reconciliation run). No change to this chart is required.
- Observe: the chart is upgraded again, its
crds/ are re-applied via SSA (no-op), and the rollout stalls
~30s on this chart. With N such charts under one profile the stalls serialize to ~30s × N.
Measured impact (downstream)
In a downstream product, three UpgradeCRDs: true charts sit in a serialized addon rollout chain. A
no-op second apply stretched the chain from ~35s to ~127–133s — i.e. roughly +30s × 3 ≈ +90s — entirely
attributable to these sleeps, with no CRD actually changing.
Suggested fixes (any one helps; ideally all three)
- Replace the fixed
time.Sleep(30s) with a readiness wait — poll each applied CRD's
Established (and NamesAccepted) condition with a timeout, instead of sleeping a flat 30s. This is
typically sub-second.
- Only count CRDs that actually changed — inspect the SSA patch response (e.g. compare
metadata.generation/resourceVersion before vs after, or use a server-side dry-run/diff) and skip the
wait entirely when nothing changed. A no-op re-apply should not cost 30s.
- Gate CRD re-apply on a real chart-version change — even in drift mode, only re-apply
crds/ when the
installed chart version differs from the requested one, rather than on every drift reconcile.
Environment
- projectsveltos addon-controller v1.1.1 (also reproduced by inspection on v1.11.1 /
main)
- Sync mode:
ContinuousWithDriftDetection
- Charts deployed via k0rdent/kcm forwarding
ServiceHelmOptions.UpgradeOptions = {UpgradeCRDs: true}
Summary
When a Helm chart is deployed with
HelmChart.Options.UpgradeOptions.UpgradeCRDs: true, the addon-controllerre-applies the chart's entire
crds/folder and then does an unconditionaltime.Sleep(30 * time.Second)on every upgrade — even when nothing has changed. Combined with
SyncModeContinuousWithDriftDetection(where
shouldUpgradealways returnstrue), this means eachUpgradeCRDs: truechart pays a flat 30s onevery reconcile, regardless of whether the chart version, values, or the CRDs themselves changed.
When several such charts are deployed by the same profile, their upgrades are serialized, so the penalty is
roughly 30s × (number of
UpgradeCRDscharts) added to each reconcile/rollout.Affected versions
Observed on v1.1.1; the same code is present on the latest release v1.11.1 and current
main(
controllers/handlers_helm.go). Not fixed.Details
1. The 30s sleep is unconditional and counts no-op patches
upgradeCRDs(incontrollers/handlers_helm.go):patch of an identical CRD is a server-side no-op but still increments
upgradedCRDs.crds/,upgradedCRDs > 0is always true and the code sleeps afixed 30 seconds — even when no CRD was modified.
Established/NamesAcceptedcondition,which normally settles in well under a second.
2. In drift-detection sync mode, this runs on every reconcile
shouldUpgradeshort-circuits forSyncModeContinuousWithDriftDetection:The upgrade path is
handleChart → handleUpgrade → doUpgradeRelease → upgradeRelease, andupgradeReleasecalls
upgradeCRDsbefore the helm upgrade. BecauseshouldUpgradereturnstrueon every reconcile indrift mode,
upgradeCRDs(and its 30s sleep) fires on every reconcile for everyUpgradeCRDs: truechart —not only when the chart version/values changed.
(First install is unaffected: the install path
handleInstall → doInstallRelease → installReleaseusesHelm's install action, which applies
crds/natively and never callsupgradeCRDs. The cost only appearson the 2nd and subsequent reconciles, which take the upgrade path.)
Reproduction
crds/(e.g. node-feature-discovery, kube-prometheus-stack,envoy-gateway) via a ClusterProfile/Profile with
syncMode: ContinuousWithDriftDetectionandhelmCharts[].options.upgradeOptions.upgradeCRDs: true.reconciliation run). No change to this chart is required.
crds/are re-applied via SSA (no-op), and the rollout stalls~30s on this chart. With N such charts under one profile the stalls serialize to ~30s × N.
Measured impact (downstream)
In a downstream product, three
UpgradeCRDs: truecharts sit in a serialized addon rollout chain. Ano-op second apply stretched the chain from ~35s to ~127–133s — i.e. roughly +30s × 3 ≈ +90s — entirely
attributable to these sleeps, with no CRD actually changing.
Suggested fixes (any one helps; ideally all three)
time.Sleep(30s)with a readiness wait — poll each applied CRD'sEstablished(andNamesAccepted) condition with a timeout, instead of sleeping a flat 30s. This istypically sub-second.
metadata.generation/resourceVersionbefore vs after, or use a server-side dry-run/diff) and skip thewait entirely when nothing changed. A no-op re-apply should not cost 30s.
crds/when theinstalled chart version differs from the requested one, rather than on every drift reconcile.
Environment
main)ContinuousWithDriftDetectionServiceHelmOptions.UpgradeOptions = {UpgradeCRDs: true}