Skip to content

[SPARK-56845][K8S] Truncate ConfigMap names that exceed DNS subdomain limit#55874

Open
TongWei1105 wants to merge 1 commit into
apache:masterfrom
TongWei1105:truncate-configmap-names
Open

[SPARK-56845][K8S] Truncate ConfigMap names that exceed DNS subdomain limit#55874
TongWei1105 wants to merge 1 commit into
apache:masterfrom
TongWei1105:truncate-configmap-names

Conversation

@TongWei1105
Copy link
Copy Markdown
Contributor

@TongWei1105 TongWei1105 commented May 14, 2026

What changes were proposed in this pull request?

Add a new overload KubernetesClientUtils.configMapName(prefix, suffix) that falls back to spark-<uniqueID><suffix> when prefix+suffix exceeds KUBERNETES_DNS_SUBDOMAIN_NAME_MAX_LENGTH (253), mirroring the
fallback strategy already used by KubernetesConf.driverServiceName.

Migrate the three driver-side ConfigMap call sites to the new helper:

  • HadoopConfDriverFeatureStep (suffix -hadoop-config)
  • KerberosConfDriverFeatureStep (suffix -krb5-file)
  • PodTemplateConfigMapStep (suffix -driver-podspec-conf-map)

The two def newConfigMapName fields are converted to lazy val so the fallback's random uniqueID() is captured exactly once — otherwise the ConfigMap would be created with one name while the pod's volume
references another. lazy val (rather than val) avoids spuriously computing — and emitting a fallback warning for — a name that is never used (e.g. the step is constructed but no Hadoop/Kerberos conf is set).

Note: this also changes the truncation behavior of the existing single-arg configMapName(prefix) (@Since("3.3.0")), which now delegates to the new overload. Spark's own callers
(configMapNameDriver/configMapNameExecutor) use short fixed prefixes (~22 chars) and never hit the fallback, so behavior for built-in callers is unchanged. External @DeveloperApi consumers passing very long
prefixes will see different — but safer, collision-free — names than before.

Why are the changes needed?

When spark.app.name is very long (>229 chars), the derived resourceNamePrefix plus a fixed suffix exceeds the Kubernetes DNS subdomain 253-char limit. The K8s API then rejects the ConfigMap with must be no more than 253 characters, failing driver submission. This PR makes the three driver-side ConfigMap names robust to long app names.

Does this PR introduce any user-facing change?

Yes — driver submission with very long spark.app.name no longer fails. Submissions that previously failed will now succeed; the affected ConfigMaps will be created with names like
spark-<uniqueID>-hadoop-config instead. A warning is logged when the fallback is used.

For users of the public KubernetesClientUtils.configMapName(prefix) API: the truncation strategy for over-long prefixes changed from "take first N chars of prefix" to "fall back to spark-<uniqueID>-conf-map".
This avoids silent name collisions across applications that happened to share the first 244 chars of their prefix. Spark's own callers always use short prefixes and are unaffected.

How was this patch tested?

Added unit tests:

  • KubernetesClientUtilsSuite: verifies the new helper returns prefix+suffix within the limit, falls back to spark-<id><suffix> when over the limit, and that the legacy single-arg overload still produces the
    -conf-map suffix.
  • HadoopConfDriverFeatureStepSuite, KerberosConfDriverFeatureStepSuite, PodTemplateConfigMapStepSuite: each adds a "very long resourceNamePrefix" case asserting (a) the resulting ConfigMap name is within
    the limit, and (b) the pod's volume references the exact same name as the created ConfigMap (regression guard for the def → lazy val change).

Was this patch authored or co-authored using generative AI tooling?

Yes,Generated-by: Claude Code (Opus 4.7)

… limit

When `spark.app.name` is very long (>229 chars), the derived
`resourceNamePrefix` plus a fixed suffix (e.g. `-hadoop-config`,
`-krb5-file`, `-driver-podspec-conf-map`) can exceed the Kubernetes DNS
subdomain 253-char limit. The K8s API then rejects the ConfigMap with
`must be no more than 253 characters`, failing driver submission.

Unify ConfigMap name construction through a single helper
`KubernetesClientUtils.configMapName(prefix, suffix)` that mirrors the
fallback strategy already used by `KubernetesConf.driverServiceName`:
when the preferred name is too long, fall back to
`spark-<uniqueID><suffix>`, which preserves uniqueness across
concurrent applications and keeps the name within the limit.

The three call sites (HadoopConfDriverFeatureStep,
KerberosConfDriverFeatureStep, PodTemplateConfigMapStep) are migrated
to the helper, and the two `def newConfigMapName` fields are converted
to `lazy val` so the fallback's `uniqueID()` is captured exactly once -
otherwise the ConfigMap would be created with one name while the pod
mounted another. `lazy val` (rather than `val`) avoids spuriously
computing - and emitting a fallback warning for - a name that is never
used (e.g. the step is constructed but no Hadoop/Kerberos conf is set).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant