Skip to content

fix: configure metrics-server for EKS host networking#284

Merged
timtalbot merged 2 commits into
mainfrom
metrics-server-eks-fix
Apr 30, 2026
Merged

fix: configure metrics-server for EKS host networking#284
timtalbot merged 2 commits into
mainfrom
metrics-server-eks-fix

Conversation

@timtalbot
Copy link
Copy Markdown
Contributor

@timtalbot timtalbot commented Apr 30, 2026

Summary

  • kubectl top fails on EKS workload clusters with "Metrics API not available" because metrics-server can't scrape kubelet metrics (connection timeout on port 10250)
  • Workload clusters use Calico CNI (overlay network at 172.16.0.0/16) instead of the AWS VPC-CNI, so pod traffic originates from IPs outside the node security groups — kubelet:10250 is unreachable from pod networking
  • Configures metrics-server Helm values on workload clusters to work around this:
    • hostNetwork.enabled: true — runs in the node network namespace, bypassing the Calico overlay
    • containerPort: 4443 — avoids port conflict with kubelet already bound to 10250 on the host
    • --kubelet-preferred-address-types=InternalIP — resolves nodes by internal IP on EKS
  • Control room clusters are unaffected — they use VPC-CNI where pods share node security groups, so metrics-server already works without hostNetwork

Test plan

  • Deployed to an EKS workload cluster and verified kubectl top pods and kubectl top nodes return metrics
  • Verify metrics-server pod is healthy after rollout to additional workload clusters

metrics-server was deployed with default Helm values, which fails on EKS
because pod networking cannot reach the kubelet metrics endpoint on port
10250. This configures:

- hostNetwork.enabled: true — runs in the node network namespace so
  metrics-server can reach kubelets directly
- containerPort: 4443 — avoids port conflict with the kubelet already
  bound to 10250 on the host
- --kubelet-preferred-address-types=InternalIP — resolves nodes by
  internal IP, which is more reliable on EKS

Applied to both the Go (HelmChart CR for workload clusters) and Python
(Pulumi Helm Release for control room) deployment paths. Enables
`kubectl top` on EKS clusters.
Control room clusters use the AWS VPC-CNI, where pod traffic originates
from node ENIs with the same security groups — so metrics-server can
reach kubelet:10250 without hostNetwork. Only workload clusters (which
use Calico overlay networking) need this fix.
@timtalbot timtalbot marked this pull request as ready for review April 30, 2026 19:03
@timtalbot timtalbot requested a review from a team as a code owner April 30, 2026 19:03
@timtalbot timtalbot added this pull request to the merge queue Apr 30, 2026
Merged via the queue into main with commit c6d628f Apr 30, 2026
5 checks passed
@timtalbot timtalbot deleted the metrics-server-eks-fix branch April 30, 2026 19:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants