[Draft] Add kernel stack cost-per-packet metrics, nodeconfig collector, and A… by midu16 · Pull Request #3555 · prometheus/node_exporter

midu16 · 2026-02-24T13:14:58Z

…I-Helpers docs

Add ebpf-pmd-jitter collector (Linux): in-tree eBPF program (collector/bpf/latency.c) measures kernel stack packet latency (XDP→TC); exposes latency min/max/avg, jitter, histogram, and collector_up/load_error/object_path_configured. Disabled by default; requires built eBPF object and --collector.ebpf-pmd-jitter.object-path.
Add nodeconfig collector (Linux): runbook-oriented metrics from sysfs and DMI (PCIe NIC link width, slot ok, cores dedicated, memory banks full). Disabled by default.
Add cmd/kernel_stack_stress_server: TCP server for functional test (variable backlog, rcvbuf, read delay, hold connections).
Add kernel_stack_af_packet_functional_test.go: Linux-only root functional test (netns) for conntrack drops, listen overflow, TCPRcvQDrop, traffic+NUMA, traffic+pcap; preserves pcaps under /tmp/node_exporter_kernel_stack_pcaps_*.
Add docs/KERNEL_STACK_AF_PACKET_METRICS.md: full guide correlating metrics with cost per packet and AF_PACKET, optional collectors, examples, and functional test.

…I-Helpers docs - Add ebpf-pmd-jitter collector (Linux): in-tree eBPF program (collector/bpf/latency.c) measures kernel stack packet latency (XDP→TC); exposes latency min/max/avg, jitter, histogram, and collector_up/load_error/object_path_configured. Disabled by default; requires built eBPF object and --collector.ebpf-pmd-jitter.object-path. - Add nodeconfig collector (Linux): runbook-oriented metrics from sysfs and DMI (PCIe NIC link width, slot ok, cores dedicated, memory banks full). Disabled by default. - Add cmd/kernel_stack_stress_server: TCP server for functional test (variable backlog, rcvbuf, read delay, hold connections). - Add kernel_stack_af_packet_functional_test.go: Linux-only root functional test (netns) for conntrack drops, listen overflow, TCPRcvQDrop, traffic+NUMA, traffic+pcap; preserves pcaps under /tmp/node_exporter_kernel_stack_pcaps_*. - Add docs/KERNEL_STACK_AF_PACKET_METRICS.md: full guide correlating metrics with cost per packet and AF_PACKET, optional collectors, examples, and functional test.

karampok

imo that should be a PR with two commits (or two PRs)

one for the nodeconfig_linux
one for the ebpf
unless there is a connection that I miss.

For the commit/PRs description, like why we need those metrics



  ## Summary
  <1-3 sentences: what this PR adds/changes and why>

  Closes #NNNN (if applicable)

  ## Motivation
  <2-4 sentences: the operational problem this solves,
  what users cannot do today, or what issue this addresses>

  ## Metrics

  | Metric | Type | Description |
  |--------|------|-------------|
  | `node_<subsystem>_<name>` | Gauge/Counter | ... |

  ## Implementation notes
  - Data source (procfs, sysfs, netlink, eBPF, etc.)
  - Disabled by default, enable with `--collector.<name>`
  - Build tag to exclude: `no<name>`
  - Graceful degradation (ErrNoData when source missing)
  - Cardinality bound
  - Dependency changes (if any)

  ## Testing
  - Unit tests added (`collector/<name>_test.go`)
  - Fixture file (`collector/fixtures/...`)
  - e2e golden output updated (if applicable)
  - Manual validation on real hardware (if applicable)

  ## Example output
  ```text
  # HELP node_<metric> ...
  # TYPE node_<metric> gauge
  node_<metric>{label="value"} 42

karampok · 2026-02-25T08:45:15Z

docs/node-exporter-new-features.excalidraw

How should I see this file? what is excalidraw (I suppose a diagram)
I suppose you could either bring as png in markdown (or maybe is not needed)

karampok · 2026-02-25T10:16:42Z

collector/nodeconfig_linux.go

+	return &nodeconfigCollector{
+		fs:     fs,
+		logger: logger,
+		pcieNICMinLinkWidthDesc: prometheus.NewDesc(


There is https://github.com/prometheus/node_exporter/blob/master/collector/pcidevice_linux.go should those metrics be added there?

karampok · 2026-02-25T10:17:32Z

collector/nodeconfig_linux.go

+		),
+		pcieSlotOkDesc: prometheus.NewDesc(
+			prometheus.BuildFQName(namespace, nodeconfigSubsystem, "pcie_slot_ok"),
+			"Whether PCIe slot/width is considered correct (1) or not (0). Derived from PCIe: 1 when minimum NIC link width >= 16, 0 otherwise. Absent if no network PCIe devices.",


I think what is ok or nok should not be hardcoded in the metrics. Metrics should only state the value.

karampok · 2026-02-25T10:18:33Z

collector/nodeconfig_linux.go

+		),
+		coresDedicatedDesc: prometheus.NewDesc(
+			prometheus.BuildFQName(namespace, nodeconfigSubsystem, "cores_dedicated"),
+			"Whether CPU cores are dedicated/isolated for workload (e.g. DPDK). 1 if at least one CPU is in /sys/devices/system/cpu/isolated, 0 otherwise.",


There should be elsewhere gathering CPU metrics, should that be added there?

karampok · 2026-02-25T10:19:39Z

collector/nodeconfig_linux.go

+		),
+		memoryBanksFullDesc: prometheus.NewDesc(
+			prometheus.BuildFQName(namespace, nodeconfigSubsystem, "memory_banks_full"),
+			"Whether memory channels/banks are fully populated (1) or not (0). Derived from DMI/SMBIOS: 1 when all memory device slots have a populated DIMM, 0 otherwise. Absent if DMI not available.",


that is brand new, probably fits into new collector but with different name (like nodeconfig is a bit generic)

karampok · 2026-02-25T10:23:15Z

cmd/kernel_stack_stress_server/main.go

Do we need to bring that main command in the git? Is it required for the metrics collection?

karampok · 2026-02-25T10:24:29Z

kernel_stack_af_packet_functional_test.go

Do we need that?

SuperQ · 2026-02-25T10:29:46Z

eBPF requires privileges to use, which is against our contributing guidelines

karampok · 2026-02-25T13:42:51Z

@midu16 would you say this statement is accurate?

For CPUs that are isolated and pinned to a workload (e.g. DPDK), I want to measure 
what percentage of their time is spent on kernel work (e.g. kernel networking), and 
break that down by  function type (e.g. NET_RX) to identify which kernel function is
 consuming cycles on CPUs that should be dedicated to userspace packet processing.

SuperQ · 2026-02-25T16:14:21Z

I recommend looking into the ebpf_exporter. This is a general-use eBPF metrics collector that is more suited to this functionality.

karampok reviewed Feb 25, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Draft] Add kernel stack cost-per-packet metrics, nodeconfig collector, and A…#3555

[Draft] Add kernel stack cost-per-packet metrics, nodeconfig collector, and A…#3555
midu16 wants to merge 1 commit intoprometheus:masterfrom
midu16:master

midu16 commented Feb 24, 2026

Uh oh!

karampok left a comment

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

karampok Feb 25, 2026

Uh oh!

SuperQ commented Feb 25, 2026

Uh oh!

karampok commented Feb 25, 2026 •

edited

Loading

Uh oh!

SuperQ commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

midu16 commented Feb 24, 2026

Uh oh!

karampok left a comment

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

karampok Feb 25, 2026

Choose a reason for hiding this comment

Uh oh!

SuperQ commented Feb 25, 2026

Uh oh!

karampok commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SuperQ commented Feb 25, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karampok commented Feb 25, 2026 •

edited

Loading