Skip to content

[Draft] Add kernel stack cost-per-packet metrics, nodeconfig collector, and A…#3555

Draft
midu16 wants to merge 1 commit intoprometheus:masterfrom
midu16:master
Draft

[Draft] Add kernel stack cost-per-packet metrics, nodeconfig collector, and A…#3555
midu16 wants to merge 1 commit intoprometheus:masterfrom
midu16:master

Conversation

@midu16
Copy link

@midu16 midu16 commented Feb 24, 2026

…I-Helpers docs

  • Add ebpf-pmd-jitter collector (Linux): in-tree eBPF program (collector/bpf/latency.c) measures kernel stack packet latency (XDP→TC); exposes latency min/max/avg, jitter, histogram, and collector_up/load_error/object_path_configured. Disabled by default; requires built eBPF object and --collector.ebpf-pmd-jitter.object-path.
  • Add nodeconfig collector (Linux): runbook-oriented metrics from sysfs and DMI (PCIe NIC link width, slot ok, cores dedicated, memory banks full). Disabled by default.
  • Add cmd/kernel_stack_stress_server: TCP server for functional test (variable backlog, rcvbuf, read delay, hold connections).
  • Add kernel_stack_af_packet_functional_test.go: Linux-only root functional test (netns) for conntrack drops, listen overflow, TCPRcvQDrop, traffic+NUMA, traffic+pcap; preserves pcaps under /tmp/node_exporter_kernel_stack_pcaps_*.
  • Add docs/KERNEL_STACK_AF_PACKET_METRICS.md: full guide correlating metrics with cost per packet and AF_PACKET, optional collectors, examples, and functional test.

…I-Helpers docs

- Add ebpf-pmd-jitter collector (Linux): in-tree eBPF program (collector/bpf/latency.c)
  measures kernel stack packet latency (XDP→TC); exposes latency min/max/avg, jitter,
  histogram, and collector_up/load_error/object_path_configured. Disabled by default;
  requires built eBPF object and --collector.ebpf-pmd-jitter.object-path.
- Add nodeconfig collector (Linux): runbook-oriented metrics from sysfs and DMI
  (PCIe NIC link width, slot ok, cores dedicated, memory banks full). Disabled by default.
- Add cmd/kernel_stack_stress_server: TCP server for functional test (variable backlog,
  rcvbuf, read delay, hold connections).
- Add kernel_stack_af_packet_functional_test.go: Linux-only root functional test
  (netns) for conntrack drops, listen overflow, TCPRcvQDrop, traffic+NUMA, traffic+pcap;
  preserves pcaps under /tmp/node_exporter_kernel_stack_pcaps_*.
- Add docs/KERNEL_STACK_AF_PACKET_METRICS.md: full guide correlating metrics with
  cost per packet and AF_PACKET, optional collectors, examples, and functional test.
Copy link

@karampok karampok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo that should be a PR with two commits (or two PRs)

  • one for the nodeconfig_linux
  • one for the ebpf
    unless there is a connection that I miss.

For the commit/PRs description, like why we need those metrics



  ## Summary
  <1-3 sentences: what this PR adds/changes and why>

  Closes #NNNN (if applicable)

  ## Motivation
  <2-4 sentences: the operational problem this solves,
  what users cannot do today, or what issue this addresses>

  ## Metrics

  | Metric | Type | Description |
  |--------|------|-------------|
  | `node_<subsystem>_<name>` | Gauge/Counter | ... |

  ## Implementation notes
  - Data source (procfs, sysfs, netlink, eBPF, etc.)
  - Disabled by default, enable with `--collector.<name>`
  - Build tag to exclude: `no<name>`
  - Graceful degradation (ErrNoData when source missing)
  - Cardinality bound
  - Dependency changes (if any)

  ## Testing
  - Unit tests added (`collector/<name>_test.go`)
  - Fixture file (`collector/fixtures/...`)
  - e2e golden output updated (if applicable)
  - Manual validation on real hardware (if applicable)

  ## Example output
  ```text
  # HELP node_<metric> ...
  # TYPE node_<metric> gauge
  node_<metric>{label="value"} 42


Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How should I see this file? what is excalidraw (I suppose a diagram)
I suppose you could either bring as png in markdown (or maybe is not needed)

return &nodeconfigCollector{
fs: fs,
logger: logger,
pcieNICMinLinkWidthDesc: prometheus.NewDesc(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

),
pcieSlotOkDesc: prometheus.NewDesc(
prometheus.BuildFQName(namespace, nodeconfigSubsystem, "pcie_slot_ok"),
"Whether PCIe slot/width is considered correct (1) or not (0). Derived from PCIe: 1 when minimum NIC link width >= 16, 0 otherwise. Absent if no network PCIe devices.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what is ok or nok should not be hardcoded in the metrics. Metrics should only state the value.

),
coresDedicatedDesc: prometheus.NewDesc(
prometheus.BuildFQName(namespace, nodeconfigSubsystem, "cores_dedicated"),
"Whether CPU cores are dedicated/isolated for workload (e.g. DPDK). 1 if at least one CPU is in /sys/devices/system/cpu/isolated, 0 otherwise.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There should be elsewhere gathering CPU metrics, should that be added there?

),
memoryBanksFullDesc: prometheus.NewDesc(
prometheus.BuildFQName(namespace, nodeconfigSubsystem, "memory_banks_full"),
"Whether memory channels/banks are fully populated (1) or not (0). Derived from DMI/SMBIOS: 1 when all memory device slots have a populated DIMM, 0 otherwise. Absent if DMI not available.",

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is brand new, probably fits into new collector but with different name (like nodeconfig is a bit generic)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to bring that main command in the git? Is it required for the metrics collection?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need that?

@SuperQ
Copy link
Member

SuperQ commented Feb 25, 2026

eBPF requires privileges to use, which is against our contributing guidelines

@karampok
Copy link

karampok commented Feb 25, 2026

@midu16 would you say this statement is accurate?

For CPUs that are isolated and pinned to a workload (e.g. DPDK), I want to measure 
what percentage of their time is spent on kernel work (e.g. kernel networking), and 
break that down by  function type (e.g. NET_RX) to identify which kernel function is
 consuming cycles on CPUs that should be dedicated to userspace packet processing.

@SuperQ
Copy link
Member

SuperQ commented Feb 25, 2026

I recommend looking into the ebpf_exporter. This is a general-use eBPF metrics collector that is more suited to this functionality.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants