Skip to content

NE-2453: Implement exec_dns_in_pod diagnostic tool#193

Merged
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
bentito:feat/NE-2453-exec-dns-in-pod
Apr 13, 2026
Merged

NE-2453: Implement exec_dns_in_pod diagnostic tool#193
openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
bentito:feat/NE-2453-exec-dns-in-pod

Conversation

@bentito
Copy link
Copy Markdown

@bentito bentito commented Mar 25, 2026

This PR implements in-cluster DNS probing via ephemeral pods: exec_dns_in_pod (NE-2453)

  • Description: Spin up a temporary pod in the cluster to execute a DNS lookup using dig, verifying internal cluster networking and DNS path.
  • It uses client-go to create a Pod with the registry.redhat.io/openshift4/network-tools-rhel9 image.
  • It uses a podExecutor interface for testability (mock injection in tests, real kubernetes.Interface in production).
  • Pod lifecycle: create → wait for Succeeded/Failed → retrieve logs → delete (best-effort cleanup).
  • Returns structured JSON: pod_name, output (raw dig output), phase (Succeeded/Failed).
  • 11 unit tests covering success, parameter validation, pod lifecycle failures, and cleanup.

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453

Summary by CodeRabbit

  • New Features
    • Added a DNS query tool that runs dig inside an ephemeral in-cluster pod to query a specified server/name (default record type A), returning pod name, query output, and pod status.
    • Input validation for server/name/record type and hardened probe execution with enforced resource/security settings and guaranteed cleanup attempts.
  • Tests
    • Added comprehensive tests covering success, validation errors, pod failures, log retrieval issues, and cleanup behavior.

This PR implements in-cluster DNS probing via ephemeral pods:
`exec_dns_in_pod` (NE-2453)

- Creates a temporary pod using network-tools-rhel9 image to run dig
- Waits for pod completion, retrieves logs, and cleans up
- Uses podExecutor interface for testability (mock injection)
- Returns structured JSON: pod_name, output (dig output), phase
- 11 unit tests covering success, validation, lifecycle failures

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Mar 25, 2026
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 25, 2026

@bentito: This pull request references NE-2453 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements in-cluster DNS probing via ephemeral pods: exec_dns_in_pod (NE-2453)

  • Description: Spin up a temporary pod in the cluster to execute a DNS lookup using dig, verifying internal cluster networking and DNS path.
  • It uses client-go to create a Pod with the registry.redhat.io/openshift4/network-tools-rhel9 image.
  • It uses a podExecutor interface for testability (mock injection in tests, real kubernetes.Interface in production).
  • Pod lifecycle: create → wait for Succeeded/Failed → retrieve logs → delete (best-effort cleanup).
  • Returns structured JSON: pod_name, output (raw dig output), phase (Succeeded/Failed).
  • 11 unit tests covering success, parameter validation, pod lifecycle failures, and cleanup.

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 25, 2026

📝 Walkthrough

Walkthrough

Adds a new server tool that runs dig inside an ephemeral Kubernetes Pod, validating inputs, managing pod lifecycle (create/wait/logs/delete) via a podExecutor, returning structured ExecDNSResult{PodName, Output, Phase}, and registers the tool into the netedge toolset with comprehensive unit tests.

Changes

Cohort / File(s) Summary
DNS-in-Pod implementation
pkg/toolsets/netedge/exec_dns_in_pod.go
New server tool: input schema and runtime validation (namespace, target_server IP, target_name, record_type default "A"), builds hardened Pod spec to run /usr/bin/dig @<server> <name> <type>, uses podExecutor interface for create/wait/logs/delete, deferred cleanup with 30s background timeout, returns ExecDNSResult.
DNS-in-Pod tests
pkg/toolsets/netedge/exec_dns_in_pod_test.go
New tests with mockPodExecutor: cover success, validation errors, defaulting of record_type, pod creation/wait/log failures, log retrieval failure, cleanup failure, and assertions on Pod spec (restart policy, container command/image/security/resources, automount token disabled).
Tool registration
pkg/toolsets/netedge/toolset.go
Registers initExecDNSInPod() into the netedge toolset's returned tools slice.

Sequence Diagram(s)

sequenceDiagram
    participant Client as Client
    participant Tool as DNS-in-Pod Tool
    participant Exec as podExecutor
    participant K8s as Kubernetes API
    participant Pod as Ephemeral Pod

    Client->>Tool: invoke tool (namespace, target_server, target_name, record_type)
    Tool->>Tool: validate inputs
    Tool->>Exec: Create pod (dig command)
    Exec->>K8s: POST pod spec
    K8s->>Pod: instantiate pod
    Tool->>Exec: wait for completion (poll)
    Pod->>Pod: run /usr/bin/dig '@server' name type
    K8s-->>Exec: pod phase (Succeeded/Failed)
    Exec-->>Tool: return phase
    Tool->>Exec: fetch logs
    Exec->>K8s: GET pod/logs
    K8s-->>Exec: logs (output)
    Exec-->>Tool: return logs
    Tool->>Exec: delete pod (deferred cleanup with separate timeout)
    Exec->>K8s: DELETE pod
    Tool->>Client: return ExecDNSResult{PodName, Output, Phase}
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I tunneled in code to run a small quest,
Spawned a pod that answered my test,
It dug up answers, then curled and fled,
Left its phase and output neatly spread —
Hopping off with a satisfied chest.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically describes the main change: implementing an exec_dns_in_pod diagnostic tool, with the issue reference providing full context.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci openshift-ci bot requested review from Cali0707 and manusa March 25, 2026 19:44
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 25, 2026

@bentito: This pull request references NE-2453 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR implements in-cluster DNS probing via ephemeral pods: exec_dns_in_pod (NE-2453)

  • Description: Spin up a temporary pod in the cluster to execute a DNS lookup using dig, verifying internal cluster networking and DNS path.
  • It uses client-go to create a Pod with the registry.redhat.io/openshift4/network-tools-rhel9 image.
  • It uses a podExecutor interface for testability (mock injection in tests, real kubernetes.Interface in production).
  • Pod lifecycle: create → wait for Succeeded/Failed → retrieve logs → delete (best-effort cleanup).
  • Returns structured JSON: pod_name, output (raw dig output), phase (Succeeded/Failed).
  • 11 unit tests covering success, parameter validation, pod lifecycle failures, and cleanup.

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453

Summary by CodeRabbit

  • New Features
  • Added a DNS query execution tool for Kubernetes pods. Users can perform DNS lookups against specific servers and record types (defaulting to A records) within targeted namespaces. Results include the pod name, query output, and final pod status, enabling network diagnostics within cluster environments.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@bentito
Copy link
Copy Markdown
Author

bentito commented Mar 25, 2026

/jira refresh

@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 25, 2026

@bentito: This pull request references NE-2453 which is a valid jira issue.

Details

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Around line 129-135: The tool annotation for the "Exec DNS in Pod" tool
incorrectly marks the operation as idempotent; update the api.ToolAnnotations
block (the Annotations value for the tool whose Title is "Exec DNS in Pod") and
change IdempotentHint from ptr.To(true) to ptr.To(false) so the pod-creating
tool is not treated as idempotent.
- Around line 149-186: Validate and reject option-shaped operands before
building the pod command: check targetName and recordType (the variables in this
diff) and return an error if they start with '-' or '+' or '@' (use
strings.HasPrefix or equivalent), keeping the default "A" behavior for empty
recordType; also set the probe PodSpec's AutomountServiceAccountToken to false
to prevent mounting the service account token (update the
pod.Spec.AutomountServiceAccountToken field where the pod variable is
constructed), and add regression tests (e.g.,
TestExecDNSInPodRejectsOptionLikeOperands and
TestExecDNSInPodDisablesSAKubeTokenMount) to assert option-like inputs are
rejected and the SA token is not mounted; reference the dnsProbePodPrefix,
dnsProbeImage, exec/defaultPodExecutor variables to locate the relevant code.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: c98aa10f-b33d-46fa-8d00-eb718ba9824a

📥 Commits

Reviewing files that changed from the base of the PR and between 4ef4ee5 and c3e7db8.

📒 Files selected for processing (3)
  • pkg/toolsets/netedge/exec_dns_in_pod.go
  • pkg/toolsets/netedge/exec_dns_in_pod_test.go
  • pkg/toolsets/netedge/toolset.go

Comment thread pkg/toolsets/netedge/exec_dns_in_pod.go
Comment thread pkg/toolsets/netedge/exec_dns_in_pod.go
@bentito
Copy link
Copy Markdown
Author

bentito commented Mar 25, 2026

/retest

- Set IdempotentHint to false (pod creation is not idempotent)
- Validate target_server is a valid IP address
- Reject option-shaped target_name and record_type (prevent dig flag injection)
- Disable AutomountServiceAccountToken on probe pod for security
- Add regression tests for all validation and security hardening
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Mar 25, 2026

@bentito: This pull request references NE-2453 which is a valid jira issue.

Details

In response to this:

This PR implements in-cluster DNS probing via ephemeral pods: exec_dns_in_pod (NE-2453)

  • Description: Spin up a temporary pod in the cluster to execute a DNS lookup using dig, verifying internal cluster networking and DNS path.
  • It uses client-go to create a Pod with the registry.redhat.io/openshift4/network-tools-rhel9 image.
  • It uses a podExecutor interface for testability (mock injection in tests, real kubernetes.Interface in production).
  • Pod lifecycle: create → wait for Succeeded/Failed → retrieve logs → delete (best-effort cleanup).
  • Returns structured JSON: pod_name, output (raw dig output), phase (Succeeded/Failed).
  • 11 unit tests covering success, parameter validation, pod lifecycle failures, and cleanup.

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453

Summary by CodeRabbit

  • New Features
  • Added a DNS query tool that runs dig inside an ephemeral pod to query a specified server/name (default record type A), returning pod name, query output, and pod status for in-cluster network diagnostics.
  • Tests
  • Added comprehensive tests validating input handling, probe behavior, output reporting, and cleanup scenarios.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
pkg/toolsets/netedge/exec_dns_in_pod.go (1)

23-23: Consider pinning the container image to a specific tag.

Using an untagged image (registry.redhat.io/openshift4/network-tools-rhel9) defaults to :latest, which can lead to inconsistent behavior if the image is updated. Consider pinning to a specific version tag for reproducibility.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/netedge/exec_dns_in_pod.go` at line 23, The dnsProbeImage
constant currently points to an untagged image which defaults to :latest; update
the dnsProbeImage constant to reference a specific semantic version tag or an
immutable digest (e.g.,
registry.redhat.io/openshift4/network-tools-rhel9:<version> or `@sha256`:...) to
ensure reproducible builds and stable behavior—change the value of dnsProbeImage
accordingly wherever it is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Line 23: The dnsProbeImage constant currently points to an untagged image
which defaults to :latest; update the dnsProbeImage constant to reference a
specific semantic version tag or an immutable digest (e.g.,
registry.redhat.io/openshift4/network-tools-rhel9:<version> or `@sha256`:...) to
ensure reproducible builds and stable behavior—change the value of dnsProbeImage
accordingly wherever it is used.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 03cdd8ed-90a7-4d89-be3d-1a013fd5e115

📥 Commits

Reviewing files that changed from the base of the PR and between c3e7db8 and 46d6e4c.

📒 Files selected for processing (2)
  • pkg/toolsets/netedge/exec_dns_in_pod.go
  • pkg/toolsets/netedge/exec_dns_in_pod_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • pkg/toolsets/netedge/exec_dns_in_pod_test.go

Comment thread pkg/toolsets/netedge/exec_dns_in_pod.go
Comment thread pkg/toolsets/netedge/exec_dns_in_pod.go Outdated
Comment thread pkg/toolsets/netedge/exec_dns_in_pod.go
@openshift-ci-robot
Copy link
Copy Markdown

openshift-ci-robot commented Apr 10, 2026

@bentito: This pull request references NE-2453 which is a valid jira issue.

Details

In response to this:

This PR implements in-cluster DNS probing via ephemeral pods: exec_dns_in_pod (NE-2453)

  • Description: Spin up a temporary pod in the cluster to execute a DNS lookup using dig, verifying internal cluster networking and DNS path.
  • It uses client-go to create a Pod with the registry.redhat.io/openshift4/network-tools-rhel9 image.
  • It uses a podExecutor interface for testability (mock injection in tests, real kubernetes.Interface in production).
  • Pod lifecycle: create → wait for Succeeded/Failed → retrieve logs → delete (best-effort cleanup).
  • Returns structured JSON: pod_name, output (raw dig output), phase (Succeeded/Failed).
  • 11 unit tests covering success, parameter validation, pod lifecycle failures, and cleanup.

Fixes NE-2453: https://issues.redhat.com/browse/NE-2453

Summary by CodeRabbit

  • New Features
  • Added a DNS query tool that runs dig inside an ephemeral in-cluster pod to query a specified server/name (default record type A), returning pod name, query output, and pod status.
  • Input validation for server/name/record type and hardened probe execution with enforced resource/security settings and guaranteed cleanup attempts.
  • Tests
  • Added comprehensive tests covering success, validation errors, pod failures, log retrieval issues, and cleanup behavior.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Around line 67-70: The select branch treating any ctx.Done() as a timeout is
misleading; change it to inspect ctx.Err() and return a contextual error: if
ctx.Err() == context.Canceled return ctx.Err() (or wrap it) so callers see
cancellation, if ctx.Err() == context.DeadlineExceeded return the existing
"timed out waiting for pod %s/%s to complete" message, otherwise propagate/wrap
the ctx.Err(); update the select handling around ctx.Done() in
exec_dns_in_pod.go (the branch that currently returns fmt.Errorf("timed out
waiting for pod %s/%s to complete", namespace, name)) to use ctx.Err() and
distinct responses for cancellation vs deadline.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d9541f4b-b9e3-42e3-b075-18ebf5eb2d69

📥 Commits

Reviewing files that changed from the base of the PR and between 46d6e4c and e978858.

📒 Files selected for processing (2)
  • pkg/toolsets/netedge/exec_dns_in_pod.go
  • pkg/toolsets/netedge/exec_dns_in_pod_test.go

Comment on lines +67 to +70
for {
select {
case <-ctx.Done():
return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Preserve cancellation vs timeout here.

Line 69 currently turns any ctx.Done() into a timeout. If the parent request context is canceled, callers will get a misleading "timed out waiting" error instead of the real cancellation cause.

🩹 Proposed fix
 import (
 	"bytes"
 	"context"
 	"encoding/json"
+	"errors"
 	"fmt"
 	"io"
 	"net"
 	"strings"
 	"time"
@@
 		select {
 		case <-ctx.Done():
-			return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
+			if err := ctx.Err(); err != nil && !errors.Is(err, context.DeadlineExceeded) {
+				return nil, fmt.Errorf("waiting for pod %s/%s was cancelled: %w", namespace, name, err)
+			}
+			return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
 		case <-ticker.C:
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
for {
select {
case <-ctx.Done():
return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
for {
select {
case <-ctx.Done():
if err := ctx.Err(); err != nil && !errors.Is(err, context.DeadlineExceeded) {
return nil, fmt.Errorf("waiting for pod %s/%s was cancelled: %w", namespace, name, err)
}
return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
case <-ticker.C:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pkg/toolsets/netedge/exec_dns_in_pod.go` around lines 67 - 70, The select
branch treating any ctx.Done() as a timeout is misleading; change it to inspect
ctx.Err() and return a contextual error: if ctx.Err() == context.Canceled return
ctx.Err() (or wrap it) so callers see cancellation, if ctx.Err() ==
context.DeadlineExceeded return the existing "timed out waiting for pod %s/%s to
complete" message, otherwise propagate/wrap the ctx.Err(); update the select
handling around ctx.Done() in exec_dns_in_pod.go (the branch that currently
returns fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace,
name)) to use ctx.Err() and distinct responses for cancellation vs deadline.

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 10, 2026

@bentito: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Copy link
Copy Markdown
Member

@matzew matzew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Apr 13, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: bentito, matzew

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 13, 2026
@matzew
Copy link
Copy Markdown
Member

matzew commented Apr 13, 2026

/override "Konflux kflux-prd-rh02 / openshift-mcp-server-on-pull-request"

@openshift-ci
Copy link
Copy Markdown

openshift-ci bot commented Apr 13, 2026

@matzew: Overrode contexts on behalf of matzew: Konflux kflux-prd-rh02 / openshift-mcp-server-on-pull-request

Details

In response to this:

/override "Konflux kflux-prd-rh02 / openshift-mcp-server-on-pull-request"

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@openshift-merge-bot openshift-merge-bot bot merged commit e54ad8c into openshift:main Apr 13, 2026
10 of 11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants