NE-2453: Implement exec_dns_in_pod diagnostic tool#193
NE-2453: Implement exec_dns_in_pod diagnostic tool#193openshift-merge-bot[bot] merged 3 commits intoopenshift:mainfrom
Conversation
This PR implements in-cluster DNS probing via ephemeral pods: `exec_dns_in_pod` (NE-2453) - Creates a temporary pod using network-tools-rhel9 image to run dig - Waits for pod completion, retrieves logs, and cleans up - Uses podExecutor interface for testability (mock injection) - Returns structured JSON: pod_name, output (dig output), phase - 11 unit tests covering success, validation, lifecycle failures Fixes NE-2453: https://issues.redhat.com/browse/NE-2453
|
@bentito: This pull request references NE-2453 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
📝 WalkthroughWalkthroughAdds a new server tool that runs Changes
Sequence Diagram(s)sequenceDiagram
participant Client as Client
participant Tool as DNS-in-Pod Tool
participant Exec as podExecutor
participant K8s as Kubernetes API
participant Pod as Ephemeral Pod
Client->>Tool: invoke tool (namespace, target_server, target_name, record_type)
Tool->>Tool: validate inputs
Tool->>Exec: Create pod (dig command)
Exec->>K8s: POST pod spec
K8s->>Pod: instantiate pod
Tool->>Exec: wait for completion (poll)
Pod->>Pod: run /usr/bin/dig '@server' name type
K8s-->>Exec: pod phase (Succeeded/Failed)
Exec-->>Tool: return phase
Tool->>Exec: fetch logs
Exec->>K8s: GET pod/logs
K8s-->>Exec: logs (output)
Exec-->>Tool: return logs
Tool->>Exec: delete pod (deferred cleanup with separate timeout)
Exec->>K8s: DELETE pod
Tool->>Client: return ExecDNSResult{PodName, Output, Phase}
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@bentito: This pull request references NE-2453 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/jira refresh |
|
@bentito: This pull request references NE-2453 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Around line 129-135: The tool annotation for the "Exec DNS in Pod" tool
incorrectly marks the operation as idempotent; update the api.ToolAnnotations
block (the Annotations value for the tool whose Title is "Exec DNS in Pod") and
change IdempotentHint from ptr.To(true) to ptr.To(false) so the pod-creating
tool is not treated as idempotent.
- Around line 149-186: Validate and reject option-shaped operands before
building the pod command: check targetName and recordType (the variables in this
diff) and return an error if they start with '-' or '+' or '@' (use
strings.HasPrefix or equivalent), keeping the default "A" behavior for empty
recordType; also set the probe PodSpec's AutomountServiceAccountToken to false
to prevent mounting the service account token (update the
pod.Spec.AutomountServiceAccountToken field where the pod variable is
constructed), and add regression tests (e.g.,
TestExecDNSInPodRejectsOptionLikeOperands and
TestExecDNSInPodDisablesSAKubeTokenMount) to assert option-like inputs are
rejected and the SA token is not mounted; reference the dnsProbePodPrefix,
dnsProbeImage, exec/defaultPodExecutor variables to locate the relevant code.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: c98aa10f-b33d-46fa-8d00-eb718ba9824a
📒 Files selected for processing (3)
pkg/toolsets/netedge/exec_dns_in_pod.gopkg/toolsets/netedge/exec_dns_in_pod_test.gopkg/toolsets/netedge/toolset.go
|
/retest |
- Set IdempotentHint to false (pod creation is not idempotent) - Validate target_server is a valid IP address - Reject option-shaped target_name and record_type (prevent dig flag injection) - Disable AutomountServiceAccountToken on probe pod for security - Add regression tests for all validation and security hardening
|
@bentito: This pull request references NE-2453 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pkg/toolsets/netedge/exec_dns_in_pod.go (1)
23-23: Consider pinning the container image to a specific tag.Using an untagged image (
registry.redhat.io/openshift4/network-tools-rhel9) defaults to:latest, which can lead to inconsistent behavior if the image is updated. Consider pinning to a specific version tag for reproducibility.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pkg/toolsets/netedge/exec_dns_in_pod.go` at line 23, The dnsProbeImage constant currently points to an untagged image which defaults to :latest; update the dnsProbeImage constant to reference a specific semantic version tag or an immutable digest (e.g., registry.redhat.io/openshift4/network-tools-rhel9:<version> or `@sha256`:...) to ensure reproducible builds and stable behavior—change the value of dnsProbeImage accordingly wherever it is used.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Line 23: The dnsProbeImage constant currently points to an untagged image
which defaults to :latest; update the dnsProbeImage constant to reference a
specific semantic version tag or an immutable digest (e.g.,
registry.redhat.io/openshift4/network-tools-rhel9:<version> or `@sha256`:...) to
ensure reproducible builds and stable behavior—change the value of dnsProbeImage
accordingly wherever it is used.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 03cdd8ed-90a7-4d89-be3d-1a013fd5e115
📒 Files selected for processing (2)
pkg/toolsets/netedge/exec_dns_in_pod.gopkg/toolsets/netedge/exec_dns_in_pod_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- pkg/toolsets/netedge/exec_dns_in_pod_test.go
|
@bentito: This pull request references NE-2453 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pkg/toolsets/netedge/exec_dns_in_pod.go`:
- Around line 67-70: The select branch treating any ctx.Done() as a timeout is
misleading; change it to inspect ctx.Err() and return a contextual error: if
ctx.Err() == context.Canceled return ctx.Err() (or wrap it) so callers see
cancellation, if ctx.Err() == context.DeadlineExceeded return the existing
"timed out waiting for pod %s/%s to complete" message, otherwise propagate/wrap
the ctx.Err(); update the select handling around ctx.Done() in
exec_dns_in_pod.go (the branch that currently returns fmt.Errorf("timed out
waiting for pod %s/%s to complete", namespace, name)) to use ctx.Err() and
distinct responses for cancellation vs deadline.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: d9541f4b-b9e3-42e3-b075-18ebf5eb2d69
📒 Files selected for processing (2)
pkg/toolsets/netedge/exec_dns_in_pod.gopkg/toolsets/netedge/exec_dns_in_pod_test.go
| for { | ||
| select { | ||
| case <-ctx.Done(): | ||
| return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name) |
There was a problem hiding this comment.
Preserve cancellation vs timeout here.
Line 69 currently turns any ctx.Done() into a timeout. If the parent request context is canceled, callers will get a misleading "timed out waiting" error instead of the real cancellation cause.
🩹 Proposed fix
import (
"bytes"
"context"
"encoding/json"
+ "errors"
"fmt"
"io"
"net"
"strings"
"time"
@@
select {
case <-ctx.Done():
- return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
+ if err := ctx.Err(); err != nil && !errors.Is(err, context.DeadlineExceeded) {
+ return nil, fmt.Errorf("waiting for pod %s/%s was cancelled: %w", namespace, name, err)
+ }
+ return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name)
case <-ticker.C:📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for { | |
| select { | |
| case <-ctx.Done(): | |
| return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name) | |
| for { | |
| select { | |
| case <-ctx.Done(): | |
| if err := ctx.Err(); err != nil && !errors.Is(err, context.DeadlineExceeded) { | |
| return nil, fmt.Errorf("waiting for pod %s/%s was cancelled: %w", namespace, name, err) | |
| } | |
| return nil, fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace, name) | |
| case <-ticker.C: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pkg/toolsets/netedge/exec_dns_in_pod.go` around lines 67 - 70, The select
branch treating any ctx.Done() as a timeout is misleading; change it to inspect
ctx.Err() and return a contextual error: if ctx.Err() == context.Canceled return
ctx.Err() (or wrap it) so callers see cancellation, if ctx.Err() ==
context.DeadlineExceeded return the existing "timed out waiting for pod %s/%s to
complete" message, otherwise propagate/wrap the ctx.Err(); update the select
handling around ctx.Done() in exec_dns_in_pod.go (the branch that currently
returns fmt.Errorf("timed out waiting for pod %s/%s to complete", namespace,
name)) to use ctx.Err() and distinct responses for cancellation vs deadline.
|
@bentito: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bentito, matzew The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
/override "Konflux kflux-prd-rh02 / openshift-mcp-server-on-pull-request" |
|
@matzew: Overrode contexts on behalf of matzew: Konflux kflux-prd-rh02 / openshift-mcp-server-on-pull-request DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
e54ad8c
into
openshift:main
This PR implements in-cluster DNS probing via ephemeral pods:
exec_dns_in_pod(NE-2453)dig, verifying internal cluster networking and DNS path.client-goto create a Pod with theregistry.redhat.io/openshift4/network-tools-rhel9image.podExecutorinterface for testability (mock injection in tests, realkubernetes.Interfacein production).pod_name,output(raw dig output),phase(Succeeded/Failed).Fixes NE-2453: https://issues.redhat.com/browse/NE-2453
Summary by CodeRabbit