Conversation
|
Pipeline controller notification For optional jobs, comment This repository is configured in: automatic mode |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: gangwgr The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
/pipeline required |
|
Scheduling required tests: |
|
/retest |
|
/test e2e-metal-ipi-ovn-ipv6 |
|
Scheduling required tests: |
|
/retest-required |
|
Job Failure Risk Analysis for sha: 21e1845
|
|
Scheduling required tests: |
|
/retest-required |
WalkthroughAdds a new end-to-end TLS ObservedConfig test suite: registers the suite, imports the TLS extended tests, adds OWNERS, and implements tests validating operator observedConfig, deployment TLS env-vars, and wire-level TLS behavior including a TLS profile change/restore flow. Changes
Sequence Diagram(s)sequenceDiagram
participant Test as Test Suite
participant DynClient as Dynamic Client
participant ApiServer as Cluster APIServer
participant Deployment as Target Deployment
participant ClusterOp as ClusterOperator
Test->>DynClient: Fetch operator config (observedConfig)
DynClient-->>Test: observedConfig (minTLSVersion, cipherSuites)
Test->>Deployment: Inspect container env vars
Deployment-->>Test: Env var values
Test->>Test: Compare env vars vs observedConfig/profile
Test->>Deployment: Port-forward to service and perform TLS handshake checks
Deployment-->>Test: TLS handshake results
Test->>ApiServer: Update TLS profile (Modern) [config-change test]
ApiServer-->>Test: Profile updated
Test->>ClusterOp: Wait for related ClusterOperators to stabilize
ClusterOp-->>Test: Stable
Test->>ApiServer: Restore original TLS profile
ApiServer-->>Test: Profile restored
Test->>ClusterOp: Wait for stabilization
ClusterOp-->>Test: Stable
sequenceDiagram
participant Test as Test Suite
participant Validator as Validator
participant RolloutMgr as Rollout Watcher
participant ClusterOp as ClusterOperators
Test->>Validator: Trigger per-target validations (observedConfig, env, wire)
Validator-->>Test: Validation results
Test->>RolloutMgr: Initiate rollout waits after profile change
RolloutMgr->>ClusterOp: Poll cluster operator statuses
ClusterOp-->>RolloutMgr: Report rollout progress/completion
RolloutMgr-->>Test: All relevant operators stabilized
Estimated code review effort🎯 4 (Complex) | ⏱️ ~65 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.5.0)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 702-712: The TLS probes use tls.Dial (in the block handling
host/shouldWork and host/shouldNotWork) which can hang; replace calls to
tls.Dial with tls.DialWithDialer using a net.Dialer that sets a reasonable
Timeout (e.g., a few seconds) so both the successful probe and the
rejected-version probe use a dialer with explicit handshake timeout, preserve
the same error handling around conn, ConnectionState(), and conn.Close(), and
ensure the dialer timeout variable is reused for both DialWithDialer invocations
to avoid duplicated magic values.
- Around line 507-510: The namespace lookup currently treats any error as "not
found" and skips the test; change the error handling after
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}) to import k8s.io/apimachinery/pkg/api/errors as kapierrors
and only call g.Skip when kapierrors.IsNotFound(err) is true, otherwise fail the
test (e.g., call the test failure helper you use in this file) so
auth/transport/transient errors are not masked; apply the same change to the
second occurrence of the same oc.AdminKubeClient().CoreV1().Namespaces().Get
call in this file.
- Around line 654-676: Replace the unreliable stdout check after starting
port-forward (where e2e.StartCmdAndStreamOutput is used and
readPartialFrom(stdout, 1024) is called) with an active TCP readiness probe:
repeatedly attempt to open a TCP connection to 127.0.0.1:localPort until success
or a short deadline elapses, then proceed to call toExecute(localPort). Use the
existing ctx/cancel or a separate short timeout for the probe, keep defers for
stdout/stderr and TryKill(cmd), and ensure probe retries/backoff are short so
the test fails fast if the tunnel never becomes reachable.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
pkg/testsuites/standard_suites.gotest/extended/include.gotest/extended/tls/OWNERStest/extended/tls/tls_observed_config.go
| _, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) | ||
| if err != nil { | ||
| g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's check the file structure and locate the file
find . -name "tls_observed_config.go" -type fRepository: openshift/origin
Length of output: 102
🏁 Script executed:
# Get the file size to determine how to read it
wc -l test/extended/tls/tls_observed_config.goRepository: openshift/origin
Length of output: 104
🏁 Script executed:
# Read the specific lines around 507-510
sed -n '500,515p' test/extended/tls/tls_observed_config.goRepository: openshift/origin
Length of output: 869
🏁 Script executed:
# Read the specific lines around 580-583
sed -n '575,590p' test/extended/tls/tls_observed_config.goRepository: openshift/origin
Length of output: 902
🏁 Script executed:
# Check the imports section of the file
head -50 test/extended/tls/tls_observed_config.goRepository: openshift/origin
Length of output: 1905
🏁 Script executed:
# Search for all instances of namespace Get() calls to find similar patterns
rg 'Namespaces\(\)\.Get' test/extended/tls/tls_observed_config.go -A 3Repository: openshift/origin
Length of output: 459
🏁 Script executed:
# Search for kapierrors usage in the codebase
rg "kapierrors" test/extended/ -A 2 | head -30Repository: openshift/origin
Length of output: 2263
🏁 Script executed:
# Search for IsNotFound usage to see the common pattern
rg "IsNotFound" test/extended/ -B 1 -A 1 | head -40Repository: openshift/origin
Length of output: 3028
🏁 Script executed:
# Check what error packages are imported in similar test files
rg 'apimachinery/pkg/api/errors' test/extended/ | head -10Repository: openshift/origin
Length of output: 999
Skip only on NotFound errors; fail on other namespace lookup failures.
Current logic treats any API error (including auth failures, transient errors, etc.) as "namespace doesn't exist", which masks real failures and can cause false-green test runs. Restrict the skip to NotFound errors only, and fail on other errors.
This issue appears at both lines 507-510 and 580-583. Add the kapierrors import and use kapierrors.IsNotFound(err) to distinguish between legitimate NotFound scenarios and actual failures:
Fix pattern
import (
"context"
"crypto/tls"
"encoding/json"
"fmt"
"io"
"math/rand"
"os/exec"
"strings"
"time"
g "github.com/onsi/ginkgo/v2"
o "github.com/onsi/gomega"
corev1 "k8s.io/api/core/v1"
+ kapierrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/util/wait"
"k8s.io/client-go/util/retry"
_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})
- if err != nil {
+ if kapierrors.IsNotFound(err) {
g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))
}
+ o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))Apply the same change at line 580-583.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| _, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) | |
| if err != nil { | |
| g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) | |
| } | |
| _, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) | |
| if kapierrors.IsNotFound(err) { | |
| g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) | |
| } | |
| o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace)) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/tls/tls_observed_config.go` around lines 507 - 510, The
namespace lookup currently treats any error as "not found" and skips the test;
change the error handling after
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}) to import k8s.io/apimachinery/pkg/api/errors as kapierrors
and only call g.Skip when kapierrors.IsNotFound(err) is true, otherwise fail the
test (e.g., call the test failure helper you use in this file) so
auth/transport/transient errors are not masked; apply the same change to the
second occurrence of the same oc.AdminKubeClient().CoreV1().Namespaces().Get
call in this file.
| conn, err := tls.Dial("tcp", host, shouldWork) | ||
| if err != nil { | ||
| return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err) | ||
| } | ||
| negotiated := conn.ConnectionState().Version | ||
| conn.Close() | ||
| e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated) | ||
|
|
||
| // Test that the version that should not work is rejected. | ||
| conn, err = tls.Dial("tcp", host, shouldNotWork) | ||
| if err == nil { |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's verify the file exists and check the exact content at the specified lines
fd tls_observed_config.goRepository: openshift/origin
Length of output: 100
🏁 Script executed:
# Once we locate the file, examine the code around lines 702-712
cat -n test/extended/tls/tls_observed_config.go | sed -n '695,720p'Repository: openshift/origin
Length of output: 1505
🏁 Script executed:
# Also check for the function signature to understand the context better
cat -n test/extended/tls/tls_observed_config.go | sed -n '680,715p'Repository: openshift/origin
Length of output: 1707
🏁 Script executed:
# Check for other tls.Dial usages in this file to see the pattern
rg "tls\.Dial" test/extended/tls/tls_observed_config.go -n -B 2 -A 2Repository: openshift/origin
Length of output: 470
🏁 Script executed:
# Check if there are any timeout or context handling patterns in this file
rg "context\.|timeout|Timeout|Deadline" test/extended/tls/tls_observed_config.go -nRepository: openshift/origin
Length of output: 860
🏁 Script executed:
# Check for imports in the file to see what's already imported
head -50 test/extended/tls/tls_observed_config.goRepository: openshift/origin
Length of output: 1905
Add explicit handshake timeouts for TLS probes.
tls.Dial without deadlines can hang under network edge cases and block test progression in the e2e suite. Use tls.DialWithDialer with a timeout to ensure reliable test execution.
🔧 Proposed fix
import (
"context"
"crypto/tls"
"encoding/json"
"fmt"
"io"
"math/rand"
+ "net"
"os/exec"
"strings"
"time"
@@
func checkTLSConnection(localPort int, shouldWork, shouldNotWork *tls.Config, t tlsTarget) error {
host := fmt.Sprintf("localhost:%d", localPort)
+ dialer := &net.Dialer{Timeout: 5 * time.Second}
e2e.Logf("Testing TLS behavior against %s (forwarded from svc/%s in %s)", host, t.serviceName, t.namespace)
// Test that the expected TLS version works.
- conn, err := tls.Dial("tcp", host, shouldWork)
+ conn, err := tls.DialWithDialer(dialer, "tcp", host, shouldWork)
@@
// Test that the version that should not work is rejected.
- conn, err = tls.Dial("tcp", host, shouldNotWork)
+ conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| conn, err := tls.Dial("tcp", host, shouldWork) | |
| if err != nil { | |
| return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err) | |
| } | |
| negotiated := conn.ConnectionState().Version | |
| conn.Close() | |
| e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated) | |
| // Test that the version that should not work is rejected. | |
| conn, err = tls.Dial("tcp", host, shouldNotWork) | |
| if err == nil { | |
| conn, err := tls.DialWithDialer(dialer, "tcp", host, shouldWork) | |
| if err != nil { | |
| return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err) | |
| } | |
| negotiated := conn.ConnectionState().Version | |
| conn.Close() | |
| e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated) | |
| // Test that the version that should not work is rejected. | |
| conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork) | |
| if err == nil { |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@test/extended/tls/tls_observed_config.go` around lines 702 - 712, The TLS
probes use tls.Dial (in the block handling host/shouldWork and
host/shouldNotWork) which can hang; replace calls to tls.Dial with
tls.DialWithDialer using a net.Dialer that sets a reasonable Timeout (e.g., a
few seconds) so both the successful probe and the rejected-version probe use a
dialer with explicit handshake timeout, preserve the same error handling around
conn, ConnectionState(), and conn.Close(), and ensure the dialer timeout
variable is reused for both DialWithDialer invocations to avoid duplicated magic
values.
|
Scheduling required tests: |
|
/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322 |
|
@gangwgr, |
There was a problem hiding this comment.
♻️ Duplicate comments (4)
test/extended/tls/tls_observed_config.go (4)
698-755:⚠️ Potential issue | 🟠 MajorAdd explicit handshake timeouts for TLS dial operations.
tls.Dialwithout deadlines can hang indefinitely under network edge cases, blocking test progression. This was flagged in a previous review. Usetls.DialWithDialerwith a timeout dialer.🔧 Proposed fix
func checkTLSConnection(localPort int, shouldWork, shouldNotWork *tls.Config, t tlsTarget) error { // Try both IPv4 and IPv6 localhost addresses for dual-stack support. hosts := []string{ fmt.Sprintf("127.0.0.1:%d", localPort), // IPv4 fmt.Sprintf("[::1]:%d", localPort), // IPv6 } var host string var conn *tls.Conn var err error + dialer := &net.Dialer{Timeout: 5 * time.Second} // Find a working host (try IPv4 first, then IPv6). for _, h := range hosts { - conn, err = tls.Dial("tcp", h, shouldWork) + conn, err = tls.DialWithDialer(dialer, "tcp", h, shouldWork) if err == nil { host = h break } // ... error handling ... } // ... // Test that the version that should not work is rejected. - conn, err = tls.Dial("tcp", host, shouldNotWork) + conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 698 - 755, The TLS dials in checkTLSConnection can hang; change both tls.Dial calls to use tls.DialWithDialer with a net.Dialer that sets a timeout (e.g. Dialer{Timeout: 10*time.Second}) so the initial probe (inside the hosts loop using shouldWork) and the subsequent negative probe (using shouldNotWork) will fail fast instead of blocking indefinitely; locate tls.Dial invocations in checkTLSConnection and replace them with tls.DialWithDialer(&net.Dialer{Timeout: <reasonable duration>}, "tcp", hostOrH, tlsConfig).
506-510:⚠️ Potential issue | 🟠 MajorSkip only on NotFound errors; fail on other namespace lookup failures.
The current code treats any API error as "namespace doesn't exist" and skips, which masks real failures like auth errors or transient issues. This was flagged in a previous review.
🐛 Proposed fix
g.By(fmt.Sprintf("verifying namespace %s exists", t.namespace)) _, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) - if err != nil { + if kapierrors.IsNotFound(err) { g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) } + o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 506 - 510, The code currently skips on any error from oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}); change it to only skip when the error is a NotFound (use k8s api errors.IsNotFound(err) from "k8s.io/apimachinery/pkg/api/errors") and for any other non-nil error fail the test (use g.Fail or a test fatal with the error) so real API/auth/transient failures aren't masked; keep the existing g.Skip(fmt.Sprintf("Namespace %s does not exist...", t.namespace)) branch for IsNotFound and report the actual err in the failing branch.
648-683:⚠️ Potential issue | 🟠 MajorAdd active readiness probe for port-forward tunnel establishment.
The code starts
oc port-forwardand reads a chunk from stdout, but this doesn't guarantee the tunnel accepts connections. The 10-second context timeout also covers both tunnel setup and TLS test execution, which is too tight. This was flagged in a previous review.🔧 Proposed fix with TCP readiness probe and extended timeout
func forwardPortAndExecute(serviceName, namespace, remotePort string, toExecute func(localPort int) error) error { var err error for i := 0; i < 3; i++ { if err = func() error { - ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second) + ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second) defer cancel() - localPort := rand.Intn(65534-1025) + 1025 + localPort := rand.Intn(64510) + 1025 // [1025, 65534] args := []string{ "port-forward", fmt.Sprintf("svc/%s", serviceName), fmt.Sprintf("%d:%s", localPort, remotePort), "-n", namespace, } cmd := exec.CommandContext(ctx, "oc", args...) stdout, stderr, err := e2e.StartCmdAndStreamOutput(cmd) if err != nil { return fmt.Errorf("failed to start port-forward: %v", err) } defer stdout.Close() defer stderr.Close() defer e2e.TryKill(cmd) - // Give port-forward a moment to establish. - e2e.Logf("oc port-forward output: %s", readPartialFrom(stdout, 1024)) + // Wait until forwarded local port is reachable via active TCP probe. + deadline := time.Now().Add(5 * time.Second) + for { + conn, dialErr := net.DialTimeout("tcp", fmt.Sprintf("127.0.0.1:%d", localPort), 500*time.Millisecond) + if dialErr == nil { + _ = conn.Close() + break + } + if time.Now().After(deadline) { + return fmt.Errorf("port-forward did not become ready for svc/%s in %s: %w", serviceName, namespace, dialErr) + } + time.Sleep(200 * time.Millisecond) + } + e2e.Logf("oc port-forward established to svc/%s in %s on local port %d", serviceName, namespace, localPort) return toExecute(localPort) }(); err == nil {🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 648 - 683, forwardPortAndExecute currently starts oc port-forward with a single 10s context and only reads stdout (via readPartialFrom/StartCmdAndStreamOutput), which doesn't guarantee the tunnel accepts connections; change it to create a dedicated, longer context for the port-forward command (e.g., 30s) and perform an active TCP readiness probe against localhost:localPort (using net.Dial or net.DialTimeout) with several short retries and backoff to confirm the tunnel is accepting connections before calling toExecute; ensure you still defer stdout.Close(), stderr.Close(), and e2e.TryKill(cmd) on failure, and separate the port-forward lifecycle from the test execution (i.e., do not have the port-forward context encompass the entire toExecute run).
579-583:⚠️ Potential issue | 🟠 MajorSame issue: skip only on NotFound errors.
This has the same error handling issue as
testDeploymentTLSEnvVars— any error causes a skip rather than only NotFound errors.🐛 Proposed fix
g.By("verifying namespace exists") _, err = oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) - if err != nil { + if kapierrors.IsNotFound(err) { g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) } + o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 579 - 583, The namespace existence check currently skips the test for any Get error; change it to only skip when the error is a NotFound error by using k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) against the result of oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}), i.e. if apierrors.IsNotFound(err) then g.Skip(...) else if err != nil then fail the test (return or call the test failure helper used in these tests) so genuine API errors surface instead of being treated as missing namespace.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 698-755: The TLS dials in checkTLSConnection can hang; change both
tls.Dial calls to use tls.DialWithDialer with a net.Dialer that sets a timeout
(e.g. Dialer{Timeout: 10*time.Second}) so the initial probe (inside the hosts
loop using shouldWork) and the subsequent negative probe (using shouldNotWork)
will fail fast instead of blocking indefinitely; locate tls.Dial invocations in
checkTLSConnection and replace them with tls.DialWithDialer(&net.Dialer{Timeout:
<reasonable duration>}, "tcp", hostOrH, tlsConfig).
- Around line 506-510: The code currently skips on any error from
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}); change it to only skip when the error is a NotFound (use
k8s api errors.IsNotFound(err) from "k8s.io/apimachinery/pkg/api/errors") and
for any other non-nil error fail the test (use g.Fail or a test fatal with the
error) so real API/auth/transient failures aren't masked; keep the existing
g.Skip(fmt.Sprintf("Namespace %s does not exist...", t.namespace)) branch for
IsNotFound and report the actual err in the failing branch.
- Around line 648-683: forwardPortAndExecute currently starts oc port-forward
with a single 10s context and only reads stdout (via
readPartialFrom/StartCmdAndStreamOutput), which doesn't guarantee the tunnel
accepts connections; change it to create a dedicated, longer context for the
port-forward command (e.g., 30s) and perform an active TCP readiness probe
against localhost:localPort (using net.Dial or net.DialTimeout) with several
short retries and backoff to confirm the tunnel is accepting connections before
calling toExecute; ensure you still defer stdout.Close(), stderr.Close(), and
e2e.TryKill(cmd) on failure, and separate the port-forward lifecycle from the
test execution (i.e., do not have the port-forward context encompass the entire
toExecute run).
- Around line 579-583: The namespace existence check currently skips the test
for any Get error; change it to only skip when the error is a NotFound error by
using k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) against the result of
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}), i.e. if apierrors.IsNotFound(err) then g.Skip(...) else if
err != nil then fail the test (return or call the test failure helper used in
these tests) so genuine API errors surface instead of being treated as missing
namespace.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
pkg/testsuites/standard_suites.gotest/extended/include.gotest/extended/tls/OWNERStest/extended/tls/tls_observed_config.go
🚧 Files skipped from review as they are similar to previous changes (2)
- test/extended/include.go
- test/extended/tls/OWNERS
|
Scheduling required tests: |
|
/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322 |
|
/testwith openshift/origin/master/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322 |
|
@gangwgr, |
|
@gangwgr, |
|
Job Failure Risk Analysis for sha: b0a3132
|
|
/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322 |
|
@gangwgr, |
There was a problem hiding this comment.
🧹 Nitpick comments (3)
test/extended/tls/tls_observed_config.go (3)
202-206: MisplacedGinkgoRecover()call.
g.GinkgoRecover()is intended to be called in goroutines spawned within test code to catch panics and report them to Ginkgo. Placing it at theDescribeblock level withdeferhas no effect since theDescribefunction itself doesn't run in a goroutine that needs panic recovery—Ginkgo already handles panics in test nodes.Remove this line as it serves no purpose here.
🧹 Proposed fix
var _ = g.Describe("[sig-api-machinery][Feature:TLSObservedConfig][Suite:openshift/tls-observed-config][Serial]", func() { - defer g.GinkgoRecover() - oc := exutil.NewCLI("tls-observed-config")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 202 - 206, Remove the misplaced defer g.GinkgoRecover() call from the Describe block in tls_observed_config.go; specifically, delete the line calling g.GinkgoRecover() immediately inside the Describe anonymous function (around the top of the Describe that defines the "[sig-api-machinery]...TLSObservedConfig" suite) because Ginkgo handles panic recovery for Describe-level nodes and GinkgoRecover is only necessary inside spawned goroutines.
655-657: Consider seeding random or using ephemeral port range.The random port selection uses
math/randwithout explicit seeding (Go 1.20+ auto-seeds, but older versions don't). Also, ports 1025-49151 include registered ports that may be in use. Consider using the ephemeral range (49152-65535) to reduce collision likelihood.🧹 Proposed improvement
- localPort := rand.Intn(65534-1025) + 1025 + // Use ephemeral port range (49152-65535) to reduce collision with registered services + localPort := rand.Intn(65535-49152) + 49152🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 655 - 657, The test picks localPort via rand.Intn without seeding and from a broad range; update the selection for stability by either seeding math/rand (e.g., seed once with time.Now().UnixNano()) or—preferably—choose from the IANA ephemeral port range (49152-65535) to avoid registered ports, and use the localPort variable (where it's assigned) accordingly; alternatively consider binding to port 0 (letting the OS pick) where applicable to eliminate collisions.
527-529: Consider checking all containers for TLS env vars.The code only inspects
Containers[0], assuming TLS env vars are always in the first container. If container ordering changes or sidecars are added, this could miss the relevant env vars. Consider iterating over all containers or adding a comment documenting this assumption.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 527 - 529, The test currently only inspects deployment.Spec.Template.Spec.Containers[0] for TLS env vars which can miss values in other containers or sidecars; update the check that uses deployment.Spec.Template.Spec.Containers, envToMap and logEnvVars to iterate over all containers (or merge each container's env into a combined map) and verify t.tlsMinVersionEnvVar in every container's env set, ensuring you call logEnvVars for each container (or once for the merged map) so no container is skipped.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 202-206: Remove the misplaced defer g.GinkgoRecover() call from
the Describe block in tls_observed_config.go; specifically, delete the line
calling g.GinkgoRecover() immediately inside the Describe anonymous function
(around the top of the Describe that defines the
"[sig-api-machinery]...TLSObservedConfig" suite) because Ginkgo handles panic
recovery for Describe-level nodes and GinkgoRecover is only necessary inside
spawned goroutines.
- Around line 655-657: The test picks localPort via rand.Intn without seeding
and from a broad range; update the selection for stability by either seeding
math/rand (e.g., seed once with time.Now().UnixNano()) or—preferably—choose from
the IANA ephemeral port range (49152-65535) to avoid registered ports, and use
the localPort variable (where it's assigned) accordingly; alternatively consider
binding to port 0 (letting the OS pick) where applicable to eliminate
collisions.
- Around line 527-529: The test currently only inspects
deployment.Spec.Template.Spec.Containers[0] for TLS env vars which can miss
values in other containers or sidecars; update the check that uses
deployment.Spec.Template.Spec.Containers, envToMap and logEnvVars to iterate
over all containers (or merge each container's env into a combined map) and
verify t.tlsMinVersionEnvVar in every container's env set, ensuring you call
logEnvVars for each container (or once for the merged map) so no container is
skipped.
ℹ️ Review info
Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge base: Disabled due to data retention organization setting
📒 Files selected for processing (4)
pkg/testsuites/standard_suites.gotest/extended/include.gotest/extended/tls/OWNERStest/extended/tls/tls_observed_config.go
🚧 Files skipped from review as they are similar to previous changes (2)
- test/extended/include.go
- pkg/testsuites/standard_suites.go
|
Scheduling required tests: |
|
@gangwgr: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Test run -https://privatebin.corp.redhat.com/?b0f760b57d709f07#F2u5YDsSTXrxCVWyKze9Jj4bP424HE2ZfhR6HhEFRJ2Y
Summary by CodeRabbit
Tests
Chores