TLS propagation test by gangwgr · Pull Request #30801 · openshift/origin

gangwgr · 2026-02-19T13:41:05Z

Test run -https://privatebin.corp.redhat.com/?b0f760b57d709f07#F2u5YDsSTXrxCVWyKze9Jj4bP424HE2ZfhR6HhEFRJ2Y

Summary by CodeRabbit

Tests
- Added a new TLS verification test suite validating TLS versions and cipher suites across services and namespaces.
- Verifies operator-observed config and TLS-related environment variables propagate to deployments.
- Adds wire-level TLS handshake tests and coverage for TLS profile transitions, including validating system stability after profile changes.
Chores
- Registered the new TLS extended tests and added ownership metadata for review/approval.

openshift-ci-robot · 2026-02-19T13:41:09Z

Pipeline controller notification
This repo is configured to use the pipeline controller. Second-stage tests will be triggered either automatically or after lgtm label is added, depending on the repository configuration. The pipeline controller will automatically detect which contexts are required and will utilize /test Prow commands to trigger the second stage.

For optional jobs, comment /test ? to see a list of all defined jobs. To trigger manually all jobs from second stage use /pipeline required command.

This repository is configured in: automatic mode

openshift-ci · 2026-02-19T13:41:42Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: gangwgr
Once this PR has been reviewed and has the lgtm label, please assign dgoodwin for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

gangwgr · 2026-02-19T13:54:27Z

/pipeline required

openshift-ci-robot · 2026-02-19T13:54:31Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-02-23T04:24:44Z

/retest

gangwgr · 2026-02-23T08:48:18Z

/test e2e-metal-ipi-ovn-ipv6

openshift-ci-robot · 2026-02-23T10:12:18Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-02-23T14:29:09Z

/retest-required

openshift-trt · 2026-02-24T03:52:07Z

Job Failure Risk Analysis for sha: 21e1845

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	High [sig-cluster-lifecycle][OCPFeatureGate:ImageStreamImportMode] ClusterVersion API desired architecture should be valid when architecture is set in release payload metadata [apigroup:config.openshift.io] [Suite:openshift/conformance/parallel] This test has passed 98.31% of 3130 runs on release 4.22 [Overall] in the last week.

openshift-ci-robot · 2026-02-24T06:53:13Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-02-25T10:56:36Z

/retest-required

coderabbitai · 2026-02-26T10:07:52Z

Walkthrough

Adds a new end-to-end TLS ObservedConfig test suite: registers the suite, imports the TLS extended tests, adds OWNERS, and implements tests validating operator observedConfig, deployment TLS env-vars, and wire-level TLS behavior including a TLS profile change/restore flow.

Changes

Cohort / File(s)	Summary
Test Suite Registration `pkg/testsuites/standard_suites.go`, `test/extended/include.go`	Registers new `openshift/tls-observed-config` suite in staticSuites and adds a blank import for the TLS extended tests.
TLS Verification Tests `test/extended/tls/tls_observed_config.go`	New Ginkgo test implementing per-target definitions and tests: operator observedConfig checks, deployment TLS env-var propagation checks, wire-level TLS handshakes via port-forward, and a config-change test that switches APIServer TLS profile to Modern then restores it with rollout/wait logic across ClusterOperators.
Package Configuration `test/extended/tls/OWNERS`	Adds OWNERS file specifying reviewers and approvers for the TLS test package.
Module Manifest `go.mod`	Minor change (single-line update referenced in raw summary).

Sequence Diagram(s)

sequenceDiagram
    participant Test as Test Suite
    participant DynClient as Dynamic Client
    participant ApiServer as Cluster APIServer
    participant Deployment as Target Deployment
    participant ClusterOp as ClusterOperator

    Test->>DynClient: Fetch operator config (observedConfig)
    DynClient-->>Test: observedConfig (minTLSVersion, cipherSuites)
    Test->>Deployment: Inspect container env vars
    Deployment-->>Test: Env var values
    Test->>Test: Compare env vars vs observedConfig/profile
    Test->>Deployment: Port-forward to service and perform TLS handshake checks
    Deployment-->>Test: TLS handshake results
    Test->>ApiServer: Update TLS profile (Modern) [config-change test]
    ApiServer-->>Test: Profile updated
    Test->>ClusterOp: Wait for related ClusterOperators to stabilize
    ClusterOp-->>Test: Stable
    Test->>ApiServer: Restore original TLS profile
    ApiServer-->>Test: Profile restored
    Test->>ClusterOp: Wait for stabilization
    ClusterOp-->>Test: Stable

sequenceDiagram
    participant Test as Test Suite
    participant Validator as Validator
    participant RolloutMgr as Rollout Watcher
    participant ClusterOp as ClusterOperators

    Test->>Validator: Trigger per-target validations (observedConfig, env, wire)
    Validator-->>Test: Validation results
    Test->>RolloutMgr: Initiate rollout waits after profile change
    RolloutMgr->>ClusterOp: Poll cluster operator statuses
    ClusterOp-->>RolloutMgr: Report rollout progress/completion
    RolloutMgr-->>Test: All relevant operators stabilized

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~65 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Test Structure And Quality	⚠️ Warning	Test functions receive untimed context.Background() and make cluster API calls without timeout protection.	Add timeout contexts to testObservedConfig, testDeploymentTLSEnvVars, and testWireLevelTLS before cluster API operations.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title 'TLS propagation test' directly aligns with the main changes: adding a comprehensive TLS propagation test suite that validates TLS environment variables, ObservedConfig settings, and wire-level TLS behavior across OpenShift services.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.
Stable And Deterministic Test Names	✅ Passed	Test names are stable and deterministic, generated from hardcoded static values in the targets list that do not change between runs.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.5.0)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/docs/product/migration-guide for migration instructions

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 702-712: The TLS probes use tls.Dial (in the block handling
host/shouldWork and host/shouldNotWork) which can hang; replace calls to
tls.Dial with tls.DialWithDialer using a net.Dialer that sets a reasonable
Timeout (e.g., a few seconds) so both the successful probe and the
rejected-version probe use a dialer with explicit handshake timeout, preserve
the same error handling around conn, ConnectionState(), and conn.Close(), and
ensure the dialer timeout variable is reused for both DialWithDialer invocations
to avoid duplicated magic values.
- Around line 507-510: The namespace lookup currently treats any error as "not
found" and skips the test; change the error handling after
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}) to import k8s.io/apimachinery/pkg/api/errors as kapierrors
and only call g.Skip when kapierrors.IsNotFound(err) is true, otherwise fail the
test (e.g., call the test failure helper you use in this file) so
auth/transport/transient errors are not masked; apply the same change to the
second occurrence of the same oc.AdminKubeClient().CoreV1().Namespaces().Get
call in this file.
- Around line 654-676: Replace the unreliable stdout check after starting
port-forward (where e2e.StartCmdAndStreamOutput is used and
readPartialFrom(stdout, 1024) is called) with an active TCP readiness probe:
repeatedly attempt to open a TCP connection to 127.0.0.1:localPort until success
or a short deadline elapses, then proceed to call toExecute(localPort). Use the
existing ctx/cancel or a separate short timeout for the probe, keep defers for
stdout/stderr and TryKill(cmd), and ensure probe retries/backoff are short so
the test fails fast if the tunnel never becomes reachable.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between a631c60 and b8f1f94.

📒 Files selected for processing (4)

pkg/testsuites/standard_suites.go
test/extended/include.go
test/extended/tls/OWNERS
test/extended/tls/tls_observed_config.go

coderabbitai · 2026-02-26T10:16:18Z

test/extended/tls/tls_observed_config.go

+	_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})
+	if err != nil {
+		g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))
+	}


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's check the file structure and locate the file find . -name "tls_observed_config.go" -type f

Repository: openshift/origin

Length of output: 102

🏁 Script executed:

# Get the file size to determine how to read it wc -l test/extended/tls/tls_observed_config.go

Repository: openshift/origin

Length of output: 104

🏁 Script executed:

# Read the specific lines around 507-510 sed -n '500,515p' test/extended/tls/tls_observed_config.go

Repository: openshift/origin

Length of output: 869

🏁 Script executed:

# Read the specific lines around 580-583 sed -n '575,590p' test/extended/tls/tls_observed_config.go

Repository: openshift/origin

Length of output: 902

🏁 Script executed:

# Check the imports section of the file head -50 test/extended/tls/tls_observed_config.go

Repository: openshift/origin

Length of output: 1905

🏁 Script executed:

# Search for all instances of namespace Get() calls to find similar patterns rg 'Namespaces\(\)\.Get' test/extended/tls/tls_observed_config.go -A 3

Repository: openshift/origin

Length of output: 459

🏁 Script executed:

# Search for kapierrors usage in the codebase rg "kapierrors" test/extended/ -A 2 | head -30

Repository: openshift/origin

Length of output: 2263

🏁 Script executed:

# Search for IsNotFound usage to see the common pattern rg "IsNotFound" test/extended/ -B 1 -A 1 | head -40

Repository: openshift/origin

Length of output: 3028

🏁 Script executed:

# Check what error packages are imported in similar test files rg 'apimachinery/pkg/api/errors' test/extended/ | head -10

Repository: openshift/origin

Length of output: 999

Skip only on NotFound errors; fail on other namespace lookup failures.

Current logic treats any API error (including auth failures, transient errors, etc.) as "namespace doesn't exist", which masks real failures and can cause false-green test runs. Restrict the skip to NotFound errors only, and fail on other errors.

This issue appears at both lines 507-510 and 580-583. Add the kapierrors import and use kapierrors.IsNotFound(err) to distinguish between legitimate NotFound scenarios and actual failures:

Fix pattern

import ( "context" "crypto/tls" "encoding/json" "fmt" "io" "math/rand" "os/exec" "strings" "time" g "github.com/onsi/ginkgo/v2" o "github.com/onsi/gomega" corev1 "k8s.io/api/core/v1" + kapierrors "k8s.io/apimachinery/pkg/api/errors" metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" "k8s.io/apimachinery/pkg/util/wait" "k8s.io/client-go/util/retry" _, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) - if err != nil { + if kapierrors.IsNotFound(err) { g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace)) } + o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))

Apply the same change at line 580-583.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})

if err != nil {

g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))

}

_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})

if kapierrors.IsNotFound(err) {

g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))

}

o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 507 - 510, The namespace lookup currently treats any error as "not found" and skips the test; change the error handling after oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{}) to import k8s.io/apimachinery/pkg/api/errors as kapierrors and only call g.Skip when kapierrors.IsNotFound(err) is true, otherwise fail the test (e.g., call the test failure helper you use in this file) so auth/transport/transient errors are not masked; apply the same change to the second occurrence of the same oc.AdminKubeClient().CoreV1().Namespaces().Get call in this file.

test/extended/tls/tls_observed_config.go

coderabbitai · 2026-02-26T10:16:18Z

test/extended/tls/tls_observed_config.go

+	conn, err := tls.Dial("tcp", host, shouldWork)
+	if err != nil {
+		return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err)
+	}
+	negotiated := conn.ConnectionState().Version
+	conn.Close()
+	e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated)
+
+	// Test that the version that should not work is rejected.
+	conn, err = tls.Dial("tcp", host, shouldNotWork)
+	if err == nil {


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's verify the file exists and check the exact content at the specified lines fd tls_observed_config.go

Repository: openshift/origin

Length of output: 100

🏁 Script executed:

# Once we locate the file, examine the code around lines 702-712 cat -n test/extended/tls/tls_observed_config.go | sed -n '695,720p'

Repository: openshift/origin

Length of output: 1505

🏁 Script executed:

# Also check for the function signature to understand the context better cat -n test/extended/tls/tls_observed_config.go | sed -n '680,715p'

Repository: openshift/origin

Length of output: 1707

🏁 Script executed:

# Check for other tls.Dial usages in this file to see the pattern rg "tls\.Dial" test/extended/tls/tls_observed_config.go -n -B 2 -A 2

Repository: openshift/origin

Length of output: 470

🏁 Script executed:

# Check if there are any timeout or context handling patterns in this file rg "context\.|timeout|Timeout|Deadline" test/extended/tls/tls_observed_config.go -n

Repository: openshift/origin

Length of output: 860

🏁 Script executed:

# Check for imports in the file to see what's already imported head -50 test/extended/tls/tls_observed_config.go

Repository: openshift/origin

Length of output: 1905

Add explicit handshake timeouts for TLS probes.

tls.Dial without deadlines can hang under network edge cases and block test progression in the e2e suite. Use tls.DialWithDialer with a timeout to ensure reliable test execution.

🔧 Proposed fix

import ( "context" "crypto/tls" "encoding/json" "fmt" "io" "math/rand" + "net" "os/exec" "strings" "time" @@ func checkTLSConnection(localPort int, shouldWork, shouldNotWork *tls.Config, t tlsTarget) error { host := fmt.Sprintf("localhost:%d", localPort) + dialer := &net.Dialer{Timeout: 5 * time.Second} e2e.Logf("Testing TLS behavior against %s (forwarded from svc/%s in %s)", host, t.serviceName, t.namespace) // Test that the expected TLS version works. - conn, err := tls.Dial("tcp", host, shouldWork) + conn, err := tls.DialWithDialer(dialer, "tcp", host, shouldWork) @@ // Test that the version that should not work is rejected. - conn, err = tls.Dial("tcp", host, shouldNotWork) + conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

conn, err := tls.Dial("tcp", host, shouldWork)

if err != nil {

return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err)

}

negotiated := conn.ConnectionState().Version

conn.Close()

e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated)

// Test that the version that should not work is rejected.

conn, err = tls.Dial("tcp", host, shouldNotWork)

if err == nil {

conn, err := tls.DialWithDialer(dialer, "tcp", host, shouldWork)

if err != nil {

return fmt.Errorf("svc/%s in %s: TLS connection that should work failed: %w", t.serviceName, t.namespace, err)

}

negotiated := conn.ConnectionState().Version

conn.Close()

e2e.Logf("svc/%s in %s: TLS connection succeeded, negotiated version: 0x%04x", t.serviceName, t.namespace, negotiated)

// Test that the version that should not work is rejected.

conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork)

if err == nil {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@test/extended/tls/tls_observed_config.go` around lines 702 - 712, The TLS probes use tls.Dial (in the block handling host/shouldWork and host/shouldNotWork) which can hang; replace calls to tls.Dial with tls.DialWithDialer using a net.Dialer that sets a reasonable Timeout (e.g., a few seconds) so both the successful probe and the rejected-version probe use a dialer with explicit handshake timeout, preserve the same error handling around conn, ConnectionState(), and conn.Close(), and ensure the dialer timeout variable is reused for both DialWithDialer invocations to avoid duplicated magic values.

openshift-ci-robot · 2026-02-26T10:28:26Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-02-26T14:46:20Z

/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322

openshift-ci · 2026-02-26T14:46:30Z

@gangwgr, testwith: could not generate prow job. ERROR:

could not determine cluster for job pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6: error performing request: got unexpected http 404 status code from dispatcher: Cluster not found

coderabbitai

♻️ Duplicate comments (4)

test/extended/tls/tls_observed_config.go (4)

698-755: ⚠️ Potential issue | 🟠 Major

Add explicit handshake timeouts for TLS dial operations.

tls.Dial without deadlines can hang indefinitely under network edge cases, blocking test progression. This was flagged in a previous review. Use tls.DialWithDialer with a timeout dialer.

🔧 Proposed fix

 func checkTLSConnection(localPort int, shouldWork, shouldNotWork *tls.Config, t tlsTarget) error {
 	// Try both IPv4 and IPv6 localhost addresses for dual-stack support.
 	hosts := []string{
 		fmt.Sprintf("127.0.0.1:%d", localPort), // IPv4
 		fmt.Sprintf("[::1]:%d", localPort),     // IPv6
 	}

 	var host string
 	var conn *tls.Conn
 	var err error
+	dialer := &net.Dialer{Timeout: 5 * time.Second}

 	// Find a working host (try IPv4 first, then IPv6).
 	for _, h := range hosts {
-		conn, err = tls.Dial("tcp", h, shouldWork)
+		conn, err = tls.DialWithDialer(dialer, "tcp", h, shouldWork)
 		if err == nil {
 			host = h
 			break
 		}
 		// ... error handling ...
 	}
 	// ...
 	// Test that the version that should not work is rejected.
-	conn, err = tls.Dial("tcp", host, shouldNotWork)
+	conn, err = tls.DialWithDialer(dialer, "tcp", host, shouldNotWork)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 698 - 755, The TLS
dials in checkTLSConnection can hang; change both tls.Dial calls to use
tls.DialWithDialer with a net.Dialer that sets a timeout (e.g. Dialer{Timeout:
10*time.Second}) so the initial probe (inside the hosts loop using shouldWork)
and the subsequent negative probe (using shouldNotWork) will fail fast instead
of blocking indefinitely; locate tls.Dial invocations in checkTLSConnection and
replace them with tls.DialWithDialer(&net.Dialer{Timeout: <reasonable
duration>}, "tcp", hostOrH, tlsConfig).

506-510: ⚠️ Potential issue | 🟠 Major

Skip only on NotFound errors; fail on other namespace lookup failures.

The current code treats any API error as "namespace doesn't exist" and skips, which masks real failures like auth errors or transient issues. This was flagged in a previous review.

🐛 Proposed fix

 	g.By(fmt.Sprintf("verifying namespace %s exists", t.namespace))
 	_, err := oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})
-	if err != nil {
+	if kapierrors.IsNotFound(err) {
 		g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))
 	}
+	o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 506 - 510, The code
currently skips on any error from
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}); change it to only skip when the error is a NotFound (use
k8s api errors.IsNotFound(err) from "k8s.io/apimachinery/pkg/api/errors") and
for any other non-nil error fail the test (use g.Fail or a test fatal with the
error) so real API/auth/transient failures aren't masked; keep the existing
g.Skip(fmt.Sprintf("Namespace %s does not exist...", t.namespace)) branch for
IsNotFound and report the actual err in the failing branch.

648-683: ⚠️ Potential issue | 🟠 Major

Add active readiness probe for port-forward tunnel establishment.

The code starts oc port-forward and reads a chunk from stdout, but this doesn't guarantee the tunnel accepts connections. The 10-second context timeout also covers both tunnel setup and TLS test execution, which is too tight. This was flagged in a previous review.

🔧 Proposed fix with TCP readiness probe and extended timeout

 func forwardPortAndExecute(serviceName, namespace, remotePort string, toExecute func(localPort int) error) error {
 	var err error
 	for i := 0; i < 3; i++ {
 		if err = func() error {
-			ctx, cancel := context.WithTimeout(context.Background(), 10*time.Second)
+			ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
 			defer cancel()
-			localPort := rand.Intn(65534-1025) + 1025
+			localPort := rand.Intn(64510) + 1025 // [1025, 65534]
 			args := []string{
 				"port-forward",
 				fmt.Sprintf("svc/%s", serviceName),
 				fmt.Sprintf("%d:%s", localPort, remotePort),
 				"-n", namespace,
 			}

 			cmd := exec.CommandContext(ctx, "oc", args...)
 			stdout, stderr, err := e2e.StartCmdAndStreamOutput(cmd)
 			if err != nil {
 				return fmt.Errorf("failed to start port-forward: %v", err)
 			}
 			defer stdout.Close()
 			defer stderr.Close()
 			defer e2e.TryKill(cmd)

-			// Give port-forward a moment to establish.
-			e2e.Logf("oc port-forward output: %s", readPartialFrom(stdout, 1024))
+			// Wait until forwarded local port is reachable via active TCP probe.
+			deadline := time.Now().Add(5 * time.Second)
+			for {
+				conn, dialErr := net.DialTimeout("tcp", fmt.Sprintf("127.0.0.1:%d", localPort), 500*time.Millisecond)
+				if dialErr == nil {
+					_ = conn.Close()
+					break
+				}
+				if time.Now().After(deadline) {
+					return fmt.Errorf("port-forward did not become ready for svc/%s in %s: %w", serviceName, namespace, dialErr)
+				}
+				time.Sleep(200 * time.Millisecond)
+			}
+			e2e.Logf("oc port-forward established to svc/%s in %s on local port %d", serviceName, namespace, localPort)
 			return toExecute(localPort)
 		}(); err == nil {

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 648 - 683,
forwardPortAndExecute currently starts oc port-forward with a single 10s context
and only reads stdout (via readPartialFrom/StartCmdAndStreamOutput), which
doesn't guarantee the tunnel accepts connections; change it to create a
dedicated, longer context for the port-forward command (e.g., 30s) and perform
an active TCP readiness probe against localhost:localPort (using net.Dial or
net.DialTimeout) with several short retries and backoff to confirm the tunnel is
accepting connections before calling toExecute; ensure you still defer
stdout.Close(), stderr.Close(), and e2e.TryKill(cmd) on failure, and separate
the port-forward lifecycle from the test execution (i.e., do not have the
port-forward context encompass the entire toExecute run).

579-583: ⚠️ Potential issue | 🟠 Major

Same issue: skip only on NotFound errors.

This has the same error handling issue as testDeploymentTLSEnvVars — any error causes a skip rather than only NotFound errors.

🐛 Proposed fix

 	g.By("verifying namespace exists")
 	_, err = oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace, metav1.GetOptions{})
-	if err != nil {
+	if kapierrors.IsNotFound(err) {
 		g.Skip(fmt.Sprintf("Namespace %s does not exist in this cluster", t.namespace))
 	}
+	o.Expect(err).NotTo(o.HaveOccurred(), fmt.Sprintf("failed to query namespace %s", t.namespace))

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 579 - 583, The
namespace existence check currently skips the test for any Get error; change it
to only skip when the error is a NotFound error by using
k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) against the result of
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}), i.e. if apierrors.IsNotFound(err) then g.Skip(...) else if
err != nil then fail the test (return or call the test failure helper used in
these tests) so genuine API errors surface instead of being treated as missing
namespace.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 698-755: The TLS dials in checkTLSConnection can hang; change both
tls.Dial calls to use tls.DialWithDialer with a net.Dialer that sets a timeout
(e.g. Dialer{Timeout: 10*time.Second}) so the initial probe (inside the hosts
loop using shouldWork) and the subsequent negative probe (using shouldNotWork)
will fail fast instead of blocking indefinitely; locate tls.Dial invocations in
checkTLSConnection and replace them with tls.DialWithDialer(&net.Dialer{Timeout:
<reasonable duration>}, "tcp", hostOrH, tlsConfig).
- Around line 506-510: The code currently skips on any error from
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}); change it to only skip when the error is a NotFound (use
k8s api errors.IsNotFound(err) from "k8s.io/apimachinery/pkg/api/errors") and
for any other non-nil error fail the test (use g.Fail or a test fatal with the
error) so real API/auth/transient failures aren't masked; keep the existing
g.Skip(fmt.Sprintf("Namespace %s does not exist...", t.namespace)) branch for
IsNotFound and report the actual err in the failing branch.
- Around line 648-683: forwardPortAndExecute currently starts oc port-forward
with a single 10s context and only reads stdout (via
readPartialFrom/StartCmdAndStreamOutput), which doesn't guarantee the tunnel
accepts connections; change it to create a dedicated, longer context for the
port-forward command (e.g., 30s) and perform an active TCP readiness probe
against localhost:localPort (using net.Dial or net.DialTimeout) with several
short retries and backoff to confirm the tunnel is accepting connections before
calling toExecute; ensure you still defer stdout.Close(), stderr.Close(), and
e2e.TryKill(cmd) on failure, and separate the port-forward lifecycle from the
test execution (i.e., do not have the port-forward context encompass the entire
toExecute run).
- Around line 579-583: The namespace existence check currently skips the test
for any Get error; change it to only skip when the error is a NotFound error by
using k8s.io/apimachinery/pkg/api/errors.IsNotFound(err) against the result of
oc.AdminKubeClient().CoreV1().Namespaces().Get(ctx, t.namespace,
metav1.GetOptions{}), i.e. if apierrors.IsNotFound(err) then g.Skip(...) else if
err != nil then fail the test (return or call the test failure helper used in
these tests) so genuine API errors surface instead of being treated as missing
namespace.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between b8f1f94 and b0a3132.

📒 Files selected for processing (4)

pkg/testsuites/standard_suites.go
test/extended/include.go
test/extended/tls/OWNERS
test/extended/tls/tls_observed_config.go

🚧 Files skipped from review as they are similar to previous changes (2)

test/extended/include.go
test/extended/tls/OWNERS

openshift-ci-robot · 2026-02-26T15:15:55Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

gangwgr · 2026-02-26T15:16:52Z

/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322

gangwgr · 2026-02-26T15:17:16Z

/testwith openshift/origin/master/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322

openshift-ci · 2026-02-26T15:17:42Z

@gangwgr, testwith: could not generate prow job. ERROR:

could not determine cluster for job pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6: error performing request: got unexpected http 404 status code from dispatcher: Cluster not found

openshift-ci · 2026-02-26T15:17:54Z

@gangwgr, testwith: could not generate prow job. ERROR:

could not determine ci op config from metadata: got unexpected http 404 status code from configresolver: failed to get config: could not find any config for branch master on repo openshift/origin

openshift-trt · 2026-02-26T22:00:55Z

Job Failure Risk Analysis for sha: b0a3132

Job Name	Failure Risk
pull-ci-openshift-origin-main-e2e-metal-ipi-ovn-ipv6	IncompleteTests Tests for this run (25) are below the historical average (2372): IncompleteTests (not enough tests ran to make a reasonable risk analysis; this could be due to infra, installation, or upgrade problems)

gangwgr · 2026-02-27T10:45:07Z

/testwith openshift/origin/main/e2e-metal-ipi-serial-ovn-ipv6 openshift/cluster-openshift-apiserver-operator#657 openshift/cluster-image-registry-operator#1297 openshift/cluster-version-operator#1322

openshift-ci · 2026-02-27T10:45:34Z

@gangwgr, testwith: could not generate prow job. ERROR:

could not determine cluster for job pull-ci-openshift-origin-main-e2e-metal-ipi-serial-ovn-ipv6: error performing request: got unexpected http 404 status code from dispatcher: Cluster not found

coderabbitai

🧹 Nitpick comments (3)

test/extended/tls/tls_observed_config.go (3)
202-206: Misplaced GinkgoRecover() call.

g.GinkgoRecover() is intended to be called in goroutines spawned within test code to catch panics and report them to Ginkgo. Placing it at the Describe block level with defer has no effect since the Describe function itself doesn't run in a goroutine that needs panic recovery—Ginkgo already handles panics in test nodes.

Remove this line as it serves no purpose here.
🧹 Proposed fix
 var _ = g.Describe("[sig-api-machinery][Feature:TLSObservedConfig][Suite:openshift/tls-observed-config][Serial]", func() {
-	defer g.GinkgoRecover()
-
 	oc := exutil.NewCLI("tls-observed-config")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 202 - 206, Remove the
misplaced defer g.GinkgoRecover() call from the Describe block in
tls_observed_config.go; specifically, delete the line calling g.GinkgoRecover()
immediately inside the Describe anonymous function (around the top of the
Describe that defines the "[sig-api-machinery]...TLSObservedConfig" suite)
because Ginkgo handles panic recovery for Describe-level nodes and GinkgoRecover
is only necessary inside spawned goroutines.
655-657: Consider seeding random or using ephemeral port range.

The random port selection uses math/rand without explicit seeding (Go 1.20+ auto-seeds, but older versions don't). Also, ports 1025-49151 include registered ports that may be in use. Consider using the ephemeral range (49152-65535) to reduce collision likelihood.
🧹 Proposed improvement
-			localPort := rand.Intn(65534-1025) + 1025
+			// Use ephemeral port range (49152-65535) to reduce collision with registered services
+			localPort := rand.Intn(65535-49152) + 49152
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 655 - 657, The test
picks localPort via rand.Intn without seeding and from a broad range; update the
selection for stability by either seeding math/rand (e.g., seed once with
time.Now().UnixNano()) or—preferably—choose from the IANA ephemeral port range
(49152-65535) to avoid registered ports, and use the localPort variable (where
it's assigned) accordingly; alternatively consider binding to port 0 (letting
the OS pick) where applicable to eliminate collisions.
527-529: Consider checking all containers for TLS env vars.

The code only inspects Containers[0], assuming TLS env vars are always in the first container. If container ordering changes or sidecars are added, this could miss the relevant env vars. Consider iterating over all containers or adding a comment documenting this assumption.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/extended/tls/tls_observed_config.go` around lines 527 - 529, The test
currently only inspects deployment.Spec.Template.Spec.Containers[0] for TLS env
vars which can miss values in other containers or sidecars; update the check
that uses deployment.Spec.Template.Spec.Containers, envToMap and logEnvVars to
iterate over all containers (or merge each container's env into a combined map)
and verify t.tlsMinVersionEnvVar in every container's env set, ensuring you call
logEnvVars for each container (or once for the merged map) so no container is
skipped.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@test/extended/tls/tls_observed_config.go`:
- Around line 202-206: Remove the misplaced defer g.GinkgoRecover() call from
the Describe block in tls_observed_config.go; specifically, delete the line
calling g.GinkgoRecover() immediately inside the Describe anonymous function
(around the top of the Describe that defines the
"[sig-api-machinery]...TLSObservedConfig" suite) because Ginkgo handles panic
recovery for Describe-level nodes and GinkgoRecover is only necessary inside
spawned goroutines.
- Around line 655-657: The test picks localPort via rand.Intn without seeding
and from a broad range; update the selection for stability by either seeding
math/rand (e.g., seed once with time.Now().UnixNano()) or—preferably—choose from
the IANA ephemeral port range (49152-65535) to avoid registered ports, and use
the localPort variable (where it's assigned) accordingly; alternatively consider
binding to port 0 (letting the OS pick) where applicable to eliminate
collisions.
- Around line 527-529: The test currently only inspects
deployment.Spec.Template.Spec.Containers[0] for TLS env vars which can miss
values in other containers or sidecars; update the check that uses
deployment.Spec.Template.Spec.Containers, envToMap and logEnvVars to iterate
over all containers (or merge each container's env into a combined map) and
verify t.tlsMinVersionEnvVar in every container's env set, ensuring you call
logEnvVars for each container (or once for the merged map) so no container is
skipped.

ℹ️ Review info

Configuration used: Repository: openshift/coderabbit/.coderabbit.yaml

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between b0a3132 and ee0403d.

📒 Files selected for processing (4)

pkg/testsuites/standard_suites.go
test/extended/include.go
test/extended/tls/OWNERS
test/extended/tls/tls_observed_config.go

🚧 Files skipped from review as they are similar to previous changes (2)

test/extended/include.go
pkg/testsuites/standard_suites.go

openshift-ci-robot · 2026-02-27T12:08:57Z

Scheduling required tests:
/test e2e-aws-csi
/test e2e-aws-ovn-fips
/test e2e-aws-ovn-microshift
/test e2e-aws-ovn-microshift-serial
/test e2e-aws-ovn-serial-1of2
/test e2e-aws-ovn-serial-2of2
/test e2e-gcp-csi
/test e2e-gcp-ovn
/test e2e-gcp-ovn-upgrade
/test e2e-metal-ipi-ovn-ipv6
/test e2e-vsphere-ovn
/test e2e-vsphere-ovn-upi

openshift-ci · 2026-02-27T12:09:07Z

@gangwgr: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-metal-ipi-ovn-ipv6	`b0a3132`	link	true	`/test e2e-metal-ipi-ovn-ipv6`
ci/prow/e2e-gcp-ovn	`b0a3132`	link	true	`/test e2e-gcp-ovn`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

openshift-ci bot requested review from p0lyn0mial and sjenning February 19, 2026 13:41

gangwgr force-pushed the tls-ci branch from 35aa49b to bc3ebf5 Compare February 19, 2026 13:47

kaleemsiddiqu mentioned this pull request Feb 23, 2026

test: Addition of ginkgo framework dependencies and TLS propagation test openshift/cluster-openshift-controller-manager-operator#409

Closed

gangwgr force-pushed the tls-ci branch from bc3ebf5 to 21e1845 Compare February 23, 2026 09:44

gangwgr changed the title ~~[WIP]TLS propagation test~~ TLS propagation test Feb 23, 2026

gangwgr force-pushed the tls-ci branch from 21e1845 to 71f63e4 Compare February 24, 2026 06:26

gangwgr force-pushed the tls-ci branch from 71f63e4 to b8f1f94 Compare February 26, 2026 10:07

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

gangwgr force-pushed the tls-ci branch from b8f1f94 to b0a3132 Compare February 26, 2026 14:41

coderabbitai bot reviewed Feb 26, 2026

View reviewed changes

TLS propagation test

ee0403d

gangwgr force-pushed the tls-ci branch from b0a3132 to ee0403d Compare February 27, 2026 11:28

coderabbitai bot reviewed Feb 27, 2026

View reviewed changes

Conversation

gangwgr commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-ci-robot commented Feb 19, 2026

Uh oh!

openshift-ci bot commented Feb 19, 2026

Uh oh!

gangwgr commented Feb 19, 2026

Uh oh!

openshift-ci-robot commented Feb 19, 2026

Uh oh!

gangwgr commented Feb 23, 2026

Uh oh!

gangwgr commented Feb 23, 2026

Uh oh!

openshift-ci-robot commented Feb 23, 2026

Uh oh!

gangwgr commented Feb 23, 2026

Uh oh!

openshift-trt bot commented Feb 24, 2026

Uh oh!

openshift-ci-robot commented Feb 24, 2026

Uh oh!

gangwgr commented Feb 25, 2026

Uh oh!

coderabbitai bot commented Feb 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot Feb 26, 2026

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 26, 2026

Uh oh!

gangwgr commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 26, 2026

Uh oh!

gangwgr commented Feb 26, 2026

Uh oh!

gangwgr commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-ci bot commented Feb 26, 2026

Uh oh!

openshift-trt bot commented Feb 26, 2026

Uh oh!

gangwgr commented Feb 27, 2026

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci-robot commented Feb 27, 2026

Uh oh!

openshift-ci bot commented Feb 27, 2026

Uh oh!

Reviewers

Assignees

Labels

gangwgr commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 26, 2026 •

edited

Loading