RHDH: Add disconnected OCP 4.21 nightly CI jobs (helm + operator)#79701
RHDH: Add disconnected OCP 4.21 nightly CI jobs (helm + operator)#79701zdrapela wants to merge 8 commits into
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds disconnected (air-gapped) nightly CI for RHDH: bumps release to OpenShift 4.21, registers ChangesDisconnected Nightly CI Workflow for RHDH
Sequence Diagram(s)sequenceDiagram
participant JobDef as Nightly Job
participant StepRef as Step Registry Ref
participant CmdScript as Commands Script
participant Quay as Quay Registry
participant Cluster as OpenShift Cluster
JobDef->>StepRef: schedule/run job (cron/presubmit)
StepRef->>CmdScript: execute disconnected bootstrap (env overrides + secrets)
CmdScript->>Quay: poll image/tag (when quay.io)
CmdScript->>Cluster: oc login -> verify cluster -> run tests
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRsSuggested labels
🚥 Pre-merge checks | ✅ 12✅ Passed checks (12 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@ci-operator/config/redhat-developer/rhdh/redhat-developer-rhdh-main.yaml`:
- Around line 166-167: The cron entries in the YAML use "0 3 31 12 *" which
schedules jobs only on December 31; update the cron key(s) in
redhat-developer-rhdh-main.yaml (both occurrences of the cron field) to a
nightly schedule such as "0 3 * * *" (or another preferred hourly/minute
pattern) so the job runs every night at the intended time instead of once per
year.
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 186-187: The curl calls that populate GIT_PR_RESPONSE and LONG_SHA
(and other curl usages around the file) lack timeouts and failure handling; wrap
or replace them with a resilient helper (e.g., a function like
call_curl_with_retries) that uses curl options such as --fail --max-time
<seconds> --retry <n> --retry-delay <s> and returns non-zero on failure, then
check the exit status and validate the response before using jq; specifically
update the GIT_PR_RESPONSE assignment and any other direct curl invocations
(references: GIT_PR_RESPONSE, LONG_SHA) to use this helper and add explicit
error handling/logging and a safe exit or retry policy when the request fails or
yields empty/invalid JSON.
- Line 2: Replace the current "set +x" tracing disable with strict shell mode by
changing the script header to enable fail-fast, undefined-var and pipefail
checks (use "set -euo pipefail") and only enable "set -x" temporarily where
tracing is required; update the script's top-level invocation that contains the
"set +x" token so functions and top-level commands (the step script header) run
with strict mode enabled to avoid silently continuing on errors.
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yaml`:
- Around line 10-11: The OC_CLIENT_VERSION default is stale—update the default
value for the OC_CLIENT_VERSION variable from "stable-4.17" to "stable-4.21" in
the YAML where OC_CLIENT_VERSION is defined so the step uses the correct OCP
4.21 client by default; ensure the change is applied to the OC_CLIENT_VERSION
entry and that any jobs/overrides that expect a different client still function
(adjust overrides if needed).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2f16c541-d715-4b8f-a148-2d0b130704a5
⛔ Files ignored due to path filters (2)
ci-operator/jobs/redhat-developer/rhdh/redhat-developer-rhdh-main-periodics.yamlis excluded by!ci-operator/jobs/**ci-operator/jobs/redhat-developer/rhdh/redhat-developer-rhdh-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (10)
ci-operator/config/redhat-developer/rhdh/redhat-developer-rhdh-main.yamlci-operator/step-registry/cluster-profiles/cluster-profiles-config.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.yaml
966f831 to
8a3618a
Compare
Add periodic CI jobs that provision a disconnected/air-gapped OCP 4.21 cluster on AWS and run RHDH E2E tests in both Helm and Operator deployment modes. Changes: - Add 'rhdh' to redhat-developer repos in cluster-profiles-config.yaml for 5 AWS profiles (aws, aws-2, aws-3, aws-4, aws-5) - Create step-registry refs for disconnected helm and operator nightly tests with proxy-conf.sh sourcing and DISCONNECTED=true - Operator commands.sh and OWNERS are symlinks to the helm variant (scripts are identical) - Add 2 periodic test entries using the openshift-e2e-aws-disconnected workflow with openshift-org-aws cluster profile set and ENABLE_IDMS=yes - Update releases.latest from 4.20/fast to 4.21/stable (required by ipi-install-install-aws dependency on release:latest) - Post phase includes full ipi-aws-post-disconnected chain for proper cluster teardown and AWS resource cleanup Jobs are configured with cron 'once a year' (placeholder) and can be triggered on-demand via /test on PRs. Assisted-by: OpenCode
Add a verification section after the cluster health check that validates the disconnected environment before running tests: 1. Bastion host reachability — mirror registry on port 5000 is responding 2. Cluster API reachability — API server is accessible via proxy tunnel 3. Node internet isolation — worker node cannot reach external URLs 4. Image mirror configuration — IDMS or ICSP resources are present 5. Catalog sources disabled — default OperatorHub sources are disabled Checks are non-blocking (warnings only) to avoid masking real test failures during initial rollout. Assisted-by: OpenCode
8a3618a to
2a3013c
Compare
|
/pj-rehearse pull-ci-redhat-developer-rhdh-main-e2e-ocp-disconnected-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (2)
ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yaml (1)
10-11:⚠️ Potential issue | 🟠 Major | ⚡ Quick winUpdate
OC_CLIENT_VERSIONdefault to match the 4.21 flow.At Line 10-11, defaulting to
stable-4.17is stale for this new 4.21 disconnected path and can cause client/server feature drift when no override is provided.For OpenShift CI step-registry jobs targeting OCP 4.21, what OC_CLIENT_VERSION default is recommended, and is using stable-4.17 considered unsupported or risky?🔧 Suggested fix
- name: OC_CLIENT_VERSION - default: stable-4.17 + default: stable-4.21🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yaml` around lines 10 - 11, The OC_CLIENT_VERSION default in the job config is stale (set to stable-4.17) and should be updated to stable-4.21 to match the 4.21 disconnected flow; update the OC_CLIENT_VERSION default value (referencing the OC_CLIENT_VERSION variable in the YAML) to stable-4.21 so client/server feature drift is avoided — using stable-4.17 for 4.21 jobs is unsupported/risky and may cause compatibility issues.ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh (1)
275-276:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHarden external
curlcalls with fail/retry/timeout.At Line 275, Line 338, and Line 369, raw
curl -sis used on critical control flow. Transient network/API errors can yield empty/invalid JSON and drive incorrect tag/SHA decisions.🔧 Suggested fix
+curl_json() { + curl -sS --fail --retry 5 --retry-all-errors --retry-delay 3 --connect-timeout 10 --max-time 30 "$1" +} ... - GIT_PR_RESPONSE=$(curl -s "https://api.github.com/repos/${GITHUB_ORG_NAME}/${GITHUB_REPOSITORY_NAME}/pulls/${GIT_PR_NUMBER}") + GIT_PR_RESPONSE=$(curl_json "https://api.github.com/repos/${GITHUB_ORG_NAME}/${GITHUB_REPOSITORY_NAME}/pulls/${GIT_PR_NUMBER}") ... - response=$(curl -s "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=$TAG_NAME") + response=$(curl_json "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}") ... - IMAGE_SHA=$(curl -s "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}" | jq -r '.tags[0].manifest_digest') + IMAGE_SHA=$(curl_json "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}" | jq -r '.tags[0].manifest_digest')Also applies to: 338-341, 369-369
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh` around lines 275 - 276, Harden the raw curl call that sets GIT_PR_RESPONSE/LONG_SHA by replacing the single-shot `curl -s` with a retry-and-timeout approach: call curl with --fail, a bounded --max-time, and retry flags (e.g., --retry, --retry-connrefused, --retry-delay) inside a small loop or wrapper that retries a few times and exits non-zero if it still returns empty/invalid JSON; after the curl returns, validate that GIT_PR_RESPONSE is non-empty and jq parsing of '.head.sha' succeeded before assigning LONG_SHA and proceeding, and log or fail fast on error so downstream logic cannot use an invalid SHA.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 210-213: The current VERIFICATION_FAILED check prints warnings and
continues, allowing false-green runs; update the conditional that checks the
VERIFICATION_FAILED variable (the if [[ "${VERIFICATION_FAILED}" == "true" ]]
block) to exit non-zero when verification fails by replacing the continuing
behavior with an explicit failure (e.g., call exit 1 after logging the warning
messages) so the CI job fails instead of proceeding on a failed disconnected
verification.
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.yaml`:
- Around line 10-11: Update the OC_CLIENT_VERSION default from stable-4.17 to
stable-4.21 so the oc client matches the OCP 4.21 release under test; locate the
OC_CLIENT_VERSION entry (currently "default: stable-4.17") in the YAML and
change its default value to "stable-4.21".
---
Duplicate comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 275-276: Harden the raw curl call that sets
GIT_PR_RESPONSE/LONG_SHA by replacing the single-shot `curl -s` with a
retry-and-timeout approach: call curl with --fail, a bounded --max-time, and
retry flags (e.g., --retry, --retry-connrefused, --retry-delay) inside a small
loop or wrapper that retries a few times and exits non-zero if it still returns
empty/invalid JSON; after the curl returns, validate that GIT_PR_RESPONSE is
non-empty and jq parsing of '.head.sha' succeeded before assigning LONG_SHA and
proceeding, and log or fail fast on error so downstream logic cannot use an
invalid SHA.
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yaml`:
- Around line 10-11: The OC_CLIENT_VERSION default in the job config is stale
(set to stable-4.17) and should be updated to stable-4.21 to match the 4.21
disconnected flow; update the OC_CLIENT_VERSION default value (referencing the
OC_CLIENT_VERSION variable in the YAML) to stable-4.21 so client/server feature
drift is avoided — using stable-4.17 for 4.21 jobs is unsupported/risky and may
cause compatibility issues.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6ac84aea-939f-4060-8fbe-670550a9cfb1
⛔ Files ignored due to path filters (2)
ci-operator/jobs/redhat-developer/rhdh/redhat-developer-rhdh-main-periodics.yamlis excluded by!ci-operator/jobs/**ci-operator/jobs/redhat-developer/rhdh/redhat-developer-rhdh-main-presubmits.yamlis excluded by!ci-operator/jobs/**
📒 Files selected for processing (12)
ci-operator/config/redhat-developer/rhdh/redhat-developer-rhdh-main.yamlci-operator/step-registry/cluster-profiles/cluster-profiles-config.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.yaml
✅ Files skipped from review due to trivial changes (5)
- ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/OWNERS
- ci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-commands.sh
- ci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/OWNERS
- ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/OWNERS
- ci-operator/step-registry/cluster-profiles/cluster-profiles-config.yaml
- Replace 'set +x' with 'set -euo pipefail' in disconnected commands.sh - Update OC_CLIENT_VERSION default from stable-4.17 to stable-4.21 - Convert ocp/OWNERS and ocp/helm/OWNERS from real files to symlinks chaining up to redhat-developer/OWNERS (single source of truth) Assisted-by: OpenCode
|
/pj-rehearse pull-ci-redhat-developer-rhdh-main-e2e-ocp-disconnected-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh (1)
100-103:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard
KUBEADMIN_PASSWORD_FILEunderset -u(and gate disconnected verification + harden curl).
- Lines 100-103:
set -umakes the unguarded"$KUBEADMIN_PASSWORD_FILE"expansion fatal when the var is unset, preventing the${SHARED_DIR}/kubeadmin-passwordfallback from running.- Lines 210-213: disconnected verification failures only print a warning and continue, which can false-green despite
VERIFICATION_FAILED=true.- Lines 275/338/369:
curl -scalls lack--fail/timeout/retry, allowing silent bad responses or long hangs in CI.🔧 Proposed fix
-if [[ -s "$KUBEADMIN_PASSWORD_FILE" ]]; then - OPENSHIFT_PASSWORD="$(cat "$KUBEADMIN_PASSWORD_FILE")" +if [[ -n "${KUBEADMIN_PASSWORD_FILE:-}" && -s "${KUBEADMIN_PASSWORD_FILE}" ]]; then + OPENSHIFT_PASSWORD="$(cat "${KUBEADMIN_PASSWORD_FILE}")" elif [[ -s "${SHARED_DIR}/kubeadmin-password" ]]; then # Recommendation from hypershift qe team in slack channel.. OPENSHIFT_PASSWORD="$(cat "${SHARED_DIR}/kubeadmin-password")"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh` around lines 100 - 103, The script uses unguarded expansions and weak failure handling: change the KUBEADMIN_PASSWORD_FILE test to reference a safe default (e.g. use the parameter expansion pattern like "${KUBEADMIN_PASSWORD_FILE:-}" or explicitly test -n before using it) so OPENSHIFT_PASSWORD fallback to "${SHARED_DIR}/kubeadmin-password" works under set -u; make disconnected verification treat VERIFICATION_FAILED as a hard failure by checking the VERIFICATION_FAILED flag after verification steps and exiting non-zero (instead of only printing a warning); and harden all curl invocations that currently use "curl -s" (the calls around the occurrences you noted) to include --fail, a connect and total timeout (e.g. --connect-timeout and --max-time) and retry options (e.g. --retry) and preserve -sS or -s only for quiet output so CI fails fast on HTTP errors or hangs.
♻️ Duplicate comments (2)
ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh (2)
275-276:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHarden external
curlcalls with failure/timeout/retry policy.These calls are on the primary path and currently use
curl -sonly; transient API/network issues can hang or yield invalid state.🔧 Proposed fix
+curl_json() { + curl -sS --fail --retry 5 --retry-all-errors --retry-delay 5 --connect-timeout 10 --max-time 30 "$1" +} + ... - GIT_PR_RESPONSE=$(curl -s "https://api.github.com/repos/${GITHUB_ORG_NAME}/${GITHUB_REPOSITORY_NAME}/pulls/${GIT_PR_NUMBER}") + GIT_PR_RESPONSE=$(curl_json "https://api.github.com/repos/${GITHUB_ORG_NAME}/${GITHUB_REPOSITORY_NAME}/pulls/${GIT_PR_NUMBER}") ... - response=$(curl -s "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=$TAG_NAME") + response=$(curl_json "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}") ... - IMAGE_SHA=$(curl -s "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}" | jq -r '.tags[0].manifest_digest') + IMAGE_SHA=$(curl_json "https://quay.io/api/v1/repository/${IMAGE_REPO}/tag/?specificTag=${TAG_NAME}" | jq -r '.tags[0].manifest_digest')Also applies to: 338-338, 369-369
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh` around lines 275 - 276, The curl call that sets GIT_PR_RESPONSE (and similar calls) is currently brittle; harden it by adding a timeout, retry/backoff and failure handling: invoke curl with connection and max-time options and retry loops (or use --retry/--retry-delay) and check the HTTP status code and non-empty response before piping to jq, fail-fast with a clear error message if the request ultimately fails; update the code paths that assign GIT_PR_RESPONSE and subsequently LONG_SHA to validate the response (e.g., ensure jq returns a non-null .head.sha) and exit with a descriptive error if validation fails.
210-213:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFail fast when disconnected verification fails.
Line 210 allows execution to continue after failed disconnected checks, which can produce false-green results.
🔧 Proposed fix
if [[ "${VERIFICATION_FAILED}" == "true" ]]; then - echo "WARNING: One or more disconnected environment checks failed. Review the output above." - echo "Continuing with test execution — failures may indicate environment misconfiguration." + echo "ERROR: One or more disconnected environment checks failed. Aborting." + exit 1 else echo "All disconnected environment checks passed." fi🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh` around lines 210 - 213, The current if-block that checks the VERIFICATION_FAILED variable prints warnings but continues; change it to fail fast by replacing the branch so that when VERIFICATION_FAILED == "true" the script exits with a non-zero status (e.g., exit 1) after logging the warning, ensuring the disconnected verification failures stop further test execution; reference the VERIFICATION_FAILED check and the existing if/else block to locate and update the behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Outside diff comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 100-103: The script uses unguarded expansions and weak failure
handling: change the KUBEADMIN_PASSWORD_FILE test to reference a safe default
(e.g. use the parameter expansion pattern like "${KUBEADMIN_PASSWORD_FILE:-}" or
explicitly test -n before using it) so OPENSHIFT_PASSWORD fallback to
"${SHARED_DIR}/kubeadmin-password" works under set -u; make disconnected
verification treat VERIFICATION_FAILED as a hard failure by checking the
VERIFICATION_FAILED flag after verification steps and exiting non-zero (instead
of only printing a warning); and harden all curl invocations that currently use
"curl -s" (the calls around the occurrences you noted) to include --fail, a
connect and total timeout (e.g. --connect-timeout and --max-time) and retry
options (e.g. --retry) and preserve -sS or -s only for quiet output so CI fails
fast on HTTP errors or hangs.
---
Duplicate comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 275-276: The curl call that sets GIT_PR_RESPONSE (and similar
calls) is currently brittle; harden it by adding a timeout, retry/backoff and
failure handling: invoke curl with connection and max-time options and retry
loops (or use --retry/--retry-delay) and check the HTTP status code and
non-empty response before piping to jq, fail-fast with a clear error message if
the request ultimately fails; update the code paths that assign GIT_PR_RESPONSE
and subsequently LONG_SHA to validate the response (e.g., ensure jq returns a
non-null .head.sha) and exit with a descriptive error if validation fails.
- Around line 210-213: The current if-block that checks the VERIFICATION_FAILED
variable prints warnings but continues; change it to fail fast by replacing the
branch so that when VERIFICATION_FAILED == "true" the script exits with a
non-zero status (e.g., exit 1) after logging the warning, ensuring the
disconnected verification failures stop further test execution; reference the
VERIFICATION_FAILED check and the existing if/else block to locate and update
the behavior.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: ee354e90-a770-4db9-bd2a-ed5dc44cc221
📒 Files selected for processing (13)
ci-operator/step-registry/redhat-developer/rhdh/ocp/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/OWNERSci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/helm/nightly/redhat-developer-rhdh-ocp-helm-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/helm/redhat-developer-rhdh-ocp-helm-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/helm/upgrade/nightly/redhat-developer-rhdh-ocp-helm-upgrade-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.metadata.jsonci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/operator/nightly/redhat-developer-rhdh-ocp-operator-nightly-ref.metadata.json
✅ Files skipped from review due to trivial changes (4)
- ci-operator/step-registry/redhat-developer/rhdh/ocp/OWNERS
- ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/OWNERS
- ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/redhat-developer-rhdh-ocp-helm-ref.metadata.json
- ci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.metadata.json
The openshift-e2e-aws-disconnected workflow pre/post steps (mirror-images-by-oc-adm, gather-installer-boostrap-logs-in-bastion) require the upi-installer image (from: upi-installer). This image is not part of the GA release payload (stable-4.21) but is built from source in the ocp/4.21 integration stream. Import it as a base_image so it is available as stable:upi-installer at runtime. Without this, both mirroring (pre) and artifact gathering (post) fail with ImagePullBackOff, blocking cluster installation and teardown. Assisted-by: OpenCode
|
/pj-rehearse pull-ci-redhat-developer-rhdh-main-e2e-ocp-disconnected-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
Add the openshift-custom-mirror-registry vault secret mount to both helm and operator disconnected nightly ref YAMLs. This makes the mirror registry credentials (registry_creds, client_ca.crt, etc.) available at /var/run/vault/mirror-registry/ in the test pod. The test step needs these credentials to authenticate to the bastion mirror registry when running prepare-restricted-environment.sh to mirror RHDH operator and operand images for air-gapped deployment. This follows the same pattern used by other disconnected test steps (enable-qe-catalogsource-disconnected, openshift-lightspeed-service- install-disconnected, distributed-tracing-install-disconnected, etc.). Assisted-by: OpenCode
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In
`@ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.sh`:
- Around line 54-69: The curl probe currently uses -k which ignores TLS
validation and defeats the MIRROR_REGISTRY_CA export; update the probe to
validate TLS with the mounted CA by replacing -k with --cacert
"${MIRROR_REGISTRY_CA}" (use the MIRROR_REGISTRY_CA variable set from
MIRROR_REGISTRY_CA_FILE) in the curl command that checks
"https://${MIRROR_REGISTRY_URL}/v2/"; ensure the MIRROR_REGISTRY_CA
variable/file (MIRROR_REGISTRY_CA_FILE/MIRROR_REGISTRY_CA) is present before
invoking curl so the probe fails if the CA is missing or invalid.
- Around line 24-58: The script currently converts missing mirror-registry
inputs into empty vars and warnings; instead, detect and fail fast when any
required input is missing: check MIRROR_REGISTRY_URL (from
SHARED_DIR/mirror_registry_url), MIRROR_REGISTRY_CREDS_FILE (registry_creds) and
MIRROR_REGISTRY_CA_FILE (client_ca.crt) and if any is missing or empty, emit a
clear error message referencing MIRROR_REGISTRY_URL, MIRROR_REGISTRY_CREDS_FILE
and MIRROR_REGISTRY_CA_FILE and exit 1; otherwise proceed to create
MIRROR_REGISTRY_PULL_SECRET and export MIRROR_REGISTRY_CA as the existing logic
(keep use of registry_cred, MIRROR_REGISTRY_PULL_SECRET and
CLUSTER_PROFILE_DIR/pull-secret).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 3a16d6c3-c62f-4f1c-9c23-87ccb45ee784
📒 Files selected for processing (3)
ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-commands.shci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yamlci-operator/step-registry/redhat-developer/rhdh/ocp/operator/disconnected/nightly/redhat-developer-rhdh-ocp-operator-disconnected-nightly-ref.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- ci-operator/step-registry/redhat-developer/rhdh/ocp/helm/disconnected/nightly/redhat-developer-rhdh-ocp-helm-disconnected-nightly-ref.yaml
Export MIRROR_REGISTRY_URL, BASTION_PUBLIC_ADDRESS, BASTION_SSH_USER, MIRROR_REGISTRY_PULL_SECRET, and MIRROR_REGISTRY_CA as environment variables early in the script so downstream tools like prepare-restricted-environment.sh can use them. The pull secret is constructed by merging the mirror registry credentials from the vault secret into the cluster pull secret, following the standard pattern used by other disconnected steps. Assisted-by: OpenCode
347381f to
8bc71e5
Compare
Add missing gather-core-dump chain to both disconnected test post phases to match the openshift-e2e-aws-disconnected workflow's default behavior (which gets fully replaced when the config overrides post). Remove rhdh from cluster-profiles-config.yaml ACLs for aws, aws-2, aws-3, aws-4, and aws-5 profiles — the disconnected jobs use openshift-org-aws (unrestricted) and no RHDH job references these profiles. Assisted-by: Claude Code Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
/pj-rehearse pull-ci-redhat-developer-rhdh-main-e2e-ocp-disconnected-helm-nightly |
|
@zdrapela: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: zdrapela The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Switch disconnected CI jobs from openshift-org-aws to the dedicated aws-rhdh-disconnected cluster profile. RHDH is a layered product and must not use the shared core OpenShift AWS budget (openshift-org-aws). The aws-rhdh-disconnected profile is being added in: - openshift/ci-tools#5205 (profile constant) - openshift#79749 (Boskos lease + secret bootstrap) Both PRs must merge before this PR can pass CI validation. Assisted-by: OpenCode
|
@zdrapela, Interacting with pj-rehearseComment: Once you are satisfied with the results of the rehearsals, comment: |
|
@zdrapela: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
Add CI jobs that provision a disconnected OCP 4.21 cluster on AWS for RHDH E2E testing in both Helm and Operator deployment modes.
The jobs are not intended to run periodically yet. The cron schedule
0 3 31 12 */0 4 31 12 *(Dec 31, which only fires once a year) is a placeholder. The actual periodic schedule will be configured after the disconnected E2E test is developed in the RHDH repo. This PR enables triggering the jobs on-demand on any PR via/test e2e-ocp-disconnected-helm-nightlyor/test e2e-ocp-disconnected-operator-nightly.Jira: https://redhat.atlassian.net/browse/RHIDP-13968
Dependencies
This PR depends on (must merge first):
aws-rhdh-disconnectedcluster profile constantRHDH is a layered product and must not use the shared
openshift-org-awsprofile (core OpenShift AWS budget). These jobs use a dedicatedaws-rhdh-disconnectedprofile with RHDH's own AWS account inus-east-2.Disconnected vs Air-gapped: What This PR Configures
This PR configures a disconnected environment — not a fully air-gapped one.
http_access allow authenticated(any destination)proxy-whitelist-enablestep) or no proxyENABLE_IDMS=yes)Key distinction: The
openshift-e2e-aws-disconnectedworkflow provisions a VPC where private subnets have no internet route (no NAT gateway). Cluster nodes genuinely cannot reach the internet directly. However, the bastion host sits on the public subnet and runs a Squid proxy (ports 3128/3129) configured with unrestrictedhttp_access allow authenticated— any authenticated request to any destination is forwarded. The CI test pod sourcesproxy-conf.shand can reach GitHub, Quay, and any external service through this proxy.To create a true air-gapped environment, you would add the
proxy-whitelist-enablestep to restrict the Squid proxy to only essential Red Hat domains (registries, telemetry, cluster apps), blocking all other outbound traffic. This PR does not use that step.Which mode is more useful for RHDH testing?
Disconnected (this PR) is the right starting point for several reasons:
Validates the core customer pain point. The most common "disconnected" RHDH deployment is one where cluster nodes cannot pull from public registries (quay.io, registry.redhat.io) directly — all images must be mirrored and redirected via IDMS/ICSP. This is exactly what this environment provides. The majority of enterprise customers who say "disconnected" mean this scenario, not a fully air-gapped network.
Tests the RHDH image mirroring pipeline. The critical thing to validate is that
prepare-restricted-environment.shcorrectly mirrors all RHDH operator and operand images to the bastion registry, creates the right IDMS rules, and that RHDH starts up pulling exclusively from the mirror. This works identically in both disconnected and air-gapped modes — the cluster nodes behave the same way.Avoids conflating two failure domains. In a true air-gapped setup, the CI test pod itself loses internet access. Any test failure could be caused by either (a) RHDH not working in disconnected mode, or (b) the test harness failing because it can't reach GitHub/Quay/external APIs. Starting with disconnected mode isolates the RHDH-specific failures from test infrastructure issues, making failures actionable.
Incremental tightening is straightforward. Once disconnected E2E tests are stable and green, adding the
proxy-whitelist-enablestep (orproxy-blocklist-enablewith specific domains) is a single-line chain addition to create an air-gapped variant. The test code itself may need adjustments (pre-download test fixtures, avoid external API calls), but the CI infrastructure change is minimal.Matches the RHDH airgap documentation. The RHDH airgap docs distinguish "partially disconnected" (host has internet + mirror access) from "fully disconnected" (bastion-only, no internet). This PR's environment maps to the "partially disconnected" scenario, which is the recommended and more common deployment model.
A future air-gapped variant would be valuable for validating that RHDH's offline bundle is truly self-contained (no surprise external calls from plugins, no hardcoded registry references), but it requires the test harness to also work fully offline — a separate effort.
Disconnected Environment Architecture
Data flow
ipi-aws-pre-disconnected): Provisions isolated VPC, bastion host with mirror registry + Squid proxy, mirrors OCP release payload to bastion:5000, installs OCP cluster withPUBLISH=Internaland IDMS enabledprepare-restricted-environment.shipi-aws-post-disconnected): Destroys cluster, CloudFormation stacks, IAM users, and verifies no AWS resources leakedChanges
Step Registry (new refs)
redhat-developer-rhdh-ocp-helm-disconnected-nightly— disconnected test ref for Helm deploymentredhat-developer-rhdh-ocp-operator-disconnected-nightly— disconnected test ref for Operator deployment (commands.sh is a symlink to Helm variant)rhdhat/tmp/secrets— RHDH-specific credentialsopenshift-custom-mirror-registryat/var/run/vault/mirror-registry— mirror registry credentials (user/pass, CA cert, upstream registry creds)MIRROR_REGISTRY_URL,BASTION_PUBLIC_ADDRESS,BASTION_SSH_USER,MIRROR_REGISTRY_PULL_SECRET,MIRROR_REGISTRY_CA) are exported and authenticated access is verified before test executionCI Config (
redhat-developer-rhdh-main.yaml)base_images.upi-installerfromocp/4.21— required by workflow steps (mirror-images-by-oc-adm,ipi-install-install-aws,gather-installer-boostrap-logs-in-bastion) that need theupi-installerimage not included in GA release payloadse2e-ocp-disconnected-helm-nightly(cron:0 3 31 12 *)e2e-ocp-disconnected-operator-nightly(cron:0 4 31 12 *)workflow: openshift-e2e-aws-disconnected,cluster_profile: aws-rhdh-disconnected,ENABLE_IDMS: "yes"gather-core-dump+ipi-aws-post-disconnected(full AWS teardown) +send-data-router+send-alertGenerated Jobs
always_run: false)Links
Docs
aws-rhdh-disconnectedprofile