Skip to content

Implement service name caching for UniquePodNames#1796

Merged
openshift-merge-bot[bot] merged 1 commit intoopenstack-k8s-operators:mainfrom
stuggi:backup_restore_uid
Feb 5, 2026
Merged

Implement service name caching for UniquePodNames#1796
openshift-merge-bot[bot] merged 1 commit intoopenstack-k8s-operators:mainfrom
stuggi:backup_restore_uid

Conversation

@stuggi
Copy link
Contributor

@stuggi stuggi commented Feb 4, 2026

Problem:
When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate
unique service names. During CREATE webhook, UID is empty (assigned after mutating
webhooks), causing panic: "slice bounds out of range [:5] with length 0".

Solution - Hybrid Approach:

  1. Webhook handles CREATE and UPDATE:

    • CREATE: Generate random 5-char hex ID (UID not available yet)
    • UPDATE: Look up existing service CR by owner reference, cache its name
    • Uses lib-common's object.CheckOwnerRefExist() for robust ownership checking
  2. Controller triggers webhook for operator upgrades:

    • When UniquePodNames=true but ServiceName="", controller adds annotation
    • Uses lib-common's webhook.EnsureWebhookTrigger() function directly
    • Sets annotation with timestamp, defer block saves it (triggers UPDATE webhook)
    • Webhook caches service name and removes annotation
    • Controller NEVER directly updates spec (follows Kubernetes best practices)

Key Implementation Details:

GetServiceName() fix (api/core/v1beta1/openstackcontrolplane_types.go):

  • Check if len(instance.UID) >= 5 before slicing
  • Return base name if UID unavailable (no panic)

Webhook caching (api/core/v1beta1/openstackcontrolplane_webhook.go):

  • CacheServiceNameForCreate(): Uses random ID since UID unavailable
  • CacheServiceNameForUpdate(): Uses object.CheckOwnerRefExist(ownerUID, cr.GetOwnerReferences())
    to find existing CR owned by this controlplane
  • Preserves existing CR name regardless of format (handles flips correctly)
  • Cleans up reconcile-trigger annotation after caching

Controller annotation trigger (internal/openstack/cinder.go, glance.go):

  • Directly calls webhook.EnsureWebhookTrigger() from lib-common
  • Passes ReconcileTriggerAnnotation constant: "openstack.org/reconcile-trigger"
  • Passes logger and default timeout (0 = 5 minutes)
  • Returns Requeue=true, defer block saves annotation
  • No direct spec mutation (avoids GitOps conflicts, SSA issues)

Benefits:

  • ✅ Consistent naming across reconciliations
  • ✅ Handles operator upgrades (existing CRs get names cached)
  • ✅ GitOps friendly (spec only mutated by webhook, not controller)
  • ✅ No SSA field ownership conflicts, if we'd call update in the
    controller if setting the spec.
  • ✅ Robust flip detection via owner references
  • ✅ Works with existing environments (no manual CR updates needed)
  • ✅ Uses generic webhook trigger mechanism from lib-common

Testing:

  • Added comprehensive tests for all scenarios:
    • CREATE with UniquePodNames (uses random ID)
    • UPDATE with UniquePodNames (caches from existing CR or generates)
    • Operator upgrade scenario (controller triggers webhook)
    • Flip scenarios (UniquePodNames false→true, true→false)
    • Existing CR discovery by owner reference

Depends-On: openstack-k8s-operators/lib-common#671

Jira: OSPRH-26002

@stuggi
Copy link
Contributor Author

stuggi commented Feb 4, 2026

when updating the operators on an existing/deployed env, the ctplane CR gets updated to reflect the existing serviceName via the webhook:

2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Defaulting for OpenStackControlPlane    {"name": "controlplane"}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  default {"name": "controlplane"}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Cached Cinder service name      {"serviceName": "cinder-3c8d9", "isCreate": false}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Cached Glance service name      {"serviceName": "glance-3c8d9", "isCreate": false}
2026-02-04T13:02:20Z    INFO    openstackcontrolplane-resource  Removed reconcile-trigger annotation after caching service name
$ oc get osctlplane -o yaml | grep serviceName:
      serviceName: cinder-3c8d9
      serviceName: glance-3c8d9

@stuggi stuggi force-pushed the backup_restore_uid branch from 875f1ca to 9173588 Compare February 4, 2026 13:17
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Feb 4, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: stuggi

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved label Feb 4, 2026
@fmount fmount self-requested a review February 4, 2026 13:53
@stuggi stuggi requested review from abays and dprince February 4, 2026 16:06
@stuggi stuggi force-pushed the backup_restore_uid branch from 9173588 to e044551 Compare February 4, 2026 20:43
@stuggi
Copy link
Contributor Author

stuggi commented Feb 4, 2026

rebased

@fmount
Copy link
Contributor

fmount commented Feb 5, 2026

The patch looks good to me and having a mechanism that migrates fields at webhook layer seems a good approach. In addition this mechanism can be either extended to support other services or reused for other purposes (see the rabbitmq patch series).
We'll approve/land this patch once lib-common is bumped.

@stuggi stuggi force-pushed the backup_restore_uid branch 2 times, most recently from c8d0b81 to fa2c1ec Compare February 5, 2026 09:17
**Problem:**
When UniquePodNames is enabled, GetServiceName() uses instance.UID[:5] to generate
unique service names. During CREATE webhook, UID is empty (assigned after mutating
webhooks), causing panic: "slice bounds out of range [:5] with length 0".

**Solution - Hybrid Approach:**

1. **Webhook handles CREATE and UPDATE**:
   - CREATE: Generate random 5-char hex ID (UID not available yet)
   - UPDATE: Look up existing service CR by owner reference, cache its name
   - Uses lib-common's object.CheckOwnerRefExist() for robust ownership checking

2. **Controller triggers webhook for operator upgrades**:
   - When UniquePodNames=true but ServiceName="", controller adds annotation
   - Uses lib-common's webhook.EnsureWebhookTrigger() function directly
   - Sets annotation with timestamp, defer block saves it (triggers UPDATE webhook)
   - Webhook caches service name and removes annotation
   - Controller NEVER directly updates spec (follows Kubernetes best practices)

**Key Implementation Details:**

**GetServiceName() fix** (api/core/v1beta1/openstackcontrolplane_types.go):
- Check if len(instance.UID) >= 5 before slicing
- Return base name if UID unavailable (no panic)

**Webhook caching** (api/core/v1beta1/openstackcontrolplane_webhook.go):
- CacheServiceNameForCreate(): Uses random ID since UID unavailable
- CacheServiceNameForUpdate(): Uses object.CheckOwnerRefExist(ownerUID, cr.GetOwnerReferences())
  to find existing CR owned by this controlplane
- Preserves existing CR name regardless of format (handles flips correctly)
- Cleans up reconcile-trigger annotation after caching

**Controller annotation trigger** (internal/openstack/cinder.go, glance.go):
- Directly calls webhook.EnsureWebhookTrigger() from lib-common
- Passes ReconcileTriggerAnnotation constant: "openstack.org/reconcile-trigger"
- Passes logger and default timeout (0 = 5 minutes)
- Returns Requeue=true, defer block saves annotation
- No direct spec mutation (avoids GitOps conflicts, SSA issues)

**Benefits:**
- ✅ Consistent naming across reconciliations
- ✅ Handles operator upgrades (existing CRs get names cached)
- ✅ GitOps friendly (spec only mutated by webhook, not controller)
- ✅ No SSA field ownership conflicts, if we'd call update in the
  controller if setting the spec.
- ✅ Robust flip detection via owner references
- ✅ Works with existing environments (no manual CR updates needed)
- ✅ Uses generic webhook trigger mechanism from lib-common

**Testing:**
- Added comprehensive tests for all scenarios:
  - CREATE with UniquePodNames (uses random ID)
  - UPDATE with UniquePodNames (caches from existing CR or generates)
  - Operator upgrade scenario (controller triggers webhook)
  - Flip scenarios (UniquePodNames false→true, true→false)
  - Existing CR discovery by owner reference

Depends-On: openstack-k8s-operators/lib-common#671

Jira: OSPRH-26002

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Signed-off-by: Martin Schuppert <mschuppert@redhat.com>
@stuggi stuggi force-pushed the backup_restore_uid branch from fa2c1ec to 10ac79c Compare February 5, 2026 09:29
Copy link
Contributor

@abays abays left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@abays abays added the lgtm label Feb 5, 2026
@openshift-merge-bot openshift-merge-bot bot merged commit 3602e24 into openstack-k8s-operators:main Feb 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants