Skip to content

feat: composable lifecycle extensions across compute drivers #1916

@cheese-head

Description

@cheese-head

Problem Statement

OpenShell now has VM-specific lifecycle extension hooks via #1583, but that extension model is local to the VM driver. Other core compute drivers such as Kubernetes, Docker, and Podman have their own lifecycle flows and driver-specific configuration, but they do not expose a common way for operator-installed integrations to participate in validation, plan construction, create/start failure handling, cleanup, or reconciliation.
This would also allow potential external drivers to easily wrap an existing driver instead of reimplementing everything.

Example: a managed Kubernetes driver.

OpenShell gateway
  -> external driver: acme-kubernetes
       -> delegates base lifecycle to core Kubernetes driver
       -> injects custom lifecycle extensions

Proposed Design

OpenShell should define a common lifecycle extension interface for core compute drivers. Each driver keeps ownership of its native implementation, but exposes a small set of lifecycle phases where operator-installed extensions can validate requests, inspect or mutate a driver-owned plan, allocate side resources, handle create failure, clean up on delete, and reconcile after restart.

The VM driver's existing lifecycle hooks should become the reference implementation. Kubernetes could be the first non-VM target, with a mutable plan covering the Sandbox CR, pod template, namespace, labels/annotations, Services, NetworkPolicies, and resource claims.

External compute drivers can either fully implement the compute-driver protocol or just provide their specific extensions to an existing compute driver.

Alternatives Considered

Fully external compute drivers were also considered. That works for completely new backends, but it is too heavy when the desired behavior is mostly an existing driver plus site-specific customization, such as Kubernetes with tenant namespaces, DRA claims, IPAM, or NetworkPolicy.

Agent Investigation

The investigation found that this is feasible, but should be treated as an
RFC/design feature rather than a small implementation change.

Key findings:

  • OpenShell already has a common internal compute-driver protocol in
    proto/compute_driver.proto, with lifecycle RPCs for validate, create, stop,
    delete, get, list, and watch.
  • The gateway wraps drivers through ComputeRuntime, but driver selection is
    still tied to the fixed ComputeDriverKind enum: kubernetes, vm,
    docker, and podman.
  • VM has the strongest precedent: openshell-driver-vm defines
    LifecycleExtension, LifecycleExtensionRegistry, LaunchPlan, activation
    labels, failure cleanup, delete cleanup, and restore hooks.
  • Kubernetes, Docker, and Podman each have clear lifecycle and plan-building
    points, but no shared hook abstraction. Kubernetes is the best first non-VM
    target because it already builds a mutable Sandbox CR / pod template before
    create.
  • driver_config is already selected and forwarded per active driver, but it is
    request data. It does not provide lifecycle ordering, rollback, cleanup,
    reconciliation, or audit semantics.
  • External/wrapper drivers need a config identity model so wrapper config,
    base-driver config, and extension config do not get conflated.
  • Main risks are API shape, security boundaries, rollback/idempotency,
    capability discovery, and how extension activation is authorized.
  • Expected tests include hook ordering, unsupported capability behavior,
    rollback on create failure, idempotent cleanup, Kubernetes plan mutation, and
    regression coverage for the existing VM hooks.

Checklist

  • I've reviewed existing issues and the architecture docs
  • This is a design proposal, not a "please build this" request

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions