Skip to content

Orchestrate eject: preflight, config strip, Mimir cleanup#265

Draft
timtalbot wants to merge 3 commits into
mainfrom
orchestrate-eject
Draft

Orchestrate eject: preflight, config strip, Mimir cleanup#265
timtalbot wants to merge 3 commits into
mainfrom
orchestrate-eject

Conversation

@timtalbot
Copy link
Copy Markdown
Contributor

Summary

Wires up the full ptd eject --dry-run=false flow, turning the eject command from a documentation-only tool into one that actually disconnects the workload from the control room.

  • Preflight checks — validates workload credentials, control room credentials, and that control_room_* fields exist in config before proceeding
  • Config snapshot + strip — snapshots control room fields for the audit record, then zeroes them in ptd.yaml
  • Mimir password deletion — removes the workload's entry from the control room's mimir-auth secret (fault-tolerant: logs and continues on failure)
  • Eject record — writes eject-record.json capturing what happened (snapshot, timestamps, success/failure of each step)
  • Terminology cleanupsevereject/removal throughout; sever.gomimir.go
  • Connection inventory trimmed — removed Loki and Team Operator connections (no longer applicable)
  • Secret name fix — mimir-auth secret resource in connection inventory now correctly uses the control room target name (matching actual deletion logic in mimir.go)

Test plan

  • All existing eject tests pass
  • New test: full happy path strips config and deletes Mimir entry
  • New test: preflight fails on empty config (already-ejected workload)
  • New test: graceful degradation when no control room target available
  • New test: Mimir deletion failure continues with partial success record
  • New test: dry-run skips all destructive steps (config, Mimir, eject record)
  • Manual test against a real workload

Closes #248

…r cleanup

Wire up the full eject flow for --dry-run=false: preflight validation,
control room config snapshot, ptd.yaml field stripping, Mimir password
deletion from the control room secret store, and eject record generation.

Closes #248
@timtalbot
Copy link
Copy Markdown
Contributor Author

@claude review

@claude
Copy link
Copy Markdown

claude Bot commented Apr 22, 2026

Claude finished @timtalbot's task —— View job


Code Review: orchestrate-eject

  • Read all changed files
  • Analyze preflight checks
  • Analyze eject orchestration logic
  • Analyze config strip and Mimir cleanup
  • Review tests
  • Post review feedback

Overall this is a well-structured PR. The step ordering (snapshot → strip → mimir delete → record) is sound, the fault-tolerance on Mimir deletion is the right call, and the test coverage is thorough. A few observations:

Potential Issues

  1. bundle is unused (eject.go:70): bundle is allocated and its ControlRoom field is set, but it's never serialized or returned. This is dead code carried from the previous PR. Either serialize it into the output directory or remove it to avoid confusion.

  2. eject-record.json file permissions vs metadata.json (eject.go:209): writeEjectRecord uses 0600 while other artifact files (e.g., metadata.json) use 0644. The eject record does contain the snapshot of control room fields (account IDs, domains) — so restricted permissions make sense — but the inconsistency is worth being intentional about. If the whole bundle is meant to be shared with the customer, 0644 might be more appropriate.

  3. Preflight RunPreflightChecks never returns an error (preflight.go:52-77): The function signature returns (*PreflightResult, error) but the error path is never hit — all failures are recorded as CheckFail results. This is fine as an extensibility hook, but the caller in eject.go:79-88 checks both err and !pfResult.Passed, which means the err branch is currently dead code. Not a bug, just something to be aware of.

  4. computePassed() on empty checks returns true (preflight.go:36-43): If no checks run (e.g., all check functions are somehow skipped), preflight passes vacuously. The test TestPreflightResult_ComputePassed_Empty explicitly validates this, so it's intentional, but worth a comment on whether that's the desired behavior for safety.

Minor Nits

  1. Error return from os.WriteFile in tests is ignored (eject_test.go:177,246,268,308,383): Several os.WriteFile calls in tests don't check the error return. While unlikely to fail with t.TempDir(), it's cleaner to use require.NoError.

  2. controlRoomName could be derived inside Run (eject.go:92-95): The pattern of if opts.ControlRoomTarget != nil { controlRoomName = opts.ControlRoomTarget.Name() } could be a method on Options for consistency with configLoader() and ptdYamlPath().

What Looks Good

  • The snapshot-before-strip ordering guarantees the audit record captures the original values even if something goes wrong mid-eject.
  • Mimir deletion failure being non-fatal (log-and-continue) is exactly right — the config is already stripped at that point, so failing hard would leave the workload in a confusing half-state with no record.
  • The terminology cleanup (severeject/removal) is consistent throughout all 16 files.
  • The connection inventory trim (removing Loki and Team Operator) simplifies the model and the mimir-auth secret name fix (using control room name instead of workload name) is a correctness fix.
  • Test coverage is comprehensive: happy path, preflight failure, graceful degradation, partial success, and dry-run isolation.

Address review feedback on #265: remove unused Bundle type and
variable from eject.Run, and wrap os.WriteFile calls in tests
with require.NoError for fail-fast on setup errors.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Orchestrate the severance

1 participant