Skip to content

chore(agent): add Operations & Debug Runbook to ephemerd-engineer.md#76

Merged
luthermonson merged 1 commit into
mainfrom
chore/agent-debug-runbook
May 25, 2026
Merged

chore(agent): add Operations & Debug Runbook to ephemerd-engineer.md#76
luthermonson merged 1 commit into
mainfrom
chore/agent-debug-runbook

Conversation

@luthermonson
Copy link
Copy Markdown
Contributor

Summary

The Ephemerd Engineer agent file was architecture-only. Adds an Operations & Debug Runbook section covering what an agent that just walked into the project needs to know on day 1, captured from one full debugging session that re-derived all of this from scratch.

Sections added:

  • Day-1 inventory — `ephemerd status`/`jobs`/`logs`/`doctor`.
  • Service control — `ephemerd start|stop|restart|drain` wrappers; `sc.exe` / `systemctl` / `launchctl` documented only as escape hatches.
  • Filesystem layout (operational) — install paths, data dir, log paths, control socket, per-job runner/docker socket paths.
  • Build & deploy on this host (Windows) — the `mage build:windows` → stop → copy → start loop.
  • Auth: App vs PAT precedence — code path in `pkg/github/client.go`, the rule that App auth wins whenever `app_id` is set (so rotating `GITHUB_TOKEN` is a no-op for ephemerd when an App is configured), plus 401 triage.
  • Local CI compromise on Windows — the AGENTS.md-documented `miekg/pkcs11` cgo failure and the `GOOS=linux` lint + compile-only test workaround.
  • Job lifecycle in the log — what a clean Linux-dispatched vs Windows-native job looks like in `ephemerd.log`, plus how to grep by job id.
  • Worktree + commit conventions — short summary of the user's hard rules (per-feature worktrees, backdate commits, no `_ =`, GITHUB_TOKEN for git/gh only).
  • CI matrix gotchas — the arm64/macOS gating from PR chore(ci): gate arm64/macOS matrix jobs on repo vars + continue-on-error #75 so the next agent doesn't chase Pending checks.

Stale `Current Branch: feat/windows-support` trailer at the bottom of the file is left as-is — separate cleanup.

Test plan

  • File renders as valid Markdown.
  • No code changes — runbook content only.
  • Spot-check on the next agent invocation that the new section actually lands in their context.

The agent definition was architecture-only; first-time agents kept
re-deriving the same operational checks. This adds a runbook section
covering what's worth knowing on day 1: how to inspect a live install
(`ephemerd status|jobs|logs|doctor`), service control (the wrapper
commands beat poking sc.exe/systemctl/launchctl directly), the
relevant filesystem paths, the build+deploy loop on Windows, App-vs-PAT
auth precedence and triage, the documented `mage ci` cgo workaround
on Windows hosts (GOOS=linux lint + compile-only test), what a clean
job lifecycle looks like in the log for grep-by-job-id, and the
worktree/backdate/no-_= conventions from operator memory.

Also covers the CI matrix gating (PR #75) so the next agent doesn't
chase "Pending" arm64/macOS checks expecting them to resolve.

The stale "Current Branch: feat/windows-support" trailer is left in
place — separate cleanup.
@luthermonson luthermonson merged commit cef5733 into main May 25, 2026
3 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant