Skip to content

feat(auth): add first-run setup wizard#103

Open
ndycode wants to merge 67 commits intodevfrom
fresh/19-first-run-wizard
Open

feat(auth): add first-run setup wizard#103
ndycode wants to merge 67 commits intodevfrom
fresh/19-first-run-wizard

Conversation

@ndycode
Copy link
Owner

@ndycode ndycode commented Mar 16, 2026

Summary

  • Stacked PR on fresh/18-import-adapter-opencode
  • Reopens the preserved first-run wizard slice from archived PR #83
  • Current head: 2e22874

What Changed

  • Adds the first-run setup wizard before OAuth for brand-new installs, with restore, OpenCode import, settings, and doctor entry points.
  • Redacts first-run OpenCode import prompts and action-panel output to avoid leaking full filesystem paths in interactive flows.
  • Replaces the brittle import-stage string probe with explicit stage tracking so assessment failures stay labeled as assessment failures.
  • Adds direct Vitest coverage for the sync-history trim-skip threshold and the 0 vs null reset semantics used by the new append optimization.

Validation

  • npm run lint
  • npm run typecheck
  • npm run build
  • npm test
  • npm test -- test/codex-manager-cli.test.ts -t "imports OpenCode accounts from the first-run wizard|returns to the dashboard when OpenCode import fails|returns to the dashboard when OpenCode import assessment fails|redacts first-run backup listing warnings|falls back to a safe health summary when restore or rollback state reads fail"
  • npm test -- test/sync-history.test.ts -t "skips trim reads while append count stays within the default cap|reloads and trims once when append count crosses the default cap|re-reads seeded history after configureSyncHistoryForTests resets the estimate to null|preserves the latest entry per kind when appends exceed the default cap"

Docs and Governance Checklist

  • README updated (if user-visible behavior changed)
  • docs/getting-started.md updated (if onboarding flow changed)
  • docs/features.md updated (if capability surface changed)
  • relevant docs/reference/* pages updated (if commands/settings/paths changed)
  • docs/upgrade.md updated (if migration behavior changed)
  • SECURITY.md and CONTRIBUTING.md reviewed for alignment

Risk and Rollback

  • Risk level: medium
  • Rollback plan: revert ac2be34, 16ba251, 64a8ead, f5b74dc, and 2e22874

Additional Notes

  • This PR is intended to stay reviewable on top of fresh/18-import-adapter-opencode, not main.
  • The latest Greptile rerun completed successfully on 2e22874.
  • Remaining latest-head discussion is stale or non-blocking noise, not a concrete merge blocker.

note: greptile review for oc-chatgpt-multi-auth. cite files like lib/foo.ts:123. confirm regression tests + windows concurrency/token redaction coverage.

Greptile Summary

this pr wires up the first-run setup wizard (runFirstRunWizard / showFirstRunWizard) that intercepts codex auth login when no storage file exists, offering restore, opencode import, settings, doctor, and skip entry points before oauth. it also replaces the brittle import-stage string probe with explicit stage tracking, applies formatRedactedFilesystemError / getRedactedFilesystemErrorLabel throughout the new flow to prevent windows path leakage in logs, and adds a historyEntryCountEstimate optimization to appendSyncHistoryEntry that skips redundant full-file reads until MAX_HISTORY_ENTRIES is crossed.

  • all concerns from previous rounds (unguarded loadAccounts, firstRunWizardShownInLoop guard, path redaction in backup-listing catches, assessment vs import error label split, wouldExceedLimit branch in wizard) are addressed in this diff.
  • withHistoryLock uses a finally-based lock release, so an EPERM/EBUSY throw during the new null-estimate loadHistoryEntriesFromDisk probe does not break the mutex chain — subsequent appends will retry the probe.
  • the "action-time opencode assessment fails" test uses only 2 mockResolvedValueOnce entries for a flow that makes 3 calls; the third call silently returns vi.fn() default undefined, which passes the !== null guard and makes hasOpencodeSource: true by accident rather than by design.
  • the "resets cached append estimate" sync-history test only asserts read count; an explicit assertion on post-clear history length would make the reset invariant self-documenting.

Confidence Score: 4/5

  • pr is safe to merge with two minor test-robustness gaps and no unaddressed runtime risks.
  • all blocking issues from previous review rounds are resolved; the two remaining notes are style/robustness in tests, not production logic. the mutex design is sound, windows path redaction is applied consistently in new log paths, and the firstRunWizardShownInLoop guard prevents the pre-flagged unbounded re-show. one test relies on a silent undefined !== null pass for a third mock call, which is fragile but does not affect production behaviour.
  • test/codex-manager-cli.test.ts — third assessOpencodeAccountPool call in "action-time assessment fails" test is unspecified.

Important Files Changed

Filename Overview
lib/codex-manager.ts adds first-run wizard wiring (shouldShowFirstRunWizard, buildFirstRunWizardOptions, runFirstRunWizard) and a firstRunWizardShownInLoop guard; redacts runActionPanel error capture and import error formatting; logic looks sound, previous threads on path-leakage, unguarded loadAccounts, and sentinel-split are all addressed in this diff.
lib/sync-history.ts adds historyEntryCountEstimate to skip redundant disk reads; withHistoryLock uses a finally-based release so a loadHistoryEntriesFromDisk throw on null-estimate does not break the mutex chain; shouldTrim boolean correctly separates intent from result, resolving the previous sentinel-conflation thread.
lib/ui/auth-menu.ts adds FirstRunWizardOptions, FirstRunWizardAction types and showFirstRunWizard; null-safe result ?? { type: "cancel" } fallback is correct; no issues found.
test/codex-manager-cli.test.ts large batch of first-run wizard integration tests; good path coverage for cancel, login, settings, doctor, import success/failure, limit, and storage-reload-failure cases; one test has an underspecified assessOpencodeAccountPool mock (third call returns vi.fn() default undefined) which silently passes via undefined !== null = true.
test/sync-history.test.ts four new tests cover the trim-skip threshold, one-read trim trigger, external-clear reset, and configureSyncHistoryForTests null-reset; all assertions appear correct given __resetSyncHistoryForTests sets estimate to 0 before each test.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[runAuthLogin] --> B{shouldShowFirstRunWizard\nstorage === null?}
    B -- no --> L[loginFlow loop]
    B -- yes --> C[runFirstRunWizard]
    C --> D[buildFirstRunWizardOptions\nprobe backups + opencode source]
    D --> E[showFirstRunWizard]
    E --> F{action.type}
    F -- cancel --> G[return cancelled\nexit 0]
    F -- login / skip --> H[return continue\nlatestStorage: null]
    F -- restore --> I[runBackupBrowserManager]
    F -- import-opencode --> J{assessOpencodeAccountPool}
    J -- throws --> K[warn redacted error\nbreak]
    J -- null --> K2[log no pool\nbreak]
    J -- invalid/limit --> K3[log failure msg\nbreak]
    J -- eligible --> M[confirm → runActionPanel → importAccounts]
    M -- throws --> K4[warn redacted error\nbreak]
    M -- ok --> N[break]
    F -- settings --> O[configureUnifiedSettings]
    F -- doctor --> P[runActionPanel → runDoctor\ntry/catch]
    I --> Q[loadAccounts try/catch]
    K --> Q
    K2 --> Q
    K3 --> Q
    K4 --> Q
    N --> Q
    O --> Q
    P --> Q
    Q --> R{accounts > 0?}
    R -- yes --> S[return continue\nlatestStorage]
    R -- no --> D
    H --> T[loadAccounts post-wizard\ntry/catch]
    S --> T
    T --> U{accounts > 0?}
    U -- yes --> V[return 0]
    U -- no --> L
    L --> W[OAuth flow]
Loading

Fix All in Codex

Prompt To Fix All With AI
This is a comment left during a code review.
Path: test/codex-manager-cli.test.ts
Line: 1293-1323

Comment:
**third `assessOpencodeAccountPool` call is unspecified — silent `undefined !== null` pass**

`buildFirstRunWizardOptions` calls `assessOpencodeAccountPool()` on every wizard iteration, so this test makes **three** calls total:

1. first `buildFirstRunWizardOptions``mockResolvedValueOnce` (valid assessment)
2. `runFirstRunWizard` import arm → `mockRejectedValueOnce` (EBUSY)
3. second `buildFirstRunWizardOptions` → no mock remaining → `vi.fn()` returns `undefined`

at call 3, `undefined !== null` evaluates to `true`, so `hasOpencodeSource` is silently set to `true` and the import option appears in the second wizard even though no pool was actually found. the test passes today only because the user picks "cancel," but the intent is opaque and the behaviour changes if mock defaults are adjusted.

add an explicit third value to make the expectation clear:

```suggestion
	assessOpencodeAccountPoolMock
		.mockResolvedValueOnce({
			backup: {
				name: "openai-codex-accounts.json",
				path: "/mock/.opencode/openai-codex-accounts.json",
				createdAt: null,
				updatedAt: Date.now(),
				sizeBytes: 256,
				version: 3,
				accountCount: 1,
				schemaErrors: [],
				valid: true,
				loadError: "",
			},
			currentAccountCount: 0,
			mergedAccountCount: 1,
			imported: 1,
			skipped: 0,
			wouldExceedLimit: false,
			eligibleForRestore: true,
			nextActiveIndex: 0,
			nextActiveEmail: "imported@example.com",
			nextActiveAccountId: undefined,
			activeAccountChanged: true,
			error: "",
		})
		.mockRejectedValueOnce(
			new Error(
				"EBUSY: C:\\Users\\alice\\AppData\\Local\\opencode\\openai-codex-accounts.json",
			),
		)
		.mockResolvedValueOnce(null); // no source found on re-probe after failure
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: test/sync-history.test.ts
Line: 308-349

Comment:
**test only asserts read count — doesn't verify estimate reset prevents a second trim**

the test title says "resets the cached append estimate when trim reload finds an externally cleared file" but the sole assertion is `historyReads.toHaveLength(1)`. the count implicitly proves that entry 202 did not trigger an additional read (which would make it 2), but this is easy to miss when the test is read in isolation.

consider adding an explicit assertion after entry 202 that the history file contains only the two entries appended after the clear:

```typescript
// after the two appends following the external clear
const liveHistory = await readSyncHistory();
expect(liveHistory.length).toBe(2); // only entries 201 & 202 survived the clear
```

this makes the post-reset correctness property visible without relying on the read-count implication.

How can I resolve this? If you propose a fix, please make it concise.

Last reviewed commit: "fix(wizard): guard d..."

Copilot AI review requested due to automatic review settings March 16, 2026 08:47
@chatgpt-codex-connector
Copy link

Codex usage limits have been reached for code reviews. Please check with the admins of this repo to increase the limits by adding credits.
Credits must be used to enable repository wide code reviews.

@ndycode ndycode force-pushed the fresh/18-import-adapter-opencode branch from 5bd459c to 3b1ba04 Compare March 18, 2026 01:19
ndycode added 2 commits March 18, 2026 10:10
…' into work/pr103-remediate-20260318

# Conflicts:
#	docs/getting-started.md
#	lib/codex-manager.ts
#	lib/ui/copy.ts
#	test/codex-manager-cli.test.ts
#	test/storage.test.ts
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
lib/codex-manager.ts (2)

4645-4668: ⚠️ Potential issue | 🟠 Major

confirm() can still abort this import flow.

lib/codex-manager.ts:4645-4650 sits outside both import error handlers. if confirm() throws on a tty/readline failure, this path skips the new redacted handling and exits codex auth login instead of returning to the menu like the assessment and import stages.

suggested change
-				const confirmed = await confirm(
-					`Import OpenCode accounts from ${backupLabel}?`,
-				);
-				if (!confirmed) {
-					continue;
-				}
-				try {
+				let importStage: "confirm" | "import" = "confirm";
+				try {
+					const confirmed = await confirm(
+						`Import OpenCode accounts from ${backupLabel}?`,
+					);
+					if (!confirmed) {
+						continue;
+					}
+					importStage = "import";
 					await runActionPanel(
 						"Import OpenCode Accounts",
 						`Importing from ${backupLabel}`,
@@
-				} catch (error) {
+				} catch (error) {
 					const errorLabel = collapseWhitespace(
 						formatRedactedFilesystemError(error),
 					);
-					console.error(`Import failed: ${errorLabel}`);
+					console.error(
+						`${importStage === "confirm" ? "Import confirmation failed" : "Import failed"}: ${errorLabel}`,
+					);
 				}

please add the matching regression in test/codex-manager-cli.test.ts:1 by forcing confirm() to throw and asserting the menu resumes. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager.ts` around lines 4645 - 4668, The confirm() call before
importing accounts can throw (e.g., tty/readline failure) and currently sits
outside the import error handling so a thrown error will escape and exit the
CLI; wrap the confirm() invocation in the same error-handling flow (or add a
surrounding try/catch) so thrown errors are processed via
formatRedactedFilesystemError()/collapseWhitespace and the command returns to
the menu instead of exiting; update the block around confirm(),
runActionPanel(), and importAccounts() (symbols: confirm, runActionPanel,
importAccounts, formatRedactedFilesystemError, collapseWhitespace) to ensure
confirm errors are caught and handled the same way as import errors, and add a
vitest regression in test/codex-manager-cli.test.ts that forces confirm() to
throw and asserts the menu resumes.

4371-4409: ⚠️ Potential issue | 🟠 Major

add the missing vitest for wizard-side opencode probe throws.

lib/storage.ts:2470-2495 shows assessOpencodeAccountPool() can throw, and lib/codex-manager.ts:4406-4409 is now the guard that keeps first-run codex auth login alive when that happens. the pr summary still calls this gap out, and i do not see the matching regression in test/codex-manager-cli.test.ts:1. please add a case that forces the assess step to throw, asserts the warning stays redacted, and verifies the wizard keeps looping instead of exiting. As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager.ts` around lines 4371 - 4409, Add a vitest that simulates
assessOpencodeAccountPool throwing and verifies codex-manager handles it without
exiting: stub/mock assessOpencodeAccountPool to throw a filesystem error, stub
getRedactedFilesystemErrorLabel to return a predictable redacted string, invoke
the CLI path exercised by "import-opencode" (same flow exercised by
test/codex-manager-cli.test.ts), assert console.warn was called with the
redacted label returned by getRedactedFilesystemErrorLabel, and assert the
wizard/CLI loop did not exit (e.g. the command returned to prompt or continued
to the next loop iteration); reference functions assessOpencodeAccountPool,
getRedactedFilesystemErrorLabel, runActionPanel, and importAccounts when
locating code to stub and update the test to cover this regression.
test/codex-manager-cli.test.ts (1)

1201-1209: ⚠️ Potential issue | 🟠 Major

these redaction assertions still only check warn arg 0.

test/codex-manager-cli.test.ts:1201-1209 and test/codex-manager-cli.test.ts:1239-1240 only inspect call[0]. a raw windows path can still leak via call[1+] and the tests stay green.

proposed fix
-		expect(
-			warnSpy.mock.calls.some(([message]) =>
-				String(message).includes(
-					"Failed to refresh saved accounts after first-run action",
-				),
-			),
-		).toBe(true);
-		expect(
-			warnSpy.mock.calls.every((call) => !String(call[0]).includes("alice")),
-		).toBe(true);
+		const warningLines = warnSpy.mock.calls.map(flattenMockCallArgs);
+		expect(
+			warningLines.some((line) =>
+				line.includes("Failed to refresh saved accounts after first-run action"),
+			),
+		).toBe(true);
+		expect(warningLines.every((line) => !line.includes("alice"))).toBe(true);
@@
-		expect(
-			warnSpy.mock.calls.every((call) => !String(call[0]).includes("alice")),
-		).toBe(true);
+		const warningLines = warnSpy.mock.calls.map(flattenMockCallArgs);
+		expect(warningLines.every((line) => !line.includes("alice"))).toBe(true);

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 1235-1240

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 1201 - 1209, The failing tests
only inspect warnSpy.mock.calls[i][0], so sensitive data in later args can still
leak; update the two assertions that currently check only call[0] (the
expect(...includes("Failed to refresh saved accounts...")) and the
expect(...every(call => !String(call[0]).includes("alice")))) to scan all
arguments of each call (e.g., join or iterate over call.slice(0) / call) and
assert that none of the arguments contain the sensitive substrings ("Failed to
refresh saved accounts after first-run action", "alice") — apply the same change
to the duplicate assertion block around the second instance (the other expect at
1235-1240) and ensure warnSpy.mock.calls is fully inspected for every call
rather than only index 0.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/sync-history.ts`:
- Around line 290-308: The cached historyEntryCountEstimate is unsafe under
multi-writer conditions; after the initial load (loadHistoryEntriesFromDisk) and
after appending (fs.appendFile to paths.historyPath), revalidate the on-disk
state instead of trusting the in-memory estimate: check the file mtime/size or
re-read the file (parse lines via serializeEntry/deserialize) and compute the
current entry count, then call trimHistoryFileIfNeeded(paths) based on that
authoritative count; update historyEntryCountEstimate from the re-read value.
Also add a vitest that seeds the file, initializes historyEntryCountEstimate,
then simulates an out-of-band append and verifies a subsequent append triggers
revalidation and trimming. Ensure any new retry logic for trimming/appending
handles transient EBUSY/429 errors with exponential backoff and cite the
affected tests in the commit message.

In `@test/codex-manager-cli.test.ts`:
- Around line 1215-1220: The test currently stubs showFirstRunWizard which
prevents exercising the real wizard re-render path when
assessOpencodeAccountPool throws; add an additional deterministic Vitest case
that does NOT spy on auth-menu.showFirstRunWizard and instead drives the real UI
flow by using selectMock to simulate user choices for "import-opencode" followed
by "cancel", while keeping loadAccountsMock as before and making
assessOpencodeAccountPool throw to trigger the re-render; ensure the new test
unregisters any spies, uses the real showFirstRunWizard function, asserts the
re-render occurred (e.g., that showFirstRunWizard was invoked twice or that
expected UI prompts were emitted), and keeps the test deterministic with Vitest
utilities.
- Around line 68-74: flattenMockCallArgs currently uses String(arg) which turns
objects into "[object Object]" and can miss leaked file paths; update
flattenMockCallArgs to robustly serialize arguments by handling Error instances
as `${err.name}: ${err.message}`, serializing objects with a JSON.stringify
replacer that redacts Windows-style absolute paths (e.g. matching
/[A-Za-z]:\\[^"\s]*/ → "[REDACTED_PATH]") and guards against circular references
using a WeakSet, then join the serialized pieces with spaces so object fields
containing paths are detected and redacted during tests.

---

Duplicate comments:
In `@lib/codex-manager.ts`:
- Around line 4645-4668: The confirm() call before importing accounts can throw
(e.g., tty/readline failure) and currently sits outside the import error
handling so a thrown error will escape and exit the CLI; wrap the confirm()
invocation in the same error-handling flow (or add a surrounding try/catch) so
thrown errors are processed via
formatRedactedFilesystemError()/collapseWhitespace and the command returns to
the menu instead of exiting; update the block around confirm(),
runActionPanel(), and importAccounts() (symbols: confirm, runActionPanel,
importAccounts, formatRedactedFilesystemError, collapseWhitespace) to ensure
confirm errors are caught and handled the same way as import errors, and add a
vitest regression in test/codex-manager-cli.test.ts that forces confirm() to
throw and asserts the menu resumes.
- Around line 4371-4409: Add a vitest that simulates assessOpencodeAccountPool
throwing and verifies codex-manager handles it without exiting: stub/mock
assessOpencodeAccountPool to throw a filesystem error, stub
getRedactedFilesystemErrorLabel to return a predictable redacted string, invoke
the CLI path exercised by "import-opencode" (same flow exercised by
test/codex-manager-cli.test.ts), assert console.warn was called with the
redacted label returned by getRedactedFilesystemErrorLabel, and assert the
wizard/CLI loop did not exit (e.g. the command returned to prompt or continued
to the next loop iteration); reference functions assessOpencodeAccountPool,
getRedactedFilesystemErrorLabel, runActionPanel, and importAccounts when
locating code to stub and update the test to cover this regression.

In `@test/codex-manager-cli.test.ts`:
- Around line 1201-1209: The failing tests only inspect
warnSpy.mock.calls[i][0], so sensitive data in later args can still leak; update
the two assertions that currently check only call[0] (the
expect(...includes("Failed to refresh saved accounts...")) and the
expect(...every(call => !String(call[0]).includes("alice")))) to scan all
arguments of each call (e.g., join or iterate over call.slice(0) / call) and
assert that none of the arguments contain the sensitive substrings ("Failed to
refresh saved accounts after first-run action", "alice") — apply the same change
to the duplicate assertion block around the second instance (the other expect at
1235-1240) and ensure warnSpy.mock.calls is fully inspected for every call
rather than only index 0.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a04cfa8b-f5bc-4b9a-b891-12f06d6d6b28

📥 Commits

Reviewing files that changed from the base of the PR and between e332b9c and ebb527f.

📒 Files selected for processing (6)
  • docs/getting-started.md
  • lib/codex-manager.ts
  • lib/sync-history.ts
  • test/codex-manager-cli.test.ts
  • test/storage.test.ts
  • test/sync-history.test.ts
💤 Files with no reviewable changes (1)
  • test/storage.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (3)
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/sync-history.ts
  • lib/codex-manager.ts
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/sync-history.test.ts
  • test/codex-manager-cli.test.ts
docs/**

⚙️ CodeRabbit configuration file

keep README, SECURITY, and docs consistent with actual CLI flags and workflows. whenever behavior changes, require updated upgrade notes and mention new npm scripts.

Files:

  • docs/getting-started.md

Comment on lines +68 to +74
function flattenMockCallArgs(call: unknown[]): string {
return call
.map((arg) =>
arg instanceof Error ? `${arg.name}: ${arg.message}` : String(arg),
)
.join(" ");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

redaction helper can hide leaked paths in object args.

in test/codex-manager-cli.test.ts:68-73, String(arg) turns objects into "[object Object]". if a logger call includes { error: "C:\\Users\\alice\\..." }, this helper will miss the leak.

proposed fix
+function serializeMockArg(arg: unknown): string {
+	if (arg instanceof Error) {
+		return `${arg.name}: ${arg.message}`;
+	}
+	if (typeof arg === "string") {
+		return arg;
+	}
+	try {
+		return JSON.stringify(arg);
+	} catch {
+		return String(arg);
+	}
+}
+
 function flattenMockCallArgs(call: unknown[]): string {
 	return call
-		.map((arg) =>
-			arg instanceof Error ? `${arg.name}: ${arg.message}` : String(arg),
-		)
+		.map((arg) => serializeMockArg(arg))
 		.join(" ");
 }

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
function flattenMockCallArgs(call: unknown[]): string {
return call
.map((arg) =>
arg instanceof Error ? `${arg.name}: ${arg.message}` : String(arg),
)
.join(" ");
}
function serializeMockArg(arg: unknown): string {
if (arg instanceof Error) {
return `${arg.name}: ${arg.message}`;
}
if (typeof arg === "string") {
return arg;
}
try {
return JSON.stringify(arg);
} catch {
return String(arg);
}
}
function flattenMockCallArgs(call: unknown[]): string {
return call
.map((arg) => serializeMockArg(arg))
.join(" ");
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 68 - 74, flattenMockCallArgs
currently uses String(arg) which turns objects into "[object Object]" and can
miss leaked file paths; update flattenMockCallArgs to robustly serialize
arguments by handling Error instances as `${err.name}: ${err.message}`,
serializing objects with a JSON.stringify replacer that redacts Windows-style
absolute paths (e.g. matching /[A-Za-z]:\\[^"\s]*/ → "[REDACTED_PATH]") and
guards against circular references using a WeakSet, then join the serialized
pieces with spaces so object fields containing paths are detected and redacted
during tests.

Comment on lines +1215 to +1220
const authMenu = await import("../lib/ui/auth-menu.js");
const wizardSpy = vi
.spyOn(authMenu, "showFirstRunWizard")
.mockResolvedValueOnce({ type: "import-opencode" })
.mockResolvedValueOnce({ type: "cancel" });
loadAccountsMock.mockResolvedValue(null);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

coverage gap: this test mocks away the real wizard-loop failure path.

in test/codex-manager-cli.test.ts:1215-1220, stubbing showFirstRunWizard means we do not exercise the actual first-run menu re-render when assessOpencodeAccountPool throws. add one deterministic case that uses the real wizard UI path (no spy on showFirstRunWizard) and drives import-opencode -> cancel via selectMock.

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 1215 - 1220, The test currently
stubs showFirstRunWizard which prevents exercising the real wizard re-render
path when assessOpencodeAccountPool throws; add an additional deterministic
Vitest case that does NOT spy on auth-menu.showFirstRunWizard and instead drives
the real UI flow by using selectMock to simulate user choices for
"import-opencode" followed by "cancel", while keeping loadAccountsMock as before
and making assessOpencodeAccountPool throw to trigger the re-render; ensure the
new test unregisters any spies, uses the real showFirstRunWizard function,
asserts the re-render occurred (e.g., that showFirstRunWizard was invoked twice
or that expected UI prompts were emitted), and keeps the test deterministic with
Vitest utilities.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (6)
lib/sync-history.ts (1)

290-308: ⚠️ Potential issue | 🟠 Major

cached entry count is still unsafe with out-of-band writers.

lib/sync-history.ts:290-308 trusts historyEntryCountEstimate after first init and does not revalidate disk state before deciding shouldTrim. if another process appends/prunes sync-history.ndjson, this process can skip trimming for too long and let history grow past policy. please revalidate on-disk state (mtime/size or bounded reread) before trim gating, and add a deterministic vitest that simulates out-of-band append after estimate warm-up.

As per coding guidelines, lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/sync-history.ts` around lines 290 - 308, The code uses
historyEntryCountEstimate (warmed via loadHistoryEntriesFromDisk) and then
decides shouldTrim without revalidating on-disk state; change the logic around
the append+trim decision (the block that writes serializeEntry(entry) to
paths.historyPath and computes shouldTrim) to revalidate the file before gating
trim by either checking the file mtime/size or performing a bounded reread of
the file tail to update historyEntryCountEstimate (or compute a safe upper
bound) and only then call trimHistoryFileIfNeeded(paths); also ensure
trimHistoryFileIfNeeded still works correctly when called concurrently and keep
loadHistoryEntriesFromDisk and serializeEntry usage intact. Add a vitest that
warms the estimate, performs an out-of-band append to paths.historyPath
(simulating another process), then runs the code path that triggers trimming to
confirm trimming executes deterministically; include the new test alongside
existing sync-history tests and handle EBUSY-like file contention in the test
setup without leaking tokens/emails in logs.
lib/codex-manager.ts (2)

4850-4878: 🧹 Nitpick | 🔵 Trivial

remove the unreachable first-run replay branch.

lib/codex-manager.ts:4477 (Line 4477) sets firstRunWizardShownInLoop = true on the only startedFromMissingStorage path, so lib/codex-manager.ts:4850 (Line 4850) guard cannot pass. this is dead control flow and makes startup logic harder to reason about.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@lib/codex-manager.ts` around lines 4850 - 4878, The if-block guarded by
startedFromMissingStorage && !firstRunWizardShownInLoop && existingCount === 0
&& isInteractiveLoginMenuAvailable() (the block that calls
loadDashboardDisplaySettings, runFirstRunWizard, and attempts
refreshedAfterWizard) is unreachable because firstRunWizardShownInLoop is
already set true on the only startedFromMissingStorage path; remove this entire
unreachable branch and its inner logic, and also remove or simplify any
now-unused references to firstRunWizardShownInLoop (or related cleanup code) so
startup flow no longer contains dead control flow around runFirstRunWizard and
refreshedAfterWizard handling.

4316-4341: ⚠️ Potential issue | 🟠 Major

avoid double opencode probes and add transient fs retry in wizard path.

lib/codex-manager.ts:4336 (Line 4336) probes with assessOpencodeAccountPool() during menu option build, then lib/codex-manager.ts:4374 (Line 4374) probes again when import-opencode is selected. this doubles windows fs contention and still has no bounded retry/backoff for transient EBUSY/sharing-lock failures. it can also hide the import action when the pre-probe throws and hasOpencodeSource stays false.

proposed direction
+async function withTransientFsRetry<T>(op: () => Promise<T>, attempts = 3): Promise<T> {
+  let delayMs = 75;
+  for (let i = 0; i < attempts; i += 1) {
+    try {
+      return await op();
+    } catch (error) {
+      const code = (error as { code?: string })?.code;
+      const transient = code === "EBUSY" || code === "EPERM" || code === "EMFILE";
+      if (!transient || i === attempts - 1) throw error;
+      await new Promise((resolve) => setTimeout(resolve, delayMs));
+      delayMs *= 2;
+    }
+  }
+  throw new Error("unreachable");
+}
- hasOpencodeSource = (await assessOpencodeAccountPool()) !== null;
+ hasOpencodeSource = (await withTransientFsRetry(() => assessOpencodeAccountPool())) !== null;
- assessment = await assessOpencodeAccountPool();
+ assessment = await withTransientFsRetry(() => assessOpencodeAccountPool());

As per coding guidelines, "lib/**: focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails."

Also applies to: 4371-4418

test/codex-manager-cli.test.ts (3)

1241-1249: ⚠️ Potential issue | 🟠 Major

assert redaction against the full logger payload, not only call[0].

at test/codex-manager-cli.test.ts:1241, test/codex-manager-cli.test.ts:1279, test/codex-manager-cli.test.ts:1338, and test/codex-manager-cli.test.ts:4246, checks still miss secondary args and object args. a console.warn("...", err) call can leak full windows paths while these tests pass.

proposed fix
- expect(
-   warnSpy.mock.calls.every((call) => !String(call[0]).includes("alice")),
- ).toBe(true);
+ expect(
+   warnSpy.mock.calls.every(
+     (call) => !flattenMockCallArgs(call).includes("alice"),
+   ),
+ ).toBe(true);

- const warningOutput = warnSpy.mock.calls.flat().join("\n");
+ const warningOutput = warnSpy.mock.calls
+   .map((call) => flattenMockCallArgs(call))
+   .join("\n");

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Also applies to: 1279-1280, 1338-1343, 4246-4255

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 1241 - 1249, The assertions
currently inspect only the first argument of warnSpy.mock.calls (call[0]) which
misses secondary args or object args that may leak sensitive data; update the
checks (those using warnSpy.mock.calls.some(...) and .every(...)) to inspect the
entire call payload (e.g., stringify or join all call elements such as using
JSON.stringify(call) or call.map(String).join(" ")) and then assert the
presence/absence of the target text (e.g., "Failed to refresh saved accounts
after first-run action" and that "alice" is not present) against that full
payload; apply the same change to the other occurrences referenced in this test
file so each assertion covers all arguments in warnSpy.mock.calls.

68-74: ⚠️ Potential issue | 🟠 Major

serialize mock args deeply for redaction checks.

String(arg) hides structured payloads as "[object Object]", so path leaks in object fields can bypass checks in test/codex-manager-cli.test.ts:68. this weakens windows redaction coverage.

proposed fix
+function serializeMockArg(arg: unknown): string {
+	if (arg instanceof Error) return `${arg.name}: ${arg.message}`;
+	if (typeof arg === "string") return arg;
+	const seen = new WeakSet<object>();
+	const redactWindowsPath = (value: string) =>
+		value.replace(/[A-Za-z]:\\[^"'\s)]+/g, "[REDACTED_PATH]");
+	try {
+		return JSON.stringify(arg, (_key, value) => {
+			if (typeof value === "string") return redactWindowsPath(value);
+			if (value && typeof value === "object") {
+				if (seen.has(value as object)) return "[Circular]";
+				seen.add(value as object);
+			}
+			return value;
+		});
+	} catch {
+		return String(arg);
+	}
+}
+
 function flattenMockCallArgs(call: unknown[]): string {
 	return call
-		.map((arg) =>
-			arg instanceof Error ? `${arg.name}: ${arg.message}` : String(arg),
-		)
+		.map((arg) => serializeMockArg(arg))
 		.join(" ");
 }

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 68 - 74, The helper
flattenMockCallArgs currently uses String(arg) which collapses objects to
"[object Object]" and hides fields; update flattenMockCallArgs to serialize
arguments deeply so object fields are visible for redaction checks: in the
flattenMockCallArgs function, detect Error instances (keep `${err.name}:
${err.message}`) and for other values use a deep serializer (e.g.,
JSON.stringify with a circular-safe replacer or Node's util.inspect) to produce
full object payloads (falling back to String(arg) if serialization fails) so
tests in test/codex-manager-cli.test.ts can assert on nested paths instead of
being bypassed by "[object Object]".

1253-1260: ⚠️ Potential issue | 🟡 Minor

do not mock away the wizard re-render path in this probe-failure regression.

test/codex-manager-cli.test.ts:1253 stubs showFirstRunWizard, so it does not exercise the real first-run render/re-render flow when OpenCode probing fails. that leaves the windows EBUSY wizard-loop behavior under-tested.

as per coding guidelines, test/**: tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/codex-manager-cli.test.ts` around lines 1253 - 1260, The test should not
stub out the real first-run wizard flow: remove the mockResolvedValueOnce stubs
on authMenu.showFirstRunWizard so the actual showFirstRunWizard implementation
runs (or replace the stub with a call-through). Concretely, in the test remove
the .mockResolvedValueOnce(...) calls on the spy for showFirstRunWizard (or
replace them with a call-through wrapper that invokes the original function:
capture const orig = authMenu.showFirstRunWizard and use vi.spyOn(authMenu,
"showFirstRunWizard").mockImplementation((...a) => orig.apply(authMenu, a))).
This ensures the real render/re-render path and EBUSY windows behavior are
exercised while still letting you observe the call.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/sync-history.ts`:
- Around line 293-309: The production write paths can fail on transient Windows
file-lock errors; wrap the append/rewrite/trim write operations in bounded
retry+exponential backoff (same behavior as __resetSyncHistoryForTests): up to 5
attempts with delay 25 * 2^attempt ms, and only retry for transient error codes
(EBUSY/EPERM). Apply this retry logic around the fs.appendFile call in
appendSyncHistoryEntry, around the file-writes inside trimHistoryFileIfNeeded,
and around rewriteLatestEntry so each physical write is retried before failing.
Add a Vitest that stubs/mock fs to throw an EBUSY/EPERM on the first N-1
attempts and succeed on the final attempt to verify recovery.

---

Duplicate comments:
In `@lib/codex-manager.ts`:
- Around line 4850-4878: The if-block guarded by startedFromMissingStorage &&
!firstRunWizardShownInLoop && existingCount === 0 &&
isInteractiveLoginMenuAvailable() (the block that calls
loadDashboardDisplaySettings, runFirstRunWizard, and attempts
refreshedAfterWizard) is unreachable because firstRunWizardShownInLoop is
already set true on the only startedFromMissingStorage path; remove this entire
unreachable branch and its inner logic, and also remove or simplify any
now-unused references to firstRunWizardShownInLoop (or related cleanup code) so
startup flow no longer contains dead control flow around runFirstRunWizard and
refreshedAfterWizard handling.

In `@lib/sync-history.ts`:
- Around line 290-308: The code uses historyEntryCountEstimate (warmed via
loadHistoryEntriesFromDisk) and then decides shouldTrim without revalidating
on-disk state; change the logic around the append+trim decision (the block that
writes serializeEntry(entry) to paths.historyPath and computes shouldTrim) to
revalidate the file before gating trim by either checking the file mtime/size or
performing a bounded reread of the file tail to update historyEntryCountEstimate
(or compute a safe upper bound) and only then call
trimHistoryFileIfNeeded(paths); also ensure trimHistoryFileIfNeeded still works
correctly when called concurrently and keep loadHistoryEntriesFromDisk and
serializeEntry usage intact. Add a vitest that warms the estimate, performs an
out-of-band append to paths.historyPath (simulating another process), then runs
the code path that triggers trimming to confirm trimming executes
deterministically; include the new test alongside existing sync-history tests
and handle EBUSY-like file contention in the test setup without leaking
tokens/emails in logs.

In `@test/codex-manager-cli.test.ts`:
- Around line 1241-1249: The assertions currently inspect only the first
argument of warnSpy.mock.calls (call[0]) which misses secondary args or object
args that may leak sensitive data; update the checks (those using
warnSpy.mock.calls.some(...) and .every(...)) to inspect the entire call payload
(e.g., stringify or join all call elements such as using JSON.stringify(call) or
call.map(String).join(" ")) and then assert the presence/absence of the target
text (e.g., "Failed to refresh saved accounts after first-run action" and that
"alice" is not present) against that full payload; apply the same change to the
other occurrences referenced in this test file so each assertion covers all
arguments in warnSpy.mock.calls.
- Around line 68-74: The helper flattenMockCallArgs currently uses String(arg)
which collapses objects to "[object Object]" and hides fields; update
flattenMockCallArgs to serialize arguments deeply so object fields are visible
for redaction checks: in the flattenMockCallArgs function, detect Error
instances (keep `${err.name}: ${err.message}`) and for other values use a deep
serializer (e.g., JSON.stringify with a circular-safe replacer or Node's
util.inspect) to produce full object payloads (falling back to String(arg) if
serialization fails) so tests in test/codex-manager-cli.test.ts can assert on
nested paths instead of being bypassed by "[object Object]".
- Around line 1253-1260: The test should not stub out the real first-run wizard
flow: remove the mockResolvedValueOnce stubs on authMenu.showFirstRunWizard so
the actual showFirstRunWizard implementation runs (or replace the stub with a
call-through). Concretely, in the test remove the .mockResolvedValueOnce(...)
calls on the spy for showFirstRunWizard (or replace them with a call-through
wrapper that invokes the original function: capture const orig =
authMenu.showFirstRunWizard and use vi.spyOn(authMenu,
"showFirstRunWizard").mockImplementation((...a) => orig.apply(authMenu, a))).
This ensures the real render/re-render path and EBUSY windows behavior are
exercised while still letting you observe the call.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: f5845167-2d5c-4391-8861-15a986edc805

📥 Commits

Reviewing files that changed from the base of the PR and between ebb527f and 42a9197.

📒 Files selected for processing (4)
  • lib/codex-manager.ts
  • lib/sync-history.ts
  • test/codex-manager-cli.test.ts
  • test/sync-history.test.ts
📜 Review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Greptile Review
🧰 Additional context used
📓 Path-based instructions (2)
test/**

⚙️ CodeRabbit configuration file

tests must stay deterministic and use vitest. demand regression cases that reproduce concurrency bugs, token refresh races, and windows filesystem behavior. reject changes that mock real secrets or skip assertions.

Files:

  • test/sync-history.test.ts
  • test/codex-manager-cli.test.ts
lib/**

⚙️ CodeRabbit configuration file

focus on auth rotation, windows filesystem IO, and concurrency. verify every change cites affected tests (vitest) and that new queues handle EBUSY/429 scenarios. check for logging that leaks tokens or emails.

Files:

  • lib/codex-manager.ts
  • lib/sync-history.ts
🔇 Additional comments (3)
test/sync-history.test.ts (1)

271-371: good deterministic coverage for estimate lifecycle paths.

test/sync-history.test.ts:271-371 is solid vitest coverage for the new append-estimate behavior and reset semantics.

lib/codex-manager.ts (2)

1286-1286: good redaction hardening on failure output.

lib/codex-manager.ts:1286 (Line 1286) and lib/codex-manager.ts:4679 (Line 4679) now normalize and redact filesystem error details before surfacing them. this is the right direction for safe cli diagnostics.

Also applies to: 4679-4681


4466-4497: tests exist for first-run wizard gating and EBUSY transient fallback, but verify via behavior rather than direct variable assertions.

tests at test/codex-manager-cli.test.ts:930 and test/codex-manager-cli.test.ts:1253 verify that the first-run wizard is shown once when starting from missing storage and remains open when OpenCode import probing hits EBUSY (wizard called twice, second showing after failure). test at test/codex-manager-cli.test.ts:1448 verifies the recovery menu appears after first-run wizard creates accounts.

the implementation at lib/codex-manager.ts:4466-4497 uses skipEmptyStorageRecoveryMenu and firstRunWizardShownInLoop to control these flows, and existing tests validate the behavior end-to-end through mock sequencing (e.g., wizardSpy.toHaveBeenCalledTimes(2) after EBUSY) rather than asserting on variable state directly. this approach covers the transient EBUSY scenarios referenced in the coding guidelines and verifies recovery paths remain unbroken.

@ndycode ndycode changed the base branch from fresh/18-import-adapter-opencode to dev March 18, 2026 06:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants