fix(capture): exclude claude -p from only-cli gate; stop stranding placeholders#254
Conversation
…placeholders
Gate: HIVEMIND_CAPTURE_ONLY_CLI used `entrypoint.includes("cli")`, which let
headless `claude -p` (CLAUDE_CODE_ENTRYPOINT="sdk-cli") slip through and create
capture rows even with the gate on. Tighten to exact `=== "cli"` so only the
interactive terminal passes; sdk-py / sdk-ts / sdk-cli are excluded.
Worker: the wiki worker exited immediately when the sessions table returned
zero events, leaving the SessionStart placeholder stranded at "in progress"
forever. Under concurrency the async capture writes lag behind SessionEnd, so
this stranded a large fraction of short sessions. Retry the event fetch with
backoff (HIVEMIND_WIKI_EVENT_RETRIES / _BACKOFF_MS), and if events still never
arrive, delete the orphan placeholder — guarded on description='in progress'
so a concurrent real summary is never clobbered.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR enforces strict ChangesSession Capture Improvements
Sequence DiagramsequenceDiagram
participant Worker as wiki-worker
participant Sessions as sessions_table
participant Memory as memory_table
Worker->>Sessions: SELECT events for session
alt Events found
Worker->>Worker: Process events
Worker->>Memory: Upload summary and finalize
else No events on initial query
loop Retries remaining
Worker->>Worker: Sleep(backoff)
Worker->>Sessions: Retry SELECT events
end
alt Events found during retries
Worker->>Worker: Process events
Worker->>Memory: Upload summary and finalize
else Still no events after max retries
Worker->>Memory: DELETE placeholder where description='in progress' and path=summaryPath
Worker->>Worker: Release lock and exit
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Coverage ReportScope: files changed in this PR. Enforced threshold: 90% per metric (per file via
File Coverage — 2 files changed
Generated for commit 5ad7e9d. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/hooks/wiki-worker.ts`:
- Around line 67-68: The constants EVENT_FETCH_RETRIES and
EVENT_FETCH_BACKOFF_MS are parsed with Number(...) and can become NaN or
negative if env vars are non-numeric, silently disabling retries; change their
initialization to robust parsing: parseInt/Number the env var, validate with
Number.isFinite and ensure retries is a non-negative integer (Math.max(0,
Math.floor(value))) and backoff is a positive integer (Math.max(1,
Math.floor(value))), falling back to the current defaults when invalid, and emit
a warning via the existing logger when a fallback occurs so misconfigured env
values are visible; update the declarations for EVENT_FETCH_RETRIES and
EVENT_FETCH_BACKOFF_MS accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 5ca02bbc-3738-49de-9428-06cdb84dc01a
📒 Files selected for processing (4)
src/hooks/shared/capture-gate.tssrc/hooks/wiki-worker.tstests/claude-code/wiki-worker.test.tstests/shared/capture-gate.test.ts
Addresses CodeRabbit review on PR #254. A non-numeric HIVEMIND_WIKI_EVENT_RETRIES made Number(...) return NaN, so `attempt <= NaN` was always false and retries were silently disabled — reintroducing placeholder orphaning under load. Parse via a parseNonNegativeInt helper that falls back to the defaults (5 / 1500ms) for missing/non-numeric/negative values. Adds a regression test asserting a garbage env value still retries. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
|
Addressed in 6df6ed8: added a |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/claude-code/wiki-worker.test.ts (1)
168-180: ⚡ Quick winMake the log assertion more specific to verify the default retry count.
The test correctly prevents the NaN regression, but the assertion on line 179 checks only for the generic substring
"no events yet — retry". Per the coding guideline for tests, prefer asserting on specific values. Since the test's purpose is to verify the fallback to the default of 5 retries (not NaN), assert on a more specific pattern that includes the retry count.✅ Proposed fix: assert on the specific retry count pattern
const log = readFileSync(join(hooksDir, "wiki.log"), "utf-8"); // It must have retried (fallback to the default of 5), not bailed immediately. - expect(log).toContain("no events yet — retry"); + expect(log).toMatch(/no events yet — retry \d+\/5 in \d+ms/);This verifies both that retries occurred and that the default of 5 was used.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/claude-code/wiki-worker.test.ts` around lines 168 - 180, Update the test assertion so it verifies the default retry count (5) rather than a generic message: after calling runWorker() and reading the log (readFileSync(...)), assert that the log contains a specific retry-with-count pattern such as "no events yet — retry 1/5" (or use a regex that matches "no events yet — retry \\d+/5") instead of just checking "no events yet — retry"; this change should be applied in the test that sets process.env.HIVEMIND_WIKI_EVENT_RETRIES = 'not-a-number' and uses runWorker(), fetchMock, and expect(log).Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@tests/claude-code/wiki-worker.test.ts`:
- Around line 168-180: Update the test assertion so it verifies the default
retry count (5) rather than a generic message: after calling runWorker() and
reading the log (readFileSync(...)), assert that the log contains a specific
retry-with-count pattern such as "no events yet — retry 1/5" (or use a regex
that matches "no events yet — retry \\d+/5") instead of just checking "no events
yet — retry"; this change should be applied in the test that sets
process.env.HIVEMIND_WIKI_EVENT_RETRIES = 'not-a-number' and uses runWorker(),
fetchMock, and expect(log).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro Plus
Run ID: 0d601202-8213-4ea4-8eee-6399b6d84898
📒 Files selected for processing (2)
src/hooks/wiki-worker.tstests/claude-code/wiki-worker.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
- src/hooks/wiki-worker.ts
Problem
Reported by Ivo/Sasun: in the team memory table the majority of
claude_codesessions are stuck in thein progressstate with no summary. Two independent defects, both reproduced locally:A. The
HIVEMIND_CAPTURE_ONLY_CLIgate letclaude -pthroughThe gate used
entrypoint.includes("cli"). Headlessclaude -preportsCLAUDE_CODE_ENTRYPOINT="sdk-cli", which contains"cli", so print-mode sessions slipped past the gate and created capture rows even with the flag on. (Confirmed with Sasun's repro script —BUG: claude -p wrote a placeholder despite HIVEMIND_CAPTURE_ONLY_CLI=true.)B. The wiki worker stranded placeholders under concurrency
SessionStartwrites anin progressplaceholder; the detached wiki worker is supposed to replace it with a real summary. But the worker exited immediately when the sessions table returned zero events. The capture hooks write events asynchronously, and Deeplake reads are eventually-consistent — under concurrency (many short SDK /claude -psessions ending at once) the events lag behindSessionEnd, so the worker read 0 events and bailed, leaving the placeholder stuck atin progressforever (no retry, no cleanup). Reproduced with 10 concurrent Agents-SDK sessions; the "stuck" ones loggedno session events foundwhile their events were demonstrably present moments later.(This is also why codex was immune: codex finalizes inline in its always-firing
Stophook, with no detached worker racing the async writes.)Fix
capture-gate.ts):entrypoint.includes("cli")→entrypoint === "cli". Only the interactive terminal passes;sdk-py/sdk-ts/sdk-cli(claude -p) are excluded.wiki-worker.ts): on zero events, retry with backoff (HIVEMIND_WIKI_EVENT_RETRIES/HIVEMIND_WIKI_EVENT_BACKOFF_MS, defaults 5 × 1.5s); if events still never arrive, delete the orphan placeholder — guarded ondescription = 'in progress'so a concurrent real summary is never clobbered.No change to the happy path; normal interactive sessions are still captured and summarized.
Use cases covered
Gate — interactive
cli+ flag → captured ·claude -p/sdk-cli+ flag → excluded ·sdk-py/sdk-ts+ flag → excluded · flag off → all captured · substring-but-not-cli→ excluded.Worker — events present → finalizes (no regression) · events lag (race) → retry recovers → summary · genuinely 0 events → orphan removed · real summary + 0 events → preserved (guard, no data-loss) · resume of an orphan-deleted session → work → close → correct summary recreated, pointing at the original session jsonl.
Testing
capture-gate.test.tsupdated to exact-match + explicitsdk-cliregression;wiki-worker.test.tsupdated for retry/cleanup + new retry-recovers test. 27/27 green (remaining repo failures are pre-existing, env-only: missingtree-sitter-*native deps, CLI TTY mocks — verified identical on cleanorigin/main).*_testtables): gate matrix 4/4 (driving the realsession-start.js), worker orphan-removal + guard-preserves-real-summary, and a faithful resume E2E via the Agents SDK — deleted the orphan, resumed with a real change, and confirmed a correct summary was recreated under the original session path with the rightSource:pointer.Notes / out of scope
claude -preports entrypoint"cli"; on claude 2.1.170 I measured"sdk-cli"(both contain"cli", so the substring bug fires either way). If a runtime ever reports a bare"cli"for-p, exact-match won't distinguish it and a print-mode signal would be needed — worth a double-check on Sasun's version.uploadSummarycanSELECT path→ miss the not-yet-visible placeholder →INSERTa second row, leaving a duplicatein progress+ summary pair. Follow-up.Session Context
Session transcript
🤖 Generated with Claude Code
Summary by CodeRabbit