Skip to content

fix(capture): exclude claude -p from only-cli gate; stop stranding placeholders#254

Merged
efenocchi merged 2 commits into
mainfrom
fix/only-cli-gate-and-worker-orphan
Jun 10, 2026
Merged

fix(capture): exclude claude -p from only-cli gate; stop stranding placeholders#254
efenocchi merged 2 commits into
mainfrom
fix/only-cli-gate-and-worker-orphan

Conversation

@efenocchi

@efenocchi efenocchi commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

Problem

Reported by Ivo/Sasun: in the team memory table the majority of claude_code sessions are stuck in the in progress state with no summary. Two independent defects, both reproduced locally:

A. The HIVEMIND_CAPTURE_ONLY_CLI gate let claude -p through

The gate used entrypoint.includes("cli"). Headless claude -p reports CLAUDE_CODE_ENTRYPOINT="sdk-cli", which contains "cli", so print-mode sessions slipped past the gate and created capture rows even with the flag on. (Confirmed with Sasun's repro script — BUG: claude -p wrote a placeholder despite HIVEMIND_CAPTURE_ONLY_CLI=true.)

B. The wiki worker stranded placeholders under concurrency

SessionStart writes an in progress placeholder; the detached wiki worker is supposed to replace it with a real summary. But the worker exited immediately when the sessions table returned zero events. The capture hooks write events asynchronously, and Deeplake reads are eventually-consistent — under concurrency (many short SDK / claude -p sessions ending at once) the events lag behind SessionEnd, so the worker read 0 events and bailed, leaving the placeholder stuck at in progress forever (no retry, no cleanup). Reproduced with 10 concurrent Agents-SDK sessions; the "stuck" ones logged no session events found while their events were demonstrably present moments later.

(This is also why codex was immune: codex finalizes inline in its always-firing Stop hook, with no detached worker racing the async writes.)

Fix

  • Gate (capture-gate.ts): entrypoint.includes("cli")entrypoint === "cli". Only the interactive terminal passes; sdk-py / sdk-ts / sdk-cli (claude -p) are excluded.
  • Worker (wiki-worker.ts): on zero events, retry with backoff (HIVEMIND_WIKI_EVENT_RETRIES / HIVEMIND_WIKI_EVENT_BACKOFF_MS, defaults 5 × 1.5s); if events still never arrive, delete the orphan placeholder — guarded on description = 'in progress' so a concurrent real summary is never clobbered.

No change to the happy path; normal interactive sessions are still captured and summarized.

Use cases covered

Gate — interactive cli + flag → captured · claude -p/sdk-cli + flag → excluded · sdk-py/sdk-ts + flag → excluded · flag off → all captured · substring-but-not-cli → excluded.

Worker — events present → finalizes (no regression) · events lag (race) → retry recovers → summary · genuinely 0 events → orphan removed · real summary + 0 events → preserved (guard, no data-loss) · resume of an orphan-deleted session → work → close → correct summary recreated, pointing at the original session jsonl.

Testing

  • Unit: capture-gate.test.ts updated to exact-match + explicit sdk-cli regression; wiki-worker.test.ts updated for retry/cleanup + new retry-recovers test. 27/27 green (remaining repo failures are pre-existing, env-only: missing tree-sitter-* native deps, CLI TTY mocks — verified identical on clean origin/main).
  • E2E vs Deeplake (*_test tables): gate matrix 4/4 (driving the real session-start.js), worker orphan-removal + guard-preserves-real-summary, and a faithful resume E2E via the Agents SDK — deleted the orphan, resumed with a real change, and confirmed a correct summary was recreated under the original session path with the right Source: pointer.

Notes / out of scope

  • Sasun's comment says claude -p reports entrypoint "cli"; on claude 2.1.170 I measured "sdk-cli" (both contain "cli", so the substring bug fires either way). If a runtime ever reports a bare "cli" for -p, exact-match won't distinguish it and a print-mode signal would be needed — worth a double-check on Sasun's version.
  • A separate, pre-existing issue (not addressed here): under read-lag, uploadSummary can SELECT path → miss the not-yet-visible placeholder → INSERT a second row, leaving a duplicate in progress + summary pair. Follow-up.

Session Context

Session transcript

🤖 Generated with Claude Code

Summary by CodeRabbit

  • Bug Fixes
    • Enforced exact CLI-mode detection to prevent misclassifying and capturing non-CLI sessions.
    • Added configurable retry/backoff for session event retrieval to increase reliability.
    • Ensured orphan session placeholders are cleaned up when events never appear to avoid stuck sessions.
  • Tests
    • Expanded tests to cover strict CLI detection, retry behavior, and orphan-placeholder cleanup.

…placeholders

Gate: HIVEMIND_CAPTURE_ONLY_CLI used `entrypoint.includes("cli")`, which let
headless `claude -p` (CLAUDE_CODE_ENTRYPOINT="sdk-cli") slip through and create
capture rows even with the gate on. Tighten to exact `=== "cli"` so only the
interactive terminal passes; sdk-py / sdk-ts / sdk-cli are excluded.

Worker: the wiki worker exited immediately when the sessions table returned
zero events, leaving the SessionStart placeholder stranded at "in progress"
forever. Under concurrency the async capture writes lag behind SessionEnd, so
this stranded a large fraction of short sessions. Retry the event fetch with
backoff (HIVEMIND_WIKI_EVENT_RETRIES / _BACKOFF_MS), and if events still never
arrive, delete the orphan placeholder — guarded on description='in progress'
so a concurrent real summary is never clobbered.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@coderabbitai

coderabbitai Bot commented Jun 10, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

This PR enforces strict CLAUDE_CODE_ENTRYPOINT === "cli" in the capture gate and adds configurable retry/backoff plus orphan-placeholder cleanup to the wiki-worker when session events are initially unavailable.

Changes

Session Capture Improvements

Layer / File(s) Summary
Capture gate strict entrypoint check
src/hooks/shared/capture-gate.ts, tests/shared/capture-gate.test.ts
Gate now requires CLAUDE_CODE_ENTRYPOINT === "cli" instead of substring containment; documentation and tests updated to block sdk-cli, claude -p, and other non-exact variants.
Wiki worker event fetch resilience with retry and cleanup
src/hooks/wiki-worker.ts, tests/claude-code/wiki-worker.test.ts
Added env-driven retry/backoff and parseNonNegativeInt/sleep helpers for session-event fetching; worker retries empty queries with linear backoff and, after retries are exhausted, attempts a guarded DELETE of orphan description = 'in progress' memory rows. Tests expanded to clean env/mocks and cover placeholder cleanup, null responses, non-numeric-retries fallback, and race/retry recovery.

Sequence Diagram

sequenceDiagram
  participant Worker as wiki-worker
  participant Sessions as sessions_table
  participant Memory as memory_table

  Worker->>Sessions: SELECT events for session
  alt Events found
    Worker->>Worker: Process events
    Worker->>Memory: Upload summary and finalize
  else No events on initial query
    loop Retries remaining
      Worker->>Worker: Sleep(backoff)
      Worker->>Sessions: Retry SELECT events
    end
    alt Events found during retries
      Worker->>Worker: Process events
      Worker->>Memory: Upload summary and finalize
    else Still no events after max retries
      Worker->>Memory: DELETE placeholder where description='in progress' and path=summaryPath
      Worker->>Worker: Release lock and exit
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • activeloopai/hivemind#210: Both PRs touch HIVEMIND_CAPTURE_ONLY_CLI gate behavior; that PR added substring-based short-circuiting while this PR tightens the gate to exact "cli" equality.

Suggested reviewers

  • kaghni

Poem

🐇 I thumped at the gate: “Only cli may pass!”
No sneaky substrings slip through the grass.
When events play hide-and-seek in the night,
I retry with patience, backoff just right.
If none arrive, I tidy the trail with care—
no orphan crumbs left anywhere.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed Title clearly summarizes the two main fixes: excluding claude -p from the only-cli gate and stopping orphaned placeholders in the worker.
Description check ✅ Passed Description comprehensively covers the problem statement, fixes, use cases, testing approach, and relevant notes; all required template sections are present and well-populated.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/only-cli-gate-and-worker-orphan

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions

github-actions Bot commented Jun 10, 2026

Copy link
Copy Markdown
Contributor

Coverage Report

Scope: files changed in this PR. Enforced threshold: 90% per metric (per file via vitest.config.ts).

Status Category Percentage Covered / Total
🟢 Lines 97.00% (🎯 90%) 97 / 100
🟢 Statements 97.30% (🎯 90%) 108 / 111
🟢 Functions 100.00% (🎯 90%) 15 / 15
🟢 Branches 96.15% (🎯 90%) 50 / 52
File Coverage — 2 files changed
File Stmts Branches Functions Lines
src/hooks/shared/capture-gate.ts 🟢 100.0% 🟢 100.0% 🟢 100.0% 🟢 100.0%
src/hooks/wiki-worker.ts 🟢 97.2% 🟢 95.7% 🟢 100.0% 🟢 96.9%

Generated for commit 5ad7e9d.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/hooks/wiki-worker.ts`:
- Around line 67-68: The constants EVENT_FETCH_RETRIES and
EVENT_FETCH_BACKOFF_MS are parsed with Number(...) and can become NaN or
negative if env vars are non-numeric, silently disabling retries; change their
initialization to robust parsing: parseInt/Number the env var, validate with
Number.isFinite and ensure retries is a non-negative integer (Math.max(0,
Math.floor(value))) and backoff is a positive integer (Math.max(1,
Math.floor(value))), falling back to the current defaults when invalid, and emit
a warning via the existing logger when a fallback occurs so misconfigured env
values are visible; update the declarations for EVENT_FETCH_RETRIES and
EVENT_FETCH_BACKOFF_MS accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 5ca02bbc-3738-49de-9428-06cdb84dc01a

📥 Commits

Reviewing files that changed from the base of the PR and between 456befd and ea25986.

📒 Files selected for processing (4)
  • src/hooks/shared/capture-gate.ts
  • src/hooks/wiki-worker.ts
  • tests/claude-code/wiki-worker.test.ts
  • tests/shared/capture-gate.test.ts

Comment thread src/hooks/wiki-worker.ts Outdated
Addresses CodeRabbit review on PR #254. A non-numeric HIVEMIND_WIKI_EVENT_RETRIES
made Number(...) return NaN, so `attempt <= NaN` was always false and retries
were silently disabled — reintroducing placeholder orphaning under load. Parse
via a parseNonNegativeInt helper that falls back to the defaults (5 / 1500ms)
for missing/non-numeric/negative values. Adds a regression test asserting a
garbage env value still retries.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@efenocchi

Copy link
Copy Markdown
Collaborator Author

Addressed in 6df6ed8: added a parseNonNegativeInt helper so a non-numeric / negative HIVEMIND_WIKI_EVENT_RETRIES or _BACKOFF_MS falls back to the defaults (5 / 1500ms) instead of becoming NaN and silently disabling retries. Added a regression test (falls back to default retries when the env var is non-numeric). Good catch — that NaN path would have re-stranded placeholders.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/claude-code/wiki-worker.test.ts (1)

168-180: ⚡ Quick win

Make the log assertion more specific to verify the default retry count.

The test correctly prevents the NaN regression, but the assertion on line 179 checks only for the generic substring "no events yet — retry". Per the coding guideline for tests, prefer asserting on specific values. Since the test's purpose is to verify the fallback to the default of 5 retries (not NaN), assert on a more specific pattern that includes the retry count.

✅ Proposed fix: assert on the specific retry count pattern
     const log = readFileSync(join(hooksDir, "wiki.log"), "utf-8");
     // It must have retried (fallback to the default of 5), not bailed immediately.
-    expect(log).toContain("no events yet — retry");
+    expect(log).toMatch(/no events yet — retry \d+\/5 in \d+ms/);

This verifies both that retries occurred and that the default of 5 was used.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/claude-code/wiki-worker.test.ts` around lines 168 - 180, Update the
test assertion so it verifies the default retry count (5) rather than a generic
message: after calling runWorker() and reading the log (readFileSync(...)),
assert that the log contains a specific retry-with-count pattern such as "no
events yet — retry 1/5" (or use a regex that matches "no events yet — retry
\\d+/5") instead of just checking "no events yet — retry"; this change should be
applied in the test that sets process.env.HIVEMIND_WIKI_EVENT_RETRIES =
'not-a-number' and uses runWorker(), fetchMock, and expect(log).

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@tests/claude-code/wiki-worker.test.ts`:
- Around line 168-180: Update the test assertion so it verifies the default
retry count (5) rather than a generic message: after calling runWorker() and
reading the log (readFileSync(...)), assert that the log contains a specific
retry-with-count pattern such as "no events yet — retry 1/5" (or use a regex
that matches "no events yet — retry \\d+/5") instead of just checking "no events
yet — retry"; this change should be applied in the test that sets
process.env.HIVEMIND_WIKI_EVENT_RETRIES = 'not-a-number' and uses runWorker(),
fetchMock, and expect(log).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 0d601202-8213-4ea4-8eee-6399b6d84898

📥 Commits

Reviewing files that changed from the base of the PR and between ea25986 and 6df6ed8.

📒 Files selected for processing (2)
  • src/hooks/wiki-worker.ts
  • tests/claude-code/wiki-worker.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/hooks/wiki-worker.ts

@efenocchi efenocchi merged commit 4253da0 into main Jun 10, 2026
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants