fix: queue-based working memory capture (#179)#180
Conversation
…extraction (#179) Transcript extraction was unreliable (most entries are tool_use with no text). Switch to hook-based capture using inputs available at hook execution time: - preamble (UserPromptSubmit) captures user prompt to .pending-turns.jsonl queue - stop-update-memory captures assistant_message (end_turn only) to same queue, then spawns throttled background updater (capture decoupled from throttle) - background-memory-update reads queue via atomic mv handoff, pairs user/assistant turns, caps at 10 most recent turns, with crash recovery via .processing file Key improvements: - No sleep 3 transcript flush wait - No transcript path computation - Queue overflow safety (>200 lines → keep last 100) - stop_reason filter (tool_use stops skipped) - Inline jq/node JSONL construction (avoids bash 4+ array syntax in json_construct) - 5 new behavior tests covering capture, skip, and JSONL format validation Co-Authored-By: Claude <noreply@anthropic.com>
- Cap .processing file at 200 lines after crash recovery merge to prevent unbounded growth when claude -p Write failures persist - Clean up .processing on claude exit 0 even if Write was not invoked (retrying same turns is futile; new turns arrive via queue) - Use json_field helper in turn parsing loop (Simplifier) - Deduplicate HOOKS_DIR in tests (Simplifier)
🔍 Code Review: Consolidated High-Confidence Findings (≥80%)Overview9 blocking/should-fix issues identified across 4 files. Multiple reviewers flagged the same concerns, indicating consistent patterns. Below, I've deduped and consolidated findings by location. 🔴 BLOCKING Issues (Fix Before Merge)1. scripts/hooks/preamble — Stale Header Comment (95% confidence)Line 5 The file header still claims: # Zero file I/O beyond stdin — static injection only.But this PR adds file I/O at lines 23–39 (appends to Fix: # Injects a detection-only preamble. Classification rules only — skill mappings live in devflow:router.
# Also captures user prompts to .memory/.pending-turns.jsonl for working memory.2. scripts/hooks/preamble & stop-update-memory — Raw JSON Construction Instead of Helper (85% confidence)Preamble lines 31–36 and stop-update-memory lines 74–79 Both scripts construct JSON objects using raw Problem: Duplicated abstraction layer. If JSON construction changes are needed, multiple locations must be updated in sync. Fix: # In preamble (lines 31-36):
json_construct --arg role "user" --arg content "$_TRUNCATED_PROMPT" --argjson ts "$_TS" >> "$CWD/.memory/.pending-turns.jsonl"
# In stop-update-memory (lines 74-79):
json_construct --arg role "assistant" --arg content "$ASSISTANT_MSG" --argjson ts "$TS" >> "$QUEUE_FILE"3. scripts/hooks/stop-update-memory — Mtime Logic Duplicated (92% confidence)Lines 96–100 The stop hook inlines the Linux/macOS Problem: Two competing approaches for the identical operation. If stat detection needs a fix (new platform), both locations must be updated in sync. Fix: Extract 4. scripts/hooks/stop-update-memory — Queue Overflow Logic Duplicated (85% confidence)Lines 82–89 (also in background-memory-update lines 107–111) The overflow safety pattern (check >200 lines, truncate to 100 via tail/mv) appears in both scripts with identical thresholds and idioms ( Problem: DRY violation. If thresholds need tuning, two locations must be updated. Fix: Extract a shared 5. scripts/hooks/background-memory-update — Per-Line Subprocess Spawning (95% confidence)Lines 148–181 The turn-parsing while-read loop calls Problem: This is the exact pattern documented in PF-006 (per-line jq spawning adds 1–3s latency). Reintroducing it adds measurable latency and extends lock hold time. Fix: Use single-pass if [ "$_HAS_JQ" = "true" ]; then
PARSED=$(echo "$ENTRIES" | jq -r '[.role, .content] | @tsv')
else
PARSED=$(echo "$ENTRIES" | node -e "d='';process.stdin.on('data',c=>d+=c);process.stdin.on('end',()=>{d.trim().split('\n').filter(Boolean).forEach(l=>{try{const o=JSON.parse(l);console.log(o.role+'\t'+o.content)}catch(e){}})})")
fi
CURRENT_USER=""
while IFS=$'\t' read -r ROLE CONTENT; do
# ... same pairing logic, but ROLE/CONTENT already extracted
done <<< "$PARSED"6. tests/shell-hooks.test.ts — Shell Injection Risk in execSync (82% confidence)Multiple test blocks: lines 772, 791, 816, 840 The test pattern Problem: While tests have controlled inputs today, the pattern is fragile. Test data evolution could cause unexpected failures. Fix: Use safer execSync(`bash "${STOP_HOOK}"`, {
input: JSON.stringify({ cwd: tmpDir, stop_reason: 'end_turn', assistant_message: 'test' }),
stdio: ['pipe', 'pipe', 'pipe'],
});🟡 SHOULD-FIX Issues (Gaps in Coverage/Documentation)7. tests/shell-hooks.test.ts — Missing Content Array Format Test (85% confidence)The stop hook handles two Fix: Add test for content array: it('stop_reason end_turn with content array — extracts text blocks', () => {
const input = JSON.stringify({
cwd: tmpDir,
session_id: 'test-session',
stop_reason: 'end_turn',
assistant_message: [
{ type: 'text', text: 'First' },
{ type: 'tool_result', content: 'ignored' },
{ type: 'text', text: 'Second' },
],
});
// ... run hook and verify both text blocks in queue
});8. tests/shell-hooks.test.ts — Missing Queue Overflow Truncation Test (82% confidence)The stop hook implements queue overflow safety (>200 lines → truncate to 100). This has zero test coverage. Bugs in overflow handling would go undetected. Fix: Pre-populate queue with 201 lines, trigger stop hook, verify truncation to ≤100 lines. 9. CLAUDE.md — Hook Count and Description Outdated (82–85% confidence)Line 41: Says "Three shell-script hooks" but now involves four:
Line 108: Says WORKING-MEMORY.md is "overwritten each session" — now queue-based, so it accumulates turns across multiple stops. Fix:
📊 Summary
Key Pattern: DRY violations (json_construct, mtime, queue overflow logic) + PF-006 reintroduction (subprocess spawning) + incomplete test coverage. Multiple reviewers flagged the same issues, indicating consistent concerns. Recommendation: Address the 6 blocking issues before merge. They involve straightforward refactoring (extract helpers, replace duplicates) and test additions. The PF-006 pattern is the most complex fix but unlocks the biggest latency win. Review compiled from 9 reviewer reports. Inline comments can be created on specific files if needed. |
- Fix stale header comment in preamble claiming zero file I/O (now accurately describes .pending-turns.jsonl capture behavior) - Guard $CWD with [ ! -d ] check in both hooks so a non-existent path from a deleted/unmounted project exits cleanly (preamble:17, stop:24) - Add -- separator before positional args in node fallback invocations to prevent content starting with -- being misinterpreted as node flags (preamble:36, stop:88) - Extract get_mtime() function in stop-update-memory to eliminate inlined stat -c/%m duplication; mirrors the identical function in background-memory-update Co-Authored-By: Claude <noreply@anthropic.com>
… spawning Two fixes in background-memory-update: 1. CWD existence guard (security/robustness): validate $CWD from args before constructing any paths, exiting early with a clear error message if the directory does not exist. 2. Single-pass JSONL extraction (performance, PF-006): replace the per-line json_field calls inside the while-read loop — which spawned 2 jq/node subprocesses per JSONL entry (up to 40 total) — with a single jq/node invocation that extracts role+content as TSV in one pass. The loop now reads pre-extracted TSV rows with IFS=$'\t', spawning zero subprocesses per iteration. Co-Authored-By: Claude <noreply@anthropic.com>
…e injection
- Add test: stop hook handles assistant_message as content array, joining
text blocks with newline and excluding tool_use blocks
- Add test: queue overflow truncates to last 100 lines when >200 entries
- Fix: replace echo-pipe shell injection pattern in all 4 working memory
queue behavior tests with execSync({ input, stdio: ['pipe','pipe','pipe'] })
- docs: update CLAUDE.md hook count from Three to Four (stop, session-start,
pre-compact, preamble) and fix WORKING-MEMORY.md description to reflect
queue-based batch update model
Co-Authored-By: Claude <noreply@anthropic.com>
…mble to pure ambient Creates scripts/hooks/prompt-capture-memory (UserPromptSubmit) that captures user prompts to .memory/.pending-turns.jsonl. Reverts preamble to zero-file-I/O ambient-only injection — prompt capture no longer mixed into classification path. Co-Authored-By: Claude <noreply@anthropic.com>
…nt, queue cleanup on disable MEMORY_HOOK_CONFIG gains UserPromptSubmit → prompt-capture-memory as first entry. hasMemoryHooks uses Object.keys(MEMORY_HOOK_CONFIG).length (4) instead of hardcoded 3. devflow memory --disable now removes queue files (.pending-turns.jsonl + .pending-turns.processing). devflow init with memory=false also cleans up queue files. Co-Authored-By: Claude <noreply@anthropic.com>
…ture-memory, preamble separation memory.test.ts: update all hook counts 3→4, add UserPromptSubmit assertions, add upgrade test (3→4 hooks), add removeMemoryHooks preamble preservation test, add toggle cycle test, add queue file cleanup tests (delete files, safe when missing, safe without .memory/). shell-hooks.test.ts: add prompt-capture-memory to HOOK_SCRIPTS syntax check, replace preamble capture tests with prompt-capture-memory equivalents, add preamble-no-longer-writes test and slash-command-no-queue-write test. Co-Authored-By: Claude <noreply@anthropic.com>
…, add threshold comments file-organization.md: four hooks (not three), add prompt-capture-memory to hook table and source tree, update flow description to show queue-based capture path, document disable cleanup. CLAUDE.md: update Working Memory paragraph — prompt-capture-memory (not preamble) captures prompts; preamble is ambient-only; note disable cleanup. stop-update-memory + background-memory-update: add cross-reference comments for 200/100 overflow threshold to keep them in sync. Co-Authored-By: Claude <noreply@anthropic.com>
dean0x
left a comment
There was a problem hiding this comment.
Review Findings: High-Confidence Issues (≥80%)
This batch of inline comments flags blocking and should-fix issues from the code review. Lower-confidence suggestions (60-79%) are included in the summary comment below.
Issues Found
Total inline comments: 6
- Idempotency regression in
addMemoryHooks(Consistency, 85%) - Path resolution assumes project root (TypeScript, 85%)
- Double JSON parsing (TypeScript, 80%)
- Per-prompt subprocess overhead in hook (Performance, 85%)
get_mtimeduplication across hooks (Architecture, 85%)- Preamble hook subprocess optimization (Performance, 80%)
Lower-Confidence Suggestions (60-79%)
These findings are consolidated here as they're architectural rather than blocking:
Queue Cleanup Logic Duplication (Complexity/Architecture, 70%)
Locations: src/cli/commands/init.ts:1413-1415, src/cli/commands/memory.ts:213-215
The .pending-turns.jsonl and .pending-turns.processing cleanup is implemented identically in two places. Extract to a shared utility in src/cli/utils/post-install.ts:
export async function cleanupQueueFiles(memoryDir: string): Promise<{ queue: boolean; processing: boolean }> {
const queue = await fs.unlink(path.join(memoryDir, '.pending-turns.jsonl')).then(() => true).catch(() => false);
const processing = await fs.unlink(path.join(memoryDir, '.pending-turns.processing')).then(() => true).catch(() => false);
return { queue, processing };
}Overflow Threshold Magic Numbers (Complexity, 85%)
Locations: scripts/hooks/stop-update-memory:92-98, scripts/hooks/background-memory-update:115-119
Both scripts use hardcoded 200/100 thresholds (200 lines trigger cleanup, keep 100). Define once:
QUEUE_MAX_LINES=200
QUEUE_TRIM_LINES=100The cross-referencing # NOTE: comments are good but constants would be more robust.
Queue File Concurrency Assumption (Architecture, 70%)
Locations: scripts/hooks/prompt-capture-memory:37, scripts/hooks/stop-update-memory:67
Two hooks (prompt-capture-memory and stop-update-memory) both append to .pending-turns.jsonl. The >> append is atomic on POSIX for lines under PIPE_BUF, but a design comment would clarify this concurrency assumption:
# Atomic append (POSIX: lines under PIPE_BUF are atomic)
echo "$TURN_JSON" >> "$CWD/.memory/.pending-turns.jsonl"Hook Lifecycle Abstraction Opportunity (Architecture, 65%)
Pattern: addMemoryHooks/removeMemoryHooks in memory.ts vs learning hooks in learn.ts
Both follow an identical pattern: iterate config map, match by marker substring, add/remove/count. A generic HookManager class could eliminate structural duplication. Lower priority as the current duplication is minor.
Summary by Category
| Reviewer | Finding | Confidence | Action |
|---|---|---|---|
| Consistency | changed variable removal in addMemoryHooks |
85% | Inline comment |
| TypeScript | process.cwd() path resolution |
85% | Inline comment |
| TypeScript | Double JSON.parse() | 80% | Inline comment |
| Performance | Per-prompt subprocess overhead | 85% | Inline comment |
| Architecture | get_mtime duplication |
85% | Inline comment |
| Performance | Preamble hook subprocess calls | 80% | Inline comment |
| Complexity | Queue cleanup duplication | 70% | Summary |
| Complexity | Magic number thresholds | 85% | Summary |
| Architecture | Queue concurrency assumption | 70% | Summary |
| Architecture | Hook lifecycle abstraction | 65% | Summary |
Deduplication Note
The following issues were reported by multiple reviewers and consolidated:
get_mtimeduplication: Architecture (85%), Complexity (90%), Consistency (70%)- Queue overflow thresholds: Complexity (85%), Consistency (65%)
- Queue cleanup duplication: Architecture (85%), Complexity (70%)
process.cwd()path resolution: TypeScript (85%), Architecture (82%)- Per-prompt subprocess overhead: Performance (85%)
- Idempotency regression: Consistency (85%), TypeScript (82%)
Each deduplicated finding appears once in this comment at highest confidence.
Next Steps
- Review inline comments and address blocking issues
- Address should-fix items (double-parse, preamble optimization)
- Consolidate lower-confidence findings as appropriate
Recommendation: Approve with conditions (fix blocking items, address should-fix items before merge).
Created by Claude Code git agent for devflow PR review. Findings ≥80% confidence elevated to inline comments.
- Semantic idempotency checks for --enable/--disable (Issue 1) - Remove queue cleanup from --disable, add --clear command (Issue 2) - Single-pass field extraction in prompt-capture-memory and preamble (Issue 3) - Truncation and BG_UPDATER guard tests (Issues 5, 6) - Fix file-organization.md hook table entries (Issue 7) - Extract get-mtime shared helper (Issue 8) - countMemoryHooks/hasMemoryHooks accept parsed Settings (Issue 9) - Add background-memory-update to docs and CLAUDE.md (Issue 10)
…behavior tests - Deduplicate filterProjectsWithMemory to delegate to hasMemoryDir - Remove 11 tests that only exercised JS builtins, not devflow logic
- Extract cleanQueueFiles() pure function (SRP, testable) - Parallelize discoverProjectGitRoots() + getGitRoot() with Promise.all - Add process.stdin.isTTY guard before p.select() (non-interactive default: clean all) - Replace currentProject! non-null assertion with null-safe conditional - Check for .working-memory.lock before deleting queue files to prevent data loss during active background update Co-Authored-By: Claude <noreply@anthropic.com>
…form, add behavioral test Restores the original detection order from stop-update-memory/background-memory-update: stat --version is probed first (succeeds on GNU/Linux, fails on BSD/macOS) so the correct stat flag is chosen. Previous get-mtime tried BSD stat -f %m first, which can return wrong values on Linux GNU stat (filesystem info instead of file mtime). Caches the detected platform type in _GET_MTIME_STAT_TYPE to avoid repeated capability probes on subsequent calls. Adds header comment documenting the Unix epoch return value. Adds behavioral test in shell-hooks.test.ts that creates a temp file, sources get-mtime, calls get_mtime, and validates a positive integer epoch within a sane range. Co-Authored-By: Claude <noreply@anthropic.com>
Addresses batch-3 review issues: - Add json_extract_cwd_prompt() helper to json-parse: extracts cwd + prompt in one subprocess using ASCII SOH (U+0001) as delimiter. SOH is safe for prompts with tabs/backslashes and works in bash 3.2 parameter expansion (unlike NUL byte). Fixes the @TSV pattern which corrupted literal tabs in prompts to the string "\t". - Deduplicate identical 8-line extraction blocks from preamble and prompt-capture-memory — both now call json_extract_cwd_prompt with ${FIELDS%%$'\001'*} / ${FIELDS#*$'\001'} split (zero extra subprocesses). - Batch cwd + stop_reason extraction in stop-update-memory into one subprocess, consistent with the pattern introduced for UserPromptSubmit hooks in this PR. - Replace echo "$INPUT" | json_field with printf '%s' "$INPUT" | json_field across all 6 hooks (stop-update-memory, session-start-memory, session-start-classification, pre-compact-memory, session-end-learning). echo with arbitrary JSON content is unsafe when the shell interprets escape sequences. Co-Authored-By: Claude <noreply@anthropic.com>
…ve error handling - Export hasMemoryDir and filterProjectsWithMemory for testability - Parallelize cleanQueueFiles: project iteration and per-project deletions both use Promise.all, reducing N*2 sequential ops to O(1) async batches - Widen removeMemoryHooks to accept string | Settings (consistent with hasMemoryHooks/countMemoryHooks API) - Differentiate ENOENT from unexpected errors (EACCES etc.) in hasMemoryDir with console.warn for non-ENOENT cases - Add --disable hint pointing to --clear for stale queue file cleanup - Add 22 tests for cleanQueueFiles, hasMemoryDir, filterProjectsWithMemory, and removeMemoryHooks(Settings) covering all edge cases Co-Authored-By: Claude <noreply@anthropic.com>
…tings, fix memory description - hasAmbientHook and hasLearningHook now accept string | Settings, matching the existing hasMemoryHooks/countMemoryHooks API for consistency - Update memory command .description() to mention --clear option Co-Authored-By: Claude <noreply@anthropic.com>
Replace loose < 3000 / < 5000 length guards with exact toBe(2015), matching the 2000-char limit + '... [truncated]' suffix (15 chars). Weak assertions would pass even with broken truncation logic. Add JSDoc to hasMemoryDir and filterProjectsWithMemory to match the documentation style of adjacent exported functions. Co-Authored-By: Claude <noreply@anthropic.com>
…earn --list - session-start-memory: replace inline stat platform detection with shared get-mtime helper - learn.ts: consolidate --list to use shared readObservations helper
Summary
preamble) captures user prompts to.memory/.pending-turns.jsonlassistant_message(onend_turnonly) to same queue, decouples throttle from capturemv-based atomic handoff with crash recovery via.pending-turns.processingWhat was wrong
The previous implementation extracted user/assistant messages from session transcript JSONL, but most entries are
tool_result/tool_usewith no text content. In a typical session (89 "user" messages, only 8 with actual text), thetail -3windowing almost always missed real content.Design
Hook inputs already provide both data points directly — no transcript needed:
prompt(user's text)assistant_message(full response) +stop_reasonQueue accumulates turns; throttle only gates background processing (not capture). 10-turn batch cap, overflow safety, crash recovery.
Test plan
shell-hooks.test.tsbash -nsyntax validation on all 3 modified hooksCloses #179