Skip to content

fix: queue-based working memory capture (#179)#180

Merged
dean0x merged 19 commits intomainfrom
fix/179-memory-extraction
Apr 9, 2026
Merged

fix: queue-based working memory capture (#179)#180
dean0x merged 19 commits intomainfrom
fix/179-memory-extraction

Conversation

@dean0x
Copy link
Copy Markdown
Owner

@dean0x dean0x commented Apr 8, 2026

Summary

  • Replaces broken transcript extraction with queue-based turn capture from hook inputs
  • UserPromptSubmit (preamble) captures user prompts to .memory/.pending-turns.jsonl
  • Stop hook captures assistant_message (on end_turn only) to same queue, decouples throttle from capture
  • Background updater uses mv-based atomic handoff with crash recovery via .pending-turns.processing

What was wrong

The previous implementation extracted user/assistant messages from session transcript JSONL, but most entries are tool_result/tool_use with no text content. In a typical session (89 "user" messages, only 8 with actual text), the tail -3 windowing almost always missed real content.

Design

Hook inputs already provide both data points directly — no transcript needed:

  • UserPromptSubmitprompt (user's text)
  • Stopassistant_message (full response) + stop_reason

Queue accumulates turns; throttle only gates background processing (not capture). 10-turn batch cap, overflow safety, crash recovery.

Test plan

  • 5 new queue behavior tests in shell-hooks.test.ts
  • 618/618 tests passing
  • bash -n syntax validation on all 3 modified hooks
  • 16 acceptance scenarios (stop_reason filtering, dual format, capture before throttle, DEVFLOW_BG_UPDATER guard, crash recovery, missing .memory/)
  • Evaluator: 5/5 plan requirements aligned

Closes #179

Dean Sharon and others added 2 commits April 9, 2026 01:04
…extraction (#179)

Transcript extraction was unreliable (most entries are tool_use with no text).
Switch to hook-based capture using inputs available at hook execution time:
- preamble (UserPromptSubmit) captures user prompt to .pending-turns.jsonl queue
- stop-update-memory captures assistant_message (end_turn only) to same queue,
  then spawns throttled background updater (capture decoupled from throttle)
- background-memory-update reads queue via atomic mv handoff, pairs user/assistant
  turns, caps at 10 most recent turns, with crash recovery via .processing file

Key improvements:
- No sleep 3 transcript flush wait
- No transcript path computation
- Queue overflow safety (>200 lines → keep last 100)
- stop_reason filter (tool_use stops skipped)
- Inline jq/node JSONL construction (avoids bash 4+ array syntax in json_construct)
- 5 new behavior tests covering capture, skip, and JSONL format validation

Co-Authored-By: Claude <noreply@anthropic.com>
- Cap .processing file at 200 lines after crash recovery merge to
  prevent unbounded growth when claude -p Write failures persist
- Clean up .processing on claude exit 0 even if Write was not invoked
  (retrying same turns is futile; new turns arrive via queue)
- Use json_field helper in turn parsing loop (Simplifier)
- Deduplicate HOOKS_DIR in tests (Simplifier)
@dean0x
Copy link
Copy Markdown
Owner Author

dean0x commented Apr 8, 2026

🔍 Code Review: Consolidated High-Confidence Findings (≥80%)

Overview

9 blocking/should-fix issues identified across 4 files. Multiple reviewers flagged the same concerns, indicating consistent patterns. Below, I've deduped and consolidated findings by location.


🔴 BLOCKING Issues (Fix Before Merge)

1. scripts/hooks/preamble — Stale Header Comment (95% confidence)

Line 5

The file header still claims:

# Zero file I/O beyond stdin — static injection only.

But this PR adds file I/O at lines 23–39 (appends to .memory/.pending-turns.jsonl). Misleading documentation that will confuse future maintainers.

Fix:

# Injects a detection-only preamble. Classification rules only — skill mappings live in devflow:router.
# Also captures user prompts to .memory/.pending-turns.jsonl for working memory.

2. scripts/hooks/preamble & stop-update-memory — Raw JSON Construction Instead of Helper (85% confidence)

Preamble lines 31–36 and stop-update-memory lines 74–79

Both scripts construct JSON objects using raw jq -n -c --arg patterns. The codebase already has a json_construct helper in the json-parse library that wraps exactly this pattern (with jq/node dual fallback).

Problem: Duplicated abstraction layer. If JSON construction changes are needed, multiple locations must be updated in sync.

Fix:

# In preamble (lines 31-36):
json_construct --arg role "user" --arg content "$_TRUNCATED_PROMPT" --argjson ts "$_TS" >> "$CWD/.memory/.pending-turns.jsonl"

# In stop-update-memory (lines 74-79):
json_construct --arg role "assistant" --arg content "$ASSISTANT_MSG" --argjson ts "$TS" >> "$QUEUE_FILE"

3. scripts/hooks/stop-update-memory — Mtime Logic Duplicated (92% confidence)

Lines 96–100

The stop hook inlines the Linux/macOS stat mtime detection, while background-memory-update defines a reusable get_mtime() function for the same operation. This pattern also appears in session-start-memory.

Problem: Two competing approaches for the identical operation. If stat detection needs a fix (new platform), both locations must be updated in sync.

Fix: Extract get_mtime() into a shared sourced file, or reuse the existing function from background-memory-update.


4. scripts/hooks/stop-update-memory — Queue Overflow Logic Duplicated (85% confidence)

Lines 82–89 (also in background-memory-update lines 107–111)

The overflow safety pattern (check >200 lines, truncate to 100 via tail/mv) appears in both scripts with identical thresholds and idioms (wc -l | tr -d ' ', tail > .tmp && mv).

Problem: DRY violation. If thresholds need tuning, two locations must be updated.

Fix: Extract a shared cap_file_lines() function into a sourced helper.


5. scripts/hooks/background-memory-update — Per-Line Subprocess Spawning (95% confidence)

Lines 148–181

The turn-parsing while-read loop calls json_field twice per JSONL line (role and content). Each invocation spawns a jq or node subprocess. With the 20-line cap (10 turns × 2 lines), this creates 40 subprocess spawns in sequence.

Problem: This is the exact pattern documented in PF-006 (per-line jq spawning adds 1–3s latency). Reintroducing it adds measurable latency and extends lock hold time.

Fix: Use single-pass jq -s (slurp) to extract all roles/contents at once:

if [ "$_HAS_JQ" = "true" ]; then
  PARSED=$(echo "$ENTRIES" | jq -r '[.role, .content] | @tsv')
else
  PARSED=$(echo "$ENTRIES" | node -e "d='';process.stdin.on('data',c=>d+=c);process.stdin.on('end',()=>{d.trim().split('\n').filter(Boolean).forEach(l=>{try{const o=JSON.parse(l);console.log(o.role+'\t'+o.content)}catch(e){}})})")
fi

CURRENT_USER=""
while IFS=$'\t' read -r ROLE CONTENT; do
  # ... same pairing logic, but ROLE/CONTENT already extracted
done <<< "$PARSED"

6. tests/shell-hooks.test.ts — Shell Injection Risk in execSync (82% confidence)

Multiple test blocks: lines 772, 791, 816, 840

The test pattern echo '${input.replace(/'/g, "'\\''")}' | bash escapes single quotes but doesn't guard against shell metacharacters in dynamically constructed JSON (backticks, $(), newlines).

Problem: While tests have controlled inputs today, the pattern is fragile. Test data evolution could cause unexpected failures.

Fix: Use safer input option:

execSync(`bash "${STOP_HOOK}"`, {
  input: JSON.stringify({ cwd: tmpDir, stop_reason: 'end_turn', assistant_message: 'test' }),
  stdio: ['pipe', 'pipe', 'pipe'],
});

🟡 SHOULD-FIX Issues (Gaps in Coverage/Documentation)

7. tests/shell-hooks.test.ts — Missing Content Array Format Test (85% confidence)

The stop hook handles two assistant_message formats: plain string and content array ([{type: "text", text: "..."}]). Tests only cover the string format.

Fix: Add test for content array:

it('stop_reason end_turn with content array — extracts text blocks', () => {
  const input = JSON.stringify({
    cwd: tmpDir,
    session_id: 'test-session',
    stop_reason: 'end_turn',
    assistant_message: [
      { type: 'text', text: 'First' },
      { type: 'tool_result', content: 'ignored' },
      { type: 'text', text: 'Second' },
    ],
  });
  // ... run hook and verify both text blocks in queue
});

8. tests/shell-hooks.test.ts — Missing Queue Overflow Truncation Test (82% confidence)

The stop hook implements queue overflow safety (>200 lines → truncate to 100). This has zero test coverage. Bugs in overflow handling would go undetected.

Fix: Pre-populate queue with 201 lines, trigger stop hook, verify truncation to ≤100 lines.


9. CLAUDE.md — Hook Count and Description Outdated (82–85% confidence)

Line 41: Says "Three shell-script hooks" but now involves four:

  1. preamble (user prompt capture)
  2. stop-update-memory (assistant response capture)
  3. background-memory-update (queue processing)
  4. session-start-memory (memory injection)

Line 108: Says WORKING-MEMORY.md is "overwritten each session" — now queue-based, so it accumulates turns across multiple stops.

Fix:

  • Line 41: Update to "Four shell-script hooks" or rephrase to avoid counting
  • Line 108: Update to "Auto-maintained by background updater (queue-fed by preamble + stop hooks)"

📊 Summary

Category Count Severity
Blocking 6 HIGH
Should-Fix 3 MEDIUM
Dedup Rate 4/13

Key Pattern: DRY violations (json_construct, mtime, queue overflow logic) + PF-006 reintroduction (subprocess spawning) + incomplete test coverage. Multiple reviewers flagged the same issues, indicating consistent concerns.

Recommendation: Address the 6 blocking issues before merge. They involve straightforward refactoring (extract helpers, replace duplicates) and test additions. The PF-006 pattern is the most complex fix but unlocks the biggest latency win.


Review compiled from 9 reviewer reports. Inline comments can be created on specific files if needed.

Dean Sharon and others added 8 commits April 9, 2026 01:39
- Fix stale header comment in preamble claiming zero file I/O (now
  accurately describes .pending-turns.jsonl capture behavior)
- Guard $CWD with [ ! -d ] check in both hooks so a non-existent path
  from a deleted/unmounted project exits cleanly (preamble:17, stop:24)
- Add -- separator before positional args in node fallback invocations
  to prevent content starting with -- being misinterpreted as node flags
  (preamble:36, stop:88)
- Extract get_mtime() function in stop-update-memory to eliminate inlined
  stat -c/%m duplication; mirrors the identical function in
  background-memory-update

Co-Authored-By: Claude <noreply@anthropic.com>
… spawning

Two fixes in background-memory-update:

1. CWD existence guard (security/robustness): validate $CWD from args
   before constructing any paths, exiting early with a clear error message
   if the directory does not exist.

2. Single-pass JSONL extraction (performance, PF-006): replace the
   per-line json_field calls inside the while-read loop — which spawned
   2 jq/node subprocesses per JSONL entry (up to 40 total) — with a
   single jq/node invocation that extracts role+content as TSV in one
   pass. The loop now reads pre-extracted TSV rows with IFS=$'\t',
   spawning zero subprocesses per iteration.

Co-Authored-By: Claude <noreply@anthropic.com>
…e injection

- Add test: stop hook handles assistant_message as content array, joining
  text blocks with newline and excluding tool_use blocks
- Add test: queue overflow truncates to last 100 lines when >200 entries
- Fix: replace echo-pipe shell injection pattern in all 4 working memory
  queue behavior tests with execSync({ input, stdio: ['pipe','pipe','pipe'] })
- docs: update CLAUDE.md hook count from Three to Four (stop, session-start,
  pre-compact, preamble) and fix WORKING-MEMORY.md description to reflect
  queue-based batch update model

Co-Authored-By: Claude <noreply@anthropic.com>
…mble to pure ambient

Creates scripts/hooks/prompt-capture-memory (UserPromptSubmit) that captures
user prompts to .memory/.pending-turns.jsonl. Reverts preamble to zero-file-I/O
ambient-only injection — prompt capture no longer mixed into classification path.

Co-Authored-By: Claude <noreply@anthropic.com>
…nt, queue cleanup on disable

MEMORY_HOOK_CONFIG gains UserPromptSubmit → prompt-capture-memory as first entry.
hasMemoryHooks uses Object.keys(MEMORY_HOOK_CONFIG).length (4) instead of hardcoded 3.
devflow memory --disable now removes queue files (.pending-turns.jsonl + .pending-turns.processing).
devflow init with memory=false also cleans up queue files.

Co-Authored-By: Claude <noreply@anthropic.com>
…ture-memory, preamble separation

memory.test.ts: update all hook counts 3→4, add UserPromptSubmit assertions, add upgrade
test (3→4 hooks), add removeMemoryHooks preamble preservation test, add toggle cycle test,
add queue file cleanup tests (delete files, safe when missing, safe without .memory/).

shell-hooks.test.ts: add prompt-capture-memory to HOOK_SCRIPTS syntax check, replace
preamble capture tests with prompt-capture-memory equivalents, add preamble-no-longer-writes
test and slash-command-no-queue-write test.

Co-Authored-By: Claude <noreply@anthropic.com>
…, add threshold comments

file-organization.md: four hooks (not three), add prompt-capture-memory to hook table and
source tree, update flow description to show queue-based capture path, document disable cleanup.

CLAUDE.md: update Working Memory paragraph — prompt-capture-memory (not preamble) captures
prompts; preamble is ambient-only; note disable cleanup.

stop-update-memory + background-memory-update: add cross-reference comments for 200/100
overflow threshold to keep them in sync.

Co-Authored-By: Claude <noreply@anthropic.com>
Copy link
Copy Markdown
Owner Author

@dean0x dean0x left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Findings: High-Confidence Issues (≥80%)

This batch of inline comments flags blocking and should-fix issues from the code review. Lower-confidence suggestions (60-79%) are included in the summary comment below.

Issues Found

Total inline comments: 6

  • Idempotency regression in addMemoryHooks (Consistency, 85%)
  • Path resolution assumes project root (TypeScript, 85%)
  • Double JSON parsing (TypeScript, 80%)
  • Per-prompt subprocess overhead in hook (Performance, 85%)
  • get_mtime duplication across hooks (Architecture, 85%)
  • Preamble hook subprocess optimization (Performance, 80%)

Lower-Confidence Suggestions (60-79%)

These findings are consolidated here as they're architectural rather than blocking:

Queue Cleanup Logic Duplication (Complexity/Architecture, 70%)

Locations: src/cli/commands/init.ts:1413-1415, src/cli/commands/memory.ts:213-215

The .pending-turns.jsonl and .pending-turns.processing cleanup is implemented identically in two places. Extract to a shared utility in src/cli/utils/post-install.ts:

export async function cleanupQueueFiles(memoryDir: string): Promise<{ queue: boolean; processing: boolean }> {
  const queue = await fs.unlink(path.join(memoryDir, '.pending-turns.jsonl')).then(() => true).catch(() => false);
  const processing = await fs.unlink(path.join(memoryDir, '.pending-turns.processing')).then(() => true).catch(() => false);
  return { queue, processing };
}

Overflow Threshold Magic Numbers (Complexity, 85%)

Locations: scripts/hooks/stop-update-memory:92-98, scripts/hooks/background-memory-update:115-119

Both scripts use hardcoded 200/100 thresholds (200 lines trigger cleanup, keep 100). Define once:

QUEUE_MAX_LINES=200
QUEUE_TRIM_LINES=100

The cross-referencing # NOTE: comments are good but constants would be more robust.

Queue File Concurrency Assumption (Architecture, 70%)

Locations: scripts/hooks/prompt-capture-memory:37, scripts/hooks/stop-update-memory:67

Two hooks (prompt-capture-memory and stop-update-memory) both append to .pending-turns.jsonl. The >> append is atomic on POSIX for lines under PIPE_BUF, but a design comment would clarify this concurrency assumption:

# Atomic append (POSIX: lines under PIPE_BUF are atomic)
echo "$TURN_JSON" >> "$CWD/.memory/.pending-turns.jsonl"

Hook Lifecycle Abstraction Opportunity (Architecture, 65%)

Pattern: addMemoryHooks/removeMemoryHooks in memory.ts vs learning hooks in learn.ts

Both follow an identical pattern: iterate config map, match by marker substring, add/remove/count. A generic HookManager class could eliminate structural duplication. Lower priority as the current duplication is minor.


Summary by Category

Reviewer Finding Confidence Action
Consistency changed variable removal in addMemoryHooks 85% Inline comment
TypeScript process.cwd() path resolution 85% Inline comment
TypeScript Double JSON.parse() 80% Inline comment
Performance Per-prompt subprocess overhead 85% Inline comment
Architecture get_mtime duplication 85% Inline comment
Performance Preamble hook subprocess calls 80% Inline comment
Complexity Queue cleanup duplication 70% Summary
Complexity Magic number thresholds 85% Summary
Architecture Queue concurrency assumption 70% Summary
Architecture Hook lifecycle abstraction 65% Summary

Deduplication Note

The following issues were reported by multiple reviewers and consolidated:

  • get_mtime duplication: Architecture (85%), Complexity (90%), Consistency (70%)
  • Queue overflow thresholds: Complexity (85%), Consistency (65%)
  • Queue cleanup duplication: Architecture (85%), Complexity (70%)
  • process.cwd() path resolution: TypeScript (85%), Architecture (82%)
  • Per-prompt subprocess overhead: Performance (85%)
  • Idempotency regression: Consistency (85%), TypeScript (82%)

Each deduplicated finding appears once in this comment at highest confidence.


Next Steps

  1. Review inline comments and address blocking issues
  2. Address should-fix items (double-parse, preamble optimization)
  3. Consolidate lower-confidence findings as appropriate

Recommendation: Approve with conditions (fix blocking items, address should-fix items before merge).


Created by Claude Code git agent for devflow PR review. Findings ≥80% confidence elevated to inline comments.

Dean Sharon and others added 9 commits April 9, 2026 21:19
- Semantic idempotency checks for --enable/--disable (Issue 1)
- Remove queue cleanup from --disable, add --clear command (Issue 2)
- Single-pass field extraction in prompt-capture-memory and preamble (Issue 3)
- Truncation and BG_UPDATER guard tests (Issues 5, 6)
- Fix file-organization.md hook table entries (Issue 7)
- Extract get-mtime shared helper (Issue 8)
- countMemoryHooks/hasMemoryHooks accept parsed Settings (Issue 9)
- Add background-memory-update to docs and CLAUDE.md (Issue 10)
…behavior tests

- Deduplicate filterProjectsWithMemory to delegate to hasMemoryDir
- Remove 11 tests that only exercised JS builtins, not devflow logic
- Extract cleanQueueFiles() pure function (SRP, testable)
- Parallelize discoverProjectGitRoots() + getGitRoot() with Promise.all
- Add process.stdin.isTTY guard before p.select() (non-interactive default: clean all)
- Replace currentProject! non-null assertion with null-safe conditional
- Check for .working-memory.lock before deleting queue files to prevent data loss during active background update

Co-Authored-By: Claude <noreply@anthropic.com>
…form, add behavioral test

Restores the original detection order from stop-update-memory/background-memory-update:
stat --version is probed first (succeeds on GNU/Linux, fails on BSD/macOS) so the correct
stat flag is chosen. Previous get-mtime tried BSD stat -f %m first, which can return
wrong values on Linux GNU stat (filesystem info instead of file mtime).

Caches the detected platform type in _GET_MTIME_STAT_TYPE to avoid repeated capability
probes on subsequent calls.

Adds header comment documenting the Unix epoch return value.

Adds behavioral test in shell-hooks.test.ts that creates a temp file, sources get-mtime,
calls get_mtime, and validates a positive integer epoch within a sane range.

Co-Authored-By: Claude <noreply@anthropic.com>
Addresses batch-3 review issues:

- Add json_extract_cwd_prompt() helper to json-parse: extracts cwd +
  prompt in one subprocess using ASCII SOH (U+0001) as delimiter. SOH
  is safe for prompts with tabs/backslashes and works in bash 3.2
  parameter expansion (unlike NUL byte). Fixes the @TSV pattern which
  corrupted literal tabs in prompts to the string "\t".

- Deduplicate identical 8-line extraction blocks from preamble and
  prompt-capture-memory — both now call json_extract_cwd_prompt with
  ${FIELDS%%$'\001'*} / ${FIELDS#*$'\001'} split (zero extra subprocesses).

- Batch cwd + stop_reason extraction in stop-update-memory into one
  subprocess, consistent with the pattern introduced for UserPromptSubmit
  hooks in this PR.

- Replace echo "$INPUT" | json_field with printf '%s' "$INPUT" | json_field
  across all 6 hooks (stop-update-memory, session-start-memory,
  session-start-classification, pre-compact-memory, session-end-learning).
  echo with arbitrary JSON content is unsafe when the shell interprets
  escape sequences.

Co-Authored-By: Claude <noreply@anthropic.com>
…ve error handling

- Export hasMemoryDir and filterProjectsWithMemory for testability
- Parallelize cleanQueueFiles: project iteration and per-project deletions
  both use Promise.all, reducing N*2 sequential ops to O(1) async batches
- Widen removeMemoryHooks to accept string | Settings (consistent with
  hasMemoryHooks/countMemoryHooks API)
- Differentiate ENOENT from unexpected errors (EACCES etc.) in hasMemoryDir
  with console.warn for non-ENOENT cases
- Add --disable hint pointing to --clear for stale queue file cleanup
- Add 22 tests for cleanQueueFiles, hasMemoryDir, filterProjectsWithMemory,
  and removeMemoryHooks(Settings) covering all edge cases

Co-Authored-By: Claude <noreply@anthropic.com>
…tings, fix memory description

- hasAmbientHook and hasLearningHook now accept string | Settings, matching
  the existing hasMemoryHooks/countMemoryHooks API for consistency
- Update memory command .description() to mention --clear option

Co-Authored-By: Claude <noreply@anthropic.com>
Replace loose < 3000 / < 5000 length guards with exact toBe(2015),
matching the 2000-char limit + '... [truncated]' suffix (15 chars).
Weak assertions would pass even with broken truncation logic.

Add JSDoc to hasMemoryDir and filterProjectsWithMemory to match the
documentation style of adjacent exported functions.

Co-Authored-By: Claude <noreply@anthropic.com>
…earn --list

- session-start-memory: replace inline stat platform detection with shared get-mtime helper
- learn.ts: consolidate --list to use shared readObservations helper
@dean0x dean0x merged commit 9685f27 into main Apr 9, 2026
4 checks passed
@dean0x dean0x deleted the fix/179-memory-extraction branch April 9, 2026 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: working memory extraction — first user message instead of last

1 participant