Junior's current visible-thread compaction accumulates structured XML summaries alongside a trimmed live transcript. Codex uses a full history-replacement model that is simpler to reason about, avoids unbounded summary accumulation, and better preserves recent user context across long threads.
Important correction: in Junior, conversation-memory.ts is not the only context-growth surface. After a completed agent turn, the next normal turn primarily seeds the model from durable Pi checkpoint history, not from conversation.compactions.
Current Behavior
packages/junior/src/chat/services/conversation-memory.ts compacts persisted Slack conversation state pre-turn:
- strategy: summarize the oldest message batch, append it as a
<compaction> inside <thread-compactions>, then trim conversation.messages from the front
- summary format: XML sections for active asks, completed/superseded asks, and facts
- cap: 16 compaction records, then oldest summaries are merged
buildConversationContext(...) renders the compacted Slack transcript for routing/thinking and for initial prompt background when no Pi history exists.
reply-executor.ts usually loads durable Pi messages from the active or last completed turn checkpoint with loadPiMessagesForTurn(...).
respond.ts suppresses <thread-background> when context.piMessages exists, then seeds agent.state.messages directly from those Pi messages.
That means current conversation-memory compaction can shrink routing/thread-background context while Pi history can still grow without a Codex-style replacement step.
Gap
- The primary agent history is durable Pi checkpoint history, which is not compacted by
conversation-memory.ts.
- Compaction records in visible conversation state still consume bounded-but-real routing/background context.
- Older user wording only survives through summaries unless it remains in the live message tail or reusable Pi history.
- Estimated tokens can diverge from actual model usage, especially with tool-heavy turns, richer message parts, and attachments.
- Codex's initial-context reinjection distinction matters for Junior only if we later add mid-turn compaction. Pre-turn compaction can rely on normal per-turn runtime context injection.
Proposed Direction
Implement Codex-style history replacement for Junior's durable Pi history, and keep conversation-memory compaction as the secondary visible-thread/routing compaction surface.
1. Compact Durable Pi History
Add a local context compactor that operates on PiMessage[] from a completed turn checkpoint. It should summarize current history, then install a bounded replacement history instead of appending more summary records.
Replacement shape:
- recent real user messages up to a configured token budget
- one synthetic handoff summary message
Do not include stale <runtime-turn-context> parts, raw image/base64 payloads, or unbounded tool output in the retained messages.
2. Handoff-Oriented Summary Prompt
Switch from XML-section summaries to a concise handoff prompt modeled on Codex:
- current outstanding asks
- key decisions and completed outcomes
- durable context, constraints, user preferences, IDs, URLs, artifact/canvas/sandbox references
- clear next steps or unresolved blockers
The summary should be easy for the resumed agent to consume as narrative context, not a nested compaction log.
3. Preserve Recent User Messages
Retain recent user-authored messages verbatim up to a budget. Codex's local compaction uses 20K tokens. Junior should start with a local constant and tune from observed usage.
4. Trigger Before Fresh Turns
Run automatic compaction after loadPiMessagesForTurn(...) has loaded reusable history and before generateAssistantReply(...) receives piMessages.
Do not compact an awaiting_resume checkpoint. Timeout/auth resume depends on safe Pi continue() boundaries and should keep exact checkpoint history until the turn completes.
5. Derive Thresholds From Model Context Windows
Do not keep a hardcoded 9K-token threshold. Automatic Pi-history compaction should derive its threshold from the active agent model's advertised context window and reserve output headroom. Visible conversation-state compaction should use the same budget rule against the fast model.
AI_MODEL_CONTEXT_WINDOW_TOKENS and AI_FAST_MODEL_CONTEXT_WINDOW_TOKENS should override advertised context windows when provider metadata is missing, stale, or intentionally constrained.
6. Internal Context Forking
No Slack message, slash command, or model tool should invoke compaction directly. Manual/user-facing compaction is out of scope.
Junior should expose an internal compaction command for future orchestration strategies, such as handing off to a coding agent or upgrading the model. That internal command should take the current reusable completed Pi history plus visible thread state, summarize/compact it, create a fresh compacted model history, and make that compacted history the next reusable context for the thread.
Implementation should create a new synthetic completed compaction checkpoint/session and update conversation.processing.lastSessionId to it, rather than destructively rewriting the prior completed turn checkpoint in place. Keeping the pre-compaction checkpoint available makes the fork auditable and avoids corrupting timeout/auth recovery state.
7. Keep Conversation State Bounded Too
Keep visible conversation-state compaction bounded for subscribed routing, thinking selection, and no-Pi-history prompt background. It can continue to retain bounded chunk summaries plus recent visible messages as long as it does not grow without bound.
8. Improve Token Accounting
Track server-reported input tokens for the last model call or max single call where available, and use character estimates only as fallback. Avoid using cumulative turn usage as the trigger by itself because multi-step/tool-heavy turns can overcount across multiple model calls.
9. Defer Mid-Turn Compaction
Do not implement Codex's mid-turn compaction first. It requires special initial-context injection before the last real user message and must preserve Pi continue() semantics. Pre-turn compaction covers the main Junior risk with much lower recovery complexity.
Prior Art
- Codex local compaction:
codex-rs/core/src/compact.rs
- full history replacement
- handoff summary prompt
- recent user-message retention up to 20K tokens
- automatic and internally callable triggers
- optional initial-context injection for mid-turn compaction
Codex should be treated as algorithmic prior art only. Junior does not have, and should not depend on, a remote/server-side compaction endpoint. The summarization, retained-message selection, replacement-history construction, persistence, triggers, and verification all need to be implemented inside Junior.
Implementation Plan
- Add/update the relevant spec to state that compaction owns durable Pi history, while Slack conversation state remains the visible thread/routing source.
- Add a
chat/services/context-compaction.ts compactor for message selection, token estimation, summary prompting, and replacement-history construction.
- Add persistence support to create a new completed compaction checkpoint/session from replacement Pi history and point
conversation.processing.lastSessionId at it.
- Wire automatic pre-turn compaction in
reply-executor.ts after Pi history load and before agent execution.
- Update
conversation-memory.ts thresholds to derive from model context windows instead of a fixed 9K/7K window.
- Add an internal compaction command for future orchestration flows; do not add user-facing Slack commands or model tools.
- Add focused unit tests for replacement selection/truncation and checkpoint forking.
- Add integration coverage proving a long Slack thread uses compacted Pi history on the next turn.
- Add an eval for long-thread continuity after compaction if the expected behavior depends on model interpretation.
Scope Notes
- Remote/server-side compaction is not available for Junior and should not be part of this design.
- User-facing/manual compaction is out of scope.
- Pre/post compact hooks are optional and can be deferred.
- Mid-turn compaction should be deferred until pre-turn replacement is working and verified.
- This should not change timeout/auth checkpoint resume semantics.
Action taken on behalf of David Cramer.
Junior's current visible-thread compaction accumulates structured XML summaries alongside a trimmed live transcript. Codex uses a full history-replacement model that is simpler to reason about, avoids unbounded summary accumulation, and better preserves recent user context across long threads.
Important correction: in Junior,
conversation-memory.tsis not the only context-growth surface. After a completed agent turn, the next normal turn primarily seeds the model from durable Pi checkpoint history, not fromconversation.compactions.Current Behavior
packages/junior/src/chat/services/conversation-memory.tscompacts persisted Slack conversation state pre-turn:<compaction>inside<thread-compactions>, then trimconversation.messagesfrom the frontbuildConversationContext(...)renders the compacted Slack transcript for routing/thinking and for initial prompt background when no Pi history exists.reply-executor.tsusually loads durable Pi messages from the active or last completed turn checkpoint withloadPiMessagesForTurn(...).respond.tssuppresses<thread-background>whencontext.piMessagesexists, then seedsagent.state.messagesdirectly from those Pi messages.That means current conversation-memory compaction can shrink routing/thread-background context while Pi history can still grow without a Codex-style replacement step.
Gap
conversation-memory.ts.Proposed Direction
Implement Codex-style history replacement for Junior's durable Pi history, and keep conversation-memory compaction as the secondary visible-thread/routing compaction surface.
1. Compact Durable Pi History
Add a local context compactor that operates on
PiMessage[]from a completed turn checkpoint. It should summarize current history, then install a bounded replacement history instead of appending more summary records.Replacement shape:
Do not include stale
<runtime-turn-context>parts, raw image/base64 payloads, or unbounded tool output in the retained messages.2. Handoff-Oriented Summary Prompt
Switch from XML-section summaries to a concise handoff prompt modeled on Codex:
The summary should be easy for the resumed agent to consume as narrative context, not a nested compaction log.
3. Preserve Recent User Messages
Retain recent user-authored messages verbatim up to a budget. Codex's local compaction uses 20K tokens. Junior should start with a local constant and tune from observed usage.
4. Trigger Before Fresh Turns
Run automatic compaction after
loadPiMessagesForTurn(...)has loaded reusable history and beforegenerateAssistantReply(...)receivespiMessages.Do not compact an
awaiting_resumecheckpoint. Timeout/auth resume depends on safe Picontinue()boundaries and should keep exact checkpoint history until the turn completes.5. Derive Thresholds From Model Context Windows
Do not keep a hardcoded 9K-token threshold. Automatic Pi-history compaction should derive its threshold from the active agent model's advertised context window and reserve output headroom. Visible conversation-state compaction should use the same budget rule against the fast model.
AI_MODEL_CONTEXT_WINDOW_TOKENSandAI_FAST_MODEL_CONTEXT_WINDOW_TOKENSshould override advertised context windows when provider metadata is missing, stale, or intentionally constrained.6. Internal Context Forking
No Slack message, slash command, or model tool should invoke compaction directly. Manual/user-facing compaction is out of scope.
Junior should expose an internal compaction command for future orchestration strategies, such as handing off to a coding agent or upgrading the model. That internal command should take the current reusable completed Pi history plus visible thread state, summarize/compact it, create a fresh compacted model history, and make that compacted history the next reusable context for the thread.
Implementation should create a new synthetic completed compaction checkpoint/session and update
conversation.processing.lastSessionIdto it, rather than destructively rewriting the prior completed turn checkpoint in place. Keeping the pre-compaction checkpoint available makes the fork auditable and avoids corrupting timeout/auth recovery state.7. Keep Conversation State Bounded Too
Keep visible conversation-state compaction bounded for subscribed routing, thinking selection, and no-Pi-history prompt background. It can continue to retain bounded chunk summaries plus recent visible messages as long as it does not grow without bound.
8. Improve Token Accounting
Track server-reported input tokens for the last model call or max single call where available, and use character estimates only as fallback. Avoid using cumulative turn usage as the trigger by itself because multi-step/tool-heavy turns can overcount across multiple model calls.
9. Defer Mid-Turn Compaction
Do not implement Codex's mid-turn compaction first. It requires special initial-context injection before the last real user message and must preserve Pi
continue()semantics. Pre-turn compaction covers the main Junior risk with much lower recovery complexity.Prior Art
codex-rs/core/src/compact.rsCodex should be treated as algorithmic prior art only. Junior does not have, and should not depend on, a remote/server-side compaction endpoint. The summarization, retained-message selection, replacement-history construction, persistence, triggers, and verification all need to be implemented inside Junior.
Implementation Plan
chat/services/context-compaction.tscompactor for message selection, token estimation, summary prompting, and replacement-history construction.conversation.processing.lastSessionIdat it.reply-executor.tsafter Pi history load and before agent execution.conversation-memory.tsthresholds to derive from model context windows instead of a fixed 9K/7K window.Scope Notes
Action taken on behalf of David Cramer.