A memorial chatbot framework that emulates a loved one's texting voice from curated iMessage history. Uses RAG (ChromaDB) over curated episodes and conversation chunks, with an LLM generating responses in the persona's style.
The repo ships with example persona ids (babybearbot, davidbot) and template soul / persona files. Replace them with your own content (see PERSONA_SETUP.md). If you are the original operator of this fork, restore prompts and YAML from LOCAL_PERSONA_NOTES.md (gitignored; create from your private backup if missing).
iMessage corpus --> pipeline/ --> workbench/<persona>/ --> bot/
(raw .txt) (parse, (analytics DB, (chunker, indexer,
candidates, candidates, retriever, prompt
stats) style_notes, builder, Telegram)
biography,
eval/)
Runtime content lives in bot/content/<persona>/:
soul.md— persona identity and boundariesmemory.md— curated facts (tagged by source), when usedstyle_examples.md— style examples for the LLM, when used
The PERSONA environment variable (default in this tree: davidbot) selects which persona's content, data, and episodes are used at every layer. Set PERSONA=babybearbot for the other example persona.
# 1. Parse corpus (optional — requires your own export under iMessageCorpusBeforeCuration/)
cd pipeline && make parse
# 2. Generate analytics and candidates
make analyst-all
# 3. Build bot indexes
cd ../bot
python3 chunker.py
python3 indexer.py --reindex --collection both
# 4. Configure and run
cp .env.example .env # fill in TELEGRAM_TOKEN, ANTHROPIC_API_KEY, CONV_FILTER as needed
python3 bot.py
# 5. Evaluate (example persona id)
python3 ../pipeline/src/evaluate.py --persona babybearbot
python3 eval_runner.py --limit 5 --delay 2
python3 eval_scorer.py- bot/ — Runtime: Telegram bot, retriever, prompt builder, content helpers
- bot/content/<persona>/ — Content consumed by the bot at runtime
- pipeline/ — Corpus parsing, candidate generation, analytics, evaluation
- workbench/<persona>/ — Analytics DB, candidates, stats, eval (generated CSV/JSON often gitignored)
- staging/<persona>/ — Curated memory episodes
- tools/ — Utilities (gap finder, export scripts)
- archive/openclaw/ — Legacy deployment scripts (preserved, not active)
The eval harness generates test cases from the corpus, runs the bot headless, and scores outputs (style, grounding, fabrication, AI leak). See pipeline/ and bot/eval_*.py for details.
See PERSONA_SETUP.md.
Some files under workbench/*/eval/experiments/ may still contain verbatim chat excerpts copied from a private corpus (used as few-shot or length-rule examples). Before publishing a repo broadly, review those YAMLs or exclude them from the remote. Markdown score reports under workbench/*/eval/results/ are gitignored by default because they can embed message text.