diff --git a/README.md b/README.md index 89fd4a8..ffa8a59 100644 --- a/README.md +++ b/README.md @@ -6,6 +6,8 @@ **Your self-hosted [StackChan](https://github.com/m5stack/StackChan) robot assistant — kid-safe by default, hackable by design, private by architecture.** +> 🤖 **AI-assisted project.** Most of the code and nearly all of the docs in this repo were written by AI agents (primarily Claude Code) under my direction. I've been coding professionally for 15+ years; Dotty is one of a few side projects I'm using to learn the current generation of AI/LLM tooling first-hand — what the tools do well, where they break, and how to drive them. Feedback on the output very welcome. + > ⚠️ **Heads up: this is not a stable project yet.** Dotty is buggy, frequently broken, and actively changing day-to-day. End-to-end behaviour works on the maintainer's hardware but regressions land all the time, the API and config surface shifts without notice, and a fresh deploy on someone else's gear has not been verified. Treat this as a hobby-grade work-in-progress, not a polished product. Bugs, PRs, and "this didn't work for me" issues all very welcome. 🍺☕ If you do try a fresh end-to-end deploy, please get in touch — I'll buy you a beer or a coffee. The best place to ask questions, get help, or show off a build is the [Dotty community Discord](https://discord.gg/7sKE5c6A). > > **Known rough edges:** face emoji rendering is missing visual differentiation for 4 of 9 emotions (sad / surprise / love / laughing); sound-direction localizer has a hardware-AEC-related left-bias on M5Stack CoreS3 (energy detection works, direction is unreliable); kid-voice ASR accuracy on SenseVoice has a kid-speech gap that whisper.cpp will close in a follow-up. diff --git a/dotty-behaviour/README.md b/dotty-behaviour/README.md index 399d735..8fc6931 100644 --- a/dotty-behaviour/README.md +++ b/dotty-behaviour/README.md @@ -35,6 +35,23 @@ ssh root@ ' ' ``` +### Vision-key env var (issue #15) + +Photo intents need an OpenAI-compatible API key for the VLM call. The +compose file picks up any of these from the shell that runs +`docker compose up`: + +- `VLM_API_KEY` (preferred) +- `VISION_API_KEY` (fallback) +- `OPENROUTER_API_KEY` (fallback of fallback) + +If none are set the container still starts, but `dispatch/vlm.py` +returns the `VLM_OFFLINE_SENTINEL` string for every photo intent so +the downstream LLM is told the camera is unavailable rather than +confabulating a description. Set the key in the host shell before +`docker compose up`, or pass `--env-file ` at a `.env` that +contains it. + ## Why a separate container The bridge was a separate process on the RPi for the whole life of diff --git a/dotty-behaviour/docker-compose.yml b/dotty-behaviour/docker-compose.yml index 7b0d419..38cf3af 100644 --- a/dotty-behaviour/docker-compose.yml +++ b/dotty-behaviour/docker-compose.yml @@ -39,6 +39,17 @@ services: - DOTTY_STATE_DIR=/var/lib/dotty-behaviour/state - HOUSEHOLD_YAML_PATH=/var/lib/dotty-behaviour/state/household.yaml - GREETER_STATE_PATH=/var/lib/dotty-behaviour/state/greeter_state.json + # Vision-language model credentials (issue #15). Required for + # photo intents — without these the container falls through to + # the VLM_OFFLINE_SENTINEL contract in dispatch/vlm.py and the + # downstream LLM is told the camera is unavailable rather than + # confabulating a description. Resolved in fallback order: + # VLM_API_KEY → VISION_API_KEY → OPENROUTER_API_KEY. Set any one + # in your shell or in .env next to this compose file. + - OPENROUTER_API_KEY=${OPENROUTER_API_KEY:-} + - VISION_API_KEY=${VISION_API_KEY:-} + - VLM_API_KEY=${VLM_API_KEY:-} + - AUDIO_CAPTION_API_KEY=${AUDIO_CAPTION_API_KEY:-} volumes: - /mnt/user/appdata/dotty-behaviour/state:/var/lib/dotty-behaviour/state - /mnt/user/appdata/dotty-behaviour/logs:/var/lib/dotty-behaviour/logs