Skip to content

UX v2 · Temperature + Operations/Tactics + Operate (bottom-cam→SPIM) suite#72

Open
pskeshu wants to merge 197 commits into
developmentfrom
feature/temperature-operations-all
Open

UX v2 · Temperature + Operations/Tactics + Operate (bottom-cam→SPIM) suite#72
pskeshu wants to merge 197 commits into
developmentfrom
feature/temperature-operations-all

Conversation

@pskeshu

@pskeshu pskeshu commented Jun 29, 2026

Copy link
Copy Markdown
Collaborator

UX v2 — Temperature + Operations/Tactics + the Operate surface

This is the consolidated PR for the temperature-experiment + Operations suite and the new device-tab Operate surface. Everything lives on feature/temperature-operations-all; base retargeted to development.

It supersedes and replaces the earlier stacked series (now closed): #58 (ux2 umbrella), #71 (A), #64 (B1), #75 (B2), #65 (C), #73 (D), #68 (D2), #69 (F), #74 (G). Those increments' head branches were deleted; their work is included here.

What's in it

  • A — Temperature interface: setpoint persistence + live temperature graph.
  • B1/B2 — Manual mode: lightsheet brightfield live view + acquire-safety; dual-camera + laser-preset browser + timelapse form.
  • C — Temperature-change burst protocol (a tactic).
  • D/D2 — Operations: the agent-authored Operation Plan (tactics language) + embryo roles/strain + multi-embryo observability (the spine tactic cards: rationale/scope/structure/relations).
  • G — Tactics library: save/reuse typed tactics.
  • F — Session ↔ plan link/delink.
  • Operate view ("The Operator Spine") — a redesigned single device-tab surface for the bottom-cam → SPIM workflow: focus → mark-all (positions-only) → per-embryo center → lower SPIM (fenced F-drive) → focus → acquire, with a stepper, dish mini-map, worklist, and progressive-disclosure rail.
  • Operate → tactics/timelapse integration ("Phase C: Run"): after marking, a Run chooser hands the marked set to imaging — Manual, Adaptive timelapse, From-library, Continue-a-plan, or Hand-to-agent — every mode emitting one tactic scoped to the marked embryos. Keystone: a deterministic Tactic Executor (first caller of resolve_scope_embryos), plus roles/operation-plan/run-tactic/timelapse routes and an in-Operate run-spine.
  • Data flywheels: marking persists localization labels; manual focus logs focus-validation traces; an offline focus-validation module.

Verification

Unit tests green (tactic executor, focus validation); ruff clean; UI flows driven in-browser (hardware-free shim). An adversarial code-review pass found 8 issues — all fixed.

Rig-only (unverified here): real stage motion + acquisition, SAM on a live frame, SPIM focus, live-session timelapse start. RIG-NOTE: the bottom-cam focus axis is assumed ZStage:Z:32 (50–250 µm) — confirm on the rig.

🤖 Consolidated with Claude Opus 4.8 (1M context).

pskeshu and others added 30 commits June 16, 2026 04:18
The header pill, the home landing line, and the agent dock each computed
connection state from their own signal at their own time. home.js read
state.connected exactly once at tab init — before the /ws handshake — and
never corrected, so the landing showed "Offline — start the agent to
connect" while the header pill showed "Online" and state.connected was true.

Add a single sticky ConnectionStatus store (status-store.js) holding three
distinct signals — gentlyConnected (/ws), microscopeConnected
(/api/device-status poll), agentConnected (/ws/agent) — which replays its
current snapshot to every new subscriber, so a late subscriber can never
miss the initial state (the root of the bug). All three surfaces now read
from / write to this store:
- websocket.js onopen/onclose -> setGently (via updateGentlyStatus)
- app.js fetchDeviceStatus -> setMicroscope; header renders via subscriber
- home.js updateStatus reads the store and re-renders on every change
- agent-chat.js setConn -> setAgent

Verified live: after reload the home line and header pill agree (no more
"Offline while Online"); no console errors.

Bug #3 (idle event-count inflation) needs no code change: the high-frequency
telemetry (DEVICE_STATE_UPDATE/BOTTOM_CAMERA_FRAME) is already excluded from
the events table + count at websocket.js, and idle measurement showed the
count is calm and dominated by LOG_RECORD.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds UISettings.ux_v2 (env GENTLY_UX_V2, default off) and threads it into
the index.html template context from pages.py. This is the coexistence
switch for the agent-first UX: the v2 markup/JS will mount only under this
flag, so the v1 dashboard stays the default and prod is unaffected while
the migration soaks behind the flag.

No behaviour change yet (flag off by default; nothing reads it client-side
until the Phase 1 dual-render lands).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The agent's structured asks (choice_request / choice_response over /ws/agent)
already ARE the one-payload protocol. This makes the SAME ask render both in
the chat transcript and prominently on a new main-stage surface (#ask-stage),
behind GENTLY_UX_V2 — the foundation for the agent-first paradigm.

Frontend only; no wire-protocol or backend change. The double-answer and
turn-wedge concerns are handled without touching the server:
- agent-chat.js renderChoice is factored into a pure buildAskCard(data, {reqId,
  isWake, hasControl, onPick}) reused by both surfaces; exported alongside
  answerChoice + a hasControl getter.
- A module-level answeredAsks Set keyed by request_id makes answering idempotent
  across both surfaces (only ONE choice_response is ever sent), so the existing
  holder-gate + _choice_futures.pop on the server stay correct and never see a
  duplicate.
- The CLEAR signal fires off the CHOICE lifecycle: answerChoice emits
  ASK_CLEARED{request_id} the instant a response is sent (NOT stream_end, which
  lands after the answer for in-turn asks and never for a cancelled turn). Both
  surfaces clear on it; '*' clears all on cancel/error/socket-drop.
- Read-only when !hasControl on both surfaces (observers can't answer), matching
  the server's holder gate — no dismiss-without-answer path, so asend can't wedge.
- Adds the free-text "Something else…" escape the web cards lacked (the bridge
  routes unknown selections to LLM resolution).

ask-stage.js (new) renders the current ask into #ask-stage via AgentChat.buildAskCard
and clears on ASK_CLEARED; no-ops unless #ask-stage is present (flag off → v1
untouched). Verified: node --check on all touched JS, Jinja parse on index.html.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the flat 8-tab bar with a calm grouped left rail (Now / Library /
System) and adds a session-context strip at the top of the main area — the
structural transformation toward the prototype's shell. All scoped under
body.ux-v2; v1 markup/CSS untouched (no consolidation of the known duplicate
.tab rulesets — deferred to the final cleanup phase).

- shell.js (new): wires each rail item to switchTab(tabId) — it ROUTES through
  the single init chokepoint, never reimplements tab activation, so every tab's
  lazy-init side-effect still fires. Keeps the rail's active state in sync via a
  new TAB_CHANGED event; populates the strip's status/embryo count from the
  Phase 0 ConnectionStatus store. Wires the rail's "Talk to Gently" to the
  existing AgentChat dock. No-ops unless body.ux-v2 is present.
- app.js: switchTab now emits TAB_CHANGED(tabName) — additive; v1 has no
  listener, so no behaviour change.
- index.html: the rail (first child of the flex-row .app-shell) + the strip
  (top of .app-main) + shell.css/shell.js includes, all under {% if ux_v2 %}.
- shell.css (new): rail + strip styling and a subtle unfold animation, every
  rule scoped under body.ux-v2.

Deferred to keep this phase low-risk: History-API routing and the
session_changed in-place re-hydration (current hash routing + reload still
work). Verified: node --check on all touched JS, Jinja parse on index.html.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…nce)

Flips plan mode from ASK-FIRST to INFERENCE-FIRST: the agent arrives with a
draft instead of interrogating the researcher. Per Keshu's call, the genotype
-> imaging-channel inference is done by the MODEL (it reads the reporters and
knows fluorophore spectra) — NOT a hardcoded heuristic table, which would only
cover a fraction of real fluorophores/dyes and force needless "asks".

- prompt.py: the stance now says infer what you can (channels from the strain
  genotype via your own fluorophore knowledge, organism defaults, lab/campaign
  context), record each inferred value's source + confidence in the spec's
  provenance, state a wavelength only when confident (else mark low-confidence
  and confirm via ask_user_choice — never guess a number), and ask ONLY for
  genuine gaps / low-confidence / consequential choices.
- model.py: ImagingSpec gains a `provenance` map (field -> {source, confidence}).
  It's a valid dataclass field, so it flows end-to-end with no extra plumbing:
  the model passes it in create_plan_item(spec=...), the store rebuilds it via
  ImagingSpec(**valid-fields), and it round-trips through serialization.

Backend only; needs a server restart to load the new prompt. The actual
inference behaviour is validated live in plan mode (see handoff).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…in the UI

Makes Phase 3a's inference visible. renderSpec now shows the channel
(laser_wavelength_nm) and reporter/genotype rows — which it omitted before —
and tags each value with its provenance ("561 nm · inferred · medium") read
from spec.provenance, so the researcher can see what was inferred vs. cited and
what to confirm.

- agent-chat.js renderSpec: keyed rows (label, value, fieldKey) with a small
  source/confidence tag per row when spec.provenance carries that field; adds
  Genotype/Reporter/Channel rows.
- bridge.py: the spec payload builder now includes genotype/reporter/
  laser_wavelength_nm and the provenance map (was a curated subset that dropped
  the channel), so the UI has the data to render.
- ask-stage.css: styling for the .ac-spec-src provenance tag.

Frontend + a contained backend payload enrichment; node --check + bridge import
verified. Note: this surfaces provenance wherever the spec panel is shown and in
the plan document (which already serializes provenance); threading it through
every spec-emission path (e.g. the apply_plan_acquisition_spec stash) and a full
plan_confirm ask with inline edit/confirm is the remaining 3b polish.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…d, live)

Renders the agent's expectations (beliefs), watchpoints (attention), and open
questions (uncertainty) as a calm panel on the landing, updating live and
resolvable by the control holder.

The "store has no event bus" blocker is solved via the EXISTING global bus
rather than dependency injection (matches agent.py's `emit` usage, so no
__init__/launch_gently changes):
- core/event_bus.py: new EventType.CONTEXT_UPDATED.
- file_store.py: FileContextStore._notify_context_change() emits it (lazy import,
  best-effort) from add/resolve of expectations, watchpoints, and questions. The
  server already broadcasts ALL bus events to /ws (subscribe_async("*")), and
  websocket.js re-emits them on ClientEventBus — so the surface refreshes live
  with zero new transport. Verified: the emit fires on the bus (unit check).
- routes/context.py (new, registered): GET /api/context (read the 3 lenses,
  defensive on cold start) + POST .../{id}/resolve for questions/watchpoints/
  expectations, each gated by Depends(require_control) (data.py pattern, NOT the
  mesh-scoped campaigns auth) so viewers can't mutate the agent's mind. Read side
  reuses campaigns._serialize.
- context-surface.js (new, ux_v2 only): fetches /api/context, renders the three
  lenses, re-fetches on CONTEXT_UPDATED + AGENT_CONTROL, and lets the holder
  answer a question (inline input — no native prompt), resolve a watchpoint, or
  confirm an expectation. shell.css: scoped styling. index.html: panel mounted at
  the top of the home landing under {% if ux_v2 %}.

Backend needs a server restart to load; live push + render is validated in-app.
Proactive #ask-stage cards from watchpoint creation are the remaining Phase-4
polish (noted).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The context surface hid itself entirely when the agent had no expectations/
watchpoints/questions — so on a fresh session it was invisible and read as
"missing". Render a calm "nothing yet" empty-state instead, so the surface is
discoverable before the agent has formed any beliefs. Static JS/CSS only —
hard-reload to pick it up, no restart.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
When the viz server can't bind its port, the error now tells you how to free
it (fuser -k <port>/tcp, or lsof -ti | xargs kill) instead of just "close it
first" — uses self.port so it's always the right port. (Standalone DX fix.)

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… data

The Experiment view's job is to show the live experimental TACTIC patterns
(cadence: base/fast/burst/cooldown + reactive-monitoring rules). It was
falling back to a ~130-line STUB_STRATEGY (and a "mockup · stubbed data" badge)
whenever there was no active experiment or the fetch wasn't ready — i.e.
production could render fake tactics.

- Removed the STUB_STRATEGY const entirely.
- loadStrategy() now returns null on non-OK / error (no stub fallback).
- render(null) shows a calm empty state ("No active experiment — the imaging
  tactics will appear here once a run is live"), never fabricated data.
- Removed both "mockup · stubbed data" badges; the header just shows "live".

Affects v1 and v2 (the Experiment tab isn't flag-gated) — removing fake data
is correct for both; it only changes the no-active-experiment case (stub →
empty state), real live runs render as before. node --check clean.

NOT done (deliberately): the large carve of a new per-embryo renderer out of
the 4,556-line embryos.js — the existing ExperimentOverview already renders the
tactic patterns from /api/experiments/current/strategy, and the detailed
contents are yours to define. The reconcileWithServerState/clearAllState
contract in embryos.js is left untouched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
v2 is now the default UI — no env var needed. The v1 dashboard stays reachable
as a fallback via GENTLY_UX_V2=0 (and its markup is NOT deleted yet). This is
the reversible "flip → soak" step; the irreversible v1 markup/CSS deletion is
deferred until v2 has run as the default and is confirmed good.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add the entry landing the prototype sketched — agent orb, time-aware greeting, and choice cards (Plan / quick look / free-text escape), behind GENTLY_UX_V2 and receding into the workspace.

Crucially, the plan dialogue renders IN the landing, not the chat REPL: 'Plan an experiment' switches to an in-place plan-wizard screen, enters /plan, and renders the agent's ask_user_choice questions as button cards there (reusing AgentChat.buildAskCard), with the plan assembling from each pick. agent-chat.js: runCommand is now connection-aware (connect + queue/flush on open) so the page drives the agent without opening the chat panel.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Self-contained landing prototype (ux-prototype/landing.html) and the 8-phase migration plan it was built from. A sketch space, kept separate from the live frontend.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Convert hatching.py and the verifier's four challenger strategies from JSON-in-prose / startswith-scraping with silent defaults to forced tool_choice — the verdict arrives as a validated dict on the tool_use block. Deletes the regex/parse layer; downstream vote-tally/consensus is untouched.

Also drop self-rated confidence from these schemas (a heuristics-era artifact — the boolean/categorical judgment is the signal); the ensemble's derived agreement ratio stays. docs/HEURISTICS-AUDIT.md ranks the remaining candidates and the keep-deterministic boundary.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…ight bind

uvicorn binds with SO_REUSEADDR; the preflight check did not, making it stricter than the server it guards — a just-exited instance leaves client sockets in TIME_WAIT that fail a bare bind() even though uvicorn would bind fine, so quick restarts hit 'port in use' repeatedly. Set SO_REUSEADDR on the preflight so it fails only on a genuine live listener.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
On Linux the Windows default GENTLY_STORAGE_PATH (D:\Gently3) gets created literally as ./D:/ under the repo, full of logs/sessions. Ignore it so it stops cluttering status and can't be committed by accident.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Polish + flow fixes for the agent-first landing and in-page plan wizard.

- Kill the welcome→plan "lift-up" lurch: top-anchor both screens to one
  shared offset so the orb no longer teleports ~140px on swap (now ~1px),
  with a single coordinated cross-fade keyframe.
- Fix broken dark mode: define the tokens landing.css relied on but the
  theme never set (--bg, --text-secondary, --accent-soft, --accent-green-soft),
  scoped to body.ux-v2 with a light override; route the page background, drift
  glow, and accent-keyed shadows through real per-theme tokens.
- A11y + polish: visible :focus-visible rings, animated tool-card reveal
  (grid-rows), feed scrolls internally with anchored header/footer, single-
  column mobile without a nested-scroll trap, consistent type scale + 4px
  spacing, ~40px touch targets, aria-expanded on disclosures, expanded
  prefers-reduced-motion coverage.
- Entry flow: under ux_v2 the landing owns session entry, so suppress the
  legacy connect-time resolution picker server-side (it duplicated and
  contradicted the landing's Plan/Standalone choice). Guard the design
  kickoff so it fires once per session (no Back/forward pile-up).
- "Plan an experiment" now offers continue-vs-fresh when an active campaign
  exists: continue it (default) or start a brand-new campaign from scratch.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
agent-chat.js mirrors the agent stream onto a new AGENT_ACTIVITY event (turn/thinking/text/tool_start/tool_result/turn_end/error) so the plan wizard can render the agent's work as collapsible tool cards instead of leaving it in the chat. Replaces the inline-only mdToHtml with a block-aware, escape-first GFM renderer (headings, pipe tables, lists, fenced code, links — XSS-safe and streaming-safe) and exports it; agent-chat.css styles the ac-md-* output for both the chat and the wizard feed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…allback

settings.py tiers: main→claude-fable-5, perception+medium→claude-opus-4-8, fast→claude-sonnet-4-6, plus a refusal_fallback (claude-opus-4-8).

Strip the params the new models reject: thinking budget_tokens → output_config.effort (conversation.py, sam_detection.py); drop the obsolete interleaved-thinking beta header (agent.py). conversation.py: a main-tier 400 (e.g. Fable 5 under <30-day org data retention) OR a stop_reason='refusal' transparently retries the turn on Opus 4.8 in both the streaming and non-streaming paths; get_tool_call guards empty refusal content; dopaminergic detector guarded. chat.py model centralized to settings.

Quiet log noise: the per-response diagnostic WARNING→DEBUG (it fired on every tool-use turn), and the benign send-after-close websocket WARNING→DEBUG.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Rebasing feature/ux-v2 onto development (now carrying #47's ruff/format
gate) left the model-migration and detector-rewrite files in their
pre-#47 formatting. Run ruff --fix + ruff format and fix the violations
that aren't auto-fixable:

- model.py: provenance annotation used unimported `Dict` (F821, would
  NameError at import) -> `dict[str, dict[str, str]]` (PEP 585).
- sam_detection._detect_with_sam: returned undefined `image_8bit`
  (F821) -> `image_rgb`, the 8-bit RGB image computed at the top.
- agent.py / sam_detection.py: moved the `logger = ...` assignment below
  the import block to clear E402 (import execution order unchanged).
- verifier.py / conversation.py: wrapped long log/summary f-strings and
  tool-schema descriptions; reflowed prompt prose (content preserved) to
  satisfy E501.

ruff check . and ruff format --check . both pass (lint.yml CI gate).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Fable 5 was declining benign planning turns (stop_reason="refusal"),
which tripped the refusal fallback on essentially every turn — adding a
full extra round-trip of latency per message. Point MODEL_MAIN at Opus
4.8 so the common path is a single call; refusal_fallback is now inert
(the guard skips it when fallback == main). Set MODEL_MAIN=claude-fable-5
to switch back once Fable 5 stops refusing. Prune the now-stale Fable-5
notes from the tier docstring.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Stream a bounded result_full alongside the 140-char result_summary so the
web UI's expandable tool card can show what a tool actually returned, not
just the one-line preview. Frontend (landing.js/css) renders the expandable
card; the thinking indicator's label is wrapped in a span so it can be
updated live.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Live digital-twin of the addressable imaging volume: the acquisition cuboid
+ light-sheet plane, driven by a new SCAN_GEOMETRY_UPDATE backend signal
(emitted from acquire_volume, bootstrapped via /api/devices/scan_geometry)
and live DEVICE_STATE_UPDATE positions. Sits as the Map / Details / 3D view
switcher inside the Devices tab (not a top-level nav tab). Includes a demo
driver for offline development.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Flow/IA audit (not visual) from a live click-audit with the agent on, plus
code cross-check. Prioritized findings + fixes: P0 loading-state legibility
(stream thinking summary — set thinking display:summarized + handle
thinking_delta), first-character truncation in the plan feed, control/auth
wall hidden in chat, double ask-mount, ASK_CLEARED never emitted, no path
back to welcome, non-stateful routing, resume hard-reload, and the
workspace-IA placement of the 3D optical-space view.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The streaming path collected the entire turn before yielding anything, so the
plan wizard sat on a static "working…" spinner for the whole turn (~90s). And
the streamed call requested no thinking, so there was no reasoning to show even
if it had streamed.

- conversation.py: rewrite call_claude_stream to stream live — a worker thread
  drains the SDK stream and pushes events onto an asyncio queue as they arrive;
  the coroutine yields text/thinking deltas in real time. Enable adaptive
  thinking with display="summarized" (+ effort=medium) and emit thinking_delta
  as {"type":"thinking"} chunks. Full assistant content (incl. thinking blocks)
  is still replayed from final_message, so the tool loop stays valid. Retry /
  400-fallback / refusal-fallback preserved (clean while nothing's been yielded).
- agent-chat.js: forward the thinking text on the 'thinking' activity.
- landing.js: render streamed reasoning as a dim block in the plan feed and add
  an elapsed-time counter to the thinking indicator so a long think reads as
  progress, not a hang.

Verified live (Opus 4.8): reasoning + prose stream into the plan feed during the
turn with a ticking timer; no console errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… strings

The model often serializes nested tool args (spec, references) as JSON strings
instead of objects. create_plan_item stored the raw string, and read-back via
_dict_to_plan_item did spec_data.items() on a str → AttributeError, leaving a
malformed, unreadable plan item persisted.

- planning.py: _coerce_plan_args() parses string spec/references (and int-casts
  estimated_days) in both create_plan_item and update_plan_item before storing.
- file_store.py: _dict_to_plan_item tolerates spec/references persisted as JSON
  strings (parse on read; fall back cleanly on garbage), so existing bad items
  load instead of crashing.

Verified: string spec/refs hydrate to ImagingSpec/list; malformed spec → None
(no crash).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rompt

The plan-mode prompt specified what to design in detail but nothing about how to
write to the user, so (with Opus 4.8's stronger narration) the agent produced
dense questions, paragraph-long ask_user_choice options, and over-explained
prose — cognitively heavy for a working biologist. Add an explicit
communication-style section: lead with the ask/finding, short questions + short
options, plain words over process jargon, one-clause rationale (full reasoning
goes to provenance/references), one idea per message.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
A turn's tool calls were always awaited one-by-one, so several independent
lookups (e.g. search_strains for dlg-1 then ajm-1) ran serially even when the
model could issue them together. Add a concurrency fast-path: when EVERY tool in
a turn is non-hardware (requires_microscope=False) and non-interactive (not
ask_user_choice), fire their tool_start events, asyncio.gather the executions,
then emit results. Any microscope action or ask_user_choice in the batch falls
back to the existing serial path, so hardware is never raced and interactive
prompts/stateful ordering are preserved.

Also nudge the plan-mode prompt to batch independent lookups into one turn so the
model actually produces parallelizable tool calls.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
phase_number arrived as "1" (stringified, like spec/references), so
get_nth_subcampaign did `1 <= "1"` → TypeError. Coerce phase_number/phase_order
to int in the create tool, and make get_nth_subcampaign tolerant of a numeric
string.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
pskeshu and others added 8 commits June 30, 2026 19:21
- Device layer: _log_focus_trace appends (t, source, focus_score, bottom_z,
  fdrive, piezo) to {session}/focus_traces.jsonl on every broadcast frame —
  the operator's manual focusing is captured passively (no autonomous Z moves)
  as autofocus-validation data. Best-effort, off the broadcast hot path.
- gently/analysis/focus_validation.py: offline replay — load traces, segment
  sweeps, compare each focus metric's argmax to the human's resting Z, report
  error stats + interior-peak/contrast quality (a flat curve = no safe autofocus).
- 8 unit tests (synthetic Gaussian sweeps) — all green.

Earns autofocus safety offline before any Z move near the objective.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e workflow

Replaces the three always-on ~340px cards (cluttered, no visible logic) with a
single height-filling four-region surface, from an expert-driven redesign
(diSPIM-ops + microscopy-UX + C. elegans-bio + critique → 3 candidates → synthesis):

- Thin HEADER: a real phase stepper (Survey: Focus→Mark │ Acquire: Center→Lower→
  Focus→Acquire) giving 'you are here', + an always-on safety status strip
  (HEAD/floor/LED/LASER/live-cam) that persists even when controls are hidden.
- LEFT SPINE: a read-only dish mini-map (state-colored pins + stage crosshair) +
  the embryo worklist board (per-row 4-node progress track), the loop's spine.
- CENTER: ONE live viewport whose camera source swaps per step, with decision
  instruments overlaid (markers / centre reticle+FOV box / floor gauge).
- RIGHT RAIL: renders ONLY the active step's controls (progressive disclosure);
  a single renderStep() drives header+spine+viewport+rail from one state so they
  can't disagree. Exactly one camera + one step live at a time.

Bug fixes from the critique: (1) 'focused' is earned at the SPIM-focus step, not
on a stray F-drive nudge (new 'lowering' state); (2) LED force-closed on
step-leave and view-leave (no more leak); (3) down-nudges auto-grey near the
floor and XY centering is blocked while the head is lowered.

Pure IA/reveal/gating over existing endpoints/streams/SSOT — no backend change.
Verified end-to-end in-browser (focus→mark→confirm→per-embryo center/lower/focus/
acquire→retract); stepper, status, board, and mini-map all track in lockstep.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ves only in Operate

Finishes the migration: the Map view keeps its read-only embryo dots; the
Detect / mark / per-embryo Center panel (an interim home) is removed now that the
Operate view owns that workflow. Drops the detect panel markup, its OperateManager-
duplicating JS in devices.js (renderEmbryoListPanel/runDetection/centerOnEmbryo/
removeEmbryo/setupDetectWiring), and the orphaned CSS; keeps the top-right rail +
XY readout.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Expert Opus workflow study + synthesis. Decisions: build all 3 phases; adaptive
default monitoring = idle; live run monitored in-Operate (rail flips to run-spine).
Tactics are the unifying object; new keystone = a deterministic Tactic Executor
(first caller of resolve_scope_embryos).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
After Confirm, Operate no longer dead-ends into the manual loop; the stepper
advances to a new ③ Run node and the rail shows a Run chooser. Tactics are the
unifying object — every run mode targets the marked set.

- Stepper: ① Focus → ② Mark → ③ Run; renderStep() drives a new c0 (chooser) and
  running (run-spine) state alongside a1/a2/b1-b5.
- Role chip strip (marked default → subject; flip to reference) → POST
  /api/embryos/roles (new thin route: sets EmbryoState.role + fires EMBRYOS_UPDATE).
  Load-bearing: expression_monitoring scopes to role=='test'.
- Mode A Manual → the existing per-embryo loop. Mode B Adaptive timelapse
  (interval/stop/monitor, default idle) → POST /api/devices/timelapse/start with
  embryo_ids=[subjects]. Mode C3 Hand-to-agent → AgentChat with the roster.
  Library/plan modes stubbed (wired in Phase 3).
- Live run-spine in the rail (reads GET /api/operation_plan, falls back to a
  summary card) + Pause/Stop/Resume (new /api/devices/timelapse/stop|pause|resume
  routes) + Open-in-Operations.

Verified in-browser: mark→confirm→chooser→roles→adaptive start→run-spine, and
Manual→b-loop, survey→a1. Backend timelapse start is rig/session-gated (503 w/o
orchestrator); UI flow + routes verified.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- /api/devices/timelapse/start now seeds the session Operation Plan with a
  standing_timelapse tactic (+ a reactive_monitor layered_on when a monitoring
  mode is active) scoped to the marked set, transitioned active — closing the
  'UI timelapses skip plan linking' TODO. Best-effort (needs a live session +
  context store); never blocks the start. So the Operate run-spine + Operations
  tab now show a real tactic for a UI-started timelapse.
- start_adaptive_timelapse gains tactic_id (transitions the tactic active on
  success) — lifecycle symmetry with stop/pause/enable_monitoring_mode/queue_burst.

Note: no tactic-schema change needed — _validate_tactics copies structure
verbatim, so cadence_s/stop_condition/monitoring_mode/interval are already
allowed. Seeding is rig/session-verified (no session in the hardware-free shim).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… modes

Keystone: gently/app/orchestration/tactic_executor.py — execute_tactic(agent,
tactic) resolves scope (the FIRST caller of resolve_scope_embryos) and dispatches
by kind to the orchestrator (standing_timelapse[+monitoring] / reactive_monitor /
exclusive_burst / oneshot), then marks the tactic active. One kind→action map the
agent and the UI both go through. 8 unit tests (mocked orchestrator/roster), all green.

- POST /api/operate/run-tactic — append a tactic (or instantiate a saved one via
  library_id) re-scoped to the marked set, then execute it via the executor.
- Operate Run modes wired: From-library → run-tactic(library_id); Manual → fires a
  cosmetic oneshot so the sweep shows on the spine; Continue-a-plan & Hand-to-agent
  → AgentChat hand-off (the agent owns plan resolution + composed tactics).
- Run-spine now renders the real Operation-Plan tactic cards (GET /api/operation_plan).

Verified in-browser: chooser→library select→run-tactic→running run-spine shows the
tactic; run-tactic route 200 end-to-end in shim. resolve_scope_embryos is no longer
orphaned. (No tactic-schema change needed — structure is free-form.)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adversarial review (Opus workflow, 8/8 confirmed) → all fixed:
- HIGH: Run chooser was a one-time gate (unreachable after a manual run or
  Survey/mark-more). The 'Run' stepper node is now a clickable re-entry to the
  chooser; finishing a manual sweep returns to the chooser, not Focus.
- MED: timelapse/start seeded scope embryos:[] when embryo_ids omitted (run images
  ALL) → now records {mode:'global'} so the plan matches the run.
- MED: B1 (Center) showed a frozen frame — the chooser stops all cameras and B1
  had no Start button. Bottom cam now auto-starts at B1 (live centering feedback).
- MED: adaptive 'after N timepoints' degraded to manual (never stops) — now sends
  the combined 'timepoints:N' / 'duration:Nh' form the parser understands.
- MED: stop/pause/resume never reconciled the seeded tactic + every Start appended
  a new active one → accumulating stale/duplicate active timelapses. Now: seeded
  tactics are tagged + linked to the run; stop→done, pause→paused, resume→active;
  prior active operate tactics are retired on a new Start; skip seeding when start
  was a no-op ('already running').
- LOW: run-tactic 500 on a malformed tactic → 400 + append_tactic_to_plan defaults
  kind='custom'.
- LOW: run-spine fallback hard-coded the adaptive shape ('undefineds' for a library
  run) → branches on mode, guards the interval interpolation.
- LOW: manual mode discarded chooser role toggles → applyRoles() now runs for every
  mode (moved to the top of startRun).

Verified in-browser (chooser re-entry, 400 path) + 16 unit tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@pskeshu pskeshu changed the title ★ Unified · Temperature-experiment + Operations suite (A→B2 combined) UX v2 · Temperature + Operations/Tactics + Operate (bottom-cam→SPIM) suite Jul 1, 2026
@pskeshu pskeshu changed the base branch from integration/ux2-all to development July 1, 2026 01:53
…-all' into feature/temperature-operations-all
pskeshu and others added 9 commits July 1, 2026 08:09
… visibility

Opus audit: settings panel is 100% localStorage display prefs; the thermalizer
connection (serial/MQTT/mock) is YAML-only + device-layer-restart. Design: a
server-backed Hardware/Thermalizer section (Test + live hot-swap w/ restart
fallback, sidecar persistence, secrets redacted), Mock dev-only, + a read-only
effective-config viewer. Decisions: full build; live-swap w/ restart fallback;
Mock dev-only.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ewer

The Settings panel was 100% localStorage display prefs. Adds a server-backed
'Hardware / Thermalizer' section to view + edit the controller connection, and a
read-only effective-config viewer.

Device layer (aiohttp): GET /api/temperature/config (redacted), POST
/api/temperature/config/test (transient probe, non-committing), POST
/api/temperature/config (live hot-swap: 409 if RunEngine running / ramp lock held,
build-new-before-swap, close old, sidecar-persist; restart-required fallback on
connect failure). config.local.yml sidecar merged over config.yml at boot (keeps
comments; 0600 perms for the MQTT password). Validator + password redaction.

Client: get/set/test_temperature_config. Viz proxies (require_control on
write/test) + GET /api/config/effective (settings.py + config.yml, secrets shown
as present/absent booleans; restart-required note).

UI: Hardware nav group; Thermalizer section (Serial/MQTT; Mock dev-only via
?dev=1) with Test + Apply (explicit, no auto-save), applied-live vs
restart-required banner; a separate ThermalizerSettings JS module isolated from
the localStorage SettingsManager; read-only Effective-config viewer. Relabeled the
Vitals 'Temperature model' → 'Developmental-timing reference' to kill the naming
trap.

Verified in-browser (render, serial↔MQTT toggle, Mock-hidden, Test graceful 502,
effective-config populated). Rig-only: real serial/MQTT connect + live-swap need
the vendor SDK + device layer (shim shows 'not available'/502 gracefully).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- MED: Test connection now preserves the stored MQTT password (shared
  _preserve_temp_password helper), so Test probes the same credentials Apply
  commits — a valid live config is no longer falsely reported broken.
- MED: thermalizer/effective edits no longer trigger SettingsManager's
  localStorage auto-save or its false 'Settings saved' toast (change listener
  now skips #section-thermalizer/#section-effective).
- LOW: the device-layer 409 (run/ramp active) is flattened to 200 by the proxy,
  so the UI detects it via a body 'blocked' flag instead of the dead
  res.status===409 branch — shows 'Blocked:' not a generic 'Failed:'.
- LOW: sidecar config.local.yml is created 0600 atomically (os.open O_CREAT
  0o600) so the plaintext MQTT password is never briefly world-readable.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…red editors

Completes the settings-panel Phase 2 alongside the effective-config viewer.

Dashboard-pref defaults (server-backed): GET/PUT /api/config/dashboard-defaults
(JSON in config/). settings.js now layers effective config as hardcoded <
rig-wide server defaults < per-browser localStorage, plus a defaults bar —
Save-as-rig-defaults, Reset-to-defaults, Export, Import — so prefs aren't trapped
per-browser.

Restart-required editors: an allowlisted set of settings.py knobs (timeouts,
mesh timing, ML, ux_v2, NCBI) editable via GET/PUT /api/config/settings-overrides
(require_control), persisted to config/settings.local.yml. settings.py now merges
that file into the environment at import (os.environ.setdefault, so a real env var
still wins) — every entry point picks it up on the next restart; the frozen
settings singleton is never live-mutated. New 'Advanced' panel section renders the
knobs with a 'Save (restart required)' banner.

Also: made the sticky settings footer opaque (var(--bg-dark) + top border) with
the defaults bar laid out inline, fixing the transparent-overlap on long
sections. Runtime override files gitignored.

Verified in-browser: server-defaults layering, Advanced render (13 knobs) +
save round-trip to settings.local.yml, fresh-import applies the file while a shell
env var still wins, footer no longer overlaps content. No console errors.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…dvanced UI

Auditing the exposed knobs showed several with ZERO runtime readers — editing them
would be a silent no-op, and one was a vestige of the removed RPyC layer:
- Removed settings.timeouts.rpc_call entirely (RPyC-era; 0 readers repo-wide; the
  HTTP client uses its own aiohttp ClientTimeout). Dropped from the effective viewer.
- Pruned the Advanced editor to knobs with verified live readers: timeouts
  {volume_acquisition, api_call}, mesh {broadcast/stale/dead}, ui.ux_v2, api
  {ncbi_tool, ncbi_email}. Omitted timeouts.plan_execution + ml.{batch,epochs,lr}
  (0 readers today — no-ops).
- Grouped the Advanced section (Timeouts / Mesh network / Interface / NCBI) with
  subheadings instead of a flat list.

Verified in-browser: 8 grouped knobs, rpc_call absent, save round-trip intact.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
store_prediction computed the next prediction_id by _read_jsonl()-ing and
JSON-parsing the ENTIRE predictions.jsonl on every append — O(n) per prediction,
quadratic over a long timelapse, and a synchronous blocking read on the app event
loop (the perception-persist path). Replace with _last_jsonl_record(): read a
bounded 64 KB tail window and take the last record's prediction_id + 1. Ids stay
sequential (we only ever append in order); robust to a trailing partial line from
an interrupted write. Verified: identical ids, 20k-record file read in 0.26 ms.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Code-grounded map of how gently manages concurrent work: the two-process split
(app/viz :8080 <-> device-layer :60610 over HTTP + shared filesystem), one
asyncio loop per process, hardware serialized behind a single plan queue +
executor (and pymmcore g_core_lock), how polling coexists with long experiments
(to_thread offload, split pollers, reference-counted pause_state_updates,
telemetry bypassing the queue), filesystem image transfer, fire-and-forget
perception/VLM + EventBus, backpressure/failure isolation, and known limits
(incl. the now-fixed O(n) prediction write).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…fline

These read-only proxy GETs are polled by the manual/settings pages. When the
device layer isn't running, _resolve_client() returns a client that is not
connected, so the call raised ConnectionError and logged a full ERROR traceback
on every poll — noisy for an expected state (UI open without the device process).
Pre-check client.is_connected and return a quiet 503 instead; genuine failures
(connected but call errors) still log at ERROR.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Control routes 403 in account mode when no operator is logged in — previously the
only signal was a bare console 403 and a button that silently did nothing. Add a
single global fetch wrapper (control-auth.js) that detects a 403 on a mutating
/api/ request and shows a throttled 'Control required — Log in' toast (reusing
showGentlyToast) whose action navigates to /login. Reads only res.status, never
consumes the body, so callers are unaffected.

Verified end-to-end in account mode: real 403 -> toast with the message + Log in
button -> navigates to /login.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant