Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs by pskeshu · Pull Request #76 · gently-project/gently

pskeshu · 2026-07-01T10:33:24Z

A new paradigm for gently development — merges back into #72 (documentation/visibility PR).

Why

Static code tracing describes what the UI wires up; it can't see emergent runtime behaviour. The static user-story audit, for example, flagged the landing has "no persistence" but never reported the consequence — you can return to the landing from the workspace by clicking the header "Gently Microscopy" logo. We need to walk the app, not just trace it.

What's here

tools/ui_crawler/crawler.py — a Playwright headless crawler that walks the UI: structural state fingerprint → enumerate every interactive element (+ synthetic __reload__/__goto_root__) → probe each in isolated parallel contexts → state-transition graph (graph.mmd) + findings report (returns-to-landing, console errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs). --browser chromium|firefox|webkit.
tools/ui_crawler/scenarios.py — scripted reproductions of the static-audit findings that blind crawling can't reach (no export, no ground-truth annotation, view-only plan wizard that spins, …), each captured as a trace.
docs/user-stories/ — one file per story with a Mermaid user-flow + status + deficiency + fix + evidence.
playwright added to [dependency-groups].dev (dev-only, never in published runtime).

👀 See it in the browser — three ways

Live window: --headed --slow-mo 500 — a real browser doing the clicks.
Trace viewer ⭐ --trace / --trace-findings, then uv run playwright show-trace <trace.zip> — an interactive time-travel viewer: timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console, network, source. Each deficiency gets its own named trace under out/traces/ (crawler-walked) and out/scenarios/ (scripted).
Video: --video → .webm per page.

It already earned its keep

The crawler found the return-to-landing affordance the static audit and a manual check missed — the header logo (<a> → /) re-shows the landing (no persistence). US-06 corrected from "dead-end" to "partial" accordingly. Running the scripted scenarios also honestly surfaced that 3 need selector refinement (flagged, not passed off as clean).

Coming on this branch

State diagrams (Operate spine, run lifecycle), an end-to-end service blueprint, a master user-flow map, a biologist journey map, the full per-story file set, the prioritized deficiency report, and airtight versions of the 3 weak scenarios + a data-screen fingerprint fix.

🤖 Built with Claude Opus 4.8 (1M context).

A new paradigm for gently dev: dynamic UX verification + executable design docs, complementing the static user-story audit. - tools/ui_crawler/ — a Playwright headless crawler that WALKS the app: fingerprints each state, enumerates every interactive element (+ synthetic reload/goto-root so browser-level transitions are explored), probes each in isolated parallel contexts, and emits a state-transition graph + a findings report (returns-to-landing, console errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs). --browser chromium|firefox|webkit. playwright added to the [dependency-groups].dev (dev-only). - docs/user-stories/ — one file per story with a Mermaid user-flow + status + deficiency + fix + evidence, plus an index with the overview flow. The crawler already earned its keep: it found the return-to-landing affordance the static audit missed — clicking the header 'Gently Microscopy' logo (→ /) re-shows the landing (no persistence). US-06 corrected from 'dead-end' to 'partial' accordingly. Branched off #72; merges back (documentation/visibility PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…e, video Answers 'can I watch it': --headed --slow-mo N opens a real browser doing the clicks; --trace records screenshots+DOM+network+console into out/trace.zip (scrub with `playwright show-trace`); --video records .webm per page. README documents all three.

…y repro harness - crawler.py --trace-findings: after a crawl, replay each walked deficiency (return-to-landing, console/HTTP errors, dead controls) into its OWN out/traces/<name>.zip so each can be scrubbed in playwright show-trace. - scenarios.py: scripted reproductions of the STATIC-audit findings (not reachable by blind crawling) — navigates to each surface + traces what's missing (no export / no ground-truth / no create-campaign / mesh-invisible / view-only plan wizard spins are clean; snap-503, notebook-questions, temp-alerts flagged as needing selector refinement). Rig/agent-only findings listed as not headless-reproducible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The interactive time-travel viewer is the recommended way to review findings: timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console, network, source. Notes display + full-chromium requirements and the per-finding trace catalogue in out/traces + out/scenarios.

The audit + regression tool for the user stories (not pytest — a 4-state audit, not pass/fail). Each documented story = tools/ui_crawler/stories/US-XX-*.py (async flow → works/partial/gap/blocked verdict + its own Playwright trace). - run_stories.py: discovers the flows, runs each in a trace, writes status.json + STATUS.md (the AUDIT) and diffs vs baseline/status.json (the REGRESSION signal): prints status flips (⬇ regression / ⬆ improved), exits non-zero on regressions. --update-baseline re-baselines; --docs-status refreshes docs/user-stories/STATUS.md. - _harness.py: shared async helpers (goto/tab/view/count_text/exists/present/dom_count) + Rec verdict. - Triage discipline documented: baseline + story doc = contract; a flip → fix UX, or (deliberate paradigm shift) edit the story doc then re-baseline. - Reverted pytest-playwright — wrong shape (2-state) for a 4-state audit. - Exemplars: US-01 (works), US-06 (gap), US-25 (works). Baseline + docs STATUS seeded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…er story The trace.zip is human-only (show-trace GUI). Every story now ALSO emits artifacts an agent can consume directly: a full-page PNG (rec.shot(); runner always captures '<id>-final.png'), the final visible screen_text, and captured console errors — all in out/stories/{shots,status.json} + surfaced in STATUS.md. Verified: the PNGs are readable via the Read tool (renders images) and the text via plain read, so flows can be visually self-audited (already caught US-06's flow drifting off the Plans tab). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…aseline The payoff run. One flow per documented user story (tools/ui_crawler/stories/US-*.py), each driving the intended path → a 4-state verdict + its own trace + an agent-readable screenshot. Result: 8 works · 13 partial · 9 blocked (rig/agent) · 6 gap. - Fan-out (14 Opus agents) wrote the flows against the real app code; recovered from transcripts after a bad bulk-edit, then E702-split only ruff-flagged lines. - Hardened _harness: skip_landing now polls until the landing is actually dismissed (CSS transition race); exists() treats opacity:0 as hidden; tab/view use :visible locators (legacy hidden navbar had duplicate data-tab). - Self-audit via screenshots caught + fixed real flow drift (US-03 return-to-landing was a false 'gap' — the header logo DOES re-show the landing → now 'partial'). Verified US-13 'works' against its screenshot (run chooser really renders). - Seeded baseline/status.json (the regression contract) + docs/user-stories/STATUS.md. Gaps found (browser-confirmed): US-03 return-path incidental, US-05 review/commit, US-06 new-plan discoverability, US-31 ground-truth annotation, US-32 export, US-35 create-campaign, US-43 mesh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

…es + 22-lens backlog Turns the UX audit into a product-ideation substrate (the thing beyond the audit). Built by a 13-agent fan-out: entity inventory → entity×operation matrix (missing affordances) + entity×entity linkage (cross-feature links) + lens sweep → dedup/rank → backlog. A dedicated set of agents IMPROVED the method itself, growing 7 lenses → 22 — incl. two that DERIVE ideas mechanically: capability-orphan (store mutating-method vs route+UI diff) and dangling-edge (stored FK rendered as dead text). Not LLM-in-the-loop-centric: missing-affordance + cross-feature-link are first-class; agentic is one lens. - docs/product-ideation/FRAMEWORK.md — the 22 lenses (core + added) + method notes. - ENTITIES.md — 30-entity inventory + the two matrices (Ground truth = 0 UI cells; high-value unlinked pairs like Embryo↔Ground-truth/Note/Tactic). - BACKLOG.md + backlog.json — 45 ranked ideas, 6 top bets, 10 clusters (queryable/appendable). The notebook 'add note' fell out as a plain missing-affordance (IDEA-02); clickable notebook chips + reverse links (IDEA-04) from dangling-edge — exactly the kinds asked for. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

pskeshu and others added 9 commits July 1, 2026 16:02

style(ui_crawler): split E701 one-line if/return in kind_of

8e2e9cb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76

Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76
pskeshu wants to merge 9 commits into
feature/temperature-operations-allfrom
tooling/ui-crawler-user-stories

pskeshu commented Jul 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

pskeshu commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What's here

👀 See it in the browser — three ways

It already earned its keep

Coming on this branch

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

pskeshu commented Jul 1, 2026 •

edited

Loading