Skip to content

Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76

Open
pskeshu wants to merge 9 commits into
feature/temperature-operations-allfrom
tooling/ui-crawler-user-stories
Open

Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76
pskeshu wants to merge 9 commits into
feature/temperature-operations-allfrom
tooling/ui-crawler-user-stories

Conversation

@pskeshu

@pskeshu pskeshu commented Jul 1, 2026

Copy link
Copy Markdown
Collaborator

A new paradigm for gently development — merges back into #72 (documentation/visibility PR).

Why

Static code tracing describes what the UI wires up; it can't see emergent runtime behaviour. The static user-story audit, for example, flagged the landing has "no persistence" but never reported the consequence — you can return to the landing from the workspace by clicking the header "Gently Microscopy" logo. We need to walk the app, not just trace it.

What's here

  • tools/ui_crawler/crawler.py — a Playwright headless crawler that walks the UI: structural state fingerprint → enumerate every interactive element (+ synthetic __reload__/__goto_root__) → probe each in isolated parallel contexts → state-transition graph (graph.mmd) + findings report (returns-to-landing, console errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs). --browser chromium|firefox|webkit.
  • tools/ui_crawler/scenarios.py — scripted reproductions of the static-audit findings that blind crawling can't reach (no export, no ground-truth annotation, view-only plan wizard that spins, …), each captured as a trace.
  • docs/user-stories/ — one file per story with a Mermaid user-flow + status + deficiency + fix + evidence.
  • playwright added to [dependency-groups].dev (dev-only, never in published runtime).

👀 See it in the browser — three ways

  • Live window: --headed --slow-mo 500 — a real browser doing the clicks.
  • Trace viewer--trace / --trace-findings, then uv run playwright show-trace <trace.zip> — an interactive time-travel viewer: timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console, network, source. Each deficiency gets its own named trace under out/traces/ (crawler-walked) and out/scenarios/ (scripted).
  • Video: --video.webm per page.

It already earned its keep

The crawler found the return-to-landing affordance the static audit and a manual check missed — the header logo (<a>/) re-shows the landing (no persistence). US-06 corrected from "dead-end" to "partial" accordingly. Running the scripted scenarios also honestly surfaced that 3 need selector refinement (flagged, not passed off as clean).

Coming on this branch

State diagrams (Operate spine, run lifecycle), an end-to-end service blueprint, a master user-flow map, a biologist journey map, the full per-story file set, the prioritized deficiency report, and airtight versions of the 3 weak scenarios + a data-screen fingerprint fix.

🤖 Built with Claude Opus 4.8 (1M context).

pskeshu and others added 9 commits July 1, 2026 16:02
A new paradigm for gently dev: dynamic UX verification + executable design docs,
complementing the static user-story audit.

- tools/ui_crawler/ — a Playwright headless crawler that WALKS the app: fingerprints
  each state, enumerates every interactive element (+ synthetic reload/goto-root so
  browser-level transitions are explored), probes each in isolated parallel contexts,
  and emits a state-transition graph + a findings report (returns-to-landing, console
  errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs). --browser
  chromium|firefox|webkit. playwright added to the [dependency-groups].dev (dev-only).
- docs/user-stories/ — one file per story with a Mermaid user-flow + status +
  deficiency + fix + evidence, plus an index with the overview flow.

The crawler already earned its keep: it found the return-to-landing affordance the
static audit missed — clicking the header 'Gently Microscopy' logo (→ /) re-shows the
landing (no persistence). US-06 corrected from 'dead-end' to 'partial' accordingly.

Branched off #72; merges back (documentation/visibility PR).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, video

Answers 'can I watch it': --headed --slow-mo N opens a real browser doing the
clicks; --trace records screenshots+DOM+network+console into out/trace.zip
(scrub with `playwright show-trace`); --video records .webm per page. README
documents all three.
…y repro harness

- crawler.py --trace-findings: after a crawl, replay each walked deficiency
  (return-to-landing, console/HTTP errors, dead controls) into its OWN
  out/traces/<name>.zip so each can be scrubbed in playwright show-trace.
- scenarios.py: scripted reproductions of the STATIC-audit findings (not
  reachable by blind crawling) — navigates to each surface + traces what's
  missing (no export / no ground-truth / no create-campaign / mesh-invisible /
  view-only plan wizard spins are clean; snap-503, notebook-questions,
  temp-alerts flagged as needing selector refinement). Rig/agent-only findings
  listed as not headless-reproducible.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The interactive time-travel viewer is the recommended way to review findings:
timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console,
network, source. Notes display + full-chromium requirements and the per-finding
trace catalogue in out/traces + out/scenarios.
The audit + regression tool for the user stories (not pytest — a 4-state audit,
not pass/fail). Each documented story = tools/ui_crawler/stories/US-XX-*.py
(async flow → works/partial/gap/blocked verdict + its own Playwright trace).

- run_stories.py: discovers the flows, runs each in a trace, writes status.json +
  STATUS.md (the AUDIT) and diffs vs baseline/status.json (the REGRESSION signal):
  prints status flips (⬇ regression / ⬆ improved), exits non-zero on regressions.
  --update-baseline re-baselines; --docs-status refreshes docs/user-stories/STATUS.md.
- _harness.py: shared async helpers (goto/tab/view/count_text/exists/present/dom_count) + Rec verdict.
- Triage discipline documented: baseline + story doc = contract; a flip → fix UX,
  or (deliberate paradigm shift) edit the story doc then re-baseline.
- Reverted pytest-playwright — wrong shape (2-state) for a 4-state audit.
- Exemplars: US-01 (works), US-06 (gap), US-25 (works). Baseline + docs STATUS seeded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er story

The trace.zip is human-only (show-trace GUI). Every story now ALSO emits artifacts
an agent can consume directly: a full-page PNG (rec.shot(); runner always captures
'<id>-final.png'), the final visible screen_text, and captured console errors — all
in out/stories/{shots,status.json} + surfaced in STATUS.md. Verified: the PNGs are
readable via the Read tool (renders images) and the text via plain read, so flows
can be visually self-audited (already caught US-06's flow drifting off the Plans tab).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aseline

The payoff run. One flow per documented user story (tools/ui_crawler/stories/US-*.py),
each driving the intended path → a 4-state verdict + its own trace + an agent-readable
screenshot. Result: 8 works · 13 partial · 9 blocked (rig/agent) · 6 gap.

- Fan-out (14 Opus agents) wrote the flows against the real app code; recovered from
  transcripts after a bad bulk-edit, then E702-split only ruff-flagged lines.
- Hardened _harness: skip_landing now polls until the landing is actually dismissed
  (CSS transition race); exists() treats opacity:0 as hidden; tab/view use :visible
  locators (legacy hidden navbar had duplicate data-tab).
- Self-audit via screenshots caught + fixed real flow drift (US-03 return-to-landing
  was a false 'gap' — the header logo DOES re-show the landing → now 'partial').
  Verified US-13 'works' against its screenshot (run chooser really renders).
- Seeded baseline/status.json (the regression contract) + docs/user-stories/STATUS.md.

Gaps found (browser-confirmed): US-03 return-path incidental, US-05 review/commit,
US-06 new-plan discoverability, US-31 ground-truth annotation, US-32 export, US-35
create-campaign, US-43 mesh.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es + 22-lens backlog

Turns the UX audit into a product-ideation substrate (the thing beyond the audit).
Built by a 13-agent fan-out: entity inventory → entity×operation matrix (missing
affordances) + entity×entity linkage (cross-feature links) + lens sweep →
dedup/rank → backlog. A dedicated set of agents IMPROVED the method itself,
growing 7 lenses → 22 — incl. two that DERIVE ideas mechanically: capability-orphan
(store mutating-method vs route+UI diff) and dangling-edge (stored FK rendered as
dead text). Not LLM-in-the-loop-centric: missing-affordance + cross-feature-link
are first-class; agentic is one lens.

- docs/product-ideation/FRAMEWORK.md — the 22 lenses (core + added) + method notes.
- ENTITIES.md — 30-entity inventory + the two matrices (Ground truth = 0 UI cells;
  high-value unlinked pairs like Embryo↔Ground-truth/Note/Tactic).
- BACKLOG.md + backlog.json — 45 ranked ideas, 6 top bets, 10 clusters (queryable/appendable).

The notebook 'add note' fell out as a plain missing-affordance (IDEA-02); clickable
notebook chips + reverse links (IDEA-04) from dangling-edge — exactly the kinds asked for.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant