Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76
Open
pskeshu wants to merge 9 commits into
Open
Tooling+docs: UI crawler/simulator + living user-story & UX-flow docs#76pskeshu wants to merge 9 commits into
pskeshu wants to merge 9 commits into
Conversation
A new paradigm for gently dev: dynamic UX verification + executable design docs, complementing the static user-story audit. - tools/ui_crawler/ — a Playwright headless crawler that WALKS the app: fingerprints each state, enumerates every interactive element (+ synthetic reload/goto-root so browser-level transitions are explored), probes each in isolated parallel contexts, and emits a state-transition graph + a findings report (returns-to-landing, console errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs). --browser chromium|firefox|webkit. playwright added to the [dependency-groups].dev (dev-only). - docs/user-stories/ — one file per story with a Mermaid user-flow + status + deficiency + fix + evidence, plus an index with the overview flow. The crawler already earned its keep: it found the return-to-landing affordance the static audit missed — clicking the header 'Gently Microscopy' logo (→ /) re-shows the landing (no persistence). US-06 corrected from 'dead-end' to 'partial' accordingly. Branched off #72; merges back (documentation/visibility PR). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e, video Answers 'can I watch it': --headed --slow-mo N opens a real browser doing the clicks; --trace records screenshots+DOM+network+console into out/trace.zip (scrub with `playwright show-trace`); --video records .webm per page. README documents all three.
…y repro harness - crawler.py --trace-findings: after a crawl, replay each walked deficiency (return-to-landing, console/HTTP errors, dead controls) into its OWN out/traces/<name>.zip so each can be scrubbed in playwright show-trace. - scenarios.py: scripted reproductions of the STATIC-audit findings (not reachable by blind crawling) — navigates to each surface + traces what's missing (no export / no ground-truth / no create-campaign / mesh-invisible / view-only plan wizard spins are clean; snap-503, notebook-questions, temp-alerts flagged as needing selector refinement). Rig/agent-only findings listed as not headless-reproducible. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The interactive time-travel viewer is the recommended way to review findings: timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console, network, source. Notes display + full-chromium requirements and the per-finding trace catalogue in out/traces + out/scenarios.
The audit + regression tool for the user stories (not pytest — a 4-state audit, not pass/fail). Each documented story = tools/ui_crawler/stories/US-XX-*.py (async flow → works/partial/gap/blocked verdict + its own Playwright trace). - run_stories.py: discovers the flows, runs each in a trace, writes status.json + STATUS.md (the AUDIT) and diffs vs baseline/status.json (the REGRESSION signal): prints status flips (⬇ regression / ⬆ improved), exits non-zero on regressions. --update-baseline re-baselines; --docs-status refreshes docs/user-stories/STATUS.md. - _harness.py: shared async helpers (goto/tab/view/count_text/exists/present/dom_count) + Rec verdict. - Triage discipline documented: baseline + story doc = contract; a flip → fix UX, or (deliberate paradigm shift) edit the story doc then re-baseline. - Reverted pytest-playwright — wrong shape (2-state) for a 4-state audit. - Exemplars: US-01 (works), US-06 (gap), US-25 (works). Baseline + docs STATUS seeded. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…er story
The trace.zip is human-only (show-trace GUI). Every story now ALSO emits artifacts
an agent can consume directly: a full-page PNG (rec.shot(); runner always captures
'<id>-final.png'), the final visible screen_text, and captured console errors — all
in out/stories/{shots,status.json} + surfaced in STATUS.md. Verified: the PNGs are
readable via the Read tool (renders images) and the text via plain read, so flows
can be visually self-audited (already caught US-06's flow drifting off the Plans tab).
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…aseline The payoff run. One flow per documented user story (tools/ui_crawler/stories/US-*.py), each driving the intended path → a 4-state verdict + its own trace + an agent-readable screenshot. Result: 8 works · 13 partial · 9 blocked (rig/agent) · 6 gap. - Fan-out (14 Opus agents) wrote the flows against the real app code; recovered from transcripts after a bad bulk-edit, then E702-split only ruff-flagged lines. - Hardened _harness: skip_landing now polls until the landing is actually dismissed (CSS transition race); exists() treats opacity:0 as hidden; tab/view use :visible locators (legacy hidden navbar had duplicate data-tab). - Self-audit via screenshots caught + fixed real flow drift (US-03 return-to-landing was a false 'gap' — the header logo DOES re-show the landing → now 'partial'). Verified US-13 'works' against its screenshot (run chooser really renders). - Seeded baseline/status.json (the regression contract) + docs/user-stories/STATUS.md. Gaps found (browser-confirmed): US-03 return-path incidental, US-05 review/commit, US-06 new-plan discoverability, US-31 ground-truth annotation, US-32 export, US-35 create-campaign, US-43 mesh. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…es + 22-lens backlog Turns the UX audit into a product-ideation substrate (the thing beyond the audit). Built by a 13-agent fan-out: entity inventory → entity×operation matrix (missing affordances) + entity×entity linkage (cross-feature links) + lens sweep → dedup/rank → backlog. A dedicated set of agents IMPROVED the method itself, growing 7 lenses → 22 — incl. two that DERIVE ideas mechanically: capability-orphan (store mutating-method vs route+UI diff) and dangling-edge (stored FK rendered as dead text). Not LLM-in-the-loop-centric: missing-affordance + cross-feature-link are first-class; agentic is one lens. - docs/product-ideation/FRAMEWORK.md — the 22 lenses (core + added) + method notes. - ENTITIES.md — 30-entity inventory + the two matrices (Ground truth = 0 UI cells; high-value unlinked pairs like Embryo↔Ground-truth/Note/Tactic). - BACKLOG.md + backlog.json — 45 ranked ideas, 6 top bets, 10 clusters (queryable/appendable). The notebook 'add note' fell out as a plain missing-affordance (IDEA-02); clickable notebook chips + reverse links (IDEA-04) from dangling-edge — exactly the kinds asked for. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A new paradigm for gently development — merges back into #72 (documentation/visibility PR).
Why
Static code tracing describes what the UI wires up; it can't see emergent runtime behaviour. The static user-story audit, for example, flagged the landing has "no persistence" but never reported the consequence — you can return to the landing from the workspace by clicking the header "Gently Microscopy" logo. We need to walk the app, not just trace it.
What's here
tools/ui_crawler/crawler.py— a Playwright headless crawler that walks the UI: structural state fingerprint → enumerate every interactive element (+ synthetic__reload__/__goto_root__) → probe each in isolated parallel contexts → state-transition graph (graph.mmd) + findings report (returns-to-landing, console errors, HTTP 4xx/5xx, spinners, dead controls, unreachable tabs).--browser chromium|firefox|webkit.tools/ui_crawler/scenarios.py— scripted reproductions of the static-audit findings that blind crawling can't reach (no export, no ground-truth annotation, view-only plan wizard that spins, …), each captured as a trace.docs/user-stories/— one file per story with a Mermaid user-flow + status + deficiency + fix + evidence.playwrightadded to[dependency-groups].dev(dev-only, never in published runtime).👀 See it in the browser — three ways
--headed --slow-mo 500— a real browser doing the clicks.--trace/--trace-findings, thenuv run playwright show-trace <trace.zip>— an interactive time-travel viewer: timeline + filmstrip, per-action before/after screenshot, DOM snapshot, console, network, source. Each deficiency gets its own named trace underout/traces/(crawler-walked) andout/scenarios/(scripted).--video→.webmper page.It already earned its keep
The crawler found the return-to-landing affordance the static audit and a manual check missed — the header logo (
<a>→/) re-shows the landing (no persistence).US-06corrected from "dead-end" to "partial" accordingly. Running the scripted scenarios also honestly surfaced that 3 need selector refinement (flagged, not passed off as clean).Coming on this branch
State diagrams (Operate spine, run lifecycle), an end-to-end service blueprint, a master user-flow map, a biologist journey map, the full per-story file set, the prioritized deficiency report, and airtight versions of the 3 weak scenarios + a
data-screenfingerprint fix.🤖 Built with Claude Opus 4.8 (1M context).