|
| 1 | +# @wdio/elements Roadmap |
| 2 | + |
| 3 | +## Current state (May 2026) |
| 4 | + |
| 5 | +The package delivers LLM-readable element snapshots for both web and mobile: |
| 6 | + |
| 7 | +| Capability | Web | Mobile | |
| 8 | +|---|---|---| |
| 9 | +| Interactable element list | `getInteractableBrowserElements()` | `getMobileVisibleElements()` | |
| 10 | +| Semantic tree | `getBrowserAccessibilityTree()` | *(raw `JSONElement` only)* | |
| 11 | +| Snapshot serialization | `serializeWebSnapshot()` | `serializeMobileSnapshot()` | |
| 12 | +| Unified API | `getElements()` returns both | `getElements()` returns both | |
| 13 | +| Viewport filtering | `inViewportOnly` (default true) | `inViewportOnly` (default true) | |
| 14 | +| Role classification | Computed in-browser from tag/ARIA | `ANDROID_ROLE_MAP` / `IOS_ROLE_MAP` in snapshot.ts | |
| 15 | +| Locator generation | CSS selectors in browser script | `getSuggestedLocators()` from locator-generation.ts | |
| 16 | +| Context disambiguation | `∈` via `inferPurpose()` | `∈` via `mobileInferPurpose()` | |
| 17 | +| Duplicate selector indexing | N/A (selectors are unique) | `.instance(N)` suffix | |
| 18 | + |
| 19 | +## Architectural concerns |
| 20 | + |
| 21 | +### 1. Two independent mobile pipelines |
| 22 | + |
| 23 | +`serializeMobileSnapshot` in `snapshot.ts` has its own copies of: |
| 24 | + |
| 25 | +- **Role classification** — `ANDROID_ROLE_MAP` / `IOS_ROLE_MAP` duplicate logic from `locators/constants.ts` and `locators/element-filter.ts`. |
| 26 | +- **Interactivity detection** — `isMobileInteractive()` shadows `isInteractableElement()` from `element-filter.ts`. They use different criteria (tag-based vs attribute-based) and can disagree. |
| 27 | +- **Locator generation** — `getBestAndroidLocator()` / `getBestIOSLocator()` are simplified fallbacks. The full pipeline (`getSuggestedLocators()`) is now wired in when source XML is available, but the fallback still exists and the two paths can produce different selectors for the same element. |
| 28 | + |
| 29 | +These should be collapsed: `serializeMobileSnapshot` should consume pre-computed roles, interactivity flags, and selectors from the locator pipeline, not recompute them. |
| 30 | + |
| 31 | +### 2. No mobile equivalent of `getBrowserAccessibilityTree()` |
| 32 | + |
| 33 | +The web path returns a flat `AccessibilityNode[]` with roles, names, selectors, depths, and state. The mobile path returns a raw `JSONElement` tree — the snapshot does all enrichment internally via `collectMobileNodes()` → `MobileFlatNode[]` (a private interface). There is no public function to get an enriched flat node list for mobile. |
| 34 | + |
| 35 | +**Proposal:** Extract `collectMobileNodes()` into a public `getMobileAccessibilityTree()` that returns `MobileFlatNode[]` (or a shared type). `serializeMobileSnapshot()` becomes a pure formatting pass — like `serializeWebSnapshot()` already is. |
| 36 | + |
| 37 | +### 3. Layout noise in mobile snapshots |
| 38 | + |
| 39 | +The Android view hierarchy includes every layout container (`FrameLayout`, `LinearLayout`, `ViewGroup`, etc.). The current noise filter (`NOISY_ROLES`) collapses anonymous containers at depth ≥ 2, but named containers and depth 0-1 scaffolding still appear. The web a11y tree doesn't have this problem because the browser's accessibility computation already skips layout-only `<div>`s. |
| 40 | + |
| 41 | +**Proposal:** A `collapseContainers` option on the snapshot (default `true`) that skips any container without an interactive descendant. Alternatively, the tree collection pass could flag "informative" vs "structural" containers and let the renderer decide. |
| 42 | + |
| 43 | +### 4. Selector format for mobile |
| 44 | + |
| 45 | +Mobile selectors are Appium/WDIO-specific strings (`~Accessibility`, `android=new UiSelector()...`, `id:com.example:id/foo`). The web path outputs CSS selectors (`a*=Highlights`, `#cart-icon-bubble`). An LLM/agent needs different selector parsing logic per platform. There's no common selector abstraction. |
| 46 | + |
| 47 | +**Proposal:** A `SelectorString` type with platform-aware parsing, or at minimum consistent prefix conventions documented for LLM consumption. |
| 48 | + |
| 49 | +### 5. The raw tree doesn't carry locators unless processed |
| 50 | + |
| 51 | +`getMobileVisibleElementsWithTree()` returns `{ elements, tree }` where `tree` is the raw `xmlToJSON()` output. Locators are only on `elements` (from `generateAllElementLocators()`). The snapshot reads locators by running `getSuggestedLocators()` again (or falling back). If a consumer wants to annotate the tree themselves, they must re-run the locator pipeline. |
| 52 | + |
| 53 | +**Proposal:** Enrich the tree in-place during `generateAllElementLocators()` — attach `_selector`, `_role`, and `_interactive` attributes to each `JSONElement` node that passes the filter. The raw tree becomes self-describing. |
| 54 | + |
| 55 | +## Improvement backlog |
| 56 | + |
| 57 | +| Priority | What | Effort | |
| 58 | +|---|---|---| |
| 59 | +| P0 | Merge `isMobileInteractive` + role classification into `generateAllElementLocators` — one source of truth | Medium | |
| 60 | +| P1 | Extract `getMobileAccessibilityTree()` as a public API returning enriched flat nodes | Medium | |
| 61 | +| P1 | Enrich `JSONElement` tree nodes with locators during `generateAllElementLocators()` | Small | |
| 62 | +| P2 | `collapseContainers` option on `serializeMobileSnapshot` | Small | |
| 63 | +| P2 | Unify web + mobile serialization into a single `serializeSnapshot()` function | Large | |
| 64 | +| P3 | Document selector format conventions for LLM consumption | Small | |
| 65 | +| P3 | Add `checked`/`selected`/`expanded` state rendering to mobile snapshot (parity with web) | Small | |
| 66 | + |
| 67 | +## Verified capabilities |
| 68 | + |
| 69 | +- [x] Web: viewport-only snapshot with semantic roles and unique CSS selectors |
| 70 | +- [x] Web: `∈` disambiguation for duplicate selectors (6 "Add to Wishlist" buttons → each with book title context) |
| 71 | +- [x] Web: `statictext` role capturing visible text (book titles, promo copy, cookie text) |
| 72 | +- [x] Web: deduplication of echoed text (child text already in parent name → skipped) |
| 73 | +- [x] Mobile: semantic role mapping (TextView→statictext, ImageView→img, Button→button, etc.) |
| 74 | +- [x] Mobile: full-pipeline selectors via `getSuggestedLocators()` wired into snapshot |
| 75 | +- [x] Mobile: `~` prefix for accessibility-id, `id:` for resource-id, `android=new UiSelector()...` for compound |
| 76 | +- [x] Mobile: `.instance(N)` indexing for duplicate selectors |
| 77 | +- [x] Mobile: explicit tap-target promotion (clickable parent carries `→`, label children provide `∈` context) |
| 78 | +- [x] Mobile: layout noise collapse for anonymous containers |
| 79 | +- [x] Mobile: `∈` context from actual parent, not previous list-item sibling |
| 80 | +- [x] Unified `getElements()` API returning `{ elements, tree }` for both platforms |
| 81 | +- [x] `inViewportOnly` default `true` across all entry points with per-function toggles |
0 commit comments