Skip to content

[STG-1250] use same cdp tree function across api and sdk#1769

Open
sameelarif wants to merge 4 commits intomainfrom
sameelarif/stg-1250-use-same-cdp-tree-function-across-api-and-sdk
Open

[STG-1250] use same cdp tree function across api and sdk#1769
sameelarif wants to merge 4 commits intomainfrom
sameelarif/stg-1250-use-same-cdp-tree-function-across-api-and-sdk

Conversation

@sameelarif
Copy link
Member

@sameelarif sameelarif commented Feb 27, 2026

why

what changed

test plan


Summary by cubic

Unified the CDP accessibility snapshot across the API and SDK and added a string-only, hash-friendly output for stable caching. Supports Linear STG-1250 by making snapshot trees deterministic and ready to use as cache keys.

  • New Features

    • Exported captureHybridSnapshot and SnapshotOptions/HybridSnapshot/PerFrameSnapshot from v3 (also available on Stagehand default).
    • Added treeOnly mode that returns the combined tree string; hash formatting normalizes structural roles to "none", quotes names, keeps [encodedId] for iframe injection, then strips them after merge.
  • Tests

    • Added integration tests confirming identical combinedTree across separate sessions, with pierceShadow on/off and with iframes disabled.

Written for commit cd0ba68. Summary will update on new commits. Review in cubic

@changeset-bot
Copy link

changeset-bot bot commented Feb 27, 2026

⚠️ No Changeset found

Latest commit: cd0ba68

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

This PR unifies CDP tree function usage and adds a treeOnly option that returns hash-friendly tree strings for caching. The implementation adds deterministic tree formatting with normalized structural roles and consistent name quoting.

Key changes:

  • Added treeOnly option to captureHybridSnapshot() that returns just the tree string instead of full snapshot object
  • Created formatTreeLineForHash() that normalizes structural roles (div, span, html, body) to "none" and uses double-quote formatting
  • Added function overloads to support type-safe treeOnly return values
  • Exported captureHybridSnapshot and snapshot types to public API
  • Added integration tests for cross-session determinism

Critical issue:

  • The stripIds regex has a bug that strips bracket patterns from node names, not just ID prefixes (see inline comment)

Confidence Score: 2/5

  • Contains a critical regex bug that corrupts node names containing bracket patterns
  • The stripIds regex bug could cause data corruption in production when node names contain [text] patterns, affecting cache keys and downstream consumers
  • packages/core/lib/v3/understudy/a11y/snapshot/capture.ts requires immediate fix to the stripIds regex

Important Files Changed

Filename Overview
packages/core/lib/v3/understudy/a11y/snapshot/capture.ts Added function overloads for treeOnly option and stripIds helper. Critical bug in stripIds regex that strips IDs from node names
packages/core/lib/v3/understudy/a11y/snapshot/treeFormatUtils.ts Added formatTreeLineForHash function for hash-friendly tree formatting with normalized structural roles
packages/core/tests/integration/snapshot-determinism.spec.ts Added determinism tests, but missing direct test for the new treeOnly option

Last reviewed commit: 12254ed

Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

6 files reviewed, 2 comments

Edit Code Review Agent Settings | Greptile

// Strip [encodedId] prefixes that were kept during formatting solely so
// injectSubtrees could locate iframe injection points. One regex pass over
// the final combined string is far cheaper than post-processing in the caller.
const stripIds = (tree: string) => tree.replace(/\[[^\]]+\] /g, "");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the regex /\[[^\]]+\] /g will incorrectly strip [...] patterns from node names, not just ID prefixes. if a node has name "[TODO] Fix this", the formatted line [1-123] button "[TODO] Fix this" becomes button "Fix this" instead of button "[TODO] Fix this". use line-anchored pattern instead:

Suggested change
const stripIds = (tree: string) => tree.replace(/\[[^\]]+\] /g, "");
const stripIds = (tree: string) => tree.replace(/^(\s*)\[[^\]]+\] /gm, "$1");

Comment on lines +1 to +96
import { test, expect } from "@playwright/test";
import { V3 } from "../../lib/v3/v3.js";
import { captureHybridSnapshot } from "../../lib/v3/understudy/a11y/snapshot/index.js";
import { v3TestConfig } from "./v3.config.js";

/**
* Checks whether captureHybridSnapshot produces identical combinedTree strings
* across two entirely separate browser sessions navigated to the same static page.
*
* This mirrors real-world cache usage: two independent requests hit the same URL,
* each with their own browser session. A determinism failure here means the cache
* hash would differ between those sessions, causing a cache miss even though the
* page content is identical.
*/
test.describe("captureHybridSnapshot determinism", () => {
let v3a: V3;
let v3b: V3;

test.beforeEach(async () => {
v3a = new V3(v3TestConfig);
v3b = new V3(v3TestConfig);
await Promise.all([v3a.init(), v3b.init()]);
});

test.afterEach(async () => {
await Promise.all([
v3a?.close?.().catch(() => {}),
v3b?.close?.().catch(() => {}),
]);
});

test("two separate sessions on the same static page produce identical trees", async () => {
const pageA = v3a.context.pages()[0];
const pageB = v3b.context.pages()[0];
await Promise.all([
pageA.goto("https://example.com"),
pageB.goto("https://example.com"),
]);

const [snap1, snap2] = await Promise.all([
captureHybridSnapshot(pageA),
captureHybridSnapshot(pageB),
]);

if (snap1.combinedTree !== snap2.combinedTree) {
const lines1 = snap1.combinedTree.split("\n");
const lines2 = snap2.combinedTree.split("\n");
const maxLines = Math.max(lines1.length, lines2.length);
const diffLines: string[] = [];
for (let i = 0; i < maxLines; i++) {
const l1 = lines1[i] ?? "<missing>";
const l2 = lines2[i] ?? "<missing>";
if (l1 !== l2) {
diffLines.push(`line ${i + 1}:`);
diffLines.push(` sessionA: ${l1}`);
diffLines.push(` sessionB: ${l2}`);
}
}
console.log("=== DIFF ===\n" + diffLines.join("\n"));
}

expect(snap1.combinedTree).toBe(snap2.combinedTree);
});

test("two separate sessions with pierceShadow produce identical trees", async () => {
const pageA = v3a.context.pages()[0];
const pageB = v3b.context.pages()[0];
await Promise.all([
pageA.goto("https://example.com"),
pageB.goto("https://example.com"),
]);

const [snap1, snap2] = await Promise.all([
captureHybridSnapshot(pageA, { pierceShadow: true }),
captureHybridSnapshot(pageB, { pierceShadow: true }),
]);

expect(snap1.combinedTree).toBe(snap2.combinedTree);
});

test("two separate sessions without iframes produce identical trees", async () => {
const pageA = v3a.context.pages()[0];
const pageB = v3b.context.pages()[0];
await Promise.all([
pageA.goto("https://example.com"),
pageB.goto("https://example.com"),
]);

const [snap1, snap2] = await Promise.all([
captureHybridSnapshot(pageA, { includeIframes: false }),
captureHybridSnapshot(pageB, { includeIframes: false }),
]);

expect(snap1.combinedTree).toBe(snap2.combinedTree);
});
});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tests verify determinism of the full HybridSnapshot.combinedTree, but the new treeOnly: true option (which is the main addition) is never tested. add a test that uses captureHybridSnapshot(page, { treeOnly: true }) and verifies the returned string is deterministic and has IDs stripped

Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

4 issues found across 6 files

Confidence score: 3/5

  • There is a concrete behavior bug in packages/core/lib/v3/understudy/a11y/snapshot/capture.ts where hashMode isn’t passed in the scoped snapshot path, so treeOnly + focusSelector can yield inconsistent formatting.
  • Public API surface is being expanded via exports from packages/core/lib/v3/index.ts, which pulls in internal types from types/private/snapshot.ts and may lock in implementation details.
  • Given the above, there’s some regression risk if downstream users rely on the scoped snapshot or if future refactors need to change internal types.
  • Pay close attention to packages/core/lib/v3/understudy/a11y/snapshot/capture.ts and packages/core/lib/v3/index.ts - scoped snapshot formatting and public API exposure of internal types.
Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/index.ts">

<violation number="1" location="packages/core/lib/v3/index.ts:68">
P2: According to linked Linear issue STG-1250, `captureHybridSnapshot` is being shared between the API and SDK internally. The codebase convention for internal cross-package exports is to prefix with `__internal` (e.g., `__internalCreateInMemoryAgentCacheHandle`). Consider renaming:
```ts
export { captureHybridSnapshot as __internalCaptureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";

This signals to consumers that this is not a stable public API.

P2: These types are sourced from `types/private/snapshot.ts`, yet they're being exported from the package's public entry point. This turns internal implementation types into a public API contract, reducing refactoring flexibility. If these types need to be shared across the API and SDK, consider either moving them to a public types module or creating a shared internal package — rather than promoting private types to the public surface.

(Based on your team's feedback about avoiding exposing internal types as public APIs.) [FEEDBACK_USED]

P2: `SnapshotOptions` is re-exported publicly from `packages/core/lib/v3/index.ts`. Adding `treeOnly` here locks it into the public API contract. Since this option is described as serving internal callers ("DOM hashing, cache keys"), consider whether it belongs on the public type or should live on a separate internal-only options type to avoid committing to this shape before there's real user signal.

(Based on your team's feedback about stabilizing APIs before exposing parameters publicly and avoiding exposing internal types as public APIs.) [FEEDBACK_USED]

P1: Bug: `hashMode` is not passed to `a11yForFrame` in the scoped snapshot path (`tryScopedSnapshot`), so when `treeOnly: true` and a `focusSelector` successfully scopes the snapshot, the tree will use standard formatting (`formatTreeLine`) instead of hash-friendly formatting (`formatTreeLineForHash`). This makes the output inconsistent depending on which code path is taken.

tryScopedSnapshot needs to propagate hashMode (or treeOnly) into its a11yForFrame call to match the behavior added in collectPerFrameMaps.


</details>

<details>
<summary>Architecture diagram</summary>

```mermaid
sequenceDiagram
    participant Caller as SDK / API Route
    participant Snap as Snapshot Manager (captureHybridSnapshot)
    participant Frame as Frame Processor (a11yForFrame)
    participant Fmt as Tree Formatter (treeFormatUtils)
    participant CDP as Browser (CDP)

    Note over Caller,CDP: NEW: Unified Deterministic Snapshot Flow (STG-1250)

    Caller->>Snap: captureHybridSnapshot(page, { treeOnly: true })
    
    Snap->>Snap: Build Frame Context (iframes/shadow)

    loop For each Frame in Scope
        Snap->>Frame: a11yForFrame(frame, { hashMode: true })
        Frame->>CDP: Fetch Accessibility Tree nodes
        CDP-->>Frame: A11y Node Tree
        
        Frame->>Frame: decorateRoles() & buildHierarchicalTree()
        
        loop For each Node in Tree
            alt CHANGED: hashMode is active
                Frame->>Fmt: NEW: formatTreeLineForHash(node)
                Fmt->>Fmt: Normalize structural roles (div/span -> "none")
                Fmt->>Fmt: Quote names and keep [encodedId]
            else Standard mode
                Frame->>Fmt: formatTreeLine(node)
            end
            Fmt-->>Frame: Formatted string line
        end
        Frame-->>Snap: Frame outline (string)
    end

    Snap->>Snap: mergeFramesIntoSnapshot()
    Note right of Snap: Injects iframe subtrees into parent outlines<br/>using [encodedId] markers

    alt NEW: treeOnly is requested
        Snap->>Snap: NEW: stripIds(combinedTree)
        Note right of Snap: Final Regex pass removes all [encodedId]<br/>prefixes for deterministic cache key
        Snap-->>Caller: Return deterministic string (Outline only)
    else Standard mode
        Snap-->>Caller: Return HybridSnapshot object
    end

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

);
if (scopedSnapshot) return scopedSnapshot;
if (scopedSnapshot) {
if (options?.treeOnly) return stripIds(scopedSnapshot.combinedTree);
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1: Bug: hashMode is not passed to a11yForFrame in the scoped snapshot path (tryScopedSnapshot), so when treeOnly: true and a focusSelector successfully scopes the snapshot, the tree will use standard formatting (formatTreeLine) instead of hash-friendly formatting (formatTreeLineForHash). This makes the output inconsistent depending on which code path is taken.

tryScopedSnapshot needs to propagate hashMode (or treeOnly) into its a11yForFrame call to match the behavior added in collectPerFrameMaps.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/understudy/a11y/snapshot/capture.ts, line 74:

<comment>Bug: `hashMode` is not passed to `a11yForFrame` in the scoped snapshot path (`tryScopedSnapshot`), so when `treeOnly: true` and a `focusSelector` successfully scopes the snapshot, the tree will use standard formatting (`formatTreeLine`) instead of hash-friendly formatting (`formatTreeLineForHash`). This makes the output inconsistent depending on which code path is taken.

`tryScopedSnapshot` needs to propagate `hashMode` (or `treeOnly`) into its `a11yForFrame` call to match the behavior added in `collectPerFrameMaps`.</comment>

<file context>
@@ -57,7 +70,10 @@ export async function captureHybridSnapshot(
   );
-  if (scopedSnapshot) return scopedSnapshot;
+  if (scopedSnapshot) {
+    if (options?.treeOnly) return stripIds(scopedSnapshot.combinedTree);
+    return scopedSnapshot;
+  }
</file context>
Fix with Cubic

export { maybeRunShutdownSupervisorFromArgv as __internalMaybeRunShutdownSupervisorFromArgv } from "./shutdown/supervisor.js";
export type { ServerAgentCacheHandle } from "./cache/serverAgentCache.js";

export { captureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: According to linked Linear issue STG-1250, captureHybridSnapshot is being shared between the API and SDK internally. The codebase convention for internal cross-package exports is to prefix with __internal (e.g., __internalCreateInMemoryAgentCacheHandle). Consider renaming:

export { captureHybridSnapshot as __internalCaptureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";

This signals to consumers that this is not a stable public API.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/index.ts, line 68:

<comment>According to linked Linear issue STG-1250, `captureHybridSnapshot` is being shared between the API and SDK internally. The codebase convention for internal cross-package exports is to prefix with `__internal` (e.g., `__internalCreateInMemoryAgentCacheHandle`). Consider renaming:
```ts
export { captureHybridSnapshot as __internalCaptureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";

This signals to consumers that this is not a stable public API.

@@ -65,6 +65,9 @@ export { __internalCreateInMemoryAgentCacheHandle } from "./cache/serverAgentCac export { maybeRunShutdownSupervisorFromArgv as __internalMaybeRunShutdownSupervisorFromArgv } from "./shutdown/supervisor.js"; export type { ServerAgentCacheHandle } from "./cache/serverAgentCache.js";

+export { captureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";
+export type { SnapshotOptions, HybridSnapshot, PerFrameSnapshot } from "./types/private/snapshot.js";
+
</file context>


</details>

```suggestion
export { captureHybridSnapshot as __internalCaptureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";
Fix with Cubic

* HybridSnapshot object. Intended for callers that only need the text
* outline (e.g. DOM hashing, cache keys).
*/
treeOnly?: boolean;
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: SnapshotOptions is re-exported publicly from packages/core/lib/v3/index.ts. Adding treeOnly here locks it into the public API contract. Since this option is described as serving internal callers ("DOM hashing, cache keys"), consider whether it belongs on the public type or should live on a separate internal-only options type to avoid committing to this shape before there's real user signal.

(Based on your team's feedback about stabilizing APIs before exposing parameters publicly and avoiding exposing internal types as public APIs.)

View Feedback: 1 2

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/types/private/snapshot.ts, line 28:

<comment>`SnapshotOptions` is re-exported publicly from `packages/core/lib/v3/index.ts`. Adding `treeOnly` here locks it into the public API contract. Since this option is described as serving internal callers ("DOM hashing, cache keys"), consider whether it belongs on the public type or should live on a separate internal-only options type to avoid committing to this shape before there's real user signal.

(Based on your team's feedback about stabilizing APIs before exposing parameters publicly and avoiding exposing internal types as public APIs.) </comment>

<file context>
@@ -20,6 +20,12 @@ export type SnapshotOptions = {
+   * HybridSnapshot object. Intended for callers that only need the text
+   * outline (e.g. DOM hashing, cache keys).
+   */
+  treeOnly?: boolean;
 };
 
</file context>
Fix with Cubic

export type { ServerAgentCacheHandle } from "./cache/serverAgentCache.js";

export { captureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";
export type { SnapshotOptions, HybridSnapshot, PerFrameSnapshot } from "./types/private/snapshot.js";
Copy link
Contributor

@cubic-dev-ai cubic-dev-ai bot Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: These types are sourced from types/private/snapshot.ts, yet they're being exported from the package's public entry point. This turns internal implementation types into a public API contract, reducing refactoring flexibility. If these types need to be shared across the API and SDK, consider either moving them to a public types module or creating a shared internal package — rather than promoting private types to the public surface.

(Based on your team's feedback about avoiding exposing internal types as public APIs.)

View Feedback: 1 2

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At packages/core/lib/v3/index.ts, line 69:

<comment>These types are sourced from `types/private/snapshot.ts`, yet they're being exported from the package's public entry point. This turns internal implementation types into a public API contract, reducing refactoring flexibility. If these types need to be shared across the API and SDK, consider either moving them to a public types module or creating a shared internal package — rather than promoting private types to the public surface.

(Based on your team's feedback about avoiding exposing internal types as public APIs.) </comment>

<file context>
@@ -65,6 +65,9 @@ export { __internalCreateInMemoryAgentCacheHandle } from "./cache/serverAgentCac
 export type { ServerAgentCacheHandle } from "./cache/serverAgentCache.js";
 
+export { captureHybridSnapshot } from "./understudy/a11y/snapshot/index.js";
+export type { SnapshotOptions, HybridSnapshot, PerFrameSnapshot } from "./types/private/snapshot.js";
+
 export type {
</file context>
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant