Skip to content

[feat]: auto-pause agent during Browserbase captcha solving#1752

Merged
shrey150 merged 29 commits intomainfrom
derek/autowait_captchas_on_browserbase
Mar 18, 2026
Merged

[feat]: auto-pause agent during Browserbase captcha solving#1752
shrey150 merged 29 commits intomainfrom
derek/autowait_captchas_on_browserbase

Conversation

@derekmeegan
Copy link
Copy Markdown
Contributor

@derekmeegan derekmeegan commented Feb 25, 2026

why

Browserbase can solve captchas asynchronously, but agents were still trying to interact with the page while the solver was active. That led to CUA and DOM/hybrid flows clicking solved captcha widgets again, pausing on confirmation questions, or resuming with stale assumptions instead of continuing the original task cleanly.

This PR pauses agent execution while Browserbase's captcha solver is active and hardens the post-solve resume path so the agent keeps working on the original task after Browserbase finishes.

what changed

  • added a shared CaptchaSolver utility that listens for Browserbase browserbase-solving-started/finished/errored console events, supports concurrent waiters, and disposes listeners cleanly
  • paused DOM/hybrid prepareStep execution and CUA prepareStep / action execution while Browserbase is solving a captcha
  • enabled Browserbase captcha solving by default unless browserSettings.solveCaptchas: false
  • updated agent prompts and follow-up messages to tell the model that captchas are handled automatically and should not be clicked again after they are solved
  • added OpenAI CUA recovery behavior so it can:
    • carry one-shot context notes into the next model turn
    • auto-continue when the model asks for confirmation instead of acting
    • guard post-solve clicks that target the solved captcha widget and restate the original instruction so the model re-anchors on the task
  • added focused unit coverage for solver state, Browserbase session accessors, CUA/regular agent hooks, and OpenAI CUA confirmation handling

test plan

  • pnpm --filter @browserbasehq/stagehand lint
  • pnpm --filter @browserbasehq/stagehand build:esm
  • cd packages/core && pnpm exec vitest run --config vitest.esm.config.mjs dist/esm/tests/unit/openai-cua-client.test.js dist/esm/tests/unit/captcha-solver.test.js dist/esm/tests/unit/agent-captcha-hooks.test.js dist/esm/tests/unit/browserbase-session-accessors.test.js

Browserbase smoke:

  • OpenAI CUA + reCAPTCHA demo: strict pass on Verification Success
  • Anthropic CUA + reCAPTCHA demo: strict pass on Verification Success
  • Hybrid Gemini + reCAPTCHA demo: strict pass on Verification Success
  • OpenAI CUA + solveCaptchas: false: solver stays disabled, no wait/resume path is triggered, and the agent stops at the captcha instead of bypassing it

@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Feb 25, 2026

🦋 Changeset detected

Latest commit: 54d9bf4

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 5 packages
Name Type
@browserbasehq/stagehand Patch
@browserbasehq/browse-cli Patch
@browserbasehq/stagehand-evals Patch
@browserbasehq/stagehand-server-v3 Patch
@browserbasehq/stagehand-server-v4 Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Feb 25, 2026

Greptile Summary

This PR adds Browserbase captcha solver awareness to all Stagehand agent execution paths (DOM/hybrid, Anthropic CUA, Google CUA, Microsoft CUA, OpenAI CUA). A new shared CaptchaSolver utility listens for browserbase-solving-started/finished/errored console events and exposes a waitIfSolving() barrier that all agent step and action handlers call before proceeding.

Key changes:

  • CaptchaSolver utility: event-driven state machine with shared waitPromise for concurrent waiters, page-provider callback for cross-navigation re-attachment, and 90 s timeout fallback
  • DOM/hybrid (V3AgentHandler): prepareStep blocks on waitIfSolving() and injects a solved/errored notification into the message stream
  • CUA (V3CuaAgentHandler): both the prepareStepHandler and actionHandler block; a 3-attempt click guard (captchaClickGuardRemaining) additionally intercepts post-solve clicks on captcha widgets by evaluating their bounding boxes against the click coordinates
  • OpenAI CUA specific: addContextNote() carries solve notifications into the next model turn; a new isFollowUpQuestion heuristic auto-continues the loop when the model asks for confirmation instead of acting
  • isCaptchaSolverEnabled getter defaults to true on Browserbase unless browserSettings.solveCaptchas is explicitly false; system prompts updated to inform the model captchas are handled automatically

Issues found:

  • shouldContinueWithoutConfirmation in OpenAICUAClient is evaluated on every loop iteration, not just steps following a captcha solve — legitimate model confirmation requests (e.g., for risky actions) will be auto-continued without caller knowledge
  • isFollowUpQuestion includes "go ahead" as a bare substring, which is broader than the other question-word patterns and could produce false positives on model responses that contain the string incidentally

Confidence Score: 3/5

  • Safe to merge with awareness of the unconstrained auto-continue behaviour in OpenAI CUA, which could silently bypass model-generated confirmation requests on non-captcha steps.
  • The core CaptchaSolver implementation is solid, tests are focused and comprehensive, and the DOM/hybrid path is clean. The main concern is in OpenAICUAClient: the shouldContinueWithoutConfirmation guard is not scoped to post-captcha recovery steps, and the "go ahead" substring in isFollowUpQuestion is overly broad — both of which could affect runtime behaviour in tasks unrelated to captchas.
  • packages/core/lib/v3/agent/OpenAICUAClient.ts — the auto-continue and follow-up-question detection logic need tighter scoping.

Important Files Changed

Filename Overview
packages/core/lib/v3/agent/utils/captchaSolver.ts New utility class that tracks Browserbase captcha solver state via console events. Concurrent waiter sharing pattern is sound; page-change re-attachment logic is well-handled. Minor readability concern around waitPromise being cleared before resolve() in settle().
packages/core/lib/v3/agent/OpenAICUAClient.ts Two logic issues: (1) shouldContinueWithoutConfirmation applies to ALL steps, not just post-captcha recovery — this can silently bypass legitimate model confirmations; (2) isFollowUpQuestion includes "go ahead" as a bare substring which is broader than the other question-word patterns and can produce false positives.
packages/core/lib/v3/handlers/v3CuaAgentHandler.ts Captcha solver lifecycle (init/dispose) is handled correctly via try/finally. captchaClickGuardRemaining and shouldSkipSolvedCaptchaInteraction provide a solid post-solve click guard. handleCaptchaSolveResult is called from both action and step hooks without double-notification due to destructive consumeSolveResult().
packages/core/lib/v3/handlers/v3AgentHandler.ts Captcha solver integrated cleanly into both the blocking execute() and streaming executeStream() paths; dispose is covered in finally blocks and all callbacks (onFinish/onError/onAbort). Injected messages for solved/errored state are well-worded.
packages/core/lib/v3/v3.ts New isCaptchaSolverEnabled getter correctly defaults to true on Browserbase unless solveCaptchas: false is explicitly set. CUA system prompt augmentation is wired correctly in the handler construction path.
packages/core/lib/v3/agent/AgentClient.ts Clean base-class additions: prepareStepHandler hook and no-op addContextNote(). The void note idiom correctly suppresses the unused-parameter lint warning.
packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts Renamed isBrowserbasesolveCaptchas is more semantically accurate; captcha prompt text updated to clearly instruct the model not to re-click solved widgets.
packages/core/tests/unit/captcha-solver.test.ts New unit tests cover solver state transitions, timeout behaviour, concurrent waiter sharing, and dispose clean-up — good coverage of the core logic paths.
packages/core/tests/unit/agent-captcha-hooks.test.ts Integration-style unit tests verify blocking in both DOM/hybrid prepareStep and CUA action/step hooks; click guard tests are thorough.
packages/core/tests/unit/openai-cua-client.test.ts Tests for confirmation auto-continue and context note draining; however, there are no tests that verify the guard is NOT triggered for non-captcha confirmation requests, which would catch the scoping issue.

Sequence Diagram

sequenceDiagram
    participant Agent as Agent Loop
    participant PSH as prepareStepHandler
    participant CS as CaptchaSolver
    participant BB as Browserbase (console)
    participant AH as actionHandler
    participant Guard as CaptchaClickGuard

    Agent->>PSH: call prepareStepHandler()
    PSH->>CS: waitIfSolving()
    BB-->>CS: browserbase-solving-started
    Note over CS: solving = true
    CS-->>PSH: (blocks — waitPromise pending)
    BB-->>CS: browserbase-solving-finished
    Note over CS: solving = false, settled
    CS-->>PSH: (resolves)
    PSH->>CS: consumeSolveResult()
    CS-->>PSH: {solved: true}
    PSH->>Agent: addContextNote("captcha solved…")
    Agent->>Agent: executeStep (model call)
    Agent->>AH: action {type:"click", x, y}
    AH->>CS: waitIfSolving() — returns immediately
    AH->>Guard: shouldSkipSolvedCaptchaInteraction?
    Guard->>Guard: page.evaluate(captcha bounding boxes)
    alt click inside captcha widget
        Guard-->>AH: true — skip click, inject note
    else click outside captcha
        Guard-->>AH: false — execute action
    end
    Note over Agent: next loop iteration includes captcha note in inputItems
Loading

Last reviewed commit: 8b70cd3

Copy link
Copy Markdown
Contributor

@greptile-apps greptile-apps Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts Outdated
Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts Outdated
Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 5 files

Confidence score: 3/5

  • Concurrent waitIfSolving() calls in packages/core/lib/v3/agent/utils/captchaSolver.ts can orphan earlier waiters due to a single resolveWait slot, causing hangs until timeout for some callers.
  • packages/core/lib/v3/handlers/v3AgentHandler.ts may leak captchaSolver if streamText() throws before callbacks run, since disposal only happens inside stream handlers in stream().
  • These are concrete runtime risks (waiting hangs and resource leaks), so there is some user-impacting risk despite the rest of the change looking contained.
  • Pay close attention to packages/core/lib/v3/agent/utils/captchaSolver.ts and packages/core/lib/v3/handlers/v3AgentHandler.ts - concurrency waiting and disposal paths.
Prompt for AI agents (all issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/utils/captchaSolver.ts">

<violation number="1" location="packages/core/lib/v3/agent/utils/captchaSolver.ts:72">
P1: Bug: `resolveWait` is a single slot, so concurrent `waitIfSolving()` calls orphan earlier waiters. The second call overwrites the resolver, leaving the first caller's promise unresolved until its 90s timeout. Consider sharing a single deferred promise across all waiters, e.g.:

```ts
private waitPromise: Promise<void> | null = null;

waitIfSolving(): Promise<void> {
  if (!this.solving) return Promise.resolve();
  if (!this.waitPromise) {
    this.waitPromise = new Promise<void>((resolve) => {
      const timer = setTimeout(() => { /* ... */ }, SOLVE_TIMEOUT_MS);
      this.resolveWait = () => { clearTimeout(timer); resolve(); this.waitPromise = null; };
    });
  }
  return this.waitPromise;
}
```</violation>
</file>

<file name="packages/core/lib/v3/handlers/v3AgentHandler.ts">

<violation number="1" location="packages/core/lib/v3/handlers/v3AgentHandler.ts:293">
P2: Resource leak: if `streamText()` throws synchronously, `captchaSolver` is never disposed. The `execute()` method properly wraps this in a `try/finally`, but `stream()` only disposes inside stream callbacks (`onError`/`onFinish`/`onAbort`), which won't fire if `streamText` itself throws. Wrap the `streamText` call and subsequent setup in a try/catch that calls `captchaSolver?.dispose()` on failure.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant User as Client/V3
    participant Handler as Agent Handler (DOM/CUA)
    participant Solver as NEW: CaptchaSolver
    participant Page as Browser Page (Browserbase)
    participant LLM as LLM Provider

    User->>Handler: execute(instruction)
    
    opt NEW: isBrowserbase && captchaSolverEnabled
        Handler->>Solver: attach(page)
        Solver->>Page: on("console", listener)
    end

    Note over Handler, LLM: CHANGED: System prompt tells LLM to ignore captchas

    loop Agent Step Loop
        Page-->>Solver: NEW: console("browserbase-solving-started")
        Note right of Solver: solving = true

        Handler->>Handler: prepareStep() / actionHandler()
        
        Handler->>Solver: NEW: waitIfSolving()
        
        alt Solver is active
            Note over Solver: NEW: Block execution (max 90s timeout)
            
            alt Solve Success
                Page-->>Solver: NEW: console("browserbase-solving-finished")
                Solver-->>Handler: Resolve (Resume)
            else Solve Error
                Page-->>Solver: NEW: console("browserbase-solving-errored")
                Solver-->>Handler: Resolve (Resume with error flag)
                Handler->>Handler: Log solver error & resetError()
            else Timeout (90s)
                Solver->>Solver: Internal timeout
                Solver-->>Handler: Resolve (Resume)
            end
        else Solver is idle
            Solver-->>Handler: Resolve immediately
        end

        Handler->>LLM: Request next action
        LLM-->>Handler: Action response
        Handler->>Page: Execute action
    end

    Handler->>Solver: NEW: dispose()
    Solver->>Page: off("console", listener)
    Handler-->>User: Result
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts Outdated
Comment thread packages/core/lib/v3/handlers/v3AgentHandler.ts
@derekmeegan
Copy link
Copy Markdown
Contributor Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: dc13b5a25a

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread packages/core/lib/v3/handlers/v3AgentHandler.ts Outdated
Comment thread packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts Outdated
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 4 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/utils/captchaSolver.ts">

<violation number="1" location="packages/core/lib/v3/agent/utils/captchaSolver.ts:54">
P1: Stale `solving` state after page change: when `ensureAttached()` detects a page switch and detaches the old listener, `this.solving` is not reset. If a solve was in progress on the old page, the agent will block for up to 90 seconds waiting for a finish/error event that can never arrive (since the old page's listener was removed). Reset `solving` and settle pending waiters when detaching from a changed page.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 1 file (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/utils/captchaSolver.ts">

<violation number="1" location="packages/core/lib/v3/agent/utils/captchaSolver.ts:148">
P2: When `detachListener()` resolves waiting callers via `settle()`, it should also set `_erroredSinceLastConsume = true` to signal the solve was interrupted. Without this, consumers of `consumeSolveResult()` get `{solved: false, errored: false}` and silently skip the error-handling/notification path. The timeout codepath correctly sets `_erroredSinceLastConsume = true` before `settle()` — this should be consistent.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/lib/v3/agent/utils/captchaSolver.ts
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 2 files (changes from recent commits).

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts">

<violation number="1" location="packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts:200">
P1: Existing tests in `agent-hybrid-mode.spec.ts` pass `isBrowserbase: true` (without `solveCaptchas`) and expect captcha content in the prompt. Since the condition now depends on `solveCaptchas`, these tests will break. The tests need to be updated to pass `solveCaptchas: true` and the expected substring should match the new prompt text (e.g., `"automatically detected and solved"` instead of `"automatically be solved"`).</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

Comment thread packages/core/lib/v3/agent/prompts/agentSystemPrompt.ts Outdated
derekmeegan and others added 21 commits March 10, 2026 16:54
Automatically pause agent execution when Browserbase's captcha solver is
active. Listens for browserbase-solving-started/finished/errored console
messages and blocks the agent's prepareStep (DOM/hybrid) or action handler
(CUA) until solving completes, errors, or hits a 90s timeout.

Also updates the system prompt to tell the agent not to interact with
captchas since they are handled transparently.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix race condition: concurrent waitIfSolving() callers now share a
  single deferred promise so no waiter is orphaned.
- Extract settle() helper for consistent timeout/resolve/cleanup paths.
- Fix stream() disposal: wrap streamText() in try/catch so captchaSolver
  is disposed if streamText throws synchronously.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…aptchas

- CaptchaSolver now accepts a page-provider callback and re-attaches
  the console listener whenever the active page changes (popups, new
  tabs). This ensures captcha events are observed on whichever page is
  currently active.
- System prompt captcha messaging is now gated on captchaSolverEnabled
  (solveCaptchas !== false) rather than just isBrowserbase, so sessions
  with solveCaptchas: false don't incorrectly tell the agent that
  captchas are auto-solved.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was passing isBrowserbase: true but not captchaSolverEnabled,
so the auto-solve captcha messaging wasn't included in the prompt.
Updated test to pass captchaSolverEnabled: true and match new wording.
Added a new test case for isBrowserbase: true with captchaSolverEnabled:
false to verify the solver-off path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… changes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Prevents the agent from blocking up to 90s waiting for solve
events that can never arrive from a detached page.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… enabled

Only includes captcha roadblocks section when isCaptchaSolverEnabled
is true (Browserbase + solveCaptchas !== false).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace isBrowserbase with solveCaptchas in system prompt interface
and update existing tests to match.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add prepareStepHandler hook to AgentClient base class, called at
the top of every CUA step loop iteration. The CUA handler uses
this to block before each LLM call when a captcha solve is in
progress, rather than only blocking at action execution time.

Also appends captcha instructions to the CUA system prompt when
solveCaptchas is enabled.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@shrey150 shrey150 force-pushed the derek/autowait_captchas_on_browserbase branch from 329f8ff to 90e0ed9 Compare March 10, 2026 23:55
@shrey150 shrey150 changed the title feat: auto-pause agent during Browserbase captcha solving (WIP) [feat]: auto-pause agent during Browserbase captcha solving Mar 11, 2026
@shrey150
Copy link
Copy Markdown
Contributor

@greptileai

Comment thread packages/core/lib/v3/agent/OpenAICUAClient.ts Outdated
Comment thread packages/core/lib/v3/agent/OpenAICUAClient.ts Outdated
Comment on lines +157 to +168
private settle(): void {
if (this.waitTimer) {
clearTimeout(this.waitTimer);
this.waitTimer = null;
}
if (this.resolveWait) {
const resolve = this.resolveWait;
this.resolveWait = null;
this.waitPromise = null;
resolve();
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

waitPromise cleared before resolve() — add a comment explaining the ordering

this.waitPromise = null is set before resolve() is called. This is intentional (allows new callers to create a fresh waitPromise once resolution is signalled) but can confuse readers into thinking it might orphan existing awaiters. A brief comment would clarify the intent:

Suggested change
private settle(): void {
if (this.waitTimer) {
clearTimeout(this.waitTimer);
this.waitTimer = null;
}
if (this.resolveWait) {
const resolve = this.resolveWait;
this.resolveWait = null;
this.waitPromise = null;
resolve();
}
}
/** Resolve the shared wait promise and clear the timeout. */
private settle(): void {
if (this.waitTimer) {
clearTimeout(this.waitTimer);
this.waitTimer = null;
}
if (this.resolveWait) {
const resolve = this.resolveWait;
this.resolveWait = null;
// Clear waitPromise first so new callers can create a fresh promise
// once this one is resolved; existing awaiters still hold a reference
// to the old promise and will be unblocked by resolve().
this.waitPromise = null;
resolve();
}
}

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

…olves as errored

- CaptchaSolver.detachListener() now sets _erroredSinceLastConsume when
  a solve was in progress, consistent with the timeout path
- OpenAICUAClient shouldContinueWithoutConfirmation is now gated behind
  a captcha context note check, so legitimate model confirmations are
  no longer silently bypassed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread packages/core/lib/v3/agent/OpenAICUAClient.ts Outdated
Comment thread packages/core/lib/v3/agent/OpenAICUAClient.ts Outdated
Comment thread packages/core/lib/v3/handlers/v3AgentHandler.ts
Comment thread packages/core/lib/v3/handlers/v3CuaAgentHandler.ts Outdated
Copy link
Copy Markdown
Member

@pirate pirate left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good feature, agents definitely waste a lot of tokens and race with auto-handling logic implemented by extensions / browser providers currently.

Just a few comments:

  • naming of the fields and helpers like solveCaptchas and setPrepareStepHandler, what about captchasWillAutoSolve and attachPreStagehandStepHook or something?

  • duplication of logic / system prompts / LLM steering code. would it be possible to centralize the system prompt text / hook logic for captchas in one place and then import it in agentSystemPrompt.ts, OpenAICUAClient.ts, v3AgentHandler.ts, v3CuaAgentHandler.ts, and v3.ts

  • can you add a live integration test that uses a real url with a captcha like https://2captcha.com/demo/recaptcha-v2 (or ask the stealth team for some good test urls) and assert it gets auto-solved by bb prod browser and that the agent doesn't race/conflict with it

shrey150 and others added 6 commits March 18, 2026 13:25
…n test

Address PR feedback from pirate:
- Rename solveCaptchas → captchasAutoSolve, isCaptchaSolverEnabled → isCaptchaAutoSolveEnabled,
  setPrepareStepHandler → setPreStepHook to better reflect intent
- Centralize captcha notification strings (CAPTCHA_SOLVED_MSG, CAPTCHA_ERRORED_MSG,
  CAPTCHA_SYSTEM_PROMPT_NOTE, CAPTCHA_CUA_SYSTEM_PROMPT_NOTE) in captchaSolver.ts
- Add live integration test against 2captcha.com reCAPTCHA v2 demo (Browserbase-only)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Verify the auto-pause mechanism fires (BB emits solving-started events)
and the agent doesn't click the reCAPTCHA checkbox itself. Don't assert
on BB successfully solving the captcha since that depends on BB infra.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Cloudflare Turnstile: asserts "Captcha is passed successfully!" on 2captcha.com demo
- reCAPTCHA v2: asserts "Verification Success... Hooray!" on Google's official demo
  (same URL the stealth team uses in their test suite)

Both tests verify the full e2e flow: BB detects captcha → auto-solves it →
agent waits via CaptchaSolver mechanism → proceeds without racing the solver.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…tion

Instead of parsing English phrases to detect when the OpenAI CUA model
asks for confirmation after a captcha solve, expose a captchaSolvedProceed
tool that the model calls to confirm it should continue. This is more
reliable (structured tool call vs free-text matching), works in any
language, and has no false-positive risk.

Deletes isFollowUpQuestion() and shouldContinueWithoutConfirmation logic.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Keep captcha solver integration (try/catch, dispose in callbacks,
captchaSolver param in createPrepareStep) while adopting main's
updated providerOptions format and other changes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Comment thread packages/core/lib/v3/v3.ts
@shrey150 shrey150 merged commit c27054b into main Mar 18, 2026
198 checks passed
@github-actions github-actions Bot mentioned this pull request Mar 18, 2026
miguelg719 pushed a commit that referenced this pull request Mar 18, 2026
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @browserbasehq/browse-cli@0.2.0

### Minor Changes

- [#1816](#1816)
[`687d54a`](687d54a)
Thanks [@shrey150](https://github.com/shrey150)! - Add `--context-id`
and `--persist` flags to `browse open` for loading and persisting
Browserbase Contexts across sessions

- [#1793](#1793)
[`e38c13b`](e38c13b)
Thanks [@shrey150](https://github.com/shrey150)! - Initial release of
browse CLI - browser automation for AI agents

### Patch Changes

- [#1806](#1806)
[`f8c7738`](f8c7738)
Thanks [@shrey150](https://github.com/shrey150)! - Fix `browse env`
showing stale mode after `browse env remote`

- Updated dependencies
\[[`505e8c6`](505e8c6),
[`2f43ffa`](2f43ffa),
[`63ee247`](63ee247),
[`7dc35f5`](7dc35f5),
[`335cf47`](335cf47),
[`6ba0a1d`](6ba0a1d),
[`4ff3bb8`](4ff3bb8),
[`c27054b`](c27054b),
[`2abf5b9`](2abf5b9),
[`7817fcc`](7817fcc),
[`7390508`](7390508),
[`611f43a`](611f43a),
[`521a10e`](521a10e),
[`2402a3c`](2402a3c)]:
    -   @browserbasehq/stagehand@3.2.0

## @browserbasehq/stagehand@3.2.0

### Minor Changes

- [#1779](#1779)
[`2f43ffa`](2f43ffa)
Thanks [@shrey150](https://github.com/shrey150)! - feat: add
`cdpHeaders` option to `localBrowserLaunchOptions` for passing custom
HTTP headers when connecting to an existing browser via CDP URL

- [#1834](#1834)
[`63ee247`](63ee247)
Thanks [@tkattkat](https://github.com/tkattkat)! - Update stagehand
agents search tool

- [#1774](#1774)
[`521a10e`](521a10e)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - add new
page.setExtraHTTPHeaders() method

### Patch Changes

- [#1759](#1759)
[`505e8c6`](505e8c6)
Thanks [@shrey150](https://github.com/shrey150)! - Add bedrock to the
provider enum in model configuration schemas and regenerate OpenAPI
spec.

- [#1814](#1814)
[`7dc35f5`](7dc35f5)
Thanks [@tkattkat](https://github.com/tkattkat)! - Change usage of
openai provider in agent to default to store:false

- [#1846](#1846)
[`335cf47`](335cf47)
Thanks [@aq17](https://github.com/aq17)! - Fix streaming finished event
being silently dropped. The final SSE event containing the result
payload (success status, message, actions, usage, and messages) was
previously discarded instead of being yielded to the caller.

- [#1764](#1764)
[`6ba0a1d`](6ba0a1d)
Thanks [@shrey150](https://github.com/shrey150)! - Expose `headers` in
`GoogleVertexProviderSettings` so model configs can pass custom provider
headers (for example `X-Goog-Priority`) without TypeScript errors.

- [#1847](#1847)
[`4ff3bb8`](4ff3bb8)
Thanks [@miguelg719](https://github.com/miguelg719)! - Enable FlowLogger
on BROWSERBASE_FLOW_LOGS=1

- [#1752](#1752)
[`c27054b`](c27054b)
Thanks [@derekmeegan](https://github.com/derekmeegan)! - fix: pause
Browserbase agents while captcha solving is active and improve CUA
recovery after the solve completes

- [#1800](#1800)
[`2abf5b9`](2abf5b9)
Thanks [@shrey150](https://github.com/shrey150)! - Make projectId
optional for Browserbase sessions — only BROWSERBASE_API_KEY is required

- [#1766](#1766)
[`7817fcc`](7817fcc)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add configurable
timeout to tools in agent

- [#1749](#1749)
[`7390508`](7390508)
Thanks [@pirate](https://github.com/pirate)! - When connecting to a
browser session that has zero open tabs, Stagehand now automatically
creates an initial `about:blank` tab so the connection can continue.

- [#1761](#1761)
[`611f43a`](611f43a)
Thanks [@seanmcguire12](https://github.com/seanmcguire12)! - fix issue
where handlePossibleNavigation was producing unnecessary error logs on
clicks that trigger page close

- [#1817](#1817)
[`2402a3c`](2402a3c)
Thanks [@tkattkat](https://github.com/tkattkat)! - Add support for
passing custom headers in clientOptions

## @browserbasehq/stagehand-evals@1.1.9

### Patch Changes

- Updated dependencies
\[[`505e8c6`](505e8c6),
[`2f43ffa`](2f43ffa),
[`63ee247`](63ee247),
[`7dc35f5`](7dc35f5),
[`335cf47`](335cf47),
[`6ba0a1d`](6ba0a1d),
[`4ff3bb8`](4ff3bb8),
[`c27054b`](c27054b),
[`2abf5b9`](2abf5b9),
[`7817fcc`](7817fcc),
[`7390508`](7390508),
[`611f43a`](611f43a),
[`521a10e`](521a10e),
[`2402a3c`](2402a3c)]:
    -   @browserbasehq/stagehand@3.2.0

## @browserbasehq/stagehand-server-v3@3.6.1

### Patch Changes

- [#1759](#1759)
[`505e8c6`](505e8c6)
Thanks [@shrey150](https://github.com/shrey150)! - Add bedrock to the
provider enum in model configuration schemas and regenerate OpenAPI
spec.

- Updated dependencies
\[[`505e8c6`](505e8c6),
[`2f43ffa`](2f43ffa),
[`63ee247`](63ee247),
[`7dc35f5`](7dc35f5),
[`335cf47`](335cf47),
[`6ba0a1d`](6ba0a1d),
[`4ff3bb8`](4ff3bb8),
[`c27054b`](c27054b),
[`2abf5b9`](2abf5b9),
[`7817fcc`](7817fcc),
[`7390508`](7390508),
[`611f43a`](611f43a),
[`521a10e`](521a10e),
[`2402a3c`](2402a3c)]:
    -   @browserbasehq/stagehand@3.2.0

## @browserbasehq/stagehand-server-v4@3.6.1

### Patch Changes

- Updated dependencies
\[[`505e8c6`](505e8c6),
[`2f43ffa`](2f43ffa),
[`63ee247`](63ee247),
[`7dc35f5`](7dc35f5),
[`335cf47`](335cf47),
[`6ba0a1d`](6ba0a1d),
[`4ff3bb8`](4ff3bb8),
[`c27054b`](c27054b),
[`2abf5b9`](2abf5b9),
[`7817fcc`](7817fcc),
[`7390508`](7390508),
[`611f43a`](611f43a),
[`521a10e`](521a10e),
[`2402a3c`](2402a3c)]:
    -   @browserbasehq/stagehand@3.2.0

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants