Skip to content

feat(scraper-create): auto-backoff on AI-Flow concurrent-job cap (429)#7

Open
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:feat/scraper-create-429-backoff
Open

feat(scraper-create): auto-backoff on AI-Flow concurrent-job cap (429)#7
anil-bd wants to merge 1 commit into
brightdata:mainfrom
anil-bd:feat/scraper-create-429-backoff

Conversation

@anil-bd
Copy link
Copy Markdown

@anil-bd anil-bd commented May 18, 2026

/Bright Data's AI Flow caps concurrent scraper-template generations per account (currently 3, undocumented). When exceeded, the automate_template POST returns:

429 Cannot run more than 3 jobs in parallel

Today the CLI maps every 429 to a single 500ms-base exponential backoff capped at ~3.5s total — way too short for this case, since freeing a slot takes 2-11 minutes (a full AI-Flow generation). Users who launch ten parallel scraper create invocations see seven of them fail within seconds, leaving seven half-built stub collectors in the dashboard.

This change is the first half of PR-11 (audit's must-fix list). The second half (programmatic stub cleanup via DELETE /dca/collector/{id}) needs an API endpoint that doesn't yet exist and has been forwarded to the product team.

Mechanism:

  • src/utils/client.ts gains an optional retry: Retry_config field on Request_opts. Callers can override max_attempts, base_ms, max_ms, and supply an on_retry callback fired before each sleep. Defaults to the existing short schedule when omitted, so every other command (scrape, search, discover, pipelines, browser) is unaffected.

  • The new compute_backoff() implements full-jitter exponential backoff: delay ∈ [exp/2, exp]. This spreads herds of concurrent processes that all 429 on the same tick — without it, ten processes would back off the same 30s and re-collide.

  • src/commands/scraper.ts owns the AI-Flow-specific schedule (base 30s, ceiling 240s, 4 attempts ≈ 7.5 min total max wait) and passes it to the automate_template POST via the new retry opt. Two new flags expose it: --max-retries override the count
    --no-retry disable retries (fail-fast on 429)

  • on_retry fires a stderr status line during each wait so the user knows the CLI isn't hung: "Hit AI-Flow concurrent-job cap (429). Waiting 32s before retry 1/4..." Non-429 transient errors get a generic line that names the status code.

  • On terminal failure paths that leave a half-built collector (AI-trigger ultimately fails after retries, poll status != done, polling exception), print_stub_recovery_note() writes a stderr block pointing at the dashboard URL for the stub and explaining that Bright Data does not yet expose programmatic deletion. Composes with the PR-2 envelope, which also surfaces the view_url in -o.

  • The AI Scraper Studio vocabulary stays in the scraper command; client.ts knows only the generic retry mechanism. Same architectural boundary as PR-12 (per-command hints).

Tests:

  • 4 unit tests for client.compute_backoff (exponential growth, max_ms ceiling, full-jitter range distribution, default constants).
  • 6 unit tests for build_ai_trigger_retry (default schedule, --max-retries override, --no-retry → max_attempts=0, on_retry emits 429-specific line, on_retry emits generic transient line, on_retry handles status=0 network error).
  • 3 unit tests for parse_max_retries (default, non-negative integers, rejects negatives/floats/non-numeric).
  • 2 unit tests for print_stub_recovery_note (content + empty-id guard).
  • 6 command-level integration tests covering: retry config flows to post, --max-retries respected, --no-retry → 0, stub-recovery note emitted on AI-trigger failure, on poll status != done, and on polling exception.

66 / 66 tests in affected files pass. The 9 pre-existing failures in unrelated suites (daemon, add-mcp, browser, discover, scrape) on main are unchanged.

What is NOT in this PR (split out as a follow-up server-side ask):

  • Programmatic stub deletion (needs DELETE /dca/collector/{id}).
  • Pre-emptive rejection at the template POST step when the user is already at the cap (avoids stub creation entirely; cleaner than client-side cleanup).

Both items are filed in the skills repo proposal
skills/scraper-studio/proposals/PR-11-backoff.md.

Bright Data's AI Flow caps concurrent scraper-template generations
per account (currently 3, undocumented). When exceeded, the
automate_template POST returns:

    429 Cannot run more than 3 jobs in parallel

Today the CLI maps every 429 to a single 500ms-base exponential
backoff capped at ~3.5s total — way too short for this case, since
freeing a slot takes 2-11 minutes (a full AI-Flow generation).
Users who launch ten parallel `scraper create` invocations see
seven of them fail within seconds, leaving seven half-built stub
collectors in the dashboard.

This change is the first half of PR-11 (audit's must-fix list).
The second half (programmatic stub cleanup via DELETE
/dca/collector/{id}) needs an API endpoint that doesn't yet exist
and has been forwarded to the product team.

Mechanism:

* src/utils/client.ts gains an optional `retry: Retry_config` field
  on Request_opts. Callers can override max_attempts, base_ms,
  max_ms, and supply an on_retry callback fired before each sleep.
  Defaults to the existing short schedule when omitted, so every
  other command (scrape, search, discover, pipelines, browser) is
  unaffected.

* The new compute_backoff() implements full-jitter exponential
  backoff: delay ∈ [exp/2, exp]. This spreads herds of concurrent
  processes that all 429 on the same tick — without it, ten
  processes would back off the same 30s and re-collide.

* src/commands/scraper.ts owns the AI-Flow-specific schedule
  (base 30s, ceiling 240s, 4 attempts ≈ 7.5 min total max wait)
  and passes it to the automate_template POST via the new
  `retry` opt. Two new flags expose it:
    --max-retries <n>  override the count
    --no-retry         disable retries (fail-fast on 429)

* on_retry fires a stderr status line during each wait so the
  user knows the CLI isn't hung:
    "Hit AI-Flow concurrent-job cap (429). Waiting 32s before
     retry 1/4..."
  Non-429 transient errors get a generic line that names the
  status code.

* On terminal failure paths that leave a half-built collector
  (AI-trigger ultimately fails after retries, poll status != done,
  polling exception), print_stub_recovery_note() writes a stderr
  block pointing at the dashboard URL for the stub and explaining
  that Bright Data does not yet expose programmatic deletion.
  Composes with the PR-2 envelope, which also surfaces the
  view_url in -o.

* The AI Scraper Studio vocabulary stays in the scraper command;
  client.ts knows only the generic retry mechanism. Same
  architectural boundary as PR-12 (per-command hints).

Tests:

* 4 unit tests for client.compute_backoff (exponential growth,
  max_ms ceiling, full-jitter range distribution, default
  constants).
* 6 unit tests for build_ai_trigger_retry (default schedule,
  --max-retries override, --no-retry → max_attempts=0, on_retry
  emits 429-specific line, on_retry emits generic transient line,
  on_retry handles status=0 network error).
* 3 unit tests for parse_max_retries (default, non-negative
  integers, rejects negatives/floats/non-numeric).
* 2 unit tests for print_stub_recovery_note (content + empty-id
  guard).
* 6 command-level integration tests covering: retry config flows
  to post, --max-retries respected, --no-retry → 0, stub-recovery
  note emitted on AI-trigger failure, on poll status != done, and
  on polling exception.

66 / 66 tests in affected files pass. The 9 pre-existing failures
in unrelated suites (daemon, add-mcp, browser, discover, scrape) on
main are unchanged.

What is NOT in this PR (split out as a follow-up server-side ask):

* Programmatic stub deletion (needs DELETE /dca/collector/{id}).
* Pre-emptive rejection at the template POST step when the user is
  already at the cap (avoids stub creation entirely; cleaner than
  client-side cleanup).

Both items are filed in the skills repo proposal
skills/scraper-studio/proposals/PR-11-backoff.md.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant