Skip to content

fix(worker-ops): preserve queue tracking and prevent scheduler recovery stalls#65

Draft
liobrasil wants to merge 3 commits intonavid/new-worker-archfrom
fix/worker-ops-reliability
Draft

fix(worker-ops): preserve queue tracking and prevent scheduler recovery stalls#65
liobrasil wants to merge 3 commits intonavid/new-worker-archfrom
fix/worker-ops-reliability

Conversation

@liobrasil
Copy link
Collaborator

@liobrasil liobrasil commented Feb 26, 2026

Summary

This PR fixes three worker-ops reliability issues in FlowYieldVaultsEVMWorkerOps.

Commit Breakdown

  1. 5350c4crecover failed workers without scheduler stalls
  • Prevents scheduler stalls by running failed-worker recovery before capacity gating only when capacity is saturated.
  • Recomputes in-flight count after recovery so stale tracked entries do not block progress.
  1. a5db0b3retain tracked requests when fail-marking fails
  • stopAll(): removes scheduledRequests entries only if markRequestAsFailed(...) succeeds.
  • _checkForFailedWorkerRequests(): keeps entries tracked when fail-marking fails, so recovery can retry instead of losing queue ownership.
  1. 412dac2keep tracking on request lookup failures
  • WorkerHandler.executeTransaction() now:
    • removes tracking on normal successful processing;
    • if lookup fails but tracked payload exists, attempts markRequestAsFailed(...) with tracked request data;
    • removes tracking only on successful fail-marking; otherwise retains tracking for scheduler recovery.

Why

Before these fixes, requests could remain PROCESSING on EVM while Cadence-side tracking was dropped, and scheduler recovery/capacity ordering could stall invalid/failing request handling.

Scope

  • Cadence only: cadence/contracts/FlowYieldVaultsEVMWorkerOps.cdc

@liobrasil liobrasil marked this pull request as draft February 26, 2026 00:27
@liobrasil liobrasil force-pushed the navid/new-worker-arch branch from dbb5672 to db29997 Compare February 26, 2026 02:29
@liobrasil liobrasil changed the title fix(worker-ops): preserve queue tracking on fail-mark and lookup failures fix(worker-ops): preserve queue tracking and prevent scheduler recovery stalls Feb 26, 2026
@liobrasil liobrasil force-pushed the fix/worker-ops-reliability branch from 7ba7ee6 to 412dac2 Compare February 26, 2026 02:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant