Skip to content

Webhook dispatch uses unstable sort when multiple pools match the same labels #5128

@vegardx

Description

@vegardx

Description

When exactMatch: true is configured (subset-check: all job labels must exist in the runner pool's label set), a workflow job requesting a subset of labels can match multiple runner pools that all contain those labels as a subset of their own.

The current dispatch logic iterates matcherConfig without a stable sort, so which pool receives the SQS message is non-deterministic. In practice, jobs can land on the most expensive pool when a cheaper one would have sufficed.

Concrete example

Pool configuration

Each pool has its own unique labels, but they all share a common subset:

Pool Labels Instance
small-spot [self-hosted, linux, x64, small, spot] 2 vCPU c5.large spot
medium-spot [self-hosted, linux, x64, medium, spot] 4 vCPU c5.xlarge spot
large-ondemand [self-hosted, linux, x64, large, ondemand] 16 vCPU c5.4xlarge on-demand

Workflow A — requests specific pool (works today)

runs-on: [self-hosted, linux, x64, small, spot]

Only matches small-spot because exactMatch: true requires all job labels to be present in the pool's label set. ✅ Deterministic.

Workflow B — requests just the common subset (broken today)

runs-on: [self-hosted, linux, x64]

All three pools contain [self-hosted, linux, x64] in their label set, so all three match. Which pool gets the message depends on iteration order of matcherConfignon-deterministic. The job might land on large-ondemand (16 vCPU, on-demand pricing) when small-spot would have been sufficient.

This pattern is common: teams define a "baseline" runs-on for simple CI jobs and only add specific labels (e.g. large) when they actually need more resources.

Key assumption

The omission of labels is intentional — the requester doesn't care which specific pool handles the job, they just want any runner that matches their request. A workflow using runs-on: [self-hosted, linux, x64] is saying "I need a Linux x64 runner" without expressing a preference for size or pricing. The priority field lets the operator decide which pool should be preferred for these ambiguous requests (e.g. cheapest first).

Why this matters

  • Cost: Jobs silently route to expensive pools
  • Capacity: Cheap spot pools sit idle while expensive on-demand pools scale up unnecessarily
  • Predictability: Same workflow can use different instance types across runs depending on config iteration order

Suggested fix

Add an optional priority field to MatcherConfig (lower = preferred, default high). Sort pools by (exactMatch DESC, priority ASC) before dispatching. This is additive — existing configs without priority behave identically.

// sqs/index.ts
export interface MatcherConfig {
  labelMatchers: string[][];
  exactMatch: boolean;
  priority?: number; // lower = preferred
}

// dispatch.ts — sort before matching
matcherConfig.sort((a, b) => {
  if (a.matcherConfig.exactMatch !== b.matcherConfig.exactMatch) {
    return a.matcherConfig.exactMatch ? -1 : 1;
  }
  return (a.matcherConfig.priority ?? 999) - (b.matcherConfig.priority ?? 999);
});

Result with priority

Pool Labels Priority Instance
small-spot [self-hosted, linux, x64, small, spot] 1 2 vCPU spot
medium-spot [self-hosted, linux, x64, medium, spot] 5 4 vCPU spot
large-ondemand [self-hosted, linux, x64, large, ondemand] 100 16 vCPU on-demand
  • runs-on: [self-hosted, linux, x64] → routes to small-spot (priority 1, cheapest match)
  • runs-on: [self-hosted, linux, x64, large, ondemand] → routes to large-ondemand (only match)
  • runs-on: [self-hosted, linux, x64, medium, spot] → routes to medium-spot (only match)

Environment

  • Module version: 7.x (multi-runner)
  • Webhook-based dispatch (not runner scale sets)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions