Description
When exactMatch: true is configured (subset-check: all job labels must exist in the runner pool's label set), a workflow job requesting a subset of labels can match multiple runner pools that all contain those labels as a subset of their own.
The current dispatch logic iterates matcherConfig without a stable sort, so which pool receives the SQS message is non-deterministic. In practice, jobs can land on the most expensive pool when a cheaper one would have sufficed.
Concrete example
Pool configuration
Each pool has its own unique labels, but they all share a common subset:
| Pool |
Labels |
Instance |
| small-spot |
[self-hosted, linux, x64, small, spot] |
2 vCPU c5.large spot |
| medium-spot |
[self-hosted, linux, x64, medium, spot] |
4 vCPU c5.xlarge spot |
| large-ondemand |
[self-hosted, linux, x64, large, ondemand] |
16 vCPU c5.4xlarge on-demand |
Workflow A — requests specific pool (works today)
runs-on: [self-hosted, linux, x64, small, spot]
Only matches small-spot because exactMatch: true requires all job labels to be present in the pool's label set. ✅ Deterministic.
Workflow B — requests just the common subset (broken today)
runs-on: [self-hosted, linux, x64]
All three pools contain [self-hosted, linux, x64] in their label set, so all three match. Which pool gets the message depends on iteration order of matcherConfig — non-deterministic. The job might land on large-ondemand (16 vCPU, on-demand pricing) when small-spot would have been sufficient.
This pattern is common: teams define a "baseline" runs-on for simple CI jobs and only add specific labels (e.g. large) when they actually need more resources.
Key assumption
The omission of labels is intentional — the requester doesn't care which specific pool handles the job, they just want any runner that matches their request. A workflow using runs-on: [self-hosted, linux, x64] is saying "I need a Linux x64 runner" without expressing a preference for size or pricing. The priority field lets the operator decide which pool should be preferred for these ambiguous requests (e.g. cheapest first).
Why this matters
- Cost: Jobs silently route to expensive pools
- Capacity: Cheap spot pools sit idle while expensive on-demand pools scale up unnecessarily
- Predictability: Same workflow can use different instance types across runs depending on config iteration order
Suggested fix
Add an optional priority field to MatcherConfig (lower = preferred, default high). Sort pools by (exactMatch DESC, priority ASC) before dispatching. This is additive — existing configs without priority behave identically.
// sqs/index.ts
export interface MatcherConfig {
labelMatchers: string[][];
exactMatch: boolean;
priority?: number; // lower = preferred
}
// dispatch.ts — sort before matching
matcherConfig.sort((a, b) => {
if (a.matcherConfig.exactMatch !== b.matcherConfig.exactMatch) {
return a.matcherConfig.exactMatch ? -1 : 1;
}
return (a.matcherConfig.priority ?? 999) - (b.matcherConfig.priority ?? 999);
});
Result with priority
| Pool |
Labels |
Priority |
Instance |
| small-spot |
[self-hosted, linux, x64, small, spot] |
1 |
2 vCPU spot |
| medium-spot |
[self-hosted, linux, x64, medium, spot] |
5 |
4 vCPU spot |
| large-ondemand |
[self-hosted, linux, x64, large, ondemand] |
100 |
16 vCPU on-demand |
runs-on: [self-hosted, linux, x64] → routes to small-spot (priority 1, cheapest match)
runs-on: [self-hosted, linux, x64, large, ondemand] → routes to large-ondemand (only match)
runs-on: [self-hosted, linux, x64, medium, spot] → routes to medium-spot (only match)
Environment
- Module version: 7.x (multi-runner)
- Webhook-based dispatch (not runner scale sets)
Description
When
exactMatch: trueis configured (subset-check: all job labels must exist in the runner pool's label set), a workflow job requesting a subset of labels can match multiple runner pools that all contain those labels as a subset of their own.The current dispatch logic iterates
matcherConfigwithout a stable sort, so which pool receives the SQS message is non-deterministic. In practice, jobs can land on the most expensive pool when a cheaper one would have sufficed.Concrete example
Pool configuration
Each pool has its own unique labels, but they all share a common subset:
[self-hosted, linux, x64, small, spot][self-hosted, linux, x64, medium, spot][self-hosted, linux, x64, large, ondemand]Workflow A — requests specific pool (works today)
Only matches
small-spotbecauseexactMatch: truerequires all job labels to be present in the pool's label set. ✅ Deterministic.Workflow B — requests just the common subset (broken today)
All three pools contain
[self-hosted, linux, x64]in their label set, so all three match. Which pool gets the message depends on iteration order ofmatcherConfig— non-deterministic. The job might land onlarge-ondemand(16 vCPU, on-demand pricing) whensmall-spotwould have been sufficient.This pattern is common: teams define a "baseline"
runs-onfor simple CI jobs and only add specific labels (e.g.large) when they actually need more resources.Key assumption
The omission of labels is intentional — the requester doesn't care which specific pool handles the job, they just want any runner that matches their request. A workflow using
runs-on: [self-hosted, linux, x64]is saying "I need a Linux x64 runner" without expressing a preference for size or pricing. Thepriorityfield lets the operator decide which pool should be preferred for these ambiguous requests (e.g. cheapest first).Why this matters
Suggested fix
Add an optional
priorityfield toMatcherConfig(lower = preferred, default high). Sort pools by(exactMatch DESC, priority ASC)before dispatching. This is additive — existing configs withoutprioritybehave identically.Result with priority
[self-hosted, linux, x64, small, spot][self-hosted, linux, x64, medium, spot][self-hosted, linux, x64, large, ondemand]runs-on: [self-hosted, linux, x64]→ routes tosmall-spot(priority 1, cheapest match)runs-on: [self-hosted, linux, x64, large, ondemand]→ routes tolarge-ondemand(only match)runs-on: [self-hosted, linux, x64, medium, spot]→ routes tomedium-spot(only match)Environment