Bug Description
The DockerResourceMonitor.getNodeResources() method calculates system resource consumption by iterating through all containers. During each iteration, it performs a sequential inspect call to retrieve CPU and memory limits:
for (const container of stats) {
if (container.State === "running") {
const c = this.docker.getContainer(container.Id);
const { HostConfig } = await c.inspect(); // Sequential blocking call
const cpu = this.resourceParser.cpu(HostConfig.NanoCpus ?? 0);
const memory = this.resourceParser.memory(HostConfig.Memory ?? 0);
cpuUsed += cpu;
memoryUsed += memory;
}
}
Because getNodeResources is called frequently to make container placement decisions during scheduling, this sequential inspect loop acts as a synchronous bottleneck. On docker hosts running many containers, performing sequential inspect round-trips to the Unix socket blocks the scheduling thread for seconds and creates workload placement latency.
Root Cause File
apps/supervisor/src/resourceMonitor.ts
Steps to Reproduce
- Start the Trigger.dev supervisor daemon on a machine hosting 50+ active docker containers.
- Trigger multiple rapid task scheduling executions (
wouldFit).
- Notice scheduling latency and elevated CPU thread blocking inside the supervisor daemon due to sequential docker socket round-trips.
Proposed Fix
Use Promise.all to fetch container configurations concurrently rather than sequentially:
const runningContainers = stats.filter(container => container.State === "running");
const inspections = await Promise.all(
runningContainers.map(async (container) => {
const c = this.docker.getContainer(container.Id);
const { HostConfig } = await c.inspect();
return {
cpu: this.resourceParser.cpu(HostConfig.NanoCpus ?? 0),
memory: this.resourceParser.memory(HostConfig.Memory ?? 0),
};
})
);
for (const inspect of inspections) {
cpuUsed += inspect.cpu;
memoryUsed += inspect.memory;
}
Bug Description
The
DockerResourceMonitor.getNodeResources()method calculates system resource consumption by iterating through all containers. During each iteration, it performs a sequential inspect call to retrieve CPU and memory limits:Because
getNodeResourcesis called frequently to make container placement decisions during scheduling, this sequential inspect loop acts as a synchronous bottleneck. On docker hosts running many containers, performing sequential inspect round-trips to the Unix socket blocks the scheduling thread for seconds and creates workload placement latency.Root Cause File
apps/supervisor/src/resourceMonitor.tsSteps to Reproduce
wouldFit).Proposed Fix
Use
Promise.allto fetch container configurations concurrently rather than sequentially: