Problem
SWT-Bench image build workflows are hanging indefinitely at the "Build and push SWT-Bench images" step, blocking evaluation runs.
Evidence
Stuck SWT-Bench Builds
| Run ID |
Started |
Duration |
Expected |
Status |
Runner |
| 21754330186 |
14:37 UTC |
2h 52m+ |
~10 min |
🚫 Stuck |
blacksmith-01kgsp5fsx8n9gacjyqhx4cmxk-32vcpu |
| 21755926698 |
15:27 UTC |
2h 2m+ |
~10 min |
🚫 Stuck |
blacksmith-01kgss09n9rdbzzsaf6kv4nkvh-32vcpu |
Configuration for stuck builds:
- Stuck at step: "Build and push SWT-Bench images"
- No progress updates since start time (no heartbeats/updates from runner)
- Commit:
680ce0f564174aecd74394ef083bd421e6dbe5e1
- Max workers: 4
- Dataset:
eth-sri/SWT-bench_Verified_bm25_27k_zsp
- Split: test
- Target: source-minimal
Last Successful SWT-Bench Build
| Run ID |
Started |
Duration |
Status |
| 21753220072 |
14:01 UTC |
9m 42s |
✅ Success |
Configuration for successful build:
- Commit:
680ce0f564174aecd74394ef083bd421e6dbe5e1 (same)
- Max workers: 4 (same)
- Dataset:
eth-sri/SWT-bench_Verified_bm25_27k_zsp (same)
- Split: test (same)
- Target: source-minimal (same)
- Completed normally in expected time
Timeline
14:01 UTC - SWT-Bench build 21753220072 starts
14:12 UTC - SWT-Bench build 21753220072 completes (9m 42s) ✅
[25 minute gap]
14:37 UTC - SWT-Bench build 21754330186 starts
- Gets stuck immediately, no progress
- Still running 2h 52m+ later
15:27 UTC - SWT-Bench build 21755926698 starts
- Gets stuck immediately, no progress
- Still running 2h 2m+ later
Something changed between 14:12 UTC and 14:37 UTC that causes builds to freeze.
Impact
2 evaluation pods blocked waiting for SWT-Bench builds:
| Pod |
Model |
Benchmark |
Waiting For |
Time Wasted |
| eval-21754233398-claude-4-6-mr4zr |
Claude Opus 4.6 |
swtbench |
Run 21754330186 |
2h 52m+ |
| eval-21755837737-claude-son-9gsdl |
Claude Sonnet 4.5 |
swtbench |
Run 21755926698 |
2h 2m+ |
These pods are stuck polling for build completion every 60 seconds and cannot start evaluation.
Analysis
What's Identical Between Working and Stuck Builds
- ✅ Same code commit:
680ce0f564174aecd74394ef083bd421e6dbe5e1
- ✅ Same workflow file (no changes)
- ✅ Same configuration (max-workers, dataset, split, target)
- ✅ Same runner type (Blacksmith 32vCPU Ubuntu 22.04)
- ✅ Same Docker/BuildKit setup
The only difference is TIME: builds after 14:12 UTC freeze
Evidence of Complete Freeze
GitHub Actions API shows no progress updates:
Stuck build 21754330186:
{
"status": "in_progress",
"started_at": "2026-02-06T14:37:57Z",
"completed_at": null,
"updated_at": "2026-02-06T14:37:29Z" // No updates in 2h 52m+!
}
Stuck build 21755926698:
{
"status": "in_progress",
"started_at": "2026-02-06T15:27:48Z",
"completed_at": null,
"updated_at": "2026-02-06T15:27:16Z" // No updates in 2h 2m+!
}
No heartbeat updates indicates complete freeze, not slow progress.
Likely Causes
-
Blacksmith runner infrastructure issue: Something changed on Blacksmith's side between 14:12-14:37 UTC
- Runner allocation changed
- Network/registry connectivity issue
- Storage/disk issues
-
Docker/BuildKit state corruption:
- BuildKit cache corruption affecting new builds
- Docker daemon hung/deadlocked
- Registry (ghcr.io) connection timeout
-
Concurrent build interference:
- First build (14:01) ran alone → succeeded
- Second/third builds (14:37, 15:27) may be interfering with each other
- Potential resource contention or lock contention
-
GitHub Actions infrastructure:
- Runner communication issue
- Job orchestration problem
- Workflow dispatch timing issue
What to Check
- Blacksmith runner status at 14:12-14:37 UTC: Were there any incidents/changes?
- GitHub Container Registry (ghcr.io) status: Any outages or rate limiting?
- Concurrent builds: Is there lock contention in
build_images.py with parallel workers?
- Runner disk space: BuildKit cache may have filled up
Recommendations
Immediate Actions
-
Cancel stuck workflows (they're consuming runner resources):
gh run cancel 21754330186 --repo All-Hands-AI/benchmarks
gh run cancel 21755926698 --repo All-Hands-AI/benchmarks
-
Delete blocked eval pods (they'll never complete):
kubectl delete pod eval-21754233398-claude-4-6-mr4zr -n evaluation-jobs
kubectl delete pod eval-21755837737-claude-son-9gsdl -n evaluation-jobs
Investigation
- Review runner logs (if accessible) for both stuck and successful builds
- Check Blacksmith runner status around 14:12-14:37 UTC
- Test single vs concurrent builds: Try running one SWT-Bench build in isolation
- Check ghcr.io rate limits: Verify if registry pushes are throttled
- Inspect BuildKit cache: Look for corruption or disk space issues
Preventive Measures
-
Add timeout to build step:
- name: Build and push SWT-Bench images
timeout-minutes: 30 # Fail after 30 min instead of hanging forever
run: |
...
-
Add progress monitoring:
# In build_images.py or wrapper script
echo "Progress: Building image X of Y" every N seconds
-
Add health checks before build:
- name: Verify Docker/BuildKit health
run: |
docker info
docker buildx inspect
df -h
-
Serialize SWT-Bench builds (prevent concurrent runs):
concurrency:
group: build-swt-bench-images # Global, not per-ref
cancel-in-progress: true # Cancel old runs
-
Add retry logic: If build hangs, auto-cancel and retry once
Environment:
- Workflow:
.github/workflows/build-swtbench-images.yml
- Runner: Blacksmith 32vCPU Ubuntu 22.04
- Docker Buildx: enabled
- BuildKit: enabled (plain progress)
- Dataset: eth-sri/SWT-bench_Verified_bm25_27k_zsp
- Build script:
benchmarks/swtbench/build_images.py
Problem
SWT-Bench image build workflows are hanging indefinitely at the "Build and push SWT-Bench images" step, blocking evaluation runs.
Evidence
Stuck SWT-Bench Builds
Configuration for stuck builds:
680ce0f564174aecd74394ef083bd421e6dbe5e1eth-sri/SWT-bench_Verified_bm25_27k_zspLast Successful SWT-Bench Build
Configuration for successful build:
680ce0f564174aecd74394ef083bd421e6dbe5e1(same)eth-sri/SWT-bench_Verified_bm25_27k_zsp(same)Timeline
Something changed between 14:12 UTC and 14:37 UTC that causes builds to freeze.
Impact
2 evaluation pods blocked waiting for SWT-Bench builds:
These pods are stuck polling for build completion every 60 seconds and cannot start evaluation.
Analysis
What's Identical Between Working and Stuck Builds
680ce0f564174aecd74394ef083bd421e6dbe5e1The only difference is TIME: builds after 14:12 UTC freeze
Evidence of Complete Freeze
GitHub Actions API shows no progress updates:
Stuck build 21754330186:
{ "status": "in_progress", "started_at": "2026-02-06T14:37:57Z", "completed_at": null, "updated_at": "2026-02-06T14:37:29Z" // No updates in 2h 52m+! }Stuck build 21755926698:
{ "status": "in_progress", "started_at": "2026-02-06T15:27:48Z", "completed_at": null, "updated_at": "2026-02-06T15:27:16Z" // No updates in 2h 2m+! }No heartbeat updates indicates complete freeze, not slow progress.
Likely Causes
Blacksmith runner infrastructure issue: Something changed on Blacksmith's side between 14:12-14:37 UTC
Docker/BuildKit state corruption:
Concurrent build interference:
GitHub Actions infrastructure:
What to Check
build_images.pywith parallel workers?Recommendations
Immediate Actions
Cancel stuck workflows (they're consuming runner resources):
Delete blocked eval pods (they'll never complete):
Investigation
Preventive Measures
Add timeout to build step:
Add progress monitoring:
Add health checks before build:
Serialize SWT-Bench builds (prevent concurrent runs):
Add retry logic: If build hangs, auto-cancel and retry once
Environment:
.github/workflows/build-swtbench-images.ymlbenchmarks/swtbench/build_images.py