Skip to content

fix(squid): increase healthcheck tolerance to prevent intermittent startup failures#3936

Open
lpcox wants to merge 2 commits into
mainfrom
fix/squid-healthcheck-timing
Open

fix(squid): increase healthcheck tolerance to prevent intermittent startup failures#3936
lpcox wants to merge 2 commits into
mainfrom
fix/squid-healthcheck-timing

Conversation

@lpcox
Copy link
Copy Markdown
Collaborator

@lpcox lpcox commented May 27, 2026

Summary

Increases the Squid container healthcheck tolerance to prevent intermittent "container is unhealthy" failures on loaded runners.

Problem

The Squid healthcheck was configured with tight timing:

  • start_period: 2s, retries: 5, interval: 1s, timeout: 1s
  • Total window: ~7 seconds

On loaded ubuntu-24.04 runners, Squid initialization (chown preflight → base64 config decode → IPv6 check → Squid startup) can exceed this window, causing dependent containers to fail with dependency failed to start: container awf-squid is unhealthy.

Fix

Relaxed healthcheck parameters:

  • start_period: 5s (was 2s)
  • retries: 10 (was 5)
  • interval: 2s (was 1s)
  • timeout: 2s (was 1s)

Total window: ~25s. Happy-path startup still detected within 5-7s (first successful probe during start_period or first retry).

Fixes #3934

…artup failures

The Squid container can fail its healthcheck on loaded runners when
initialization (chown, base64 decode, IPv6 check, Squid startup) takes
longer than the 7-second window. Increase start_period from 2s to 5s,
retries from 5 to 10, and interval/timeout from 1s to 2s, giving Squid
~25s total to become healthy without meaningfully slowing the happy path.

Fixes #3934

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 27, 2026 15:00
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 27, 2026

✅ Coverage Check Passed

Overall Coverage

Metric Base PR Delta
Lines 96.48% 96.53% 📈 +0.05%
Statements 96.35% 96.40% 📈 +0.05%
Functions 98.22% 98.22% ➡️ +0.00%
Branches 90.62% 90.66% 📈 +0.04%
📁 Per-file Coverage Changes (1 files)
File Lines (Before → After) Statements (Before → After)
src/config-writer.ts 89.3% → 90.9% (+1.65%) 89.3% → 90.9% (+1.65%)

Coverage comparison generated by scripts/ci/compare-coverage.ts

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adjusts the Docker Compose Squid service healthcheck timing to reduce intermittent “container is unhealthy” startup failures on slower or heavily loaded runners, improving reliability for dependent services that wait on service_healthy.

Changes:

  • Increased Squid healthcheck start_period from 2s → 5s.
  • Relaxed probe parameters: interval 1s → 2s, timeout 1s → 2s, retries 5 → 10.
Show a summary per file
File Description
src/services/squid-service.ts Relaxes Squid container healthcheck parameters to provide a larger startup tolerance window.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

  • Files reviewed: 1/1 changed files
  • Comments generated: 1

Comment on lines 74 to +79
healthcheck: {
test: ['CMD', 'nc', '-z', 'localhost', '3128'],
interval: '1s',
timeout: '1s',
retries: 5,
start_period: '2s',
interval: '2s',
timeout: '2s',
retries: 10,
start_period: '5s',
@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test: Claude Engine ✅

  • ✅ GitHub API: 2 recent PR entries fetched
  • ✅ GitHub check: playwright_check PASS
  • ✅ File verify: smoke-test-claude-26519418580.txt exists

Result: PASS

💥 [THE END] — Illustrated by Smoke Claude

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test: Gemini Engine Validation

  • GitHub MCP Testing: ❌ (Tools missing or unreachable)
  • GitHub.com Connectivity: ❌ (Status 000)
  • File Writing Testing: ✅
  • Bash Tool Testing: ✅

Overall status: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • localhost

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "localhost"

See Network Configuration for more information.

💎 Faceted by Smoke Gemini

@lpcox
Copy link
Copy Markdown
Collaborator Author

lpcox commented May 27, 2026

@copilot address the review feedback

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

Copy link
Copy Markdown
Contributor

Copilot AI commented May 27, 2026

@copilot address the review feedback

Implemented in 1f7b8b9: added explicit assertions in src/services/squid-service.test.ts for Squid healthcheck test, interval, timeout, retries, and start_period so the tolerance settings are locked by unit tests.

Copilot finished work on behalf of lpcox May 27, 2026 15:16
@github-actions
Copy link
Copy Markdown
Contributor

🔥 Smoke Test: Copilot BYOK (Offline) Mode

Test Result
GitHub MCP connectivity ✅ PR listed: "fix(test): update smoke-claude test to match v0.76.1 action pin"
GitHub.com HTTP connectivity ⚠️ Template var not expanded (pre-step data unavailable)
File write/read ⚠️ Template var not expanded (pre-step data unavailable)
BYOK inference (api-proxy → api.githubcopilot.com) ✅ Responding now

Running in BYOK offline mode (COPILOT_OFFLINE=true) via api-proxy → api.githubcopilot.com

Author: @lpcox | Overall: PARTIAL (MCP + BYOK ✅; pre-step data not injected ⚠️)

🔑 BYOK report filed by Smoke Copilot BYOK

@github-actions
Copy link
Copy Markdown
Contributor

🔥 Smoke Test Results

Test Status
GitHub MCP connectivity
File write/read ✅ (smoke-test-copilot-26520287286.txt verified)
GitHub.com connectivity ✅ (MCP reachable)

PR: fix(squid): increase healthcheck tolerance to prevent intermittent startup failures
Author: @lpcox | Assignees: none

Overall: PASS

📰 BREAKING: Report filed by Smoke Copilot

@github-actions
Copy link
Copy Markdown
Contributor

Smoke Test Results — FAIL

Check Result
Redis PING ❌ timeout (no response)
PostgreSQL pg_isready ❌ no response
PostgreSQL SELECT 1 ❌ timeout (no response)

host.docker.internal is not reachable on ports 6379 or 5432. Service containers may not be running or network routing to the host is blocked.

🔌 Service connectivity validated by Smoke Services

@github-actions
Copy link
Copy Markdown
Contributor

fix(squid): increase healthcheck tolerance to prevent intermittent startup failures ✅
fix(test): update smoke-claude test to match v0.76.1 action pin ✅
Remove dead cleanupFirewallNetwork export from host iptables network module ✅
GitHub title check ✅
Smoke file check ✅
Discussion check ✅
Build check ❌
Overall: FAIL

Warning

Firewall blocked 1 domain

The following domain was blocked by the firewall during workflow execution:

  • registry.npmjs.org

To allow these domains, add them to the network.allowed list in your workflow frontmatter:

network:
  allowed:
    - defaults
    - "registry.npmjs.org"

See Network Configuration for more information.

🔮 The oracle has spoken through Smoke Codex

@github-actions
Copy link
Copy Markdown
Contributor

🏗️ Build Test Suite Results

Ecosystem Project Build/Install Tests Status
Bun elysia 1/1 passed ✅ PASS
Bun hono 1/1 passed ✅ PASS
C++ fmt N/A ✅ PASS
C++ json N/A ✅ PASS
Deno oak N/A 1/1 passed ✅ PASS
Deno std N/A 1/1 passed ✅ PASS
.NET hello-world N/A ✅ PASS
.NET json-parse N/A ✅ PASS
Go color 1/1 passed ✅ PASS
Go env 1/1 passed ✅ PASS
Go uuid 1/1 passed ✅ PASS
Java gson 1/1 passed ✅ PASS
Java caffeine 1/1 passed ✅ PASS
Node.js clsx All passed ✅ PASS
Node.js execa All passed ✅ PASS
Node.js p-limit All passed ✅ PASS
Rust fd 1/1 passed ✅ PASS
Rust zoxide 1/1 passed ✅ PASS

Overall: 8/8 ecosystems passed — ✅ PASS

Generated by Build Test Suite for issue #3936 · sonnet46 1M ·

@github-actions
Copy link
Copy Markdown
Contributor

Chroot Runtime Version Comparison

Runtime Host Version Chroot Version Match?
Python Python 3.12.13 Python 3.12.3 ❌ NO
Node.js v24.16.0 v22.22.3 ❌ NO
Go go1.22.12 go1.22.12 ✅ YES

Overall: FAILED — Python and Node.js versions differ between host and chroot.

Tested by Smoke Chroot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

awf-squid container is failed to start

3 participants