ci: fix flaky MongoDB "Connection refused" failures#1629
Draft
jkeuhlen wants to merge 1 commit into
Draft
Conversation
The persistent-mongoDB test suite intermittently fails with `Network.Socket.connect: ... does not exist (Connection refused)`. Unlike postgres and mysql, which are declared as health-gated `services` (the job blocks until they report healthy, and mysql has an extra explicit connection check), MongoDB is started by an action step with no readiness gate, and the test harness (MongoInit.runConn) connects once with no retry. It was also started *before* the ~10 minute `cabal v2-build all`, leaving a long window in which the mongod container could be resource-starved and become unreachable by the time the tests ran. - Move "Start MongoDB" to immediately before `cabal v2-test`, after the build. - Add a readiness gate that waits for the port to accept connections (a plain TCP check, mirroring the exact failure mode, with no mongo-client dependency). - Bump supercharge/mongodb-github-action 1.8.0 -> 1.12.0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
My other recent PR (#1609) was failing CI, but it looked unrelated to my changes. Digging into it more, it looked like the mongoDB test suite was failing, which was odd since I wasn't touching anything that should impact the mongo implementation.
I poked around some with Claude (summary left below) and it noticed that the workflow for mongo was implemented as a singular action step rather than a service.
This PR attempts to fix the flakiness in two ways:
I'll run it a few times before marking the PR ready for review, but there are no code changes, just this fix to the CI workflow for mongo.
Problem
CI intermittently fails with the
persistent-mongoDBtest suite dying on:One refused socket fails the entire suite (e.g.
102 examples, 89 failures), which then cancels the rest of thefail-fastmatrix. The same branches pass on re-run, so it's a flake rather than a real regression.Root cause
MongoDB is wired up differently from the other databases:
services:with--health-cmdhealth checks, so GitHub blocks the job until they're healthy. mysql additionally has an explicitCheck MySQL connectionstep.persistent-mongoDB/test/MongoInit.hs→runConn) connects once with no retry/backoff.On top of that,
Start MongoDBran before the ~10 minutecabal v2-build all, leaving a long window in which the mongod container could be resource-starved (the build is memory-hungry on a 7 GB runner) and become unreachable by the time the tests connect.Fix
Start MongoDBto immediately beforecabal v2-test, after the build — shrinking the kill window to near zero./dev/tcpcheck, which mirrors the exact failure mode (a refused socket) and avoids depending on amongosh/mongoclient being present on the runner.supercharge/mongodb-github-action1.8.0 → 1.12.0.This is workflow-only; no library or test code changes.
🤖 Generated with Claude Code