Skip to content

ci: fix flaky MongoDB "Connection refused" failures#1629

Draft
jkeuhlen wants to merge 1 commit into
yesodweb:masterfrom
jkeuhlen:ci-mongodb-flaky-fix
Draft

ci: fix flaky MongoDB "Connection refused" failures#1629
jkeuhlen wants to merge 1 commit into
yesodweb:masterfrom
jkeuhlen:ci-mongodb-flaky-fix

Conversation

@jkeuhlen

@jkeuhlen jkeuhlen commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

My other recent PR (#1609) was failing CI, but it looked unrelated to my changes. Digging into it more, it looked like the mongoDB test suite was failing, which was odd since I wasn't touching anything that should impact the mongo implementation.

I poked around some with Claude (summary left below) and it noticed that the workflow for mongo was implemented as a singular action step rather than a service.

This PR attempts to fix the flakiness in two ways:

  1. Move the startup to right before the test suite run, so it isn't sitting idle and unused while the build happens.
  2. Add a healthcheck.
  3. Bump the underlying action version, in case there are fixes upstream that help here.

I'll run it a few times before marking the PR ready for review, but there are no code changes, just this fix to the CI workflow for mongo.

Problem

CI intermittently fails with the persistent-mongoDB test suite dying on:

uncaught exception: IOException of type NoSuchThing
Network.Socket.connect: <socket: 10>: does not exist (Connection refused)

One refused socket fails the entire suite (e.g. 102 examples, 89 failures), which then cancels the rest of the fail-fast matrix. The same branches pass on re-run, so it's a flake rather than a real regression.

Root cause

MongoDB is wired up differently from the other databases:

  • postgres and mysql are declared as services: with --health-cmd health checks, so GitHub blocks the job until they're healthy. mysql additionally has an explicit Check MySQL connection step.
  • MongoDB is started by a plain action step with no readiness gate, and the test harness (persistent-mongoDB/test/MongoInit.hsrunConn) connects once with no retry/backoff.

On top of that, Start MongoDB ran before the ~10 minute cabal v2-build all, leaving a long window in which the mongod container could be resource-starved (the build is memory-hungry on a 7 GB runner) and become unreachable by the time the tests connect.

Fix

  • Move Start MongoDB to immediately before cabal v2-test, after the build — shrinking the kill window to near zero.
  • Add a readiness gate that waits for the port to accept connections. It's a plain bash /dev/tcp check, which mirrors the exact failure mode (a refused socket) and avoids depending on a mongosh/mongo client being present on the runner.
  • Bump supercharge/mongodb-github-action 1.8.0 → 1.12.0.

This is workflow-only; no library or test code changes.

🤖 Generated with Claude Code

The persistent-mongoDB test suite intermittently fails with
`Network.Socket.connect: ... does not exist (Connection refused)`.

Unlike postgres and mysql, which are declared as health-gated `services`
(the job blocks until they report healthy, and mysql has an extra explicit
connection check), MongoDB is started by an action step with no readiness
gate, and the test harness (MongoInit.runConn) connects once with no retry.
It was also started *before* the ~10 minute `cabal v2-build all`, leaving a
long window in which the mongod container could be resource-starved and
become unreachable by the time the tests ran.

- Move "Start MongoDB" to immediately before `cabal v2-test`, after the build.
- Add a readiness gate that waits for the port to accept connections (a plain
  TCP check, mirroring the exact failure mode, with no mongo-client dependency).
- Bump supercharge/mongodb-github-action 1.8.0 -> 1.12.0.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant