Skip to content

Fix _sock_recv infinite loop when StatusDB TCP connection drops#322

Open
TKeji wants to merge 1 commit intopytest-dev:masterfrom
TKeji:fix-sock-recv-infinite-loop
Open

Fix _sock_recv infinite loop when StatusDB TCP connection drops#322
TKeji wants to merge 1 commit intopytest-dev:masterfrom
TKeji:fix-sock-recv-infinite-loop

Conversation

@TKeji
Copy link
Copy Markdown

@TKeji TKeji commented Mar 30, 2026

Problem

When using --reruns with pytest-xdist, every test makes two blocking TCP calls to the StatusDB server (get_test_failures and set_test_reruns in pytest_runtest_protocol). If the server-side connection drops, _sock_recv enters an infinite loop:

def _sock_recv(self, conn) -> str:
    buf = b""
    while True:
        b = conn.recv(1)
        if b == self.delim:  # b"" != b"\n" → never breaks
            break
        buf += b
    return buf.decode()

recv(1) returns b"" (empty bytes) on a closed socket, but the code only checks for the newline delimiter. Since b"" != b"\n" is always True, the loop never exits.

This causes xdist workers to hang indefinitely at ~90% CPU, appearing stuck on a test that never completes ([pytest-xdist running] ...). The hang persists until the process is manually killed.

Fix

Add a check for empty bytes from recv(1) and raise ConnectionError:

b = conn.recv(1)
if not b:
    raise ConnectionError("StatusDB connection closed unexpectedly")

The ConnectionError propagates as an INTERNALERROR that xdist handles by replacing the worker — much better than hanging forever.

Reproduction

Minimal reproduction (proves the infinite loop on the unpatched version):

import socket
from pytest_rerunfailures import SocketDB

s1, s2 = socket.socketpair()
s2.close()  # recv on s1 will now return b""

db = SocketDB()
db._sock_recv(s1)  # hangs forever on unpatched, raises ConnectionError on patched

Full reproduction in a test run: monkey-patch ServerStatusDB.run_connection to close the server-side connection after a few requests, then run pytest --reruns=1 -n 1 --dist=loadgroup. The worker hangs on the next test's db.get_test_failures() call in _sock_recv.

Impact

Affects any xdist run with --reruns enabled. Without --reruns, the TCP protocol is never exercised (pytest_runtest_protocol returns early), so the bug doesn't manifest.

Checklist

  • Changelog entry in CHANGES.rst
  • Test added
  • Pre-commit hooks pass

When using --reruns with pytest-xdist, every test makes two blocking TCP
calls to the StatusDB server (get_test_failures and set_test_reruns in
pytest_runtest_protocol). If the server-side connection drops, _sock_recv
enters an infinite loop because recv(1) returns b'' (empty bytes) on a
closed socket, but the code only checks for the newline delimiter:

    while True:
        b = conn.recv(1)
        if b == self.delim:  # b'' != b'\n' -> never breaks
            break
        buf += b

This causes xdist workers to hang indefinitely at ~90% CPU, appearing
stuck on a test that never completes. The hang persists until the process
is manually killed.

The fix adds a check for empty bytes from recv(1) and raises
ConnectionError, which surfaces as an INTERNALERROR that xdist handles
by replacing the worker.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a hang in the StatusDB socket protocol used during pytest-xdist runs with --reruns by making _sock_recv detect closed TCP connections and fail fast instead of looping forever.

Changes:

  • Update SocketDB._sock_recv to raise ConnectionError when recv(1) returns empty bytes (closed connection).
  • Add a regression test covering the closed-connection behavior.
  • Document the fix in the changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/pytest_rerunfailures.py Adds an EOF (b"") check in _sock_recv and raises ConnectionError to avoid infinite loops on dropped connections.
tests/test_pytest_rerunfailures.py Adds a unit test using socket.socketpair() to ensure _sock_recv raises on closed connections.
CHANGES.rst Adds a 16.2 (unreleased) changelog entry describing the fix and impact in xdist runs.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1424 to +1427
db = SocketDB()
with pytest.raises(ConnectionError, match="closed unexpectedly"):
db._sock_recv(s1)

Comment on lines +1422 to +1428
s2.close() # Close one end — recv on s1 will return b""

db = SocketDB()
with pytest.raises(ConnectionError, match="closed unexpectedly"):
db._sock_recv(s1)

s1.close()
Comment on lines +1425 to +1428
with pytest.raises(ConnectionError, match="closed unexpectedly"):
db._sock_recv(s1)

s1.close()
@icemac
Copy link
Copy Markdown
Contributor

icemac commented Apr 10, 2026

Hi @TKeji, thank you for your PR. It looks legit to me. I let copliot review it. Could you please check its comments?
The broken tests against main are fixed in #324.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants