Fix CI workflow to fail Test job on pytest failures (ar-r82f.16) by atc964 · Pull Request #15 · IABTechLab/seller-agent

atc964 · 2026-05-28T18:31:28Z

Summary

The CI Test job was silently passing on failing pytest runs. Discovered on
PR #14's run 26593648988:
pytest reported 5 failed, 709 passed but the Test job concluded success.

This means all recent "green CI" claims on main may be unreliable.

Root Cause

The Unit tests / Integration tests steps wrapped pytest in a shell timeout
with exit-code mapping:

set +e
timeout --kill-after=10s 60s pytest tests/unit/ -v --tb=short
code=$?
([ $code -eq 124 ] || [ $code -eq 137 ]) && exit 0 || exit $code

The intent was to swallow only true timeouts (124) or SIGKILLs (137).
In practice, pytest itself completes quickly (~9s), but the python interpreter
then hangs for ~50s during shutdown cleaning up crewai telemetry threads
(cannot schedule new futures after interpreter shutdown log spam).

The 60s shell timeout was firing during that shutdown hang, killing the
interpreter with SIGTERM, returning exit code 124. The workflow then mapped
124 to exit 0 — even though pytest had already printed 5 failed, 709 passed
before the kill.

Timeline from run 26593648988 (Unit tests, py3.12):

18:19:34 — step started
18:19:45 — pytest done, reported 5 failed, 709 passed
18:20:34 — interpreter still hanging on crewai telemetry shutdown
18:20:35 — 60s timeout fires → SIGTERM python → exit 124 → mapped to exit 0
Step concluded: SUCCESS

Fix

Drop the shell timeout wrapper entirely. Use pytest-timeout for per-test
deadlines (30s unit, 60s integration). Pytest's exit code now propagates
directly to GitHub Actions.

Changes:

.github/workflows/ci.yml: replace the set +e ... timeout ... exit code mapping block with pytest --timeout=N for both Unit tests and Integration tests steps.
pyproject.toml: add pytest-timeout>=2.3.0 to dev deps.

Verification

Done with deliberate failure injection on this branch.

Pushed the fix commit alone → expect green CI on this branch's first push.
Added a poison-pill commit (tests/unit/test_ci_poison_pill_DELETEME.py)
with assert False → expect CI Test job to conclude FAILURE (the whole
point of this PR).
Watch the poison-pill run conclude red.
Revert the poison-pill commit → expect green CI again.

Verification status will be appended as commits and CI runs land.

Test plan

Workflow committed and pushed
Poison-pill commit causes CI Test job conclusion = FAILURE (proves fix works)
Poison-pill reverted, CI Test job conclusion = SUCCESS (proves no regression)
Final state: ci.yml fix + pytest-timeout dep, no poison-pill in tree

Bead

ar-r82f.16 — Fix seller-agent CI: workflow silently passes on failing tests

The previous workflow wrapped pytest in `timeout 60s ... && exit 0 on code 124/137`, which swallowed exit code 124 (timeout-killed) as success. In practice, pytest itself completes quickly (~9s) but the python interpreter hangs ~50s during shutdown cleaning up crewai telemetry threads. The 60s shell `timeout` was firing during that shutdown hang, killing the interpreter with SIGTERM, returning 124, which the workflow then mapped to exit 0 even though pytest had reported test failures. Symptom: run 26593648988 reported "5 failed, 709 passed" but the Test job concluded SUCCESS. Fix: drop the shell timeout wrapper entirely. Use pytest-timeout for per-test deadlines (30s unit, 60s integration) and let pytest's exit code propagate to GitHub Actions directly. bead: ar-r82f.16 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

Temporary commit. Verifies the fix in the previous commit causes the CI Test job to conclude FAILURE on pytest failures (previously it silently passed). This commit will be reverted once CI is observed red. bead: ar-r82f.16 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

… method The previous CI runs hung after pytest printed its summary line. The default signal-based pytest-timeout cannot kill processes that ignore signals or are stuck in teardown/atexit code paths. We layer two safety nets: 1. Outer `timeout --kill-after=10s 300s` ensures the shell kills the process group if pytest itself fails to exit (the actual failure mode in run 26595978562 — pytest summary printed, then 11min hang until cancellation). 2. Inner `--timeout=30 --timeout-method=thread` gives finer-grained per-test protection using thread-based termination, more reliable than signals in containerized CI. No exit-code masking. Both timeouts propagate failures naturally as job failures (exit 124 or 137). bead: ar-r82f.16 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>

…r82f.16) crewai 1.14.6 has a telemetry shutdown handler at crewai/telemetry/telemetry.py:211 that raises RuntimeError ("cannot schedule new futures after interpreter shutdown") on clean test exits. This causes pytest to pass in ~7s but the Python interpreter to hang ~5min on shutdown, tripping the outer 300s timeout and producing exit 124 / CI conclusion FAILURE despite all tests passing. CREWAI_TELEMETRY_OPT_OUT alone does not suppress this shutdown path. crewai's telemetry uses OpenTelemetry under the hood, so setting OTEL_SDK_DISABLED=true should fully disable the shutdown handler and let the interpreter exit cleanly. Investigation tracked in ar-r82f.21. Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Pragmatic ship for the lingering interpreter-shutdown hang. Cycle 3 confirmed pytest itself completes (no traceback in the killed runs), but the Python interpreter hangs ~5min on exit, triggering the outer timeout 300s and producing exit 124 / FAILURE despite a clean test run. Two-part fix: 1. Bump outer `timeout` from 300s to 600s on both pytest steps. Even if the interpreter still sits in atexit handlers for several minutes, it has room to complete naturally instead of being killed. 2. Add belt-and-suspenders telemetry opt-out env vars on both steps, targeting the dependencies most likely to be registering atexit network calls: - CREWAI_DISABLE_TELEMETRY / CREWAI_DISABLE_TRACKING: the other crewai-recognized opt-out names (CREWAI_TELEMETRY_OPT_OUT alone turned out to be a documented-but-unrecognized no-op). - ANONYMIZED_TELEMETRY: chromadb's opt-out switch. - POSTHOG_DISABLED: posthog (used by chromadb/crewai) opt-out. Any one of these may shorten the hang; together with the 600s budget the workflow should report SUCCESS on clean runs while still surfacing real pytest failures (cycle 1 verified this works in 8.47s). bead: ar-r82f.16 Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Auto-fixed two ruff violations introduced by PR #13: - freewheel_adapter.py: I001 unsorted imports (alphabetized) - freewheel_oauth.py: W292 missing trailing newline Also ran `ruff format` on both files for consistency; this reflowed two short multi-line calls in freewheel_adapter.py and re-wrapped one call in freewheel_oauth.py. No logic changes. Unblocks PR #15 (ar-r82f.16) — seller CI workflow fix needs main's Lint job to pass so verification can reach pytest. bead: ar-r82f.22 Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>

atc964 · 2026-05-28T20:16:04Z

Status — parked, not closing

Today's work proved this PR's workflow change is functionally correct — it surfaces real pytest failures (verified by cycle 1: poison-pill present → CI conclusion FAILURE in 8.47s). The previous workflow silently passed those failures, which is the bug this PR is fixing.

Why not merging tonight: seller-agent CI has a second, independent issue — a Python interpreter shutdown hang after pytest exits cleanly. Cycle 4 evidence (run 26599023320):

Lint: ✅ PASS
Pytest: ✅ PASS (719 passed in 7.63s)
Then: 591 seconds of dead-air post-pytest until the outer 600s timeout killed the process with exit 124
CI conclusion: FAILURE

If we merge this PR as-is, every subsequent seller-agent PR would fail CI on this same hang regardless of test results — strictly worse than the current (buggy, silent-pass) state.

Diagnosis (via ar-r82f.21): root cause is CREWAI_TELEMETRY_OPT_OUT being a no-op env var in crewai 1.14.6 (only OTEL_SDK_DISABLED, CREWAI_DISABLE_TELEMETRY, CREWAI_DISABLE_TRACKING are recognized). Even with all those set plus extra downstream-dep opt-outs (chromadb, posthog), the hang persists — meaning some other transitive-dep daemon thread is keeping the process alive past pytest exit.

Path to merge:

Land ar-r82f.21: write a monkey-patch shim in seller-agent (and likely buyer-agent) that unregisters the offending atexit handlers on import, OR file upstream issue with crewAIInc/crewAI for a real fix.
Rebase this PR onto post-ar-r82f.21 main.
Re-run CI — should be SUCCESS.
Then merge.

Moving to Draft for clarity. Branch and 600s+telemetry-env workflow changes are preserved — when ar-r82f.21 lands this can be picked up directly.

cc bead: ar-r82f.16 (blocked-by ar-r82f.21)

atc964 and others added 2 commits May 28, 2026 15:02

atc964 force-pushed the fix/ci-workflow-exit-codes branch from 01d9a5d to a361be9 Compare May 28, 2026 19:02

atc964 and others added 4 commits May 28, 2026 15:16

Remove CI poison-pill test (verification complete) (ar-r82f.16)

54fa488

Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>

atc964 mentioned this pull request May 28, 2026

Fix ruff lint violations introduced by PR #13 (ar-r82f.22) #16

Merged

2 tasks

Merge branch 'main' into fix/ci-workflow-exit-codes

3e74b98

atc964 marked this pull request as draft May 28, 2026 20:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15

Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15
atc964 wants to merge 7 commits into
mainfrom
fix/ci-workflow-exit-codes

atc964 commented May 28, 2026

Uh oh!

atc964 commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

atc964 commented May 28, 2026

Summary

Root Cause

Fix

Verification

Test plan

Bead

Uh oh!

atc964 commented May 28, 2026

Status — parked, not closing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant