Skip to content

Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15

Draft
atc964 wants to merge 7 commits into
mainfrom
fix/ci-workflow-exit-codes
Draft

Fix CI workflow to fail Test job on pytest failures (ar-r82f.16)#15
atc964 wants to merge 7 commits into
mainfrom
fix/ci-workflow-exit-codes

Conversation

@atc964
Copy link
Copy Markdown
Collaborator

@atc964 atc964 commented May 28, 2026

Summary

The CI Test job was silently passing on failing pytest runs. Discovered on
PR #14's run 26593648988:
pytest reported 5 failed, 709 passed but the Test job concluded success.

This means all recent "green CI" claims on main may be unreliable.

Root Cause

The Unit tests / Integration tests steps wrapped pytest in a shell timeout
with exit-code mapping:

set +e
timeout --kill-after=10s 60s pytest tests/unit/ -v --tb=short
code=$?
([ $code -eq 124 ] || [ $code -eq 137 ]) && exit 0 || exit $code

The intent was to swallow only true timeouts (124) or SIGKILLs (137).
In practice, pytest itself completes quickly (~9s), but the python interpreter
then hangs for ~50s during shutdown cleaning up crewai telemetry threads
(cannot schedule new futures after interpreter shutdown log spam).

The 60s shell timeout was firing during that shutdown hang, killing the
interpreter with SIGTERM, returning exit code 124. The workflow then mapped
124 to exit 0 — even though pytest had already printed 5 failed, 709 passed
before the kill.

Timeline from run 26593648988 (Unit tests, py3.12):

  • 18:19:34 — step started
  • 18:19:45 — pytest done, reported 5 failed, 709 passed
  • 18:20:34 — interpreter still hanging on crewai telemetry shutdown
  • 18:20:35 — 60s timeout fires → SIGTERM python → exit 124 → mapped to exit 0
  • Step concluded: SUCCESS

Fix

Drop the shell timeout wrapper entirely. Use pytest-timeout for per-test
deadlines (30s unit, 60s integration). Pytest's exit code now propagates
directly to GitHub Actions.

Changes:

  • .github/workflows/ci.yml: replace the set +e ... timeout ... exit code mapping block with pytest --timeout=N for both Unit tests and Integration tests steps.
  • pyproject.toml: add pytest-timeout>=2.3.0 to dev deps.

Verification

Done with deliberate failure injection on this branch.

  1. Pushed the fix commit alone → expect green CI on this branch's first push.
  2. Added a poison-pill commit (tests/unit/test_ci_poison_pill_DELETEME.py)
    with assert False → expect CI Test job to conclude FAILURE (the whole
    point of this PR).
  3. Watch the poison-pill run conclude red.
  4. Revert the poison-pill commit → expect green CI again.

Verification status will be appended as commits and CI runs land.

Test plan

  • Workflow committed and pushed
  • Poison-pill commit causes CI Test job conclusion = FAILURE (proves fix works)
  • Poison-pill reverted, CI Test job conclusion = SUCCESS (proves no regression)
  • Final state: ci.yml fix + pytest-timeout dep, no poison-pill in tree

Bead

ar-r82f.16 — Fix seller-agent CI: workflow silently passes on failing tests

atc964 and others added 2 commits May 28, 2026 15:02
The previous workflow wrapped pytest in `timeout 60s ... && exit 0 on code
124/137`, which swallowed exit code 124 (timeout-killed) as success.

In practice, pytest itself completes quickly (~9s) but the python interpreter
hangs ~50s during shutdown cleaning up crewai telemetry threads. The 60s
shell `timeout` was firing during that shutdown hang, killing the interpreter
with SIGTERM, returning 124, which the workflow then mapped to exit 0 even
though pytest had reported test failures.

Symptom: run 26593648988 reported "5 failed, 709 passed" but the Test job
concluded SUCCESS.

Fix: drop the shell timeout wrapper entirely. Use pytest-timeout for per-test
deadlines (30s unit, 60s integration) and let pytest's exit code propagate
to GitHub Actions directly.

bead: ar-r82f.16

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Temporary commit. Verifies the fix in the previous commit causes the
CI Test job to conclude FAILURE on pytest failures (previously it
silently passed). This commit will be reverted once CI is observed red.

bead: ar-r82f.16

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
@atc964 atc964 force-pushed the fix/ci-workflow-exit-codes branch from 01d9a5d to a361be9 Compare May 28, 2026 19:02
atc964 and others added 4 commits May 28, 2026 15:16
… method

The previous CI runs hung after pytest printed its summary line. The default
signal-based pytest-timeout cannot kill processes that ignore signals or are
stuck in teardown/atexit code paths. We layer two safety nets:

1. Outer `timeout --kill-after=10s 300s` ensures the shell kills the process
   group if pytest itself fails to exit (the actual failure mode in run
   26595978562 — pytest summary printed, then 11min hang until cancellation).
2. Inner `--timeout=30 --timeout-method=thread` gives finer-grained per-test
   protection using thread-based termination, more reliable than signals in
   containerized CI.

No exit-code masking. Both timeouts propagate failures naturally as job
failures (exit 124 or 137).

bead: ar-r82f.16

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.7 <noreply@anthropic.com>
…r82f.16)

crewai 1.14.6 has a telemetry shutdown handler at
crewai/telemetry/telemetry.py:211 that raises RuntimeError
("cannot schedule new futures after interpreter shutdown") on clean
test exits. This causes pytest to pass in ~7s but the Python
interpreter to hang ~5min on shutdown, tripping the outer 300s
timeout and producing exit 124 / CI conclusion FAILURE despite all
tests passing.

CREWAI_TELEMETRY_OPT_OUT alone does not suppress this shutdown path.
crewai's telemetry uses OpenTelemetry under the hood, so setting
OTEL_SDK_DISABLED=true should fully disable the shutdown handler
and let the interpreter exit cleanly.

Investigation tracked in ar-r82f.21.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Pragmatic ship for the lingering interpreter-shutdown hang. Cycle 3
confirmed pytest itself completes (no traceback in the killed runs),
but the Python interpreter hangs ~5min on exit, triggering the outer
timeout 300s and producing exit 124 / FAILURE despite a clean test run.

Two-part fix:

1. Bump outer `timeout` from 300s to 600s on both pytest steps. Even
   if the interpreter still sits in atexit handlers for several
   minutes, it has room to complete naturally instead of being killed.

2. Add belt-and-suspenders telemetry opt-out env vars on both steps,
   targeting the dependencies most likely to be registering atexit
   network calls:
     - CREWAI_DISABLE_TELEMETRY / CREWAI_DISABLE_TRACKING: the other
       crewai-recognized opt-out names (CREWAI_TELEMETRY_OPT_OUT alone
       turned out to be a documented-but-unrecognized no-op).
     - ANONYMIZED_TELEMETRY: chromadb's opt-out switch.
     - POSTHOG_DISABLED: posthog (used by chromadb/crewai) opt-out.

Any one of these may shorten the hang; together with the 600s budget
the workflow should report SUCCESS on clean runs while still surfacing
real pytest failures (cycle 1 verified this works in 8.47s).

bead: ar-r82f.16

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
atc964 added a commit that referenced this pull request May 28, 2026
Auto-fixed two ruff violations introduced by PR #13:
- freewheel_adapter.py: I001 unsorted imports (alphabetized)
- freewheel_oauth.py: W292 missing trailing newline

Also ran `ruff format` on both files for consistency; this reflowed
two short multi-line calls in freewheel_adapter.py and re-wrapped one
call in freewheel_oauth.py. No logic changes.

Unblocks PR #15 (ar-r82f.16) — seller CI workflow fix needs main's
Lint job to pass so verification can reach pytest.

bead: ar-r82f.22

Co-authored-by: Claude Sonnet 4.5 <noreply@anthropic.com>
@atc964
Copy link
Copy Markdown
Collaborator Author

atc964 commented May 28, 2026

Status — parked, not closing

Today's work proved this PR's workflow change is functionally correct — it surfaces real pytest failures (verified by cycle 1: poison-pill present → CI conclusion FAILURE in 8.47s). The previous workflow silently passed those failures, which is the bug this PR is fixing.

Why not merging tonight: seller-agent CI has a second, independent issue — a Python interpreter shutdown hang after pytest exits cleanly. Cycle 4 evidence (run 26599023320):

  • Lint: ✅ PASS
  • Pytest: ✅ PASS (719 passed in 7.63s)
  • Then: 591 seconds of dead-air post-pytest until the outer 600s timeout killed the process with exit 124
  • CI conclusion: FAILURE

If we merge this PR as-is, every subsequent seller-agent PR would fail CI on this same hang regardless of test results — strictly worse than the current (buggy, silent-pass) state.

Diagnosis (via ar-r82f.21): root cause is CREWAI_TELEMETRY_OPT_OUT being a no-op env var in crewai 1.14.6 (only OTEL_SDK_DISABLED, CREWAI_DISABLE_TELEMETRY, CREWAI_DISABLE_TRACKING are recognized). Even with all those set plus extra downstream-dep opt-outs (chromadb, posthog), the hang persists — meaning some other transitive-dep daemon thread is keeping the process alive past pytest exit.

Path to merge:

  1. Land ar-r82f.21: write a monkey-patch shim in seller-agent (and likely buyer-agent) that unregisters the offending atexit handlers on import, OR file upstream issue with crewAIInc/crewAI for a real fix.
  2. Rebase this PR onto post-ar-r82f.21 main.
  3. Re-run CI — should be SUCCESS.
  4. Then merge.

Moving to Draft for clarity. Branch and 600s+telemetry-env workflow changes are preserved — when ar-r82f.21 lands this can be picked up directly.

cc bead: ar-r82f.16 (blocked-by ar-r82f.21)

@atc964 atc964 marked this pull request as draft May 28, 2026 20:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant