Skip to content

test(schedule): remove describe_schedule workaround now that server handles closed schedulers#109

Open
abhishekj720 wants to merge 4 commits into
mainfrom
fix-describe-schedule-retry
Open

test(schedule): remove describe_schedule workaround now that server handles closed schedulers#109
abhishekj720 wants to merge 4 commits into
mainfrom
fix-describe-schedule-retry

Conversation

@abhishekj720
Copy link
Copy Markdown
Contributor

@abhishekj720 abhishekj720 commented May 19, 2026

What changed?

  • Removed client-side retry loop that worked around a transient QueryFailedError from DescribeSchedule right after
    CreateSchedule
  • All call sites replaced with plain client.describe_schedule(...)

Why?
The workaround was added because DescribeSchedule is backed by a QueryWorkflow call internally, and a freshly created scheduler workflow could transiently fail the query before processing its first decision task.

cadence-workflow/cadence#8065 fixes the server-side handling of scheduler workflow state, with that fix in, the client no longer needs to paper over the error.

How did you test it?

- Existing unit tests still pass (uv run pytest tests/cadence/test_schedule.py -q — 21 passed)
- Integration tests are structurally unchanged in what they verify; only the helper wrapper was removed

Potential risks
N/A

Release notes
N/A

Documentation Changes
N/A

Signed-off-by: abhishek.jha <abhishek.jha@uber.com>
Comment thread cadence/client.py Outdated
@abhishekj720 abhishekj720 changed the title fix(schedule): handle transient QueryFailedError in describe_schedule internally fix(schedule): handle transient QueryFailedError in describe_schedule internally May 19, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

Files with missing lines Coverage Δ
cadence/client.py 87.02% <100.00%> (+0.51%) ⬆️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Signed-off-by: abhishek.jha <abhishek.jha@uber.com>
Signed-off-by: abhishek.jha <abhishek.jha@uber.com>
@abhishekj720 abhishekj720 changed the title fix(schedule): handle transient QueryFailedError in describe_schedule internally test(schedule): remove describe_schedule workaround now that server handles closed schedulers May 19, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 19, 2026

CI failed: The `test_create_describe_delete` integration test consistently fails with a `QueryFailedError` because the workflow is queried before it has processed a decision task. This indicates that the recent removal of the workaround in the PR exposed a race condition in the test suite.

Overview

All analyzed CI jobs failed during the test_create_describe_delete integration test. The failure is caused by an INVALID_ARGUMENT gRPC error, specifically: "workflow must handle at least one decision task before it can be queried."

Failures

QueryFailedError in test_create_describe_delete (confidence: high)

  • Type: test
  • Affected jobs: 76852581769, 76852584944, 76852581757
  • Related to change: yes
  • Root cause: The test attempts to describe_schedule immediately after creation. The Cadence server requires a workflow to process at least one decision task before it accepts queries. The previous code contained a workaround for this; its removal has surfaced this race condition in the test suite.
  • Suggested fix: Update test_create_describe_delete to include a short polling mechanism or a delay after creating the schedule to ensure the workflow has processed a decision task before the describe operation is invoked.

Summary

  • Change-related failures: 1 (Integration test failure due to removed workaround)
  • Infrastructure/flaky failures: 0
  • Recommended action: Modify test_create_describe_delete to wait for the schedule to become ready, as the underlying workflow requires a decision task to be completed before it becomes queryable.
Code Review ✅ Approved 1 resolved / 1 findings

Implements internal retries for transient QueryFailedErrors in describe_schedule, simplifying caller logic. The implementation was updated to use asyncio.get_running_loop() to address the deprecation finding.

✅ 1 resolved
Quality: Use asyncio.get_running_loop() instead of deprecated get_event_loop()

📄 cadence/client.py:632 📄 cadence/client.py:642
The code uses asyncio.get_event_loop() on lines 632 and 642 to get the loop's monotonic clock. Since the project targets Python >=3.11 and describe_schedule is always called from an async context, asyncio.get_running_loop() is the correct API. get_event_loop() has been deprecated for use without a running loop since Python 3.10, and while it works here (since there's always a running loop), get_running_loop() makes the intent explicit and avoids any future deprecation warnings.

Tip

Comment Gitar fix CI or enable auto-apply: gitar auto-apply:on

Options

Auto-apply is off → Gitar will not commit updates to this branch.
Display: compact → Showing less information.

Comment with these commands to change:

Auto-apply Compact
gitar auto-apply:on         
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant