Skip to content

Fix deferrable execution_timeout handling in AirbyteTriggerSyncOperator#67382

Open
SameerMesiah97 wants to merge 1 commit into
apache:mainfrom
SameerMesiah97:AirbyteTriggerSyncOperator-Defer-None-Timeout
Open

Fix deferrable execution_timeout handling in AirbyteTriggerSyncOperator#67382
SameerMesiah97 wants to merge 1 commit into
apache:mainfrom
SameerMesiah97:AirbyteTriggerSyncOperator-Defer-None-Timeout

Conversation

@SameerMesiah97
Copy link
Copy Markdown
Contributor

Description

This change fixes an issue (as reported in Issue #64048) affecting AirbyteTriggerSyncOperator in deferrable mode that was not fully resolved by PR #64051.

Previously, execution_timeout was passed directly to defer(). The operator now explicitly passes timeout=None to defer() while still preserving execution_deadline handling within the trigger/operator flow.

Rationale

In deferrable mode, Airbyte job cancellation is performed within execute_complete() when the trigger emits a timeout event.

However, framework-level deferred timeout handling bypasses execute_complete() entirely and does not invoke on_kill() in the triggerer process. As a result, Airbyte jobs could continue running after the Airflow task timed out, leading to leaked workloads and excessive resource consumption.

While this weakens the framework-level deferred timeout guarantee, in this case preventing leaked external workloads takes precedence because there is currently no equivalent cleanup path in the triggerer process.

This change keeps timeout handling within the trigger/operator flow so Airbyte job cancellation can occur correctly.

Tests

Added an operator-level regression test verifying that defer() is called with timeout=None while execution_deadline handling remains preserved for Airbyte job timeout cancellation processing.

Backwards Compatibility

This change does not modify public APIs or method signatures.

Related: #64048

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: [GPT 5.5] following the guidelines

@SameerMesiah97 SameerMesiah97 force-pushed the AirbyteTriggerSyncOperator-Defer-None-Timeout branch from db25926 to f465a78 Compare May 23, 2026 18:25
Prevent framework-level deferred timeouts from bypassing
execute_complete() cancellation handling for Airbyte jobs.
@SameerMesiah97 SameerMesiah97 force-pushed the AirbyteTriggerSyncOperator-Defer-None-Timeout branch from f465a78 to 7a3d3bd Compare May 23, 2026 19:08
Copy link
Copy Markdown
Contributor

@henry3260 henry3260 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR!

self.log.debug("Running in deferrable mode in job state %s...", state)
if state in (JobStatusEnum.RUNNING, JobStatusEnum.PENDING, JobStatusEnum.INCOMPLETE):
self.defer(
timeout=self.execution_timeout,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is an edge case here. If execution_timeout=30s, but the Airbyte job reaches SUCCEEDED after 45s, the trigger can still mark the task as successful because it checks the Airbyte job status before checking execution_deadline:

if job_run_status == JobStatusEnum.SUCCEEDED:
yield TriggerEvent(
{
"status": "success",
"message": f"Job run {self.job_id} has completed successfully.",
"job_id": self.job_id,
}
)
return
elif job_run_status == JobStatusEnum.CANCELLED:
yield TriggerEvent(
{
"status": "cancelled",
"message": f"Job run {self.job_id} has been cancelled.",
"job_id": self.job_id,
}
)
return
if self.execution_deadline is not None:

After this PR changes defer(timeout=None), the framework-level deferred timeout no longer catches this case. Should the trigger check execution_deadline before accepting SUCCEEDED / CANCELLED, so execution_timeout remains a hard task-level limit?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@henry3260

I agree. This is a very valid edge case but this PR was submitted on short notice to fix a resource leak before an RC2 can be cut for the airbyte provider. For the fix in this PR to land in RC2, it has to be merged by 2026-05-26 so the scope is purposely being kept very tight so that even a generalist reviewer can confidently sign off on it. I believe it might be better for me to open another PR to address your comment. If you disagree with my reasoning, you are free to express yourself here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants