Accept Sequence[UserContent] in common.ai TaskFlow decorators by kaxil · Pull Request #67389 · apache/airflow

kaxil · 2026-05-23T21:03:36Z

Summary

@task.agent and the four sibling LLM decorators (@task.llm, @task.llm_branch, @task.llm_schema_compare, @task.llm_sql) currently reject any non-string return value from the user's callable:

if not isinstance(self.prompt, str) or not self.prompt.strip():
    raise TypeError("...must be a non-empty string.")

But pydantic-ai's Agent.run_sync accepts str | Sequence[UserContent], and these operators pass self.prompt straight through. The string-only constraint lives only in the decorator's execute -- there's no architectural reason for it.

This PR widens the validation so the callable may return a Sequence of pydantic-ai UserContent items (TextContent, ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent, UploadedFile, CachePoint) in addition to str. Vision, audio, and document inputs to pydantic-ai agents now work through the TaskFlow decorator path without falling back to PydanticAIHook.create_agent() inside a plain @task.

Usage

from pydantic_ai.messages import ImageUrl
from airflow.sdk import dag, task

@dag(...)
def vision_pipeline():
    @task.agent(llm_conn_id="pydantic_ai_default", system_prompt="You are a careful image analyst.")
    def describe(image_url: str):
        return ["Describe what you see in this image:", ImageUrl(url=image_url)]

    describe("https://example.com/sample.png")

Design rationale

Why decorator-only widening (operator __init__ types unchanged): Direct operator instantiation (AgentOperator(prompt=...)) is supported but uncommon -- the decorator path covers the primary use case. Widening the operator __init__ annotation would also tempt direct callers into shapes the rendered-fields capture path doesn't handle well. Decorator-only widening is a clean partial step; the operator prompt: str annotation stays, and direct-multimodal callers fall back to the same hook-level pattern they had before.

Why three layers of guards (decorator preflight → operator preflight → mixin guard): each layer catches a different bypass scenario:

Decorator preflight (@task.agent + enable_hitl_review=True + Sequence): fails fast on the obvious case before render_template_fields runs.
Operator preflight (AgentOperator.execute checking self.prompt after task SDK has rendered templates): catches the native template rendering bypass -- prompt="{{ params.parts }}" rendering into a Sequence at execute time -- and direct-operator construction.
LLMApprovalMixin.defer_for_approval guard: backstop in case any path bypasses the operator-level check; also prevents raw bytes from a BinaryContent from being interpolated into the human review body.

Why HITL/approval are blocked rather than coerced: AgentSessionData.prompt: str and SessionResponse.prompt: str (plugin + frontend) assume a string today. Silently stringifying a list into repr(['Describe:', ImageUrl(url='...')]) would expose object reprs (and embedded bytes) in the review UI. Fail-loudly is the right v1 behaviour. Widening the session model + review UI is tracked as a follow-up on the AIP-99 board.

Why llm_file_analysis keeps the string-only check: that operator builds request.user_content from prompt + files -- prompt is intentionally a string description and files are supplied separately. Multimodal is already supported there through the files kwarg. A one-line code comment documents this.

Gotchas / known limitations

HITL incompatibility: enable_hitl_review=True + Sequence prompt raises TypeError before the agent runs. Workaround: return a str prompt, or disable HITL review. Follow-up: widen AgentSessionData.prompt and the HITL review UI.
Approval incompatibility: require_approval=True + Sequence prompt raises TypeError before the agent runs (on @task.llm and @task.llm_sql; the inherited approval path is a no-op on @task.llm_branch and @task.llm_schema_compare -- pre-existing bug, separate follow-up).
Direct-operator type annotation drift: AgentOperator.__init__ still types prompt: str even though the runtime accepts more for the decorator path. mypy users instantiating the operator directly with a Sequence see the type warning; supported usage remains through the decorator. Widening direct-operator typing requires a safer rendered-fields representation for non-str prompts, which is out of scope for this PR.
Rendered Fields UI: for the decorator path, self.prompt is SET_DURING_EXECUTION at the pre-execute render_fields capture, so the UI shows "DYNAMIC (set during execution)" regardless of prompt shape. No bytes leak.

Follow-ups (tracked on AIP-99 board)

Widen AgentSessionData / SessionResponse to support multimodal prompts in HITL review.
Fix pre-existing require_approval=True no-op on LLMBranchOperator / LLMSchemaCompareOperator.
Render multimodal prompts safely in LLMApprovalMixin review body (remove the guard once safe).

Was generative AI tooling used to co-author this PR?

[ ]

@task.agent, @task.llm, @task.llm_branch, @task.llm_schema_compare and @task.llm_sql decorators now accept a Sequence of pydantic-ai UserContent items (ImageUrl, AudioUrl, DocumentUrl, etc.) in addition to str, mirroring Agent.run_sync's input contract. This enables vision, audio, and document inputs to pydantic-ai agents directly through the TaskFlow decorator path. Sequence prompts fail loudly before any LLM call when combined with enable_hitl_review=True (agent) or require_approval=True (llm, llm_sql) -- the HITL session model and approval review body both assume str prompts. Both are tracked as follow-ups on the AIP-99 board.

The provider changelog is regenerated by the release manager from git log at wave time; manually authoring a versioned block pre-empts that and duplicates the auto-extraction from the commit title. The HITL/approval limitations are already documented in the operator docs (agent.rst, llm.rst) where they belong.

The verb form of 'stringified' is used in the new validate_prompt / reject_sequence_with_unsupported_feature docstring; only the past-tense forms were in the wordlist. Sphinx spellcheck failed on the docstring during build-docs.

kaxil requested a review from gopidesupavan as a code owner May 23, 2026 21:03

boring-cyborg Bot added area:providers kind:documentation provider:common-ai labels May 23, 2026

kaxil added 2 commits May 23, 2026 22:05

Add 'stringify' to global spelling_wordlist

ac9b3f6

The verb form of 'stringified' is used in the new validate_prompt / reject_sequence_with_unsupported_feature docstring; only the past-tense forms were in the wordlist. Sphinx spellcheck failed on the docstring during build-docs.

gopidesupavan approved these changes May 24, 2026

View reviewed changes

kaxil merged commit 325f377 into apache:main May 24, 2026
143 checks passed

kaxil deleted the common-ai-multimodal-prompts branch May 24, 2026 20:33

gopidesupavan mentioned this pull request May 24, 2026

Add BaseAIHook and Update usages #67438

Open

16 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept Sequence[UserContent] in common.ai TaskFlow decorators#67389

Accept Sequence[UserContent] in common.ai TaskFlow decorators#67389
kaxil merged 3 commits into
apache:mainfrom
astronomer:common-ai-multimodal-prompts

kaxil commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kaxil commented May 23, 2026

Summary

Usage

Design rationale

Gotchas / known limitations

Follow-ups (tracked on AIP-99 board)

Was generative AI tooling used to co-author this PR?

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants