Skip to content

Accept Sequence[UserContent] in common.ai TaskFlow decorators#67389

Merged
kaxil merged 3 commits into
apache:mainfrom
astronomer:common-ai-multimodal-prompts
May 24, 2026
Merged

Accept Sequence[UserContent] in common.ai TaskFlow decorators#67389
kaxil merged 3 commits into
apache:mainfrom
astronomer:common-ai-multimodal-prompts

Conversation

@kaxil
Copy link
Copy Markdown
Member

@kaxil kaxil commented May 23, 2026

Summary

@task.agent and the four sibling LLM decorators (@task.llm, @task.llm_branch, @task.llm_schema_compare, @task.llm_sql) currently reject any non-string return value from the user's callable:

if not isinstance(self.prompt, str) or not self.prompt.strip():
    raise TypeError("...must be a non-empty string.")

But pydantic-ai's Agent.run_sync accepts str | Sequence[UserContent], and these operators pass self.prompt straight through. The string-only constraint lives only in the decorator's execute -- there's no architectural reason for it.

This PR widens the validation so the callable may return a Sequence of pydantic-ai UserContent items (TextContent, ImageUrl, AudioUrl, DocumentUrl, VideoUrl, BinaryContent, UploadedFile, CachePoint) in addition to str. Vision, audio, and document inputs to pydantic-ai agents now work through the TaskFlow decorator path without falling back to PydanticAIHook.create_agent() inside a plain @task.

Usage

from pydantic_ai.messages import ImageUrl
from airflow.sdk import dag, task

@dag(...)
def vision_pipeline():
    @task.agent(llm_conn_id="pydantic_ai_default", system_prompt="You are a careful image analyst.")
    def describe(image_url: str):
        return ["Describe what you see in this image:", ImageUrl(url=image_url)]

    describe("https://example.com/sample.png")

Design rationale

Why decorator-only widening (operator __init__ types unchanged): Direct operator instantiation (AgentOperator(prompt=...)) is supported but uncommon -- the decorator path covers the primary use case. Widening the operator __init__ annotation would also tempt direct callers into shapes the rendered-fields capture path doesn't handle well. Decorator-only widening is a clean partial step; the operator prompt: str annotation stays, and direct-multimodal callers fall back to the same hook-level pattern they had before.

Why three layers of guards (decorator preflight → operator preflight → mixin guard): each layer catches a different bypass scenario:

  • Decorator preflight (@task.agent + enable_hitl_review=True + Sequence): fails fast on the obvious case before render_template_fields runs.
  • Operator preflight (AgentOperator.execute checking self.prompt after task SDK has rendered templates): catches the native template rendering bypass -- prompt="{{ params.parts }}" rendering into a Sequence at execute time -- and direct-operator construction.
  • LLMApprovalMixin.defer_for_approval guard: backstop in case any path bypasses the operator-level check; also prevents raw bytes from a BinaryContent from being interpolated into the human review body.

Why HITL/approval are blocked rather than coerced: AgentSessionData.prompt: str and SessionResponse.prompt: str (plugin + frontend) assume a string today. Silently stringifying a list into repr(['Describe:', ImageUrl(url='...')]) would expose object reprs (and embedded bytes) in the review UI. Fail-loudly is the right v1 behaviour. Widening the session model + review UI is tracked as a follow-up on the AIP-99 board.

Why llm_file_analysis keeps the string-only check: that operator builds request.user_content from prompt + files -- prompt is intentionally a string description and files are supplied separately. Multimodal is already supported there through the files kwarg. A one-line code comment documents this.

Gotchas / known limitations

  • HITL incompatibility: enable_hitl_review=True + Sequence prompt raises TypeError before the agent runs. Workaround: return a str prompt, or disable HITL review. Follow-up: widen AgentSessionData.prompt and the HITL review UI.
  • Approval incompatibility: require_approval=True + Sequence prompt raises TypeError before the agent runs (on @task.llm and @task.llm_sql; the inherited approval path is a no-op on @task.llm_branch and @task.llm_schema_compare -- pre-existing bug, separate follow-up).
  • Direct-operator type annotation drift: AgentOperator.__init__ still types prompt: str even though the runtime accepts more for the decorator path. mypy users instantiating the operator directly with a Sequence see the type warning; supported usage remains through the decorator. Widening direct-operator typing requires a safer rendered-fields representation for non-str prompts, which is out of scope for this PR.
  • Rendered Fields UI: for the decorator path, self.prompt is SET_DURING_EXECUTION at the pre-execute render_fields capture, so the UI shows "DYNAMIC (set during execution)" regardless of prompt shape. No bytes leak.

Follow-ups (tracked on AIP-99 board)

  • Widen AgentSessionData / SessionResponse to support multimodal prompts in HITL review.
  • Fix pre-existing require_approval=True no-op on LLMBranchOperator / LLMSchemaCompareOperator.
  • Render multimodal prompts safely in LLMApprovalMixin review body (remove the guard once safe).

Was generative AI tooling used to co-author this PR?
  • [ ]

@task.agent, @task.llm, @task.llm_branch, @task.llm_schema_compare and
@task.llm_sql decorators now accept a Sequence of pydantic-ai UserContent
items (ImageUrl, AudioUrl, DocumentUrl, etc.) in addition to str, mirroring
Agent.run_sync's input contract. This enables vision, audio, and document
inputs to pydantic-ai agents directly through the TaskFlow decorator path.

Sequence prompts fail loudly before any LLM call when combined with
enable_hitl_review=True (agent) or require_approval=True (llm, llm_sql) --
the HITL session model and approval review body both assume str prompts.
Both are tracked as follow-ups on the AIP-99 board.
kaxil added 2 commits May 23, 2026 22:05
The provider changelog is regenerated by the release manager from git log
at wave time; manually authoring a versioned block pre-empts that and
duplicates the auto-extraction from the commit title. The HITL/approval
limitations are already documented in the operator docs (agent.rst,
llm.rst) where they belong.
The verb form of 'stringified' is used in the new validate_prompt /
reject_sequence_with_unsupported_feature docstring; only the past-tense
forms were in the wordlist. Sphinx spellcheck failed on the docstring
during build-docs.
@kaxil kaxil merged commit 325f377 into apache:main May 24, 2026
143 checks passed
@kaxil kaxil deleted the common-ai-multimodal-prompts branch May 24, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants