Accept Sequence[UserContent] in common.ai TaskFlow decorators#67389
Merged
Conversation
@task.agent, @task.llm, @task.llm_branch, @task.llm_schema_compare and @task.llm_sql decorators now accept a Sequence of pydantic-ai UserContent items (ImageUrl, AudioUrl, DocumentUrl, etc.) in addition to str, mirroring Agent.run_sync's input contract. This enables vision, audio, and document inputs to pydantic-ai agents directly through the TaskFlow decorator path. Sequence prompts fail loudly before any LLM call when combined with enable_hitl_review=True (agent) or require_approval=True (llm, llm_sql) -- the HITL session model and approval review body both assume str prompts. Both are tracked as follow-ups on the AIP-99 board.
The provider changelog is regenerated by the release manager from git log at wave time; manually authoring a versioned block pre-empts that and duplicates the auto-extraction from the commit title. The HITL/approval limitations are already documented in the operator docs (agent.rst, llm.rst) where they belong.
The verb form of 'stringified' is used in the new validate_prompt / reject_sequence_with_unsupported_feature docstring; only the past-tense forms were in the wordlist. Sphinx spellcheck failed on the docstring during build-docs.
gopidesupavan
approved these changes
May 24, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
@task.agentand the four sibling LLM decorators (@task.llm,@task.llm_branch,@task.llm_schema_compare,@task.llm_sql) currently reject any non-string return value from the user's callable:But pydantic-ai's
Agent.run_syncacceptsstr | Sequence[UserContent], and these operators passself.promptstraight through. The string-only constraint lives only in the decorator'sexecute-- there's no architectural reason for it.This PR widens the validation so the callable may return a
Sequenceof pydantic-aiUserContentitems (TextContent,ImageUrl,AudioUrl,DocumentUrl,VideoUrl,BinaryContent,UploadedFile,CachePoint) in addition tostr. Vision, audio, and document inputs to pydantic-ai agents now work through the TaskFlow decorator path without falling back toPydanticAIHook.create_agent()inside a plain@task.Usage
Design rationale
Why decorator-only widening (operator
__init__types unchanged): Direct operator instantiation (AgentOperator(prompt=...)) is supported but uncommon -- the decorator path covers the primary use case. Widening the operator__init__annotation would also tempt direct callers into shapes the rendered-fields capture path doesn't handle well. Decorator-only widening is a clean partial step; the operatorprompt: strannotation stays, and direct-multimodal callers fall back to the same hook-level pattern they had before.Why three layers of guards (decorator preflight → operator preflight → mixin guard): each layer catches a different bypass scenario:
@task.agent+enable_hitl_review=True+ Sequence): fails fast on the obvious case before render_template_fields runs.AgentOperator.executecheckingself.promptafter task SDK has rendered templates): catches the native template rendering bypass --prompt="{{ params.parts }}"rendering into a Sequence at execute time -- and direct-operator construction.LLMApprovalMixin.defer_for_approvalguard: backstop in case any path bypasses the operator-level check; also prevents raw bytes from aBinaryContentfrom being interpolated into the human review body.Why HITL/approval are blocked rather than coerced:
AgentSessionData.prompt: strandSessionResponse.prompt: str(plugin + frontend) assume a string today. Silently stringifying a list intorepr(['Describe:', ImageUrl(url='...')])would expose object reprs (and embedded bytes) in the review UI. Fail-loudly is the right v1 behaviour. Widening the session model + review UI is tracked as a follow-up on the AIP-99 board.Why
llm_file_analysiskeeps the string-only check: that operator buildsrequest.user_contentfromprompt + files-- prompt is intentionally a string description and files are supplied separately. Multimodal is already supported there through thefileskwarg. A one-line code comment documents this.Gotchas / known limitations
enable_hitl_review=True+ Sequence prompt raisesTypeErrorbefore the agent runs. Workaround: return astrprompt, or disable HITL review. Follow-up: widenAgentSessionData.promptand the HITL review UI.require_approval=True+ Sequence prompt raisesTypeErrorbefore the agent runs (on@task.llmand@task.llm_sql; the inherited approval path is a no-op on@task.llm_branchand@task.llm_schema_compare-- pre-existing bug, separate follow-up).AgentOperator.__init__still typesprompt: streven though the runtime accepts more for the decorator path. mypy users instantiating the operator directly with aSequencesee the type warning; supported usage remains through the decorator. Widening direct-operator typing requires a safer rendered-fields representation for non-str prompts, which is out of scope for this PR.Rendered FieldsUI: for the decorator path,self.promptisSET_DURING_EXECUTIONat the pre-execute render_fields capture, so the UI shows"DYNAMIC (set during execution)"regardless of prompt shape. No bytes leak.Follow-ups (tracked on AIP-99 board)
AgentSessionData/SessionResponseto support multimodal prompts in HITL review.require_approval=Trueno-op onLLMBranchOperator/LLMSchemaCompareOperator.LLMApprovalMixinreview body (remove the guard once safe).Was generative AI tooling used to co-author this PR?