Skip to content

Introduce AWS Strands Agents hook to common AI provider and Skills integration#67450

Open
gopidesupavan wants to merge 3 commits into
apache:mainfrom
gopidesupavan:add-strands-hook-gemini
Open

Introduce AWS Strands Agents hook to common AI provider and Skills integration#67450
gopidesupavan wants to merge 3 commits into
apache:mainfrom
gopidesupavan:add-strands-hook-gemini

Conversation

@gopidesupavan
Copy link
Copy Markdown
Member

Add Strands Agents hook to common AI provider

Summary

Add AWS Strands Agents as a new agent backend for AgentOperator and @task.agent in the common AI provider, building on the BaseAIHook contract.

  • Introduce StrandsHook (shared Strands SDK integration) and StrandsGeminiHook as the first backend (conn_type: strands-gemini, default connection ID: strands_default)

  • Wire toolsets through _tool_spec_to_native, converting ToolSpec instances to Strands-native tools

  • Add skills support end-to-end: SkillSpec dataclass on BaseAIHook, skills / skills_params on AgentRunRequest and AgentOperator, and Strands AgentSkills plugin integration for filesystem paths and inline skill definitions

  • Register the new connection type in provider.yaml / get_provider_info.py and add optional dependency:

    pip install 'apache-airflow-providers-common-ai[strands]'

    (strands-agents[gemini]>=1.0.0)

  • Add example DAGs (example_strands.py) covering basic operator usage, skills, inline SkillSpec + SQL toolset, direct hook usage, and @task.agent

  • Document connection setup, hook usage, and operator skills in new/updated RST pages

Depends on

BaseAIHook PR #67438


Follow-ups

Durable execution for Strands (durable=True)

StrandsHook currently sets supports_durable=False. A follow-up PR should mirror the pydantic-ai durable path so Strands agents can resume from cached steps on task retry.

Out of scope for this PR: usage limits for Strands hooks.

Skills for Pydantic AI (pydantic-ai-skills)

PydanticAIHook currently leaves supports_skills=False, so AgentOperator.skills / skills_params only work with Strands backends in this PR. A follow-up should wire the same operator-level skills API to pydantic-ai via the pydantic-ai-skills library (Agent Skills / agentskills.io spec with progressive disclosure).

Files changed

Area Files
Hooks hooks/strands_ai.py, hooks/base_ai.py
Operator operators/agent.py
Examples example_dags/example_strands.py
Provider metadata provider.yaml, get_provider_info.py, pyproject.toml
Docs docs/connections/strands.rst, docs/hooks/strands_ai.rst, docs/operators/agent.rst, …
Tests tests/unit/common/ai/hooks/test_strands_ai.py, test_base_ai.py, operators/test_agent.py

Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst, in airflow-core/newsfragments. You can add this file in a follow-up commit after the PR is created so you know the PR number.

def tool_fn(*args: Any, **kwargs: Any) -> Any:
return fn(*args, **kwargs)

tool_fn.__name__ = spec.name.replace("-", "_")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

spec.name.replace("-", "_") only handles hyphens. Names with spaces ("my tool"), dots ("ns.tool"), leading digits ("123-tool" -> "123_tool"), or empty strings still pass through and produce invalid Python identifiers, which breaks Strands' inspect.signature()-based schema inference downstream.

Suggest a stricter normalizer plus a non-empty guard, e.g.:

import re
safe = re.sub(r"\W|^(?=\d)", "_", spec.name) or "tool"
tool_fn.__name__ = safe

or reject invalid names up-front in ToolSpec validation.


def _skill_spec_to_native(self, skill: str | SkillSpec) -> Any:
"""Convert a skill source to a Strands-native skill object or path."""
if isinstance(skill, SkillSpec) and not skill.path:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Precedence behavior here silently drops fields: when a SkillSpec has both path set and inline name/description/instructions set, the not skill.path branch is skipped and super()._skill_spec_to_native(skill) returns skill.path -- the inline fields are discarded with no warning.

Also, validation only happens inside the Strands branch, so a SkillSpec(name="x") (missing instructions/description, no path) only fails when Strands is the backend. Other hooks via the base path get a different error or none.

Recommend moving the mutually-exclusive validation into SkillSpec.__post_init__ so misuse fails fast regardless of backend, and explicitly rejecting path + inline combinations rather than silently preferring one.

)

def test_connection(self) -> tuple[bool, str]:
"""Validate the connection by instantiating the model (no API call)."""
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: the "(no API call)" claim isn't guaranteed. get_model() calls GeminiModel(**kwargs), whose constructor behavior depends on the strands-agents/google-genai SDK version -- some versions probe credentials or metadata at construction.

Either drop the parenthetical or pin it: "no remote call as of strands-agents X.Y". Otherwise a future SDK bump could quietly turn test_connection into a network call without anyone noticing.

- /docs/apache-airflow-providers-common-ai/operators/llamaindex_embedding.rst
- /docs/apache-airflow-providers-common-ai/operators/llamaindex_retrieval.rst
tags: [ai]
- integration-name: Strands Agents
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing how-to-guide: for the Strands Agents integration. The Pydantic AI and LlamaIndex entries above each list how-to-guide: pointing at their .rst files (see lines 47-50 and 58-60), but the Strands block has only integration-name, external-doc-url, and tags.

Without how-to-guide, the new strands.rst / example_strands.py won't be linked from the integration index page in the docs. Add the .rst paths once the docs file is in place.

prompt="Extract tables from report.pdf and summarize the findings.",
llm_conn_id="strands_default",
system_prompt="You are a document processing assistant.",
skills=["/opt/airflow/skills/pdf-processing"],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hardcoded host paths (/opt/airflow/skills/pdf-processing here, /opt/airflow/skills/web-research further down) won't exist on a stock install -- anyone copy-pasting the example will hit a confusing failure at agent build time.

Suggest one of: (1) use Variable.get("strands_skill_path") so the path is configurable, (2) point to a path that ships with the provider (e.g. relative to example_dags/skills/), or (3) add an inline comment explicitly telling the user to place their skill bundle at that path before running the DAG.

self.output_type = output_type
self.toolsets = toolsets
self.skills = skills
self.skills_params = skills_params or {}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skills_params was added to template_fields (line ~148) but is stored and used as a dict. Jinja renders a dict to a str (e.g. "{'strict': True}"), not back to a dict, so at execute time dict(self.skills_params) either raises or yields character pairs.

Options:

  • drop skills_params from template_fields if templating it isn't a real requirement,
  • register a JSON renderer via template_fields_renderers = {"skills_params": "json"} and pre-parse,
  • or template individual values rather than the whole dict.

Same caveat already applies to the pre-existing agent_params -- worth a follow-up there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants