Skip to content

fix: preserve discriminated-union schema in tool conversion#1104

Open
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/issue-989
Open

fix: preserve discriminated-union schema in tool conversion#1104
planetf1 wants to merge 2 commits into
generative-computing:mainfrom
planetf1:fix/issue-989

Conversation

@planetf1
Copy link
Copy Markdown
Contributor

@planetf1 planetf1 commented May 20, 2026

Bug Fix: discriminated-union tool parameters

Type of PR

  • Bug Fix
  • New Feature
  • Documentation
  • Other

Description

A tool parameter typed as a Pydantic discriminated union — Annotated[A | B, Field(discriminator="kind")], with or without | None — currently collapses to {"type": "string"} in the schema emitted by convert_function_to_ollama_tool.

Despite the function name, that schema is consumed by every backend (as_json_tool is read by Ollama, OpenAI, LiteLLM, Watsonx, and HuggingFace), so discriminated-union tool parameters are silently broken across the board: the model sees a string and hallucinates a payload, and validate_tool_arguments rejects valid dicts.

Root cause

Pydantic emits the union as oneOf plus an OAS-3 discriminator keyword:

// Required parameter:
{"discriminator": {...}, "oneOf": [{"$ref": "#/$defs/Cat"}, {"$ref": "#/$defs/Dog"}]}

// Optional parameter:
{"anyOf": [{"discriminator": {...}, "oneOf": [...]}, {"type": "null"}]}

Neither oneOf nor discriminator is in the JSON Schema subset accepted by tool-calling APIs. The existing inliner only descends into anyOf and $ref, so the structure falls through to the primitive-flattening branch and emerges as {"type": "string"}.

Fix

Adds a pre-pass that flattens both shapes to plain anyOf of inlined object branches, with the OAS discriminator keyword stripped. The Literal constraints on the tag field already carry the discriminator signal, so dropping the OAS keyword is a no-op semantically but makes the schema acceptable to tool-calling APIs.

_flatten_discriminated_union is non-mutating (returns a new schema) and defensively merges into any pre-existing top-level anyOf rather than overwriting. Flattening is single-level; nested discriminated unions are tracked alongside the recursive $ref resolution work in #911 (follow-up filed as #1105).

Before / after

class Cat(BaseModel):
    kind: Literal["cat"]
    name: str

class Dog(BaseModel):
    kind: Literal["dog"]
    name: str
    breed: str

def act(pet: Annotated[Cat | Dog, Field(discriminator="kind")]) -> str:
    """Act on a pet.

    Args:
        pet: the pet
    """
    return "ok"

Before ({"type": "string"} is wrong):

"pet": {"type": "string", "description": "the pet"}

After (preserved union, discriminator stripped, refs inlined):

"pet": {
  "anyOf": [
    {"type": "object", "properties": {"kind": {"const": "cat", ...}, "name": {...}}, "required": ["kind", "name"], "title": "Cat"},
    {"type": "object", "properties": {"kind": {"const": "dog", ...}, "name": {...}, "breed": {...}}, "required": ["kind", "name", "breed"], "title": "Dog"}
  ],
  "title": "Pet",
  "description": "the pet"
}

Testing

  • Tests added to the respective file if code was changed
  • New code has 100% coverage if code was added
  • Ensure existing tests and github automation passes (a maintainer will kick off the github automation when the rest of the PR is populated)

Test additions:

  • test/backends/test_schema_helpers.py — extends _is_complex_anyof coverage for oneOf branches (with and without discriminator metadata).
  • test/backends/test_discriminated_union_tools.py — new file:
    • Schema-shape assertions (required + optional variants): no collapse to {"type":"string"}, both Cat and Dog branches survive as fully-inlined object schemas, OAS discriminator keyword is stripped (asserted on both required and optional paths), no dangling $ref leaks, optional variant is removed from required.
    • Three-arm union: Annotated[Cat | Dog | Fish, Field(discriminator="kind")] preserves all three branches.
    • Non-discriminated Optional[Email] regression guard: the new pre-pass must be a no-op for plain $ref + | None shapes that the existing inliner already handles.
    • Validation round-trip via validate_tool_arguments: accepts valid {"kind":"dog", ...} and {"kind":"cat", ...} payloads, rejects bare strings, rejects dicts missing the discriminator, accepts both omitted and supplied for the optional variant.

Local regression: uv run pytest test/ -m "not qualitative" --ignore=test/stdlib/tools/test_mcp.py1847 passed, 1 pre-existing failure unrelated to this change (test_example_collection_sanity — depends on optional extras like mcp being installed).

Notes for reviewers

  • The fix lives in one place (mellea/backends/tools.py) but applies to every tool-calling backend because they all consume MelleaTool.as_json_tool. No backend-specific changes are needed.
  • oneOfanyOf rather than the reverse: OpenAI strict-mode tool schema rejects oneOf; the Literal tag enforces uniqueness in practice, so the choice is purely about which keyword the consumers accept.
  • The discriminator-stripping is a behaviour change for any consumer that may have been relying on it. Grep finds no internal consumer.
  • Out of scope: the function name convert_function_to_ollama_tool is misleading (it is the canonical OpenAI-tool-format converter for all backends), but renaming is a public-API change worth a separate PR.

Follow-ups filed during review

Attribution

  • AI coding assistants used

A tool parameter typed as a Pydantic discriminated union
(``Annotated[A | B, Field(discriminator="kind")]``, with or without
``| None``) currently collapses to ``{"type": "string"}`` in the schema
emitted by ``convert_function_to_ollama_tool``. Because that schema is
shared by every backend (Ollama, OpenAI, Watsonx, HuggingFace, LiteLLM),
discriminated-union tool parameters are silently broken across the
board: the model sees a string and hallucinates a payload, and
``validate_tool_arguments`` rejects valid dicts.

Pydantic emits the union as ``oneOf`` plus an OAS-3 ``discriminator``
keyword, neither of which is in the JSON Schema subset accepted by
tool-calling APIs. The existing inliner only descends into ``anyOf`` and
``$ref``, so the structure falls through to the primitive-flattening
branch.

Fix: add a pre-pass that flattens the discriminated-union shapes — both
top-level (required) and nested-in-anyOf (Optional) — to plain ``anyOf``
of inlined object schemas, with the OAS ``discriminator`` keyword
stripped. The ``Literal`` constraints on the tag field already carry
the discriminator signal. Also extends ``_is_complex_anyof`` to detect
``oneOf`` branches defensively.

Resolves generative-computing#989.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
@github-actions github-actions Bot added the bug Something isn't working label May 20, 2026
- Rewrite ``_flatten_discriminated_union`` to be non-mutating and update
  the docstring; the previous "rewrites in place" claim was misleading
  because the required-union path returned a new dict. Document the
  single-level limitation (nested discriminated unions are not
  recursively flattened — tracked alongside generative-computing#911).
- Defensively merge ``oneOf`` into any pre-existing top-level ``anyOf``
  rather than overwriting, so the helper is safe to call in isolation
  even on shapes Pydantic does not currently emit.
- Drop a redundant ``v.get("anyOf", [])`` default whose key existence
  was already guaranteed by the surrounding guard.

Tests:
- ``test_optional_union_strips_discriminator_keyword`` — pin the
  implicit-strip in the optional path so a refactor can't silently
  reintroduce the OAS-3 keyword.
- ``test_three_way_union_preserves_all_branches`` — three-arm
  discriminated unions are common in command-pattern tools.
- ``test_non_discriminated_optional_unchanged`` — regression guard for
  the existing ``Optional[Email]`` flow; the new pre-pass must be a no-op.
- Tighten ``_has_branch`` to ``anyOf`` only; accepting ``oneOf`` as a
  fallback would silently mask a regression of the flattening pre-pass.
- Move ``import json`` to module top.

Assisted-by: Claude Code
Signed-off-by: Nigel Jones <jonesn@uk.ibm.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Discriminated-union tool parameters lose their schema and fail validation

1 participant