Skip to content

Use native SQLA UUID/Uuid instead of sqlalchemy-utils#61532

Merged
potiuk merged 1 commit intoapache:mainfrom
Dev-iL:2602/remove_sqla_utils
Feb 10, 2026
Merged

Use native SQLA UUID/Uuid instead of sqlalchemy-utils#61532
potiuk merged 1 commit intoapache:mainfrom
Dev-iL:2602/remove_sqla_utils

Conversation

@Dev-iL
Copy link
Collaborator

@Dev-iL Dev-iL commented Feb 6, 2026

This PR removes a dependency on a 3rd party library by switching to native SQLA types.

Migrations:

  • Changes to migrations 0047-0099 are python-side type hints only - should not affect underlying DB types.
  • The new migration 0103 standardizes the UUID type to sa.Uuid (that uses backend-appropriate dtypes automatically).

related (split from): #61229


Was generative AI tooling used to co-author this PR?
  • Yes (please specify the tool below)

Generated-by: Claude Opus 4.6 following the guidelines


  • Read the Pull Request Guidelines for more information. Note: commit author/co-author name and email in commits become permanently public when merged.
  • For fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
  • When adding dependency, check compliance with the ASF 3rd Party License Policy.
  • For significant user-facing changes create newsfragment: {pr_number}.significant.rst or {issue_number}.significant.rst, in airflow-core/newsfragments.

@Dev-iL Dev-iL added full tests needed We need to run full set of tests for this PR to merge kind:documentation area:deadline-alerts AIP-86 (former AIP-57) area:db-migrations PRs with DB migration area:DAG-processing and removed kind:documentation area:deadline-alerts AIP-86 (former AIP-57) area:DAG-processing area:db-migrations PRs with DB migration labels Feb 6, 2026
@Dev-iL Dev-iL closed this Feb 6, 2026
@Dev-iL Dev-iL reopened this Feb 6, 2026
@Dev-iL Dev-iL requested review from jscheffl and vincbeck February 6, 2026 13:43
@Dev-iL Dev-iL force-pushed the 2602/remove_sqla_utils branch 3 times, most recently from 1828fed to fa290a9 Compare February 7, 2026 09:03
@Dev-iL Dev-iL force-pushed the 2602/remove_sqla_utils branch 2 times, most recently from ea5f602 to 2156654 Compare February 7, 2026 19:54
@jscheffl
Copy link
Contributor

jscheffl commented Feb 8, 2026

I think a lot of small changes, would request another pair of eyes for review if I missed anything, then LGTM!

Copy link
Contributor

@bugraoz93 bugraoz93 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!
This looks like breaking for the public API on the datamodel change. It won't in most of the cases due to str translation of HTTP layer but if someone depends on our datamodels to communicate, then API communication would break. It would be amazing if you can add significant newsfragment for it so it could be descriptive to any type of Airflow users :)

@jscheffl
Copy link
Contributor

jscheffl commented Feb 8, 2026

ks great, thanks!
This looks like breaking for the public API on the datamodel change. It won't in most of the cases due to str translation of HTTP layer but if someone depends on our datamodels to communicate, then API communication would break. It would be amazing if you can add significant newsfragment for it so it could be descriptive to any type of

@bugraoz93 What do you mean in regards of public API and data model? The DB Model is for sure not a public API and we usually warn everybody who makes calls and interactions with the DB.

Rest API only makes now a bit more preceise details about the format of the UUID keys. Do you see this as a breaking change? Because content will not be different.

@jscheffl
Copy link
Contributor

jscheffl commented Feb 8, 2026

Birefly closing + re-opening to force a re-check with all versions label added.

@jscheffl jscheffl closed this Feb 8, 2026
@jscheffl jscheffl reopened this Feb 8, 2026
@bugraoz93
Copy link
Contributor

ks great, thanks!
This looks like breaking for the public API on the datamodel change. It won't in most of the cases due to str translation of HTTP layer but if someone depends on our datamodels to communicate, then API communication would break. It would be amazing if you can add significant newsfragment for it so it could be descriptive to any type of

@bugraoz93 What do you mean in regards of public API and data model? The DB Model is for sure not a public API and we usually warn everybody who makes calls and interactions with the DB.

Rest API only makes now a bit more preceise details about the format of the UUID keys. Do you see this as a breaking change? Because content will not be different.

Thanks Jens, good question! This could maybe explain more, so I can see if I missed anything too
I haven't meant to do anything with db calls. My comment was purely on bespoke clients or API extensions. Since type is still a string in the schema, I don't think this will impact most of the users.
For example, if they introduced a new endpoint in their environment, or even applications that extend and use Airflow Public API, or some process that depends on our datamodels to communicate with the Public API or extend in any way. In that case, ti.id is not str anymore for those use cases for them while using as from airflow.api_fastapi.core_api.datamodels.task_instances import TaskInstanceResponse, they may need to do some level of casting or a change in their type annotations in their code. They should do some degree of change, I assume
What do you think?

@jscheffl
Copy link
Contributor

jscheffl commented Feb 8, 2026

ks great, thanks!
This looks like breaking for the public API on the datamodel change. It won't in most of the cases due to str translation of HTTP layer but if someone depends on our datamodels to communicate, then API communication would break. It would be amazing if you can add significant newsfragment for it so it could be descriptive to any type of

@bugraoz93 What do you mean in regards of public API and data model? The DB Model is for sure not a public API and we usually warn everybody who makes calls and interactions with the DB.
Rest API only makes now a bit more preceise details about the format of the UUID keys. Do you see this as a breaking change? Because content will not be different.

Thanks Jens, good question! This could maybe explain more, so I can see if I missed anything too I haven't meant to do anything with db calls. My comment was purely on bespoke clients or API extensions. Since type is still a string in the schema, I don't think this will impact most of the users. For example, if they introduced a new endpoint in their environment, or even applications that extend and use Airflow Public API, or some process that depends on our datamodels to communicate with the Public API or extend in any way. In that case, ti.id is not str anymore for those use cases for them while using as from airflow.api_fastapi.core_api.datamodels.task_instances import TaskInstanceResponse, they may need to do some level of casting or a change in their type annotations in their code. They should do some degree of change, I assume What do you think?

Okay, then I am 95% certain... any kind of customization and type casting is non breaking in my view - just type checking. Otherwise the interpreter should do type casting under the hood automatically.

Then let's leave the PR open for a (short) moment if any other expert has a stringer opinion. For me this is an internal change only and only typing outside effect.

@bugraoz93
Copy link
Contributor

This also depends on how we categorise this. I agree that %95 won't be breaking part. For example, FastAPI changed how they return the value, which was breaking for us, even though it wasn't a breaking change and only an internal change. Of course, not saying this should wait for more versions, just trying to see if we can/need to warn some group of people using/building over Airflow :) Great idea to get more take!

@Dev-iL Dev-iL force-pushed the 2602/remove_sqla_utils branch 4 times, most recently from 3a7ca09 to 06f9077 Compare February 9, 2026 12:18
Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example, if they introduced a new endpoint in their environment, or even applications that extend and use Airflow Public API, or some process that depends on our datamodels to communicate with the Public API or extend in any way. In that case, ti.id is not str anymore for those use cases for them while using as from airflow.api_fastapi.core_api.datamodels.task_instances import TaskInstanceResponse, they may need to do some level of casting or a change in their type annotations in their code. They should do some degree of change, I assume
What do you think?

I think the change from ‎str to ‎UUID in ‎datamodels will also impact ‎airflow-client-python, since it is generated from the yaml.

Maybe we could add a ‎field_serializer to ensure the responses remain compatible with the response layer.

    @field_serializer("id", when_used="always")
    def serialize_id(self, id_value: UUID) -> str:
        return str(id_value)

@Dev-iL
Copy link
Collaborator Author

Dev-iL commented Feb 9, 2026

I think the change from ‎str to ‎UUID in ‎datamodels will also impact ‎airflow-client-python, since it is generated from the yaml.

Maybe we could add a ‎field_serializer to ensure the responses remain compatible with the response layer.

    @field_serializer("id", when_used="always")
    def serialize_id(self, id_value: UUID) -> str:
        return str(id_value)

Couple of questions:

  1. Why str(id_value) (dashed) and not id_value.hex (dashless)?
  2. Should we have a test for this behavior if it is some API we must adhere to?

Copy link
Member

@jason810496 jason810496 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple of questions:

  1. Why str(id_value) (dashed) and not id_value.hex (dashless)?
  2. Should we have a test for this behavior if it is some API we must adhere to?
  1. If we use a dashless UUID as the response (which would also affect the JSON field of the API response), it will be an even bigger breaking change IMHO.

assert msg.ti.id == uuid.UUID("4d828a62-a417-4936-a7a6-2b3fabacecab")

  1. A simple unit test to verify the data model serialization behavior would be enough, since the generated OpenAPI yaml spec will also reflect that behavior.

@Dev-iL
Copy link
Collaborator Author

Dev-iL commented Feb 10, 2026

@jason810496 Please see below Claude's analysis of the point you raised. I am going to add the extra test. Please let me know if you think this is insufficient.


Reviewer concern: field_serializer for UUID fields

The reviewer suggests adding field_serializer to UUID fields in Pydantic response models to keep responses compatible with airflow-client-python (which is generated from the OpenAPI YAML spec).

Is the concern justified? Partially — there IS an observable schema change, but the suggested fix doesn't address it:

  1. JSON wire format (NOT affected): Pydantic v2 already serializes UUID objects to dashed strings ("019c39c7-...") by default. A field_serializer returning str(id_value) is redundant — it does exactly what Pydantic already does.

  2. OpenAPI schema (IS affected): Changing id: strid: UUID causes the generated YAML spec to add format: uuid:

    # Before (str)         # After (UUID)
    id:                     id:
      type: string            type: string
      title: Id               format: uuid   # ← NEW
                               title: Id

    This is already reflected in both v2-rest-api-generated.yaml (line 12585) and _private_ui.yaml (line 2548).

  3. field_serializer does NOT fix the schema: In Pydantic v2, @field_serializer only affects runtime serialization, not JSON schema generation. The schema is derived from the type annotation (UUID), so format: uuid appears regardless of any serializer. To suppress format: uuid in the schema, you'd need Annotated[UUID, PlainSerializer(str, return_type=str)] or similar — a fundamentally different approach.

  4. Precedent already exists: DagVersionResponse.id was already UUID typed and already had format: uuid in the OpenAPI spec before this PR. The pattern is established.

  5. format: uuid is non-breaking for clients: OpenAPI's format is a hint. The base type is still string. Well-behaved client generators treat format: uuid as validation/documentation, not as a different wire type.

Recommendation: Add a serialization regression test (no field_serializer)

The field_serializer is unnecessary since Pydantic already handles the conversion. However, a unit test documenting the expected serialization behavior IS valuable as a regression guard — it protects against future Pydantic upgrades or config changes accidentally breaking UUID→string serialization.

Test to add in a new or existing test file (e.g., airflow-core/tests/unit/api_fastapi/core_api/datamodels/test_task_instances.py):

from uuid import UUID
from airflow.api_fastapi.core_api.datamodels.task_instances import TaskInstanceResponse

def test_uuid_serialized_as_dashed_string():
    """Ensure UUID fields serialize to dashed string format in JSON responses.

    This guards against regressions that could break airflow-client-python
    and other consumers of the Public API.
    See: https://github.com/apache/airflow/pull/XXXXX
    """
    test_uuid = UUID("019c39c7-4fea-76b6-891d-ebb3267d427d")
    # TaskInstanceResponse requires many fields; construct minimally via model_construct
    response = TaskInstanceResponse.model_construct(id=test_uuid)
    json_data = response.model_dump(mode="json")
    assert json_data["id"] == "019c39c7-4fea-76b6-891d-ebb3267d427d"
    assert isinstance(json_data["id"], str)

@jason810496
Copy link
Member

format: uuid is non-breaking for clients: OpenAPI's format is a hint. The base type is still string. Well-behaved client generators treat format: uuid as validation/documentation, not as a different wire type.

Oh, indeed, I oversight the generated yaml spec, the type is still string instead of uuid, so I think it still compatible for the generated client.

@jason810496 Please see below Claude's analysis of the point you raised. I am going to add the extra test. Please let me know if you think this is insufficient.

Yes, the analysis makes sense to me, then I think we don't need field_serializer at all, as the type is still string before we adding the field_serializer. And, yes, having the test to ensure the compatibility would be nice. Thanks!

@Dev-iL Dev-iL force-pushed the 2602/remove_sqla_utils branch 2 times, most recently from c8001cb to 4c63256 Compare February 10, 2026 08:50
@Dev-iL Dev-iL force-pushed the 2602/remove_sqla_utils branch from 4c63256 to b750e1d Compare February 10, 2026 11:14
Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! Thanks for all the efforts @Dev-iL -> this is cool we can get rid of yet-another dependency holding us back

@potiuk potiuk merged commit 19c1a16 into apache:main Feb 10, 2026
436 of 437 checks passed
@potiuk
Copy link
Member

potiuk commented Feb 10, 2026

#protm

@jscheffl
Copy link
Contributor

Cool! THanks for the efforts also from my side!

@Dev-iL Dev-iL deleted the 2602/remove_sqla_utils branch February 11, 2026 04:48
@Dev-iL Dev-iL mentioned this pull request Feb 11, 2026
4 tasks
Alok-kumar-priyadarshi pushed a commit to Alok-kumar-priyadarshi/airflow that referenced this pull request Feb 11, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs area:DAG-processing area:db-migrations PRs with DB migration area:deadline-alerts AIP-86 (former AIP-57) full tests needed We need to run full set of tests for this PR to merge kind:documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants