Handle pickling for generic pydantic models, fixes #210#211
Open
NeejWeej wants to merge 2 commits into
Open
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #211 +/- ##
==========================================
- Coverage 95.37% 95.32% -0.05%
==========================================
Files 142 143 +1
Lines 11404 11608 +204
Branches 620 633 +13
==========================================
+ Hits 10876 11065 +189
- Misses 399 412 +13
- Partials 129 131 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
f4c4745 to
840d895
Compare
Signed-off-by: Nijat K <nijat.khanbabayev@gmail.com>
840d895 to
7404c07
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Pydantic Generic Pickle and Ray Notes
TLDR
Concrete Pydantic generic
BaseModelspecializations can be fragile acrossfresh-process pickle/cloudpickle boundaries. ccflow works around this for
concrete generic
ccflow.BaseModelinstances by pickling them as stableorigin + args + statedata and recreating the specialized class during load.Fixes #210.
Summary
This issue is a mismatch between three things:
GenericResult[int], dynamically at runtime.The bug is not specific to
GenericResult. Any concrete Pydantic genericBaseModelspecialization can have the same problem if that specialized class object crosses a process boundary before the receiver has materialized it.The fix is confusing because there are two separate objects involved:
GenericResult[int](value=5)generic type arguments, such as
GenericResult[int],ListResult[int], orCallableModelGenericType[NullContext, GenericResult[int]]Fixing only the top-level instance class is not enough if generated generic
classes are also embedded inside the type arguments that define that instance's
specialized class.
What Pydantic Does
Pydantic v2 does not require
pydantic.generics.GenericModel; normalBaseModelsubclasses can be generic. When code evaluates:Pydantic runs
BaseModel.__class_getitem__. In the local environment, this is Pydantic2.13.4.The relevant source is pinned to the
v2.13.4tag here:https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/main.py#L904-L969
The relevant flow is:
GenericResult[int]._generics.create_generic_submodel(...).The generated class has metadata like:
with shape:
{ "origin": GenericResult, "args": (int,), "parameters": (), }For module-level models like this repro,
originis stable and importable. Ingeneral, the reducer still relies on the origin class itself being importable or
otherwise serializable by cloudpickle. The generated specialized class is
runtime-created.
In Pydantic's
_generics.create_generic_submodel, the new subclass is createdwith the origin model's
__module__and generic metadata. The relevant sourceis pinned here:
https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/_internal/_generics.py#L105-L149
Then Pydantic conditionally registers the generated class in the origin module
when
_get_caller_frame_info(...)decides the specialization was created froma global context:
https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/_internal/_generics.py#L140-L147
That global registration is the key point. Sometimes a process has
ccflow.result.generic.GenericResult[int]as a module attribute because that process already materialized it in a context Pydantic considers global. A fresh process may not.What Pickle and Cloudpickle Do
Pickle generally reconstructs class objects by global reference:
For an ordinary class, this is fine:
can be imported in any process.
For a generated specialization, pickle/cloudpickle may see:
and serialize it by reference as if this were importable:
That works in the process that created and registered the class. It can fail in a fresh process:
Ray makes this easy to hit because Ray workers are separate Python processes. Importing
GenericResultin the worker does not necessarily createGenericResult[int]. If unpickling happens first, the generated class name is missing.Pydantic's Instance Pickle Behavior
Pydantic already has instance pickle machinery.
BaseModel.__getstate__()returns a dict containing Pydantic's internal modelstate, and
BaseModel.__setstate__()restores that state directly. The sourceis pinned here:
https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/main.py#L1145-L1160
{ "__dict__": self.__dict__, "__pydantic_extra__": self.__pydantic_extra__, "__pydantic_fields_set__": self.__pydantic_fields_set__, "__pydantic_private__": private, }That is important because pickle should preserve an already-validated object. It should not rerun normal validation, coerce values again, drop private attrs, or rebuild the object through
model_validate.ccflow already overrides
__getstate__/__setstate__slightly to make__pydantic_fields_set__deterministic in pickle output.The new fix keeps this Pydantic state-based behavior. For concrete generic
specializations of
ccflow.BaseModel, it changes only the reduce recipe.The Exact Failure
A minimal failing shape is:
Then in a fresh process that has imported
GenericResultbut has not evaluatedGenericResult[int]:can fail because the receiver has the origin class:
but not the generated specialization:
The problem is broader than the top-level class:
Here the top-level class is
GenericResult[ListResult[int]], and the generic arg contains another generated class,ListResult[int].This also appears inside typing aliases:
typing.Callableis especially annoying because its parameter types can appear as a plain Python list insidetyping.get_args():So a helper that only walks
typing.get_args()recursively can still miss generated classes inside that list.Generated classes can also appear as field values:
Even if the instance class is reconstructed correctly, the field value
ListResult[int]would otherwise be pickled by its fragile generated classname. That field-value shape is not fixed by the current change. The current
fix deliberately covers generated classes used as the instance class and inside
generic type arguments, while leaving arbitrary class objects stored in model
state to pickle/cloudpickle's normal behavior.
Pydantic-Only Repro
The smallest useful repro does not need ccflow or Ray. It only needs a plain
Pydantic generic model, two Python processes, and a cold receiver that imports
the generic origin class without first materializing the concrete specialization.
Consider the following repro:
Then it runs three subprocess steps:
Box, evaluatesBox[int], constructsBox[int](value=5), and serializes it withcloudpickle.Boxbut does not evaluateBox[int]before calling
cloudpickle.loads(...).Box, evaluatesBox[int], then callscloudpickle.loads(...).The observed output is:
That is the core bug in isolation. The creating process has a generated
Box[int]class registered on the module. The cold receiving process has onlythe importable generic origin class
Box, so pickle's global lookup forBox[int]fails. The warm receiver evaluatesBox[int]at module/global scopebefore unpickling, which causes Pydantic to install the same generated class as
a module attribute before pickle tries to resolve it. That proves this is an
import/materialization-ordering problem rather than a ccflow model-definition
problem.
The same repro was checked with several Pydantic 2.x releases using:
Box[int]module attributeBox[int]module attributeBox[int]module attributeBox[int]module attributeBox[int]module attributeBox[int]module attributeBox[int]module attributeBox[int]module attributeThis is not a regression in a recent Pydantic minor release. The behavior is
stable across the tested 2.x line.
Why the Fix Uses
__reduce_ex____reduce_ex__is the pickle hook that returns a reconstruction recipe.For concrete generic specializations of
ccflow.BaseModel, ccflow now returnsa recipe like:
( _new_ccflow_generic_model, (origin, portable_args), pydantic_state, )For:
the recipe is conceptually:
On load, pickle first calls the reducer function:
Then, because the reducer returned
pydantic_stateas the third tuple element,pickle applies that state to the object:
This uses Pydantic's own generic construction path in the receiving process,
while still letting pickle apply Pydantic's normal state protocol. The receiver
does not need to already have a global
GenericResult[int]module attribute.Why Type Arguments Need Special Handling
The generic argument portability layer is the ugly part, but it is solving a
real second-order problem.
If we only serialize the top-level class as:
then this works:
but these can still fail:
because
ListResult[int]is itself a generated Pydantic generic class.The helper therefore handles:
listandtuplecontainers that can appear inside type expressions, suchas the callable parameter list
The raw Pydantic state remains in the outer pickle stream. That is intentional:
pickle keeps its own memo table there, which is required to preserve shared
references, cycles, and protocol-5 buffers. The reducer only changes how the
generic class is recreated; it does not recursively rewrite arbitrary model
field/private state.
It intentionally does not treat model instances as type specs. A value like:
is still a model instance and should be pickled as an instance. Its own
__reduce_ex__will handle its generated class. Accidentally converting instances into class specs would corrupt data.Blast Radius
There are two related guards:
That predicate identifies concrete generated Pydantic generic classes such as:
The
BaseModel.__reduce_ex__override runs when pickling a ccflow modelinstance whose
type(self)satisfies that predicate. So it does apply to:It does not apply to:
Normal non-generic
BaseModelinstances continue using the default reducerpath, plus ccflow's existing deterministic
__getstate__/__setstate__hooks.
The custom reduce recipe is created only during pickling of concrete generic
ccflow
BaseModelinstances. It does not run during:model_dumpThe performance cost is therefore limited to pickling generic model instances.
The extra work is walking the generic type arguments for the model class. The
actual Pydantic instance state is still handled by the surrounding pickle
operation.
Why Not Simpler Alternatives
Why not call
model_validateon restore?Because pickle should restore object state, not validate new input.
Revalidation can:
Pydantic's own pickle support uses
__getstate__/__setstate__, so the ccflow fix follows that model.Why not rely on cloudpickle to serialize the generated class by value?
Sometimes cloudpickle can serialize dynamic classes by value. But Pydantic specializations are not just ordinary dynamic classes. They carry generated schemas, validators, serializers, generic metadata, and cache behavior.
Also, if the class appears to be importable by module/name in the creating process, cloudpickle can choose a global-reference path. That is exactly the fragile path that fails in a fresh receiver.
The stable representation for a Pydantic generic specialization is not the generated class object. It is:
Why not globally register every generated specialization?
Pydantic already conditionally registers generated specializations when it thinks they were created globally. But a fresh Ray worker has not necessarily executed the same specialization expression yet.
Trying to eagerly register all possible specializations is impossible. Registering during serialization still would not help the receiver unless the receiver imports side effects in the same order.
Why not monkeypatch pickle/cloudpickle for all classes?
Generated Pydantic specializations are class objects, so a global reducer would mean changing behavior for
typeor for broad classes of model classes. That is much wider than this bug.The ccflow fix keeps the custom behavior inside ccflow
BaseModelinstance pickling.Why not support fields containing generated classes too?
That is a real broader issue, but fixing it at the state-value level is a much
bigger change. It requires intercepting arbitrary class objects inside the
pickle stream or walking Pydantic state manually, both of which can disturb
pickle's normal identity/cycle/buffer semantics if done carelessly.
The current fix chooses the smaller and safer boundary: generated classes that
define the model instance's own type, plus generated classes inside that type's
generic arguments. A field value like
GenericResult[type](value=ListResult[int])can still fail in a cold receiver and remains out of scope.
How Broad Is This Problem?
It affects concrete Pydantic generic specializations crossing a process
boundary by pickle/cloudpickle when the receiver has not already materialized
the same specialization.
Shapes this PR fixes:
GenericResult[int](...)GenericResult[ListResult[int]](...)GenericResult[list[ListResult[int]]](...)GenericResult[dict[str, GenericContext[int]]](...)typingalias arg:GenericResult[typing.List[ListResult[int]]](...)GenericResult[typing.Callable[[ListResult[int]], int]](...)as
typing.ClassVar[ListResult[int]]andtyping.Final[ListResult[int]]GenericResult[GenericContext[int] | None](...)GenericResult[typing.Optional[ListResult[int]]](...)Not affected:
BaseModelclassesGenericResultHandled by normal nested pickling, not by rewriting field/private state:
instances normally, and each instance's own reducer handles its generated
class
Helper-level coverage only:
CallableModelGenericType[NullContext, GenericResult[int]]as a typeargument
typing.Required[ListResult[int]]andtyping.NotRequired[ListResult[int]]; Pydantic rejects these asGenericResult[...]arguments in this context, but the restore helperhandles the one-argument special-form shape
Known not fixed:
GenericResult[type](value=ListResult[int])Annotated[int, frozenset([ListResult[int]])]; the helper walks normaltyping args plus explicit
list/tuplecontainers, not every object thatcan be embedded in metadata
Why This Is Hard To Read
The code is confusing because it has to preserve three different pieces of
pickle/type state:
__getstate__/__setstate__origin[args]as builtin
list[...],typing.List[...],typing.Optional[...],typing.Callable[...], or a PEP 604A | BunionIt also has to avoid a dangerous false positive:
That is why there are separate helpers for:
Pydantic's normal
__setstate__The resulting implementation is not aesthetically simple, but each piece exists to handle a real pickle/Ray failure mode.
References
Pydantic dynamic model docs. Pydantic explicitly notes that dynamically created models must be globally defined and have
__module__provided to be pickleable:https://docs.pydantic.dev/latest/concepts/models/#dynamic-model-creation
Pydantic
BaseModel.__class_getitem__source for generic specialization:https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/main.py#L904-L969
Pydantic
_generics.create_generic_submodelsource for dynamic generatedgeneric subclasses and conditional module registration:
https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/_internal/_generics.py#L105-L149
Pydantic
BaseModel.__getstate__/BaseModel.__setstate__source showingstate-based pickle behavior:
https://github.com/pydantic/pydantic/blob/v2.13.4/pydantic/main.py#L1145-L1160
Pydantic issue #9668, broad background on Python compatibility work that mentions generic models and pickling among affected areas. It is not this exact bug:
Support Python 3.13 pydantic/pydantic#9668
Prefect's
add_cloudpickle_reduction, an analogous downstream precedent for adding reducer logic around Pydantic model classes in workflow/distributed execution contexts:https://reference.prefect.io/prefect/utilities/pydantic/#add_cloudpickle_reduction