Skip to content

Increased validation and serialisation performance#145

Open
NaqGuug wants to merge 13 commits into
python-scim:mainfrom
NaqGuug:perf/validation-and-serialization-performance
Open

Increased validation and serialisation performance#145
NaqGuug wants to merge 13 commits into
python-scim:mainfrom
NaqGuug:perf/validation-and-serialization-performance

Conversation

@NaqGuug
Copy link
Copy Markdown
Contributor

@NaqGuug NaqGuug commented May 21, 2026

Performance improvements

Caching

The main reason why model validation/serialisation used to be very slow was unnecessarily computing same information during validation/serialisation. This is why we cache commonly used metadata of fields to a class variable when creating a model. Just normalising attribute names alone ate all of the processing time, which is not that surprising because we are calling a regex pattern millions of times.

I'm actually thinking if the name normalisation is actually needed, as RFC 7643 §2.1 is quite lenient about accepted attribute names. Just calling lower() could be enough, as "user-name", "user_name" and "username" are considered different. Of course we want to convert Python snake_case variables to camelCase automatically, so we need to think what kind of normalisation is needed.

Validation

Every context validators were it's own validators and each validator ran for ALL fields. This is simply waste of time, so I just collapsed the whole context validation to one model validator. Here the cached values are also used, which works quite nicely.

Serialisation

Same story for serialisation, collapsed whole serialisation process to single model serialiser. Also completely removed model_serializer_exclude_none as we want Pydantic to exclude the None fields for us. In the new serialiser we only check the deletion for specific fields which could be None after Pydantic's exclusion. Also as mention before, here caching really comes to play.

Fixes/Misc

One fix regarding for checking replace constraints of extensions. Previously extensions were skipped for this check, so made some tests and fixes to the code. Now when calling replace() we recursively check both complex attributes and extensions.

Overall I refactored and simplified whole base.py. There are still improvements left, mainly caching values from get_field_annotation, get_field_root_type and get_field_multiplicity, as those are completely static metadata and calling these functions are surprisingly expensive. Also we could cache immutable fields, always returned fields, never returned fields etc. so during validation/serialisation we never have to loop through all the fields, just the ones that actually matter. However, I didn't include these in this PR, as there is already much to review through.


Script used for performance checking

import json
from pyinstrument import Profiler
from scim2_models import User, Context

REPETATIONS = 10000


def main():
    with open("rfc7643-8.2-user-full.json", "r") as user_file:
        user_dict = json.load(user_file)
    scim_user_dict = User.model_validate(user_dict)

    scim2_profiler = Profiler()
    scim2_profiler.start()

    context = Context.DEFAULT
    # context = Context.RESOURCE_CREATION_REQUEST
    # context = Context.RESOURCE_QUERY_RESPONSE
    for _ in range(REPETATIONS):
        User.model_validate(
            scim_user_dict.model_dump(
                scim_ctx=context
            ),
            scim_ctx=context
        )

    scim2_profiler.stop()
    scim2_profiler.write_html("scim2-models.html")


if __name__ == "__main__":
    main()

Results

DEFAULT

Speedup: ~4x

Before After
default-old default-new

CREATION REQUEST

Speedup: ~2x

Before After
creation_request-old creation_request-new

QUERY RESPONSE

Speedup: ~2x

Before After
query_response-old query_response-new

NaqGuug added 13 commits May 12, 2026 19:38
Updated attribute urns get/set
Cache normalized names with lru_cache
This allows us to delete the dict comprehension from
`scim_serializer` and pydantic's none exclusion preserved
Mainly removed unused checks
Simplified `_set_complex_attribute_urns` even more
Test model serialization and validation with extensions
Added `extensions` to lookup table.
In `_apply_replace_constraints` we just loop through
complex attributes and extensions for deep replace check
@azmeuk
Copy link
Copy Markdown
Member

azmeuk commented May 22, 2026

Hello. Thank you for your contributions. I am quite busy currently but I will try to review your patches in the coming weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants