Increased validation and serialisation performance#145
Open
NaqGuug wants to merge 13 commits into
Open
Conversation
Updated attribute urns get/set
Cache normalized names with lru_cache
This allows us to delete the dict comprehension from `scim_serializer` and pydantic's none exclusion preserved
Mainly removed unused checks
Simplified `_set_complex_attribute_urns` even more
Test model serialization and validation with extensions
Added `extensions` to lookup table. In `_apply_replace_constraints` we just loop through complex attributes and extensions for deep replace check
Member
|
Hello. Thank you for your contributions. I am quite busy currently but I will try to review your patches in the coming weeks. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Performance improvements
Caching
The main reason why model validation/serialisation used to be very slow was unnecessarily computing same information during validation/serialisation. This is why we cache commonly used metadata of fields to a class variable when creating a model. Just normalising attribute names alone ate all of the processing time, which is not that surprising because we are calling a regex pattern millions of times.
I'm actually thinking if the name normalisation is actually needed, as RFC 7643 §2.1 is quite lenient about accepted attribute names. Just calling
lower()could be enough, as "user-name", "user_name" and "username" are considered different. Of course we want to convert Python snake_case variables to camelCase automatically, so we need to think what kind of normalisation is needed.Validation
Every context validators were it's own validators and each validator ran for ALL fields. This is simply waste of time, so I just collapsed the whole context validation to one model validator. Here the cached values are also used, which works quite nicely.
Serialisation
Same story for serialisation, collapsed whole serialisation process to single model serialiser. Also completely removed
model_serializer_exclude_noneas we want Pydantic to exclude the None fields for us. In the new serialiser we only check the deletion for specific fields which could be None after Pydantic's exclusion. Also as mention before, here caching really comes to play.Fixes/Misc
One fix regarding for checking replace constraints of extensions. Previously extensions were skipped for this check, so made some tests and fixes to the code. Now when calling
replace()we recursively check both complex attributes and extensions.Overall I refactored and simplified whole
base.py. There are still improvements left, mainly caching values fromget_field_annotation,get_field_root_typeandget_field_multiplicity, as those are completely static metadata and calling these functions are surprisingly expensive. Also we could cache immutable fields, always returned fields, never returned fields etc. so during validation/serialisation we never have to loop through all the fields, just the ones that actually matter. However, I didn't include these in this PR, as there is already much to review through.Script used for performance checking
Results
DEFAULT
Speedup: ~4x
CREATION REQUEST
Speedup: ~2x
QUERY RESPONSE
Speedup: ~2x