gh-150583: Fix zstd compression with digested ZstdDict#150586
Conversation
sobolevn
left a comment
There was a problem hiding this comment.
Please, add a NEWS entry for the fixed bug :)
Also, add a unit test.
|
Noted for the news entry, I will add it. For the unit test, I am unsure about what to add. Because the bug was using whatever was left in memory when the object was initialized, testing this bug means making a non-deterministic test (it could work on not from one run to the other). The reproducer from the ticket worked around that by calling the code many times in a loop, but this is not ideal either for tests because it increases the time to run tests (plus it is still non-deterministic, although with decreased probability), and it is unclear how many times it should loop. And then the question of why we would test this case specifically and not any other possibility of arguments during init (or other methods) appears. Should that be a job of a static code analyzer instead? But maybe this is my lack of experience in C projects that is showing. Anyways, I am unsure if/how such cases are tested in cpython codebase, and what should I do for tests here. I would appreciate if you could point me in the right direction. |
Changes
Force setting a
compression_levelofZSTD_CLEVEL_DEFAULTwhen creating theZstdCompressorobject.Question for reviewers: should I use
_zstd_set_c_level(ZSTD_CLEVEL_DEFAULT)instead?See reproducer in #150583.
Analysis
When creating a
ZstdCompressorobject (_zstd_ZstdCompressor_new_impl), thecompression_levelfield is unset, and as a result keeps its value from whatever was in memory (not deterministic), assuming no level is passed in the constructor.Then, when set, the C dict is loaded into the compression context with
_zstd_load_c_dict, which in turn calls_zstd_load_impl.In the case of a digested dictionary,
_get_CDict(zd, self->compression_level)is now called, which in turns callsZSTD_createCDict(..., compressionLevel).So an arbitrary compression level is passed to that call to
libzstd.Bug impact
Prerequisites:
compression.zstdlevelnor throughoptions)zstd_dict(e.g. withZstdDict(...).as_digested_dict)Consequence: the compression level used is arbitrary, and may change from one run to another.
Misc
Backport suggestion: to 3.15 and 3.14.
AI disclosure: analysis was performed with help of an LLM.
@emmatyping 👋