[codex] Reduce serializer peak memory#312
Merged
Merged
Conversation
Contributor
Reviewer's GuideOptimizes both the Python and Rust XML serializers to reduce peak memory by streaming directly into bytes buffers, updates Rust crate features/toolchain configuration, and adds documentation plus a reproducible memory benchmark for the Rust backend. Sequence diagram for Rust dicttoxml streaming into Python bytessequenceDiagram
actor Caller
participant PyDicttoxml as python_dicttoxml
participant RsDicttoxml as rust_dicttoxml
participant PyBytesWriter as PyBytes_new_with_writer
participant WriteHelpers as write_value_write_dict_contents_write_list_contents
Caller->>PyDicttoxml: dicttoxml(obj,...,fast_path)
PyDicttoxml->>RsDicttoxml: dicttoxml(obj, custom_root, root, attr_type, item_wrap, cdata, list_headers)
RsDicttoxml->>PyBytesWriter: PyBytes::new_with_writer(py, 0, writer)
activate PyBytesWriter
PyBytesWriter->>WriteHelpers: write_dict_contents / write_list_contents / write_value
activate WriteHelpers
WriteHelpers-->>PyBytesWriter: write_str, write_byte, write_cdata, write_escaped_text
deactivate WriteHelpers
PyBytesWriter-->>RsDicttoxml: Py<PyBytes>
deactivate PyBytesWriter
RsDicttoxml-->>PyDicttoxml: PyBytes result
PyDicttoxml-->>Caller: bytes XML
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #312 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 6 6
Lines 570 569 -1
=========================================
- Hits 570 569 -1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. |
Contributor
There was a problem hiding this comment.
Hey - I've left some high level feedback:
- The new
write_escaped_text/write_cdatahelpers duplicate the logic inpush_escaped_text/wrap_cdata; consider centralizing the escaping/CDATA handling so both Rust backends share the same implementation and future changes don’t need to be kept in sync manually. - The
benchmark_memory_rust.pyscript importsresourceunconditionally, which will fail on Windows; if you intend this script to be usable or at least importable there, guard the import andmax_rss_bytesimplementation by platform.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- The new `write_escaped_text` / `write_cdata` helpers duplicate the logic in `push_escaped_text` / `wrap_cdata`; consider centralizing the escaping/CDATA handling so both Rust backends share the same implementation and future changes don’t need to be kept in sync manually.
- The `benchmark_memory_rust.py` script imports `resource` unconditionally, which will fail on Windows; if you intend this script to be usable or at least importable there, guard the import and `max_rss_bytes` implementation by platform.Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
json2xml6.2.0 andjson2xml-rs0.3.0 for the memory-savings releasejson2xml[fast]requirejson2xml-rs>=0.3.0so fast installs receive the memory-saving Rust packageWhy
The serializers already return UTF-8 bytes. The Python path can release temporary XML body state before final encoding, and the Rust path can stream into the eventual Python bytes object via PyO3. Together these keep peak output memory closer to the returned payload size while preserving existing behavior.
The Rust crate now targets the current stable toolchain used for this work: Rust/Cargo 1.96.0 with the Rust 2024 edition. The PyO3 extension-module linker mode is enabled only for maturin extension builds, so plain
cargo testworks on macOS without unresolved Python symbols.Release prep
json2xml6.2.0json2xml-rs0.3.0/private/tmp/json2xml-dist/json2xml-6.2.0.tar.gz,/private/tmp/json2xml-dist/json2xml-6.2.0-py3-none-any.whl/private/tmp/json2xml-rust-dist/json2xml_rs-0.3.0-cp314-cp314-macosx_11_0_arm64.whljson2xml-rs0.3.0 first with therust-v0.3.0tag, then publishjson2xml6.2.0 with thev6.2.0GitHub release/tag so thefastextra resolves on PyPI.Memory benchmark
docs/rust_memory_benchmark.rstValidation
rustc --version: 1.96.0cargo --version: 1.96.0cargo fmt --checkcargo checkcargo check --features extension-modulecargo test: 46 passedpython3 -m maturin develop --offlinepython3 -m maturin develop --release --offlinepython3 -m maturin build --release --offline --out /private/tmp/json2xml-rust-distUV_CACHE_DIR=/private/tmp/json2xml-uv-cache uv run --no-project --with build python -m build --sdist --wheel --outdir /private/tmp/json2xml-dist .python3 benchmark_memory_rust.py --records 100000 --label previous/current release samplespython3 benchmark_memory_rust.py --records 1000 --label validationpython3 -m py_compile benchmark_memory_rust.pypython3 -m pytest tests/test_rust_dicttoxml.py tests/test_dicttoxml_fast_fallback.py: 104 passedpython3 -m pytest: 397 passedlat check: passed