Skip to content

[codex] Reduce serializer peak memory#312

Merged
vinitkumar merged 6 commits into
masterfrom
fix/optimise-memory
Jun 5, 2026
Merged

[codex] Reduce serializer peak memory#312
vinitkumar merged 6 commits into
masterfrom
fix/optimise-memory

Conversation

@vinitkumar

@vinitkumar vinitkumar commented Jun 5, 2026

Copy link
Copy Markdown
Owner

Summary

  • prepare json2xml 6.2.0 and json2xml-rs 0.3.0 for the memory-savings release
  • build the Python rooted XML document without keeping an intermediate output list
  • avoid decoding XML bytes before pretty-print parsing
  • write Rust serializer output directly into Python bytes writer instead of first materializing a Rust string for the extension boundary
  • split Rust Cargo features so cargo tests do not enable PyO3 extension-module linking on macOS, while maturin still enables the extension-module feature
  • update the Rust extension crate to Rust 2024 with rust-version 1.96
  • make json2xml[fast] require json2xml-rs>=0.3.0 so fast installs receive the memory-saving Rust package
  • add a reproducible Rust memory benchmark and report documenting the measured savings
  • document memory, release packaging, toolchain, and Rust feature behavior in lat.md architecture notes

Why

The serializers already return UTF-8 bytes. The Python path can release temporary XML body state before final encoding, and the Rust path can stream into the eventual Python bytes object via PyO3. Together these keep peak output memory closer to the returned payload size while preserving existing behavior.

The Rust crate now targets the current stable toolchain used for this work: Rust/Cargo 1.96.0 with the Rust 2024 edition. The PyO3 extension-module linker mode is enabled only for maturin extension builds, so plain cargo test works on macOS without unresolved Python symbols.

Release prep

  • Python package version: json2xml 6.2.0
  • Rust package version: json2xml-rs 0.3.0
  • Local Python artifacts built: /private/tmp/json2xml-dist/json2xml-6.2.0.tar.gz, /private/tmp/json2xml-dist/json2xml-6.2.0-py3-none-any.whl
  • Local Rust artifact built: /private/tmp/json2xml-rust-dist/json2xml_rs-0.3.0-cp314-cp314-macosx_11_0_arm64.whl
  • Publish order after merge: publish json2xml-rs 0.3.0 first with the rust-v0.3.0 tag, then publish json2xml 6.2.0 with the v6.2.0 GitHub release/tag so the fast extra resolves on PyPI.

Memory benchmark

  • Benchmark report: docs/rust_memory_benchmark.rst
  • Payload: 100,000 generated nested records; 44.31 MiB input JSON equivalent; 78.17 MiB XML output
  • Previous Rust commit 7dd86b0: 157.70 MiB average serializer RSS delta
  • Current Rust commit 07d840f: 80.26 MiB average serializer RSS delta
  • Savings: 77.44 MiB, about 49.1% lower serializer peak delta
  • Tradeoff observed: release-build conversion time increased from 0.180s to 0.265s for this payload

Validation

  • rustc --version: 1.96.0
  • cargo --version: 1.96.0
  • cargo fmt --check
  • cargo check
  • cargo check --features extension-module
  • cargo test: 46 passed
  • python3 -m maturin develop --offline
  • python3 -m maturin develop --release --offline
  • python3 -m maturin build --release --offline --out /private/tmp/json2xml-rust-dist
  • UV_CACHE_DIR=/private/tmp/json2xml-uv-cache uv run --no-project --with build python -m build --sdist --wheel --outdir /private/tmp/json2xml-dist .
  • python3 benchmark_memory_rust.py --records 100000 --label previous/current release samples
  • python3 benchmark_memory_rust.py --records 1000 --label validation
  • python3 -m py_compile benchmark_memory_rust.py
  • python3 -m pytest tests/test_rust_dicttoxml.py tests/test_dicttoxml_fast_fallback.py: 104 passed
  • python3 -m pytest: 397 passed
  • lat check: passed

@sourcery-ai

sourcery-ai Bot commented Jun 5, 2026

Copy link
Copy Markdown
Contributor

Reviewer's Guide

Optimizes both the Python and Rust XML serializers to reduce peak memory by streaming directly into bytes buffers, updates Rust crate features/toolchain configuration, and adds documentation plus a reproducible memory benchmark for the Rust backend.

Sequence diagram for Rust dicttoxml streaming into Python bytes

sequenceDiagram
    actor Caller
    participant PyDicttoxml as python_dicttoxml
    participant RsDicttoxml as rust_dicttoxml
    participant PyBytesWriter as PyBytes_new_with_writer
    participant WriteHelpers as write_value_write_dict_contents_write_list_contents

    Caller->>PyDicttoxml: dicttoxml(obj,...,fast_path)
    PyDicttoxml->>RsDicttoxml: dicttoxml(obj, custom_root, root, attr_type, item_wrap, cdata, list_headers)
    RsDicttoxml->>PyBytesWriter: PyBytes::new_with_writer(py, 0, writer)
    activate PyBytesWriter
    PyBytesWriter->>WriteHelpers: write_dict_contents / write_list_contents / write_value
    activate WriteHelpers
    WriteHelpers-->>PyBytesWriter: write_str, write_byte, write_cdata, write_escaped_text
    deactivate WriteHelpers
    PyBytesWriter-->>RsDicttoxml: Py<PyBytes>
    deactivate PyBytesWriter
    RsDicttoxml-->>PyDicttoxml: PyBytes result
    PyDicttoxml-->>Caller: bytes XML
Loading

File-Level Changes

Change Details Files
Stream Rust serializer output directly into a Python-owned bytes buffer instead of building an intermediate Rust String.
  • Introduce writer-style helper functions that write XML fragments and escaping logic to a generic std::io::Write sink under the python feature.
  • Change write_value, write_dict_contents, and write_list_contents to take &mut dyn Write and propagate PyResult for I/O-style errors.
  • Update dicttoxml to construct the final payload via PyBytes::new_with_writer, writing the XML declaration, root wrapper, and body directly into PyBytes and returning Py instead of Vec.
rust/src/lib.rs
Reduce Python dicttoxml peak memory by avoiding an intermediate string list for the root-wrapped path.
  • Remove the shared output list and build the root-wrapped XML string as a single formatted expression before encoding.
  • Ensure the non-root path still returns the convert() result encoded directly to UTF-8.
  • Explicitly delete the large body string before encoding to encourage earlier memory release.
json2xml/dicttoxml.py
Avoid an extra decode/encode cycle during pretty-printing in the Python frontend.
  • Call minidom.parseString with the raw bytes from the serializer and only decode after toprettyxml returns bytes.
json2xml/json2xml.py
Adjust Rust crate features and toolchain settings to separate extension-module builds from core tests and target Rust 2024.
  • Bump the crate edition to 2024 and set rust-version to 1.96 in Cargo.toml.
  • Split the python feature from a new extension-module feature so cargo test uses non-extension PyO3 linking while maturin builds enable the extension-module feature.
  • Update rust/pyproject.toml maturin config to enable the new extension-module feature instead of directly enabling pyo3/extension-module.
rust/Cargo.toml
rust/pyproject.toml
Document the new memory behavior and benchmarking approach for the Rust backend and add a reproducible memory benchmark script.
  • Add benchmark_memory_rust.py to generate a deterministic large payload, run the Rust dicttoxml, and report RSS deltas plus timing as JSON.
  • Extend architecture notes to describe Python root wrapper memory behavior, Rust bytes-writer design, toolchain targeting, and Cargo feature separation.
  • Add a new rust_memory_benchmark.rst document summarizing the June 2026 Rust memory benchmark methodology and results, referencing the new script.
benchmark_memory_rust.py
lat.md/architecture.md
docs/rust_memory_benchmark.rst
docs/benchmarks.rst

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@codecov

codecov Bot commented Jun 5, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 100.00%. Comparing base (98d299d) to head (8e3bae9).

Additional details and impacted files
@@            Coverage Diff            @@
##            master      #312   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          570       569    -1     
=========================================
- Hits           570       569    -1     
Flag Coverage Δ
unittests 100.00% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

@vinitkumar vinitkumar marked this pull request as ready for review June 5, 2026 06:01

@sourcery-ai sourcery-ai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've left some high level feedback:

  • The new write_escaped_text / write_cdata helpers duplicate the logic in push_escaped_text / wrap_cdata; consider centralizing the escaping/CDATA handling so both Rust backends share the same implementation and future changes don’t need to be kept in sync manually.
  • The benchmark_memory_rust.py script imports resource unconditionally, which will fail on Windows; if you intend this script to be usable or at least importable there, guard the import and max_rss_bytes implementation by platform.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The new `write_escaped_text` / `write_cdata` helpers duplicate the logic in `push_escaped_text` / `wrap_cdata`; consider centralizing the escaping/CDATA handling so both Rust backends share the same implementation and future changes don’t need to be kept in sync manually.
- The `benchmark_memory_rust.py` script imports `resource` unconditionally, which will fail on Windows; if you intend this script to be usable or at least importable there, guard the import and `max_rss_bytes` implementation by platform.

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@vinitkumar vinitkumar merged commit 97597f0 into master Jun 5, 2026
62 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant