Skip to content

fix(streaming): make deserialized numpy arrays and tensors writable#820

Open
amlyczz wants to merge 2 commits into
Lightning-AI:mainfrom
amlyczz:fix/writable-numpy-tensor-deserialize
Open

fix(streaming): make deserialized numpy arrays and tensors writable#820
amlyczz wants to merge 2 commits into
Lightning-AI:mainfrom
amlyczz:fix/writable-numpy-tensor-deserialize

Conversation

@amlyczz
Copy link
Copy Markdown

@amlyczz amlyczz commented May 9, 2026

Summary

Fixes #818

np.frombuffer() and torch.frombuffer() with bytes objects produce non-writable arrays/tensors, which triggers a UserWarning from PyTorch:

"The given buffer is not writable, and PyTorch does not support non-writable tensors."

This PR addresses the root cause by ensuring all deserialized arrays and tensors are writable.

Changes

  • serializers.py:

    • NumpySerializer.deserialize: Added .copy() to np.frombuffer() result
    • NoHeaderNumpySerializer.deserialize: Added .copy() to np.frombuffer() result
    • TensorSerializer.deserialize: Used bytearray() wrapper for torch.frombuffer() to provide a writable buffer
    • NoHeaderTensorSerializer.deserialize: Used bytearray() wrapper for torch.frombuffer()
    • JPEGSerializer.deserialize: Used bytearray() wrapper for torch.frombuffer()
  • item_loader.py: Added .clone() to torch.frombuffer() and .copy() to np.frombuffer() in TokensLoader

  • reader.py: Removed the two warnings.filterwarnings("ignore", ...) lines that were suppressing the symptom rather than fixing the root cause

Testing

Added 8 new tests in TestWritableDeserializedArrays:

  • Verifies numpy arrays are writable after deserialization
  • Verifies no UserWarning about non-writable buffers from torch deserializers
  • Verifies correctness (deserialized values match originals)

All 29 tests in test_serializer.py pass.


✨ Let Copilot coding agent set things up for you — coding agent creates PRs from your issue directly! Learn more

- Add .copy() to np.frombuffer() calls in NumpySerializer and NoHeaderNumpySerializer
- Use bytearray() with torch.frombuffer() to avoid non-writable buffer warnings
- Add .clone()/.copy() to torch.frombuffer()/np.frombuffer() in TokensLoader (item_loader.py)
- Remove warning suppression filters in reader.py since the root cause is fixed

Closes Lightning-AI#818
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 29, 2026

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81%. Comparing base (808fe07) to head (81d0996).
⚠️ Report is 9 commits behind head on main.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@         Coverage Diff         @@
##           main   #820   +/-   ##
===================================
- Coverage    81%    81%   -0%     
===================================
  Files        54     54           
  Lines      7617   7617           
===================================
- Hits       6144   6142    -2     
- Misses     1473   1475    +2     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

UserWarning: The given NumPy array is not writable, and PyTorch does not support non-writable tensors.

2 participants