(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Decided to split up my previous pull request (#1247) to multiple PRs due to it having set of changes too broad in my opinion. Ran tests as usual for fb3 server and embedded. Updated to current master branch state. Also decided to throw in reworked ring-buffer Queue's to be more large object friendly.
TLDR: Ellimination of most of the repeated allocations for intermediate conversions, also added inplace copy methods for I/O and reworked intermediate network buffers to use proper ring-buffer resizable arrays with unitary copying instead of per-byte queueing, massively boosting large objects throughput.
Practical benchmarks (3 int and 1 char100 columns, real I/O with fb3 server on localhost and nvme)
Master:
New (isolated from Rune opt (#1252)):
Combined with #1252:
Perf (defaults):
Boost for large object I/O is significant. Benchmarks were performed without compression and encryption (typical for walled LAN / server sharing db with apps in single env). For small data types and writing operations, changes in practice are much smaller, but some difference in allocations can be observed still. Such an effect on performance is mostly due to ellimination of copies and ability for system to use packed operations (r/w with 64 bit ops instead of 8x8 bit ops) or even simd when dealing with the buffers, and also jit should be much happier with the new installment. For large blobs with the same mode boost can be even bigger with the bypass mode (less copies).