(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

pl752 · 2025-12-29T09:03:13Z

Decided to split up my previous pull request (#1247) to multiple PRs due to it having set of changes too broad in my opinion. Ran tests as usual for fb3 server and embedded. Updated to current master branch state. Also decided to throw in reworked ring-buffer Queue's to be more large object friendly.

TLDR: Ellimination of most of the repeated allocations for intermediate conversions, also added inplace copy methods for I/O and reworked intermediate network buffers to use proper ring-buffer resizable arrays with unitary copying instead of per-byte queueing, massively boosting large objects throughput.

Practical benchmarks (3 int and 1 char100 columns, real I/O with fb3 server on localhost and nvme)

Master:

| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.796 ms |  0.0649 ms |   0.1904 ms |     1.784 ms |           - |           - |     457.27 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     8.326 ms |  0.5724 ms |   1.5955 ms |     7.472 ms |           - |           - |    4502.13 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    31.037 ms |  3.7512 ms |  10.8231 ms |    24.973 ms |   5000.0000 |   1000.0000 |   44985.72 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   337.278 ms | 16.5461 ms |  48.7865 ms |   334.254 ms |  55000.0000 |  10000.0000 |  449543.01 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,467.848 ms | 98.9785 ms | 288.7249 ms | 3,435.619 ms | 550000.0000 | 114000.0000 | 4494685.62 KB |

New (isolated from Rune opt (#1252)):

| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.614 ms |  0.0500 ms |   0.1450 ms |     1.584 ms |           - |           - |        447 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     9.760 ms |  0.1127 ms |   0.0880 ms |     9.746 ms |           - |           - |    4411.49 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    22.283 ms |  2.1941 ms |   6.1524 ms |    19.432 ms |   5000.0000 |   1000.0000 |   43999.39 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   221.000 ms |  9.0103 ms |  26.1406 ms |   221.383 ms |  53000.0000 |  10000.0000 |  439534.85 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 2,317.527 ms | 46.2359 ms | 108.0750 ms | 2,342.930 ms | 538000.0000 | 108000.0000 | 4394711.29 KB |

Combined with #1252:

| Method                              | Rows   | Mean         | Error        | StdDev       | Median       | Gen0       | Gen1      | Allocated    |
|------------------------------------ |------- |-------------:|-------------:|-------------:|-------------:|-----------:|----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     471.8 us |     15.22 us |     43.68 us |     462.0 us |          - |         - |     43.98 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |   1,638.7 us |     62.67 us |    182.81 us |   1,590.5 us |          - |         - |    383.96 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |  13,283.3 us |  2,299.73 us |  6,561.26 us |   9,672.6 us |          - |         - |   3785.55 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |  92,663.6 us |  2,673.87 us |  7,841.99 us |  91,372.1 us |  4000.0000 |         - |  37583.41 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 897,179.0 us | 17,724.27 us | 24,846.95 us | 900,061.5 us | 45000.0000 | 5000.0000 | 375392.09 KB |

Perf (defaults):

| Method  | Job     | BuildConfiguration | DataType             | Count | Mean        | Error     | StdDev    | Ratio | RatioSD | Gen0    | Allocated | Alloc Ratio |
|-------- |-------- |------------------- |--------------------- |------ |------------:|----------:|----------:|------:|--------:|--------:|----------:|------------:|
| Execute | NuGet   | ReleaseNuGet       | bigint               | 100   | 19,426.6 us | 195.03 us | 172.89 us |  1.00 |    0.01 | 31.2500 | 290.31 KB |        1.00 |
| Execute | Project | Release            | bigint               | 100   | 19,731.2 us | 376.18 us | 369.46 us |  1.02 |    0.02 |       - | 192.32 KB |        0.66 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | bigint               | 100   |    461.9 us |   5.48 us |   5.12 us |  1.00 |    0.02 |  5.8594 |  52.02 KB |        1.00 |
| Fetch   | Project | Release            | bigint               | 100   |    449.6 us |   3.00 us |   2.81 us |  0.97 |    0.01 |  3.9063 |  39.03 KB |        0.75 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Execute | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   | 19,592.1 us | 172.88 us | 144.36 us |  1.00 |    0.01 | 31.2500 | 294.24 KB |        1.00 |
| Execute | Project | Release            | varch(...) utf8 [30] | 100   | 19,419.8 us |  96.03 us |  80.19 us |  0.99 |    0.01 |       - | 194.64 KB |        0.66 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   |    466.2 us |   2.37 us |   2.22 us |  1.00 |    0.01 |  6.8359 |   55.9 KB |        1.00 |
| Fetch   | Project | Release            | varch(...) utf8 [30] | 100   |    458.4 us |   3.01 us |   2.67 us |  0.98 |    0.01 |  3.9063 |  39.77 KB |        0.71 |

Boost for large object I/O is significant. Benchmarks were performed without compression and encryption (typical for walled LAN / server sharing db with apps in single env). For small data types and writing operations, changes in practice are much smaller, but some difference in allocations can be observed still. Such an effect on performance is mostly due to ellimination of copies and ability for system to use packed operations (r/w with 64 bit ops instead of 8x8 bit ops) or even simd when dealing with the buffers, and also jit should be much happier with the new installment. For large blobs with the same mode boost can be even bigger with the bypass mode (less copies).

…ncrypt and no-compress

pl752 added 2 commits December 29, 2025 13:09

Optimized intermediate buffers with stackalloc, spans and array pools

6973fc9

Implemented proper ring-buffers instead of Queues and bypass for no-e…

21baa1b

…ncrypt and no-compress

pl752 mentioned this pull request Dec 29, 2025

(Improvements) Rune related and span-ish optimizations #1247

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

Uh oh!

pl752 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

Are you sure you want to change the base?

(Improvements) Ring-buffer and XdrReaderWriter optimizations, assorted Span/Memory method additions #1253

Uh oh!

Conversation

pl752 commented Dec 29, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant