Skip to content

Conversation

@pl752
Copy link
Contributor

@pl752 pl752 commented Dec 29, 2025

Decided to split up my previous pull request (#1247) to multiple PRs due to it having set of changes too broad in my opinion. Ran tests as usual for fb3 server and embedded. Updated to current master branch state. Also decided to throw in reworked ring-buffer Queue's to be more large object friendly.

TLDR: Ellimination of most of the repeated allocations for intermediate conversions, also added inplace copy methods for I/O and reworked intermediate network buffers to use proper ring-buffer resizable arrays with unitary copying instead of per-byte queueing, massively boosting large objects throughput.

Practical benchmarks (3 int and 1 char100 columns, real I/O with fb3 server on localhost and nvme)

Master:

| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.796 ms |  0.0649 ms |   0.1904 ms |     1.784 ms |           - |           - |     457.27 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     8.326 ms |  0.5724 ms |   1.5955 ms |     7.472 ms |           - |           - |    4502.13 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    31.037 ms |  3.7512 ms |  10.8231 ms |    24.973 ms |   5000.0000 |   1000.0000 |   44985.72 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   337.278 ms | 16.5461 ms |  48.7865 ms |   334.254 ms |  55000.0000 |  10000.0000 |  449543.01 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,467.848 ms | 98.9785 ms | 288.7249 ms | 3,435.619 ms | 550000.0000 | 114000.0000 | 4494685.62 KB |

New (isolated from Rune opt (#1252)):

| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.614 ms |  0.0500 ms |   0.1450 ms |     1.584 ms |           - |           - |        447 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     9.760 ms |  0.1127 ms |   0.0880 ms |     9.746 ms |           - |           - |    4411.49 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    22.283 ms |  2.1941 ms |   6.1524 ms |    19.432 ms |   5000.0000 |   1000.0000 |   43999.39 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   221.000 ms |  9.0103 ms |  26.1406 ms |   221.383 ms |  53000.0000 |  10000.0000 |  439534.85 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 2,317.527 ms | 46.2359 ms | 108.0750 ms | 2,342.930 ms | 538000.0000 | 108000.0000 | 4394711.29 KB |

Combined with #1252:

| Method                              | Rows   | Mean         | Error        | StdDev       | Median       | Gen0       | Gen1      | Allocated    |
|------------------------------------ |------- |-------------:|-------------:|-------------:|-------------:|-----------:|----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     471.8 us |     15.22 us |     43.68 us |     462.0 us |          - |         - |     43.98 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |   1,638.7 us |     62.67 us |    182.81 us |   1,590.5 us |          - |         - |    383.96 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |  13,283.3 us |  2,299.73 us |  6,561.26 us |   9,672.6 us |          - |         - |   3785.55 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |  92,663.6 us |  2,673.87 us |  7,841.99 us |  91,372.1 us |  4000.0000 |         - |  37583.41 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 897,179.0 us | 17,724.27 us | 24,846.95 us | 900,061.5 us | 45000.0000 | 5000.0000 | 375392.09 KB |

Perf (defaults):

| Method  | Job     | BuildConfiguration | DataType             | Count | Mean        | Error     | StdDev    | Ratio | RatioSD | Gen0    | Allocated | Alloc Ratio |
|-------- |-------- |------------------- |--------------------- |------ |------------:|----------:|----------:|------:|--------:|--------:|----------:|------------:|
| Execute | NuGet   | ReleaseNuGet       | bigint               | 100   | 19,426.6 us | 195.03 us | 172.89 us |  1.00 |    0.01 | 31.2500 | 290.31 KB |        1.00 |
| Execute | Project | Release            | bigint               | 100   | 19,731.2 us | 376.18 us | 369.46 us |  1.02 |    0.02 |       - | 192.32 KB |        0.66 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | bigint               | 100   |    461.9 us |   5.48 us |   5.12 us |  1.00 |    0.02 |  5.8594 |  52.02 KB |        1.00 |
| Fetch   | Project | Release            | bigint               | 100   |    449.6 us |   3.00 us |   2.81 us |  0.97 |    0.01 |  3.9063 |  39.03 KB |        0.75 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Execute | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   | 19,592.1 us | 172.88 us | 144.36 us |  1.00 |    0.01 | 31.2500 | 294.24 KB |        1.00 |
| Execute | Project | Release            | varch(...) utf8 [30] | 100   | 19,419.8 us |  96.03 us |  80.19 us |  0.99 |    0.01 |       - | 194.64 KB |        0.66 |
|         |         |                    |                      |       |             |           |           |       |         |         |           |             |
| Fetch   | NuGet   | ReleaseNuGet       | varch(...) utf8 [30] | 100   |    466.2 us |   2.37 us |   2.22 us |  1.00 |    0.01 |  6.8359 |   55.9 KB |        1.00 |
| Fetch   | Project | Release            | varch(...) utf8 [30] | 100   |    458.4 us |   3.01 us |   2.67 us |  0.98 |    0.01 |  3.9063 |  39.77 KB |        0.71 |

Boost for large object I/O is significant. Benchmarks were performed without compression and encryption (typical for walled LAN / server sharing db with apps in single env). For small data types and writing operations, changes in practice are much smaller, but some difference in allocations can be observed still. Such an effect on performance is mostly due to ellimination of copies and ability for system to use packed operations (r/w with 64 bit ops instead of 8x8 bit ops) or even simd when dealing with the buffers, and also jit should be much happier with the new installment. For large blobs with the same mode boost can be even bigger with the bypass mode (less copies).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant