Skip to content

Conversation

@pl752
Copy link
Contributor

@pl752 pl752 commented Dec 29, 2025

Decided to split up my previous pull request (#1247) to multiple PRs due to it having set of changes too broad in my opinion. Ran tests as usual for fb3 server and embedded. Updated to current master branch state.

TLDR: Small set of optimizations, improving speed and alloc volumes of string methods tenfold (up to 3x if including I/O and real db engine for select queries containing char100 field in my case) and also including ability to better utilize modern processors SIMD instructions for BMP-only (single char per rune) strings.

Synthetic (serverless, no I/O) benchmarks

Implemented rune ops:

// * Summary *

BenchmarkDotNet v0.15.8, Windows 10 (10.0.19044.6691/21H2/November2021Update)
AMD Ryzen 7 5800H with Radeon Graphics 3.20GHz, 1 CPU, 16 logical and 8 physical cores
.NET SDK 10.0.101
  [Host]     : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3
  Job-JMDAGQ : .NET 10.0.1 (10.0.1, 10.0.125.57005), X64 RyuJIT x86-64-v3
  Job-OOTPKI : .NET 8.0.22 (8.0.22, 8.0.2225.52707), X64 RyuJIT x86-64-v3


| Method                                       | Job        | Toolchain | Kind                 | RuneLength | MaxRuneCount | Mean        | Error     | StdDev    | Gen0   | Allocated |
|--------------------------------------------- |----------- |---------- |--------------------- |----------- |------------- |------------:|----------:|----------:|-------:|----------:|
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |  2,490.0 ns |  39.22 ns |  36.69 ns | 1.0414 |    8712 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |    110.8 ns |   2.15 ns |   1.91 ns | 0.0334 |     280 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |  3,002.9 ns |  54.82 ns |  51.27 ns | 1.0414 |    8712 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |    108.2 ns |   1.94 ns |   1.81 ns | 0.0334 |     280 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |  9,842.2 ns | 196.62 ns | 201.91 ns | 4.0588 |   34056 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |    376.1 ns |   4.77 ns |   4.46 ns | 0.1249 |    1048 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          | 11,843.6 ns | 178.52 ns | 166.99 ns | 4.0588 |   34056 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          |    423.3 ns |   7.58 ns |   6.72 ns | 0.1249 |    1048 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |  2,749.5 ns |  53.15 ns |  49.72 ns | 1.0490 |    8776 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |    133.1 ns |   2.38 ns |   2.11 ns | 0.0410 |     344 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |  3,315.5 ns |  61.21 ns |  57.26 ns | 1.0490 |    8776 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |    126.6 ns |   1.03 ns |   0.91 ns | 0.0410 |     344 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 10,957.8 ns | 157.33 ns | 139.47 ns | 4.0894 |   34312 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          |    453.9 ns |   8.40 ns |   7.86 ns | 0.1554 |    1304 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 12,647.8 ns | 113.42 ns | 100.54 ns | 4.0894 |   34312 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          |    449.2 ns |   5.65 ns |   5.01 ns | 0.1554 |    1304 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |  3,225.2 ns |  28.44 ns |  25.21 ns | 1.0681 |    8936 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |    165.1 ns |   3.34 ns |   5.67 ns | 0.0601 |     504 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |  3,867.0 ns |  40.20 ns |  37.61 ns | 1.0681 |    8936 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |    161.0 ns |   3.25 ns |   7.84 ns | 0.0601 |     504 B |
| 'old truncate via EnumerateRunesToChars'     | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 13,368.1 ns | 239.05 ns | 211.91 ns | 4.1656 |   34952 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          |    584.4 ns |  10.15 ns |   9.00 ns | 0.2317 |    1944 B |
| 'old truncate via EnumerateRunesToChars'     | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 15,874.9 ns | 213.65 ns | 199.85 ns | 4.1656 |   34952 B |
| 'new TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          |    592.7 ns |  11.31 ns |  10.58 ns | 0.2317 |    1944 B |
  
| Method                                   | Job        | Toolchain | Kind                 | RuneLength | Mean         | Error        | StdDev       | Median       | Gen0    | Allocated |
|----------------------------------------- |----------- |---------- |--------------------- |----------- |-------------:|-------------:|-------------:|-------------:|--------:|----------:|
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |    619.63 ns |    11.920 ns |    11.707 ns |    620.95 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |     64.97 ns |     0.332 ns |     0.294 ns |     64.97 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |    754.45 ns |    14.865 ns |    31.678 ns |    745.93 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |     66.31 ns |     0.209 ns |     0.196 ns |     66.30 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       | 35,601.63 ns |   693.533 ns | 1,886.808 ns | 34,675.31 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       |  3,909.67 ns |    23.210 ns |    21.711 ns |  3,901.30 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       | 46,619.68 ns |   920.902 ns | 1,752.112 ns | 46,408.94 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       |  3,924.67 ns |    15.360 ns |    14.368 ns |  3,921.15 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    649.58 ns |    11.920 ns |    11.150 ns |    651.50 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |     74.12 ns |     1.125 ns |     0.939 ns |     74.00 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    745.95 ns |    14.848 ns |    14.583 ns |    749.10 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |     81.77 ns |     0.479 ns |     0.374 ns |     81.83 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 38,241.41 ns |   592.292 ns |   462.423 ns | 38,313.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       |  4,373.03 ns |    11.984 ns |    10.623 ns |  4,372.00 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 45,407.27 ns |   876.842 ns | 1,140.142 ns | 45,305.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       |  4,841.84 ns |    13.875 ns |    12.300 ns |  4,844.33 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    715.61 ns |    13.934 ns |    12.352 ns |    718.27 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |     92.80 ns |     0.554 ns |     0.492 ns |     92.63 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |    814.53 ns |    16.191 ns |    31.579 ns |    796.44 ns |  0.5035 |    4216 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |     97.37 ns |     0.291 ns |     0.258 ns |     97.33 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 43,324.08 ns |   835.526 ns |   781.551 ns | 43,488.72 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       |  5,585.16 ns |    25.642 ns |    21.412 ns |  5,584.12 ns |       - |         - |
| 'old Count() over EnumerateRunesToChars' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 50,161.87 ns | 1,000.371 ns | 1,466.331 ns | 50,378.08 ns | 31.3110 |  262264 B |
| 'new CountRunes(span)'                   | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       |  5,831.10 ns |    30.611 ns |    27.135 ns |  5,827.66 ns |       - |         - |

Added SIMD search and early bailout:

Added SIMD search:
| Method                                        | Job        | Toolchain | Kind                 | RuneLength | MaxRuneCount | Mean      | Error     | StdDev    | Median    | Ratio | RatioSD | Gen0   | Allocated | Alloc Ratio |
|---------------------------------------------- |----------- |---------- |--------------------- |----------- |------------- |----------:|----------:|----------:|----------:|------:|--------:|-------:|----------:|------------:|
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          | 117.83 ns |  3.247 ns |  9.472 ns | 118.01 ns |  1.01 |    0.11 | 0.0334 |     280 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        | 512          |  20.53 ns |  1.192 ns |  3.458 ns |  21.20 ns |  0.18 |    0.03 | 0.0335 |     280 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          | 113.01 ns |  2.309 ns |  5.836 ns | 112.96 ns |  1.00 |    0.07 | 0.0334 |     280 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Ascii                | 128        | 512          |  17.22 ns |  0.272 ns |  0.302 ns |  17.14 ns |  0.15 |    0.01 | 0.0335 |     280 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          | 356.61 ns |  4.256 ns |  3.772 ns | 356.42 ns |  1.00 |    0.01 | 0.1249 |    1048 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 1024       | 512          |  58.19 ns |  1.418 ns |  4.024 ns |  57.35 ns |  0.16 |    0.01 | 0.1253 |    1048 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          | 447.18 ns |  8.996 ns | 23.540 ns | 454.22 ns |  1.00 |    0.08 | 0.1249 |    1048 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Ascii                | 1024       | 512          |  79.19 ns |  5.037 ns | 14.853 ns |  77.34 ns |  0.18 |    0.03 | 0.1253 |    1048 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          | 128.13 ns |  3.666 ns | 10.808 ns | 126.82 ns |  1.01 |    0.12 | 0.0410 |     344 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        | 512          |  16.93 ns |  0.387 ns |  0.791 ns |  16.80 ns |  0.13 |    0.01 | 0.0411 |     344 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          | 121.79 ns |  2.492 ns |  5.523 ns | 120.67 ns |  1.00 |    0.06 | 0.0410 |     344 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        | 512          |  18.89 ns |  0.391 ns |  0.858 ns |  18.66 ns |  0.16 |    0.01 | 0.0411 |     344 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 436.43 ns |  8.777 ns | 10.449 ns | 439.27 ns |  1.00 |    0.03 | 0.1554 |    1304 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 1024       | 512          | 376.56 ns |  7.524 ns | 14.675 ns | 374.07 ns |  0.86 |    0.04 | 0.1554 |    1304 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 422.26 ns |  5.899 ns |  5.229 ns | 423.12 ns |  1.00 |    0.02 | 0.1554 |    1304 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 1024       | 512          | 393.42 ns |  5.567 ns |  4.649 ns | 392.34 ns |  0.93 |    0.02 | 0.1554 |    1304 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          | 150.52 ns |  2.176 ns |  1.817 ns | 150.22 ns |  1.00 |    0.02 | 0.0601 |     504 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        | 512          |  21.68 ns |  0.482 ns |  1.296 ns |  20.99 ns |  0.14 |    0.01 | 0.0602 |     504 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          | 154.18 ns |  3.114 ns |  6.361 ns | 155.56 ns |  1.00 |    0.06 | 0.0601 |     504 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        | 512          |  23.86 ns |  0.502 ns |  1.390 ns |  23.22 ns |  0.16 |    0.01 | 0.0602 |     504 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 556.12 ns | 10.914 ns | 11.208 ns | 556.83 ns |  1.00 |    0.03 | 0.2317 |    1944 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 1024       | 512          | 492.66 ns |  9.859 ns |  9.222 ns | 490.17 ns |  0.89 |    0.02 | 0.2317 |    1944 B |        1.00 |
|                                               |            |           |                      |            |              |           |           |           |           |       |         |        |           |             |
| 'prev TruncateStringToRuneCount().ToString()' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 545.49 ns |  8.397 ns |  7.444 ns | 544.45 ns |  1.00 |    0.02 | 0.2317 |    1944 B |        1.00 |
| 'new TruncateStringToRuneCount().ToString()'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 1024       | 512          | 557.46 ns | 10.655 ns | 10.464 ns | 556.98 ns |  1.02 |    0.02 | 0.2317 |    1944 B |        1.00 |

| Method                  | Job        | Toolchain | Kind                 | RuneLength | Mean         | Error      | StdDev     | Ratio | RatioSD | Allocated | Alloc Ratio |
|------------------------ |----------- |---------- |--------------------- |----------- |-------------:|-----------:|-----------:|------:|--------:|----------:|------------:|
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |    70.216 ns |  1.3830 ns |  2.3484 ns |  1.00 |    0.05 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 128        |     4.904 ns |  0.0343 ns |  0.0321 ns |  0.07 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |    68.014 ns |  1.3878 ns |  1.4849 ns |  1.00 |    0.03 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Ascii                | 128        |     6.129 ns |  0.0646 ns |  0.0605 ns |  0.09 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       | 3,884.546 ns | 31.2551 ns | 29.2360 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Ascii                | 8192       |   254.202 ns |  3.0534 ns |  2.5497 ns |  0.07 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       | 4,064.459 ns | 42.3702 ns | 37.5601 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Ascii                | 8192       |   254.313 ns |  4.4622 ns |  4.1739 ns |  0.06 |    0.00 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    74.100 ns |  0.8221 ns |  0.7690 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 128        |    77.034 ns |  1.5091 ns |  1.8533 ns |  1.04 |    0.03 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    85.025 ns |  1.7147 ns |  2.4037 ns |  1.00 |    0.04 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 128        |    85.963 ns |  0.7137 ns |  0.5960 ns |  1.01 |    0.03 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 4,396.650 ns | 58.2950 ns | 54.5292 ns |  1.00 |    0.02 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | Mixed(...)gates [21] | 8192       | 4,434.144 ns | 22.8985 ns | 21.4193 ns |  1.01 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 4,904.013 ns | 41.1963 ns | 34.4008 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | Mixed(...)gates [21] | 8192       | 4,822.738 ns | 31.6198 ns | 29.5771 ns |  0.98 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    94.261 ns |  0.9110 ns |  0.7607 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 128        |    99.823 ns |  1.4972 ns |  1.4005 ns |  1.06 |    0.02 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |    96.153 ns |  0.3468 ns |  0.3074 ns |  1.00 |    0.00 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 128        |   101.584 ns |  1.0409 ns |  0.8692 ns |  1.06 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 5,546.374 ns | 25.6919 ns | 21.4539 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-JMDAGQ | .NET 10.0 | MostlySurrogates     | 8192       | 5,574.675 ns | 79.9365 ns | 70.8616 ns |  1.01 |    0.01 |         - |          NA |
|                         |            |           |                      |            |              |            |            |       |         |           |             |
| 'prev CountRunes(span)' | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 5,686.914 ns | 37.1821 ns | 31.0488 ns |  1.00 |    0.01 |         - |          NA |
| 'new CountRunes(span)'  | Job-OOTPKI | .NET 8.0  | MostlySurrogates     | 8192       | 5,720.962 ns | 76.7373 ns | 71.7802 ns |  1.01 |    0.01 |         - |          NA |

Synthetic benchmarks demonstrate order of magnitude (or two in case of BMP-only and AVX512 in theory) improvements for methods themselves, also count and span truncation itself are alloc-free now.

Practical benchmarks (3 int and 1 char100 columns, real I/O with fb3 server on localhost and nvme)

Master:

| Method                              | Rows   | Mean         | Error      | StdDev      | Median       | Gen0        | Gen1        | Allocated     |
|------------------------------------ |------- |-------------:|-----------:|------------:|-------------:|------------:|------------:|--------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |     1.796 ms |  0.0649 ms |   0.1904 ms |     1.784 ms |           - |           - |     457.27 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     8.326 ms |  0.5724 ms |   1.5955 ms |     7.472 ms |           - |           - |    4502.13 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    31.037 ms |  3.7512 ms |  10.8231 ms |    24.973 ms |   5000.0000 |   1000.0000 |   44985.72 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   337.278 ms | 16.5461 ms |  48.7865 ms |   334.254 ms |  55000.0000 |  10000.0000 |  449543.01 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,467.848 ms | 98.9785 ms | 288.7249 ms | 3,435.619 ms | 550000.0000 | 114000.0000 | 4494685.62 KB |

New:

| Method                              | Rows   | Mean           | Error        | StdDev        | Gen0       | Gen1       | Allocated    |
|------------------------------------ |------- |---------------:|-------------:|--------------:|-----------:|-----------:|-------------:|
| SelectAndMap_Main_ReusedBufferAsync | 10     |       650.8 us |     19.56 us |      55.16 us |          - |          - |     60.55 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100    |     2,833.8 us |     63.37 us |     178.72 us |          - |          - |    527.99 KB |
| SelectAndMap_Main_ReusedBufferAsync | 1000   |    24,552.9 us |  4,380.22 us |  12,777.32 us |          - |          - |   5143.66 KB |
| SelectAndMap_Main_ReusedBufferAsync | 10000  |   137,303.6 us |  7,622.68 us |  22,235.71 us |  6000.0000 |  1000.0000 |  51342.63 KB |
| SelectAndMap_Main_ReusedBufferAsync | 100000 | 3,369,344.7 us | 95,721.74 us | 282,237.74 us | 62000.0000 | 10000.0000 | 513447.63 KB |

In practice differences are still noticiable, up to 3x in terms of speed, tenfold in memory volume and ~100x in allocation events reduction can be observed (profiling of real run, not in benchmark).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant