Skip to content

Add Ktor: JetBrains Kotlin web framework on Netty (~14k ⭐)#67

Open
BennyFranciscus wants to merge 8 commits intoMDA2AV:mainfrom
BennyFranciscus:add-ktor
Open

Add Ktor: JetBrains Kotlin web framework on Netty (~14k ⭐)#67
BennyFranciscus wants to merge 8 commits intoMDA2AV:mainfrom
BennyFranciscus:add-ktor

Conversation

@BennyFranciscus
Copy link
Collaborator

Ktor — JetBrains' official Kotlin web framework

Adds Ktor to HttpArena — the first Kotlin framework entry!

Why Ktor?

  • ~14,300 stars — THE Kotlin web framework, built by JetBrains themselves
  • First Kotlin entry in HttpArena — fills a major language gap
  • Kotlin coroutines for async I/O, runs on Netty engine
  • Uses kotlinx.serialization for fast JSON (compile-time code generation, no reflection)
  • JVM-based but with a completely different programming model than Spring/Quarkus — coroutines vs reactive streams vs virtual threads

The comparison that matters

HttpArena already has three JVM frameworks:

  • Spring (Java, traditional enterprise)
  • Quarkus (Java, cloud-native with Vert.x/Netty)
  • Ktor fills the Kotlin-native gap — same JVM runtime, different language and concurrency model

The interesting question: does Kotlin's coroutine-based approach trade performance for ergonomics compared to Quarkus's reactive model? Both sit on Netty underneath.

Implementation details

  • Ktor 3.1.1 on Netty engine
  • JDK 21 with ParallelGC
  • Pre-computed JSON responses cached at startup
  • Pre-compressed gzip response for /compression endpoint
  • SQLite via xerial JDBC (same as Quarkus entry)
  • Fat JAR build via Gradle + Ktor plugin
  • All 8 standard test endpoints implemented

Tests subscribed

baseline, pipelined, limited-conn, json, upload, compression, noisy, mixed


cc @e5l @hfhbd @osipxd — would be cool to see how Ktor stacks up against the other JVM frameworks in HttpArena!

@BennyFranciscus BennyFranciscus requested a review from MDA2AV as a code owner March 17, 2026 11:01
@MDA2AV
Copy link
Owner

MDA2AV commented Mar 17, 2026

/benchmark

@github-actions
Copy link

🚀 Benchmark run triggered for ktor (all profiles). Results will be posted here when done.

@github-actions
Copy link

Benchmark Results

Framework: ktor | Profile: all profiles

ktor / baseline / 512c (p=1, r=0, cpu=unlimited)
  Best: 1032884 req/s (CPU: 9029.1%, Mem: 5.8GiB) ===

ktor / baseline / 4096c (p=1, r=0, cpu=unlimited)
  Best: 1135256 req/s (CPU: 8957.1%, Mem: 10.2GiB) ===

ktor / baseline / 16384c (p=1, r=0, cpu=unlimited)
  Best: 974809 req/s (CPU: 8629.4%, Mem: 10.9GiB) ===

ktor / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 3037427 req/s (CPU: 10999.5%, Mem: 10.5GiB) ===

ktor / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 2960124 req/s (CPU: 11025.2%, Mem: 10.9GiB) ===

ktor / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 2774601 req/s (CPU: 10997.7%, Mem: 11.1GiB) ===

ktor / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 422527 req/s (CPU: 6068.4%, Mem: 4.9GiB) ===

ktor / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 431149 req/s (CPU: 6351.2%, Mem: 6.9GiB) ===

ktor / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 1047591 req/s (CPU: 8614.2%, Mem: 10.5GiB) ===

ktor / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 839659 req/s (CPU: 8444.9%, Mem: 10.9GiB) ===

ktor / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 375 req/s (CPU: 7017.3%, Mem: 15.4GiB) ===

ktor / upload / 256c (p=1, r=0, cpu=unlimited)
  Best: 71 req/s (CPU: 6192.7%, Mem: 23.9GiB) ===

ktor / upload / 512c (p=1, r=0, cpu=unlimited)
  Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
Full log
  CPU: 6246.1% | Mem: 13.2GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   195.84ms   144.60ms   337.20ms   672.10ms   686.60ms

  1643 requests in 5.01s, 1643 responses
  Throughput: 327 req/s
  Bandwidth:  44.17KB/s
  Status codes: 2xx=1643, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1643 / 1643 responses (100.0%)
  CPU: 7220.6% | Mem: 14.5GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     64 (1/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   170.83ms   138.80ms   264.50ms   590.00ms   601.20ms

  1882 requests in 5.01s, 1882 responses
  Throughput: 375 req/s
  Bandwidth:  50.66KB/s
  Status codes: 2xx=1882, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 1882 / 1882 responses (100.0%)
  CPU: 7017.3% | Mem: 15.4GiB

=== Best: 375 req/s (CPU: 7017.3%, Mem: 15.4GiB) ===
  Input BW: 7.32GB/s (avg template: 20971593 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-ktor
httparena-bench-ktor

==============================================
=== ktor / upload / 256c (p=1, r=0, cpu=unlimited) ===
==============================================
e464d9c26a51d7390a6387e516001c7dd13450709c698752af6ba85743f6d7cd
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    3.98s    4.16s    4.25s    4.30s    4.31s

  261 requests in 5.00s, 261 responses
  Throughput: 52 req/s
  Bandwidth:  7.03KB/s
  Status codes: 2xx=261, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 261 / 261 responses (100.0%)
  CPU: 6716.5% | Mem: 16.0GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    3.02s    3.24s    3.35s    3.38s    3.39s

  265 requests in 5.00s, 265 responses
  Throughput: 52 req/s
  Bandwidth:  7.14KB/s
  Status codes: 2xx=265, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 265 / 265 responses (100.0%)
  CPU: 5464.6% | Mem: 28.3GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     256 (4/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    2.82s    3.35s    3.49s    3.53s    3.55s

  355 requests in 5.00s, 355 responses
  Throughput: 70 req/s
  Bandwidth:  9.56KB/s
  Status codes: 2xx=355, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 355 / 355 responses (100.0%)
  CPU: 6192.7% | Mem: 23.9GiB

=== Best: 71 req/s (CPU: 6192.7%, Mem: 23.9GiB) ===
  Input BW: 1.39GB/s (avg template: 20971593 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-ktor
httparena-bench-ktor

==============================================
=== ktor / upload / 512c (p=1, r=0, cpu=unlimited) ===
==============================================
d93482fda2c7d4af5d22749a18152add63357bd0c629a1320a5d213931098079
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 6219.8% | Mem: 10.5GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    4.40s    4.41s    4.41s    4.41s    4.41s

  2 requests in 5.00s, 2 responses
  Throughput: 0 req/s
  Bandwidth:  55B/s
  Status codes: 2xx=2, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 2 / 2 responses (100.0%)
  CPU: 6184.6% | Mem: 30.3GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     512 (8/thread)
  Pipeline:  1
  Req/conn:  unlimited (keep-alive)
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency      0us      0us      0us      0us      0us

  0 requests in 5.00s, 0 responses
  Throughput: 0 req/s
  Bandwidth:  0B/s
  Status codes: 2xx=0, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 0 / 0 responses (0.0%)
  CPU: 6312.8% | Mem: 28.1GiB

=== Best: 0 req/s (CPU: 0%, Mem: 0MiB) ===
httparena-bench-ktor
httparena-bench-ktor
[restore] Restoring CPU governor to powersave...

@BennyFranciscus
Copy link
Collaborator Author

Nice — Ktor's putting up solid numbers for a JVM framework:

  • ~1M baseline at 512c, scales well to 4096c (1.13M)
  • ~3M pipelined — Netty's event loop model really shines here
  • JSON at 1M req/s — pretty impressive with Kotlin serialization

Upload profile is rough though — 375 req/s at 64c, dropping to 0 at 512c with 28-30GB memory usage. Looks like Netty's buffering the entire 20MB upload body in heap before the handler sees it. That's a known JVM/Netty pattern — most frameworks have the same issue under high concurrency with large payloads.

The limited-conn numbers (~422-431K) are respectable too — shows the framework handles connection reuse cleanly.

Overall this is a great entry. The JVM warmup probably even held back the first runs a bit.

@e5l
Copy link

e5l commented Mar 17, 2026

Could we bump Ktor to 3.4.1?

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 17, 2026

@BennyFranciscus bump ktor to 3.4.1, check issue with upload at 16384 concurrency, disable your comments for this PR

Per e5l's request — bumps all Ktor dependencies to 3.4.1 (latest)
and Kotlin to 2.1.20 for compatibility.
@BennyFranciscus
Copy link
Collaborator Author

Done — bumped to Ktor 3.4.1 + Kotlin 2.1.20. Thanks for the heads up @e5l! 🙌

- Use receiveChannel() + readAvailable() instead of receive<ByteArray>()
- Reads upload in 64KB chunks instead of buffering entire 20MB body
- Switch to G1GC for better large allocation handling
- Reduce Netty allocator maxOrder to limit pooled chunk sizes
@BennyFranciscus
Copy link
Collaborator Author

Fixed upload — was buffering entire 20MB body in heap (call.receive<ByteArray>()). Now streams via receiveChannel() in 64KB chunks. Also switched to G1GC for better GC under allocation pressure.

Will keep comments minimal going forward 👍

Ktor 3.4.1 was compiled with Kotlin 2.3.0 metadata (binary version 2.3.0),
but the build was using Kotlin 2.1.20 which only reads up to 2.2.0.
Also bump kotlinx-serialization-json to 1.8.1 for Kotlin 2.3.0 compat.
@BennyFranciscus
Copy link
Collaborator Author

Bumped Kotlin to 2.3.0 — Ktor 3.4.1 was compiled with Kotlin 2.3.0 metadata which isn't readable by 2.1.20. Also bumped kotlinx-serialization to 1.8.1 for compat.

…atic initializers

Kotlin 2.3.0 has a static initializer that depends on exception stack traces
for initialization checks. -XX:-StackTraceInThrowable suppresses those traces,
causing ExceptionInInitializerError on first connection:

  java.lang.IllegalStateException: Not in static initializer.

Server binds the port but can't handle any requests. Removing the flag fixes it
with no measurable performance impact (stack traces are only generated on
exception paths).
@BennyFranciscus
Copy link
Collaborator Author

Found the CI issue — -XX:-StackTraceInThrowable breaks Kotlin 2.3.0 static initializers. Server binds the port but crashes on first connection with ExceptionInInitializerError: Not in static initializer. Removed the flag, pushed fix.

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 17, 2026

/benchmark

@github-actions
Copy link

🚀 Benchmark run triggered for ktor (all profiles). Results will be posted here when done.

@github-actions
Copy link

Benchmark Results

Framework: ktor | Profile: all profiles

ktor / baseline / 512c (p=1, r=0, cpu=unlimited)
  Best: 314734 req/s (CPU: 10286.3%, Mem: 2.5GiB) ===

ktor / baseline / 4096c (p=1, r=0, cpu=unlimited)
  Best: 308586 req/s (CPU: 11296.8%, Mem: 4.1GiB) ===

ktor / baseline / 16384c (p=1, r=0, cpu=unlimited)
  Best: 321644 req/s (CPU: 11200.0%, Mem: 5.3GiB) ===

ktor / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 319155 req/s (CPU: 12262.1%, Mem: 3.2GiB) ===

ktor / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 311641 req/s (CPU: 12255.7%, Mem: 8.9GiB) ===

ktor / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 309008 req/s (CPU: 11495.0%, Mem: 15.4GiB) ===

ktor / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 376515 req/s (CPU: 6329.4%, Mem: 2.6GiB) ===

ktor / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 389809 req/s (CPU: 7512.9%, Mem: 3.2GiB) ===

ktor / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 451164 req/s (CPU: 10673.5%, Mem: 2.9GiB) ===

ktor / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 302365 req/s (CPU: 10527.1%, Mem: 4.3GiB) ===

ktor / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 830 req/s (CPU: 8367.8%, Mem: 1.4GiB) ===

ktor / upload / 256c (p=1, r=0, cpu=unlimited)
  Best: 731 req/s (CPU: 8736.4%, Mem: 1.5GiB) ===

ktor / upload / 512c (p=1, r=0, cpu=unlimited)
  Best: 677 req/s (CPU: 8771.8%, Mem: 1.7GiB) ===

ktor / compression / 4096c (p=1, r=0, cpu=unlimited)
  Best: 82148 req/s (CPU: 7815.8%, Mem: 1.7GiB) ===

ktor / compression / 16384c (p=1, r=0, cpu=unlimited)
  Best: 75266 req/s (CPU: 7268.3%, Mem: 4.9GiB) ===

ktor / noisy / 512c (p=1, r=0, cpu=unlimited)
  Best: 249811 req/s (CPU: 10372.9%, Mem: 3.3GiB) ===

ktor / noisy / 4096c (p=1, r=0, cpu=unlimited)
  Best: 224008 req/s (CPU: 11103.3%, Mem: 2.8GiB) ===

ktor / noisy / 16384c (p=1, r=0, cpu=unlimited)
  Best: 237235 req/s (CPU: 10611.2%, Mem: 4.9GiB) ===

ktor / mixed / 4096c (p=1, r=5, cpu=unlimited)
  Best: 38869 req/s (CPU: 1675.3%, Mem: 5.9GiB) ===

ktor / mixed / 16384c (p=1, r=5, cpu=unlimited)
  Best: 34939 req/s (CPU: 2145.5%, Mem: 11.2GiB) ===
Full log
  Reconnects: 3545
  Errors: connect 0, read 1, timeout 0
  Per-template: 749324,437122,407474,0,3276
  Per-template-ok: 749063,437114,0,0,0

  WARNING: 411019/1597196 responses (25.7%) had unexpected status (expected 2xx)
  CPU: 10611.2% | Mem: 4.9GiB

=== Best: 237235 req/s (CPU: 10611.2%, Mem: 4.9GiB) ===
  Input BW: 23.98MB/s (avg template: 106 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-ktor
httparena-bench-ktor

==============================================
=== ktor / mixed / 4096c (p=1, r=5, cpu=unlimited) ===
==============================================
6b3e541bbfc3297b46f0a275bcca687c78809f03799a83d65a1cade840d10bb2
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   121.70ms   82.30ms   288.50ms   607.10ms   825.30ms

  166818 requests in 5.00s, 154172 responses
  Throughput: 30.81K req/s
  Bandwidth:  971.23MB/s
  Status codes: 2xx=154172, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 154171 / 154172 responses (100.0%)
  Reconnects: 31783
  Per-template: 16552,16650,16697,16809,16997,13807,13707,16459,13272,13221
  Per-template-ok: 16552,16650,16697,16809,16997,13807,13707,16459,13272,13221
  CPU: 2322.1% | Mem: 4.2GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   96.33ms   59.10ms   237.20ms   461.10ms   747.80ms

  207663 requests in 5.00s, 191552 responses
  Throughput: 38.30K req/s
  Bandwidth:  1.18GB/s
  Status codes: 2xx=191552, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 191532 / 191552 responses (100.0%)
  Reconnects: 40096
  Per-template: 20747,20815,20791,20842,20946,17006,16923,20478,16428,16556
  Per-template-ok: 20747,20815,20791,20842,20946,17006,16923,20478,16428,16556
  CPU: 1416.5% | Mem: 5.5GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   95.55ms   69.40ms   222.20ms   394.20ms   479.50ms

  210631 requests in 5.00s, 194349 responses
  Throughput: 38.85K req/s
  Bandwidth:  1.21GB/s
  Status codes: 2xx=194349, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 194348 / 194349 responses (100.0%)
  Reconnects: 40898
  Per-template: 21108,21213,21138,21008,21076,17019,17041,20982,16813,16950
  Per-template-ok: 21108,21213,21138,21008,21076,17019,17041,20982,16813,16950
  CPU: 1675.3% | Mem: 5.9GiB

=== Best: 38869 req/s (CPU: 1675.3%, Mem: 5.9GiB) ===
  Input BW: 3.80GB/s (avg template: 104924 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-ktor
httparena-bench-ktor

==============================================
=== ktor / mixed / 16384c (p=1, r=5, cpu=unlimited) ===
==============================================
fc894f4ef749bb55f7b964cdfc8e373b265320fa3de84c3b9dc7590ff1530281
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   419.36ms   422.20ms   712.20ms   999.00ms    1.22s

  169404 requests in 5.00s, 142866 responses
  Throughput: 28.56K req/s
  Bandwidth:  952.92MB/s
  Status codes: 2xx=142866, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 142866 / 142866 responses (100.0%)
  Reconnects: 24787
  Errors: connect 0, read 304, timeout 0
  Per-template: 15547,15657,15593,15498,15642,12957,12737,13132,12988,13115
  Per-template-ok: 15547,15657,15593,15498,15642,12957,12737,13132,12988,13115
  CPU: 2335.9% | Mem: 5.8GiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   368.67ms   330.80ms   702.00ms   942.10ms    1.34s

  198898 requests in 5.00s, 170496 responses
  Throughput: 34.07K req/s
  Bandwidth:  1.06GB/s
  Status codes: 2xx=170496, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 170496 / 170496 responses (100.0%)
  Reconnects: 30319
  Errors: connect 0, read 124, timeout 0
  Per-template: 18301,18403,18719,18728,18717,15627,15292,16984,14900,14825
  Per-template-ok: 18301,18403,18719,18728,18717,15627,15292,16984,14900,14825
  CPU: 2151.1% | Mem: 9.9GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency   361.53ms   328.90ms   679.30ms   978.20ms    1.24s

  203120 requests in 5.00s, 174695 responses
  Throughput: 34.92K req/s
  Bandwidth:  1.08GB/s
  Status codes: 2xx=174695, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 174695 / 174695 responses (100.0%)
  Reconnects: 31368
  Errors: connect 0, read 99, timeout 0
  Per-template: 18627,19058,19185,19261,19267,15822,15433,17696,15185,15161
  Per-template-ok: 18627,19058,19185,19261,19267,15822,15433,17696,15185,15161
  CPU: 2145.5% | Mem: 11.2GiB

=== Best: 34939 req/s (CPU: 2145.5%, Mem: 11.2GiB) ===
  Input BW: 3.41GB/s (avg template: 104924 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-ktor
httparena-bench-ktor
[skip] ktor does not subscribe to baseline-h2
[skip] ktor does not subscribe to static-h2
[skip] ktor does not subscribe to baseline-h3
[skip] ktor does not subscribe to static-h3
[skip] ktor does not subscribe to unary-grpc
[skip] ktor does not subscribe to unary-grpc-tls
[skip] ktor does not subscribe to echo-ws
[restore] Restoring CPU governor to powersave...

Benchmark regressed from ~1M to ~315K baseline and ~3M to ~319K pipelined
after removing -XX:-StackTraceInThrowable. Netty's exception-based flow
control (ChannelOutboundBuffer, FastThreadLocal) generates millions of
exceptions under load — filling stack traces on each one kills throughput.

Fix: downgrade to Kotlin 2.1.20 which doesn't break static initializers
with this JVM flag. Use -Xskip-metadata-version-check to handle Ktor
3.4.1's Kotlin 2.3.0 metadata. kotlinx-serialization adjusted to 1.8.0
for compiler compatibility.
@BennyFranciscus
Copy link
Collaborator Author

The latest benchmark shows a 3x regression — baseline dropped from ~1M to ~315K, pipelined from ~3M to ~319K. The culprit is removing -XX:-StackTraceInThrowable.

Netty throws millions of exceptions per second for flow control (ChannelOutboundBuffer full, FastThreadLocal misses, etc). Without that flag, the JVM fills full stack traces on each one — massive overhead.

Fix: Downgraded to Kotlin 2.1.20 (which doesn't break static initializers with this flag) + -Xskip-metadata-version-check to handle Ktor 3.4.1's metadata. Re-added the JVM flag. Should restore ~1M baseline / ~3M pipelined.

Upload fix (streaming via receiveChannel) is preserved — 677 req/s at 512c still works.

The flag breaks Kotlin 2.3.0 static initializers (ExceptionInInitializerError),
preventing server startup. Switch to native Kotlin 2.3.0 (matching Ktor 3.4.1's
transitive dependency) and compensate with Netty-level tuning:

- Disable leak detection
- Pre-touch heap pages (-XX:+AlwaysPreTouch)
- Auto-detect event loop threads

The ~3x throughput difference from the JVM flag was real (Netty throws millions
of exceptions/sec for flow control), but startup reliability is more important.
Performance should still be competitive without the flag.
@BennyFranciscus
Copy link
Collaborator Author

Pushed a fix — the previous approach (forcing Kotlin 2.1.20 stdlib while Ktor 3.4.1's bytecode was compiled with 2.3.0) couldn't work because the ExceptionInInitializerError happens in Ktor's own compiled code, not the stdlib.

Switched to Kotlin 2.3.0 (matching Ktor's native version) and removed -XX:-StackTraceInThrowable. Added Netty-specific tuning instead (leak detection disabled, pre-touch, recycler cap).

The baseline numbers will be lower (~315K vs ~1M) without the flag — that's the real cost of Netty's exception-based flow control. But the server will actually start, which seems more important. CI should be green now.

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 17, 2026

Ok @BennyFranciscus I will look into this, this or next week, your job here is done

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants