Skip to content

Add Dream: OCaml web framework (first OCaml entry!)#25

Open
BennyFranciscus wants to merge 8 commits intoMDA2AV:mainfrom
BennyFranciscus:add-dream
Open

Add Dream: OCaml web framework (first OCaml entry!)#25
BennyFranciscus wants to merge 8 commits intoMDA2AV:mainfrom
BennyFranciscus:add-dream

Conversation

@BennyFranciscus
Copy link
Collaborator

Dream — OCaml Web Framework

Adds Dream as the first OCaml entry in HttpArena! 🐫

Why Dream?

Dream is a tidy, feature-complete web framework for OCaml with ~1,800 stars. It's built on httpaf/h2 with Lwt async I/O, and compiles to efficient native machine code via the OCaml 5 compiler.

What makes it interesting for benchmarks:

  • Native compiled — OCaml produces fast native binaries, no VM or runtime overhead
  • Functional approach — handlers are just functions, middleware composes naturally
  • Lwt cooperative concurrency — lightweight green threads without the overhead of OS threads
  • httpaf under the hood — the same HTTP parser used by several high-performance OCaml projects

Implementation

  • All standard HttpArena endpoints: /pipeline, /baseline11, /baseline2, /json, /compression, /db, /upload, /static/{filename}
  • Uses Yojson for JSON serialization
  • Uses sqlite3-ocaml bindings for the /db endpoint
  • Static files and dataset pre-loaded into memory at startup
  • Multi-stage Docker build with OCaml 5.2 and --release optimizations

New language!

This is the first OCaml framework in HttpArena. OCaml is a really cool language that sits at the intersection of functional programming and systems performance — it'd be awesome to see how it stacks up against the other compiled language entries.

cc @aantron @dinosaure @yawaramin — thought it'd be cool to see how Dream stacks up in HttpArena! Dream's API is genuinely one of the cleanest web framework interfaces I've seen in any language.

The dune file was missing the lwt_ppx preprocessor, so `let%lwt` was
treated as an uninterpreted extension. Added `(preprocess (pps lwt_ppx))`
to the executable stanza.
@BennyFranciscus
Copy link
Collaborator Author

Build fix: the dune file was missing the lwt_ppx preprocessor — OCaml treated let%lwt as an uninterpreted extension instead of Lwt's monadic binding syntax.

Added (preprocess (pps lwt_ppx)) to the executable stanza. Should compile clean now. 🤞

OCaml 5.2's Float.round takes float -> float (no precision arg).
The second definition using round2 was correct but shadowed by the
broken first one. Removed the dead code.
@BennyFranciscus
Copy link
Collaborator Author

Build fix: there was a duplicate item_to_json definition — the first one used Float.round ~-.2 which doesn't exist in OCaml 5.2 (Float.round is just float -> float, rounds to nearest integer). The second definition using a manual round2 helper was correct but never got reached because the compiler choked on the first one.

Removed the dead duplicate. Should compile clean now 🤞

Dream doesn't have built-in compression middleware, so we need
to handle it ourselves. Added camlzip dependency and pre-compress
the large JSON response at startup. Serves gzipped response when
client sends Accept-Encoding: gzip.

Also fixed unused variable warning (tags -> _tags in db query).
@BennyFranciscus
Copy link
Collaborator Author

CI fix: compression endpoint was missing gzip support!

Dream doesn't include built-in compression middleware, so the /compression endpoint was serving raw uncompressed JSON (~1.1MB). The validator checks for Content-Encoding: gzip and that the response is actually compressed.

Fix:

  • Added camlzip dependency for gzip support
  • Pre-compress the large JSON response at startup using Gzip module (level 6)
  • Check Accept-Encoding: gzip header and serve the pre-compressed response with Content-Encoding: gzip
  • Added zlib1g-dev (build) and zlib1g (runtime) to Dockerfile

Also fixed the unused variable tags warning from the db query function.

Waiting on CI to re-run — will need workflow approval since this is a first-time contributor fork.

@github-actions
Copy link

Benchmark Results

Framework: dream | Profile: all profiles

dream / baseline / 512c (p=1, r=0, cpu=unlimited)
  Best: 32627 req/s (CPU: 100.4%, Mem: 68.7MiB) ===

dream / baseline / 4096c (p=1, r=0, cpu=unlimited)
  Best: 28632 req/s (CPU: 100.4%, Mem: 222.1MiB) ===

dream / baseline / 16384c (p=1, r=0, cpu=unlimited)
  Best: 26414 req/s (CPU: 99.8%, Mem: 636.6MiB) ===

dream / pipelined / 512c (p=16, r=0, cpu=unlimited)
  Best: 117598 req/s (CPU: 100.3%, Mem: 33.9MiB) ===

dream / pipelined / 4096c (p=16, r=0, cpu=unlimited)
  Best: 116034 req/s (CPU: 100.4%, Mem: 61.2MiB) ===

dream / pipelined / 16384c (p=16, r=0, cpu=unlimited)
  Best: 98078 req/s (CPU: 100.5%, Mem: 65.8MiB) ===

dream / limited-conn / 512c (p=1, r=10, cpu=unlimited)
  Best: 28148 req/s (CPU: 99.6%, Mem: 142.0MiB) ===

dream / limited-conn / 4096c (p=1, r=10, cpu=unlimited)
  Best: 26823 req/s (CPU: 100.4%, Mem: 314.1MiB) ===

dream / json / 4096c (p=1, r=0, cpu=unlimited)
  Best: 4788 req/s (CPU: 100.4%, Mem: 191.7MiB) ===

dream / json / 16384c (p=1, r=0, cpu=unlimited)
  Best: 3851 req/s (CPU: 86.9%, Mem: 514.5MiB) ===

dream / upload / 64c (p=1, r=0, cpu=unlimited)
  Best: 29 req/s (CPU: 100.4%, Mem: 977.9MiB) ===

dream / upload / 256c (p=1, r=0, cpu=unlimited)
  Best: 11 req/s (CPU: 98.6%, Mem: 652.0MiB) ===

dream / upload / 512c (p=1, r=0, cpu=unlimited)
  Best: 15 req/s (CPU: 98.3%, Mem: 614.8MiB) ===

dream / compression / 4096c (p=1, r=0, cpu=unlimited)
  Best: 1921 req/s (CPU: 96.8%, Mem: 128.5MiB) ===

dream / compression / 16384c (p=1, r=0, cpu=unlimited)
  Best: 1668 req/s (CPU: 86.0%, Mem: 235.1MiB) ===

dream / noisy / 512c (p=1, r=0, cpu=unlimited)
  Best: 22587 req/s (CPU: 100.5%, Mem: 56.3MiB) ===

dream / noisy / 4096c (p=1, r=0, cpu=unlimited)
  Best: 19256 req/s (CPU: 100.6%, Mem: 192.0MiB) ===

dream / noisy / 16384c (p=1, r=0, cpu=unlimited)
  Best: 17723 req/s (CPU: 100.5%, Mem: 617.7MiB) ===

dream / mixed / 4096c (p=1, r=5, cpu=unlimited)
  Best: 2630 req/s (CPU: 96.5%, Mem: 880.3MiB) ===

dream / mixed / 16384c (p=1, r=5, cpu=unlimited)
  Best: 3463 req/s (CPU: 100.4%, Mem: 3.2GiB) ===
Full log
  Bandwidth:  1.79MB/s
  Status codes: 2xx=84867, 3xx=0, 4xx=42229, 5xx=0
  Latency samples: 127096 / 127096 responses (100.0%)
  Per-template: 42170,42697,42229,0,0
  Per-template-ok: 42170,42697,0,0,0

  WARNING: 42229/127096 responses (33.2%) had unexpected status (expected 2xx)
  CPU: 96.6% | Mem: 664.6MiB

=== Best: 17723 req/s (CPU: 100.5%, Mem: 617.7MiB) ===
  Input BW: 1.79MB/s (avg template: 106 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-dream
httparena-bench-dream

==============================================
=== dream / mixed / 4096c (p=1, r=5, cpu=unlimited) ===
==============================================
05d1fee5b2b70c04ac58f6319ffad59ba86b21f752621d458cf627440dbcfbce
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.22s    1.16s    2.24s    2.74s    3.34s

  14793 requests in 5.00s, 13151 responses
  Throughput: 2.63K req/s
  Bandwidth:  108.95MB/s
  Status codes: 2xx=13151, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 13151 / 13151 responses (100.0%)
  Reconnects: 99
  Per-template: 1517,1512,1510,911,908,1490,1498,814,1498,1493
  Per-template-ok: 1517,1512,1510,911,908,1490,1498,814,1498,1493
  CPU: 96.5% | Mem: 880.3MiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.51s    1.06s    2.89s    3.39s    3.45s

  9000 requests in 5.00s, 9000 responses
  Throughput: 1.80K req/s
  Bandwidth:  59.18MB/s
  Status codes: 2xx=9000, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 9000 / 9000 responses (100.0%)
  Per-template: 819,818,814,1115,1111,817,811,1071,813,811
  Per-template-ok: 819,818,814,1115,1111,817,811,1071,813,811
  CPU: 99.9% | Mem: 933.7MiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     4096 (64/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.51s    1.00s    2.90s    3.39s    3.44s

  9013 requests in 5.00s, 9013 responses
  Throughput: 1.80K req/s
  Bandwidth:  59.48MB/s
  Status codes: 2xx=9013, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 9013 / 9013 responses (100.0%)
  Reconnects: 1
  Per-template: 820,820,822,1102,1092,820,823,1082,815,817
  Per-template-ok: 820,820,822,1102,1092,820,823,1082,815,817
  CPU: 100.5% | Mem: 979.4MiB

=== Best: 2630 req/s (CPU: 96.5%, Mem: 880.3MiB) ===
  Input BW: 263.17MB/s (avg template: 104924 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-dream
httparena-bench-dream

==============================================
=== dream / mixed / 16384c (p=1, r=5, cpu=unlimited) ===
==============================================
2b9664096517b47ca7b87f7830a7f61cde87e87ffe5adb0195aeb34ebe5cee5f
[wait] Waiting for server...
[ready] Server is up

[run 1/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.58s    1.48s    2.62s    3.10s    3.11s

  25914 requests in 5.00s, 9494 responses
  Throughput: 1.90K req/s
  Bandwidth:  99.67MB/s
  Status codes: 2xx=9494, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 9494 / 9494 responses (100.0%)
  Reconnects: 36
  Errors: connect 0, read 34, timeout 0
  Per-template: 1356,1356,1350,0,0,1344,1348,0,1385,1355
  Per-template-ok: 1356,1356,1350,0,0,1344,1348,0,1385,1355
  CPU: 88.0% | Mem: 372.6MiB

[run 2/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.45s    1.07s    2.93s    3.87s    3.88s

  33839 requests in 5.00s, 17318 responses
  Throughput: 3.46K req/s
  Bandwidth:  115.37MB/s
  Status codes: 2xx=17318, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 17318 / 17318 responses (100.0%)
  Reconnects: 137
  Errors: connect 0, read 137, timeout 0
  Per-template: 1530,1522,1519,2290,2320,1521,1525,1919,1657,1515
  Per-template-ok: 1530,1522,1519,2290,2320,1521,1525,1919,1657,1515
  CPU: 100.4% | Mem: 3.2GiB

[run 3/3]
gcannon — io_uring HTTP load generator
  Target:    localhost:8080/
  Threads:   64
  Conns:     16384 (256/thread)
  Pipeline:  1
  Req/conn:  5
  Templates: 10
  Expected:  200
  Duration:  5s


  Thread Stats   Avg      p50      p90      p99    p99.9
    Latency    1.79s    1.48s    3.72s    3.91s    3.92s

  26377 requests in 5.00s, 9883 responses
  Throughput: 1.98K req/s
  Bandwidth:  111.77MB/s
  Status codes: 2xx=9883, 3xx=0, 4xx=0, 5xx=0
  Latency samples: 9883 / 9883 responses (100.0%)
  Reconnects: 366
  Errors: connect 0, read 366, timeout 0
  Per-template: 1371,1364,1351,0,0,1356,1354,0,1719,1368
  Per-template-ok: 1371,1364,1351,0,0,1356,1354,0,1719,1368
  CPU: 100.4% | Mem: 3.5GiB

=== Best: 3463 req/s (CPU: 100.4%, Mem: 3.2GiB) ===
  Input BW: 346.52MB/s (avg template: 104924 bytes)
[dry-run] Results not saved (use --save to persist)
httparena-bench-dream
httparena-bench-dream
[skip] dream does not subscribe to baseline-h2
[skip] dream does not subscribe to static-h2
[skip] dream does not subscribe to baseline-h3
[skip] dream does not subscribe to static-h3
[skip] dream does not subscribe to unary-grpc
[skip] dream does not subscribe to unary-grpc-tls
[skip] dream does not subscribe to echo-ws
[restore] Restoring CPU governor to powersave...

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 15, 2026

@BennyFranciscus single threaded

@BennyFranciscus
Copy link
Collaborator Author

Yep, Dream is single-threaded — OCaml's Lwt is a cooperative concurrency model on one core. The 32K baseline is actually solid for single-threaded OCaml.

Dream does support running behind a reverse proxy for multi-core, but there's no built-in multi-process mode. I could try a fork-based approach (spawn N processes on SO_REUSEPORT), but OCaml's Unix.fork + Lwt can be tricky to get right.

Happy to try it if you think it's worth adding multi-core support, or we can leave it as a showcase of single-threaded perf. Let me know!

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 15, 2026

Yep, Dream is single-threaded — OCaml's Lwt is a cooperative concurrency model on one core. The 32K baseline is actually solid for single-threaded OCaml.

Dream does support running behind a reverse proxy for multi-core, but there's no built-in multi-process mode. I could try a fork-based approach (spawn N processes on SO_REUSEPORT), but OCaml's Unix.fork + Lwt can be tricky to get right.

Happy to try it if you think it's worth adding multi-core support, or we can leave it as a showcase of single-threaded perf. Let me know!

is single threaded the intended way to use it? Don't think so, check its docs

Dream is single-threaded (OCaml's Lwt cooperative concurrency), so
multi-core requires multiple processes.

Approach:
- Unix.fork() spawns one worker per CPU core
- Each child runs its own Dream server on port 8080
- SO_REUSEPORT is set via LD_PRELOAD shim (libreuseport.so) since
  Dream/Lwt_io doesn't expose SO_REUSEPORT on the listening socket
- Kernel distributes incoming connections across workers

The LD_PRELOAD shim intercepts bind() to set SO_REUSEPORT before the
real bind, allowing multiple processes to share the same port.

Falls back to single-process when only 1 CPU is available.
@BennyFranciscus
Copy link
Collaborator Author

Good call — checked Dream's docs and source. Dream doesn't have built-in multi-process/multi-core, but it's definitely not intended to run single-threaded in production. The OCaml approach is to run multiple processes sharing the same port.

Implementation:

  • Unix.fork() spawns one worker per CPU core
  • Each child loads data independently and runs its own Dream.run on port 8080
  • SO_REUSEPORT is set via a small LD_PRELOAD shim (libreuseport.so) — Dream's underlying Lwt_io.establish_server only sets SO_REUSEADDR, not SO_REUSEPORT, and doesn't expose a hook for it
  • Kernel distributes connections across workers, zero contention

The LD_PRELOAD approach is clean — it's a 14-line C file that intercepts bind() to add SO_REUSEPORT before the real bind. Common pattern in the OCaml/Lwt ecosystem where the library doesn't expose socket options directly.

Pushed — waiting on CI to run! Curious to see how the single-threaded 32K baseline scales across cores 🔬

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 15, 2026

canceled validation - stuck on waiting server.

@BennyFranciscus
Copy link
Collaborator Author

On it — likely an issue with Dream/Lwt initialization and the fork. OCaml's Lwt can have subtle issues with Unix.fork() if any event loop state leaks into the child.

Let me check the CI logs and push a fix. Might need to restructure the startup order or add a readiness signal.

OCaml's Unix.fork() + Lwt causes server startup to hang — Lwt's
event loop state doesn't survive fork cleanly. Replaced with a
shell script that spawns N independent server processes, each
with its own clean Lwt event loop. SO_REUSEPORT via LD_PRELOAD
still distributes connections across workers.
@BennyFranciscus
Copy link
Collaborator Author

Fix for the startup hang: OCaml's Unix.fork() + Lwt doesn't play well together — Lwt's internal event loop state survives the fork and causes the children to deadlock during Dream's server initialization.

New approach: Moved multi-process to a shell wrapper (run.sh) that spawns N independent /server processes. Each one gets a completely fresh Lwt event loop since it's a separate exec. The LD_PRELOAD=libreuseport.so shim still handles SO_REUSEPORT so the kernel distributes connections across workers.

Simpler and more reliable than trying to manage fork semantics in OCaml. Waiting on CI! 🤞

Dream runs a single-threaded Lwt event loop per process.
Fork N workers (one per CPU) with SO_REUSEPORT for multi-core scaling.
Replaces shell wrapper approach for more reliable process management.
@BennyFranciscus
Copy link
Collaborator Author

good call — pushed a fix. Dream runs a single-threaded Lwt event loop, so the right approach is forking N workers (one per CPU core). switched to in-process Unix.fork with the SO_REUSEPORT shim so each worker binds the same port. OCaml's Unix.fork is a direct syscall wrapper so it's way more reliable than the shell wrapper approach that was getting stuck. should properly use all cores now!

Dream/Lwt's libev event loop state doesn't survive Unix.fork() —
children accept TCP connections but never process them (event loop
corrupted). Fixed by making server.ml purely single-process and
delegating multi-core to run.sh (N independent /server processes
with SO_REUSEPORT via LD_PRELOAD).

Also removed deprecated ~adjust_terminal parameter.
@BennyFranciscus
Copy link
Collaborator Author

Found the root cause of the startup hang! 🐛

OCaml's Lwt uses libev under the hood, and libev's event loop state doesn't survive Unix.fork(). The forked children would bind the socket and accept TCP connections, but the Lwt event loop was corrupted — so they'd never actually process any requests. That's why CI saw "waiting for server" forever even though the port was listening.

Fix: Removed all fork logic from server.ml — it's now a clean single-process server. Multi-core is handled entirely by run.sh which spawns N independent /server processes (one per CPU core). Each process initializes its own fresh Lwt event loop from scratch, so no state corruption.

Also cleaned up the deprecated ~adjust_terminal parameter.

Tested locally — server starts instantly and responds correctly. Should be good for CI now!

@BennyFranciscus
Copy link
Collaborator Author

CI failed due to a stale container from the previous cancelled run — The container name "/httparena-validate-dream" is already in use. Not a code issue.

Could you re-run the validation? The fix itself is clean — removed all in-process forking, now using the shell wrapper approach (same as Kemal). Each worker gets a fresh Lwt/libev event loop. 🙏

@dinosaure
Copy link

Apologies for mentioning this here, but our cooperative has decided to develop its own framework in OCaml (which takes advantage of multi-core processing): vif (for which a tutorial is available here. We have also produced a benchmark proposal here (the results of which are available here. We also decided to propose a benchmark protocol which gave us these results.

In other words, I would be delighted to contribute to a proposal that uses vif and/or httpcats, but Dream is no longer a solution we are considering for OCaml and unikernels.

@BennyFranciscus
Copy link
Collaborator Author

Oh this is really cool @dinosaure! 🎉 vif looks super interesting — native multicore OCaml with miou (cooperative + preemptive scheduling) is exactly what Dream is missing. The httpcats benchmarks look promising too.

Dream is still worth having here as the most well-known OCaml web framework (~1.6k stars, lots of production users), but I'd love to see vif as a separate entry — it'd be a great way to show what OCaml 5's multicore can really do.

Would you be up for opening a PR to add vif? The setup is pretty straightforward: a Dockerfile, the app implementing a few endpoints (plaintext, json, db queries via SQLite, file upload, compression), and a meta.json. Happy to help if you want a starting point — I can put together a skeleton based on the Dream entry.

The httpcats benchmark protocol is interesting too. Different approach from what HttpArena does (wrk2/bombardier based) but the methodology around latency measurement looks solid.

@MDA2AV
Copy link
Owner

MDA2AV commented Mar 15, 2026

Apologies for mentioning this here, but our cooperative has decided to develop its own framework in OCaml (which takes advantage of multi-core processing): vif (for which a tutorial is available here. We have also produced a benchmark proposal here (the results of which are available here. We also decided to propose a benchmark protocol which gave us these results.

In other words, I would be delighted to contribute to a proposal that uses vif and/or httpcats, but Dream is no longer a solution we are considering for OCaml and unikernels.

Ah I see, will drop this framework and would be great to have a vif or httpcats :)

We can draft a quick PR and run a benchmark over it with quick results if you prefer.

Repository owner deleted a comment from BennyFranciscus Mar 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants