Skip to content

feat(export): stream /export/dump to R2 with DO alarm resumption (#59)#251

Open
ViperDroid wants to merge 1 commit into
outerbase:mainfrom
ViperDroid:feat/streaming-dump-r2
Open

feat(export): stream /export/dump to R2 with DO alarm resumption (#59)#251
ViperDroid wants to merge 1 commit into
outerbase:mainfrom
ViperDroid:feat/streaming-dump-r2

Conversation

@ViperDroid
Copy link
Copy Markdown

The legacy /export/dump route buffers the entire dump in memory and runs synchronously, so it falls over on databases that exceed the 30s Worker timeout or the Durable Object memory ceiling (currently 1GB, soon 10GB).

This change adds a streaming path that lives inside the Durable Object:

  • POST /export/dump kicks off a job, opens an R2 multipart upload, and returns 202 with a jobId. Supports format=sql|csv|json plus optional callbackUrl/table/chunkSize.
  • GET /export/dump/status/:jobId returns progress (tables, rows, bytes, parts uploaded) and a downloadUrl once status is 'completed'.
  • GET /export/dump/download/:jobId streams the finished object back from R2 to the client.
  • DELETE /export/dump/:jobId aborts an in-flight upload.

The engine paginates 1000 rows at a time, buffers up to the R2 multipart 5 MiB minimum, flushes parts as they fill, and budgets each tick at 20s. When a tick yields, the leftover bytes are persisted to a temp R2 object (DO storage values are capped at 128 KiB and cannot hold the buffer directly). The DO alarm() handler dispatches dump work first, then falls through to the existing cron logic, so the two co-exist on the same alarm channel.

A new [[r2_buckets]] binding named DATABASE_DUMPS gates the streaming path. The legacy GET /export/dump remains untouched for small databases and existing clients.

Tests: 17 new unit tests covering the engine (mid-tick yield/resume, multipart flushing at the 5 MiB threshold, error abort, BLOB literals, empty databases, CSV/JSON formats) and the HTTP routes.

Purpose

Tasks

  • [ ]

Verify

…erbase#59)

The legacy /export/dump route buffers the entire dump in memory and runs
synchronously, so it falls over on databases that exceed the 30s Worker
timeout or the Durable Object memory ceiling (currently 1GB, soon 10GB).

This change adds a streaming path that lives inside the Durable Object:

- POST /export/dump kicks off a job, opens an R2 multipart upload, and
  returns 202 with a jobId. Supports format=sql|csv|json plus optional
  callbackUrl/table/chunkSize.
- GET /export/dump/status/:jobId returns progress (tables, rows, bytes,
  parts uploaded) and a downloadUrl once status is 'completed'.
- GET /export/dump/download/:jobId streams the finished object back from
  R2 to the client.
- DELETE /export/dump/:jobId aborts an in-flight upload.

The engine paginates 1000 rows at a time, buffers up to the R2 multipart
5 MiB minimum, flushes parts as they fill, and budgets each tick at 20s.
When a tick yields, the leftover bytes are persisted to a temp R2 object
(DO storage values are capped at 128 KiB and cannot hold the buffer
directly). The DO alarm() handler dispatches dump work first, then falls
through to the existing cron logic, so the two co-exist on the same
alarm channel.

A new [[r2_buckets]] binding named DATABASE_DUMPS gates the streaming
path. The legacy GET /export/dump remains untouched for small databases
and existing clients.

Tests: 17 new unit tests covering the engine (mid-tick yield/resume,
multipart flushing at the 5 MiB threshold, error abort, BLOB literals,
empty databases, CSV/JSON formats) and the HTTP routes.
@ViperDroid ViperDroid force-pushed the feat/streaming-dump-r2 branch from 08847db to 2099316 Compare May 26, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant