This repository contains a pair of WebAssembly components that wrap the DuckDB C API (libduckdb) and expose it through the Wasm component model.
ducklink-core: Implements theduckdb:component/databaseworld and provides structured access to DuckDB connections and SQL execution.ducklink-cli: Implements thewasi:cli/runworld and offers a WASI-native command line interface that mirrors the behaviour of the native DuckDB shell while delegating database access through the component interface.
Both components are intended to run in preview2-capable runtimes such as wasmtime 16.0+.
Documentation: a Docusaurus site organizing these docs lives in
website/. Build it withcd website && npm install && npm run build, or run it locally withcd website && npm start.
The repo also ships 111 component extensions (254 SQL functions) — Rust
wasm32-wasip2 components implementing the duckdb:extension WIT world, loadable
at runtime with LOAD <name> and verified by tooling/smoke.py. They span text
& NLP, encodings, crypto, aggregates (bloom/minhash/count-min sketches), and gated
network (dns/http). See CATALOG.md for the full index
(regenerate with python3 tooling/gen-catalog.py; verify integrity with
python3 tooling/verify-catalog.py).
Track what builds embed which extensions — sqlink's Bundles model, adapted
to ducklink's two embedding layers and stored as JSON. A build record is a
NAMED, content-hashed set of embedding members keyed by a set_hash:
- core_embedded — the wasm core's statically embedded set (
EMBED_EXTENSIONSat build time; the lean default embeds nothing optional —core_functions+parquet). - components — the loaded / autoloaded / COMPOSED component extensions
(e.g.
jsonfnsautoloads;spatialprojcomposes the GDAL component viawac).
Records live in registry/builds.json; the human-readable index is
BUILDS.md. The recorder/query tool is tooling/builds.py:
# Record an ad-hoc bundle (core embed set + a component, content-hashed)
python3 tooling/builds.py record lean-default --kind core \
--embed core_functions,parquet \
--component jsonfns@artifacts/extensions/jsonfns.wasm
python3 tooling/builds.py list # NAME | KIND | CORE-EMBEDDED | #COMP | SET-HASH | CREATED
python3 tooling/builds.py show lean-default # full detail incl. composed-of graph
python3 tooling/builds.py gen # (re)write BUILDS.md
python3 tooling/builds.py verify # every set_hash recomputes; every artifact presentMembers are content-hashed with BLAKE2b-256 (stdlib hashlib.blake2b; sqlink
uses blake3, unavailable in the Python stdlib). The set_hash is BLAKE2b-256 over
the sorted, newline-terminated name\thash member lines — the same named-set /
content-hashed-member / set_hash identity as sqlink. Re-recording an unchanged set
is idempotent (created_at is preserved); reusing a name for a different set is an
error (sqlink's alias-conflict rule).
Self-recording build hooks make the embedding sets capture themselves:
- The wasm build script (
../duckdb-wasm/scripts/build-libduckdb-wasm.sh) writesregistry/last-core-build.json(theEMBED_EXTENSIONSsplit + the core artifact hash) after a successful build (guarded on$DUCKLINK). extensions/spatialproj-component/compose.shwritesspatialproj.compose.jsonafterwac plug, recording the GDAL composition (gdalembedsPROJ/proj.db).
Ingest either with tooling/builds.py record <name> --from-manifest <file>.
The .bundle dot command (extensions/bundle-dotcmd/, in artifacts/dotcmds/)
is the interactive surface: .bundle loaded introspects the live loaded-extension
set (via duckdb_extensions(), a core builtin — works on the lean core, no JSON
extension needed), .bundle members renders it as set_hash member lines, and
.bundle help points at tooling/builds.py for the persisted records (the dotcmd
has no filesystem access, and read_json needs the de-embedded JSON extension).
wit/
core/ Shared database interface definitions
standalone/ WASI-oriented worlds (standalone DB + CLI)
browser/ Browser-oriented database world
crates/
libduckdb-sys/ bindgen-based bindings to the DuckDB C API
ducklink-core/ Component implementation of the DuckDB API
ducklink-cli/ WASI CLI component built on top of the exported API
scripts/
build-libduckdb-wasm.sh Helper for cross-compiling DuckDB to wasm32-wasi
cmake/toolchains/
wasi-sdk.cmake Toolchain file for building DuckDB with wasi-sdk
- DuckDB source at
DUCKDB_SOURCE_DIR(e.g.~/src/duckdb). A shallow clone is sufficient:git clone https://github.com/duckdb/duckdb.git ~/src/duckdb - wasi-sdk (tested with 33.0; exception handling requires >= 33) with
WASI_SDK_PREFIXpointing at the installation root. A predownloaded copy lives underexternal/wasi-sdk-33.0-<platform>; point the variable there if you do not have a global install. - Rust tooling:
rustup target add wasm32-wasicargo install cargo-component
- wit-bindgen tooling (included automatically by
cargo-component).
Network access is required only when fetching DuckDB or installing the toolchain.
The component links against a statically built libduckdb compiled for wasm32-wasi. Use the helper script to cross-compile the library:
export DUCKDB_SOURCE_DIR=~/src/duckdb
export WASI_SDK_PREFIX="$(pwd)/external/wasi-sdk-33.0-arm64-macos"
export WASI_TARGET_TRIPLE=wasm32-wasip2
export WASM_EXTENSIONS=json # defaults to json if unset; comma‑separate to add more later
scripts/build-libduckdb-wasm.shThe script places libduckdb-wasi.a under artifacts/. Afterwards set the following environment variables so the Rust build can locate the headers and the archive:
export DUCKDB_INCLUDE_DIR="$DUCKDB_SOURCE_DIR/src/include"
export DUCKDB_STATIC_LIB="$(pwd)/artifacts/libduckdb-wasi.a"For the browser component you will need a DuckDB archive compiled for the appropriate wasm32-unknown-unknown (or equivalent) target. Once built, point DUCKDB_STATIC_LIB at that archive and use the make core-browser target to produce ducklink_core.wasm with the browser feature enabled.
Compile both components using the make targets (they call cargo component under the hood):
makeIndividual targets are also available:
make core
make ducklink-cli
# Build the browser-oriented core (requires a browser-compatible DuckDB static archive)
make core-browser BROWSER_TARGET=wasm32-unknown-unknownThe resulting component binaries are generated in target/wasm32-wasi/release/:
ducklink_core.wasmducklink_cli.wasm
Extensions live under extensions/<name>-component, register imperatively in
load() against the duckdb:extension world, and are tracked by the tooling in
tooling/ + registry/ (mirrors ~/git/sqlite-wasm's system). The full
roadmap is in PLAN-duckdb-extensions.md.
# Scaffold a skeleton (consults tooling/compat-registry.json for crate status,
# registers the workspace member, and cargo-checks that it compiles):
make ext-scaffold NAME=myext CRATE=base32,bs58
# Edit extensions/myext-component/src/lib.rs + smoke.sql, then build + smoke:
make ext NAME=myext-component
# Seed assertions from current output, review, and re-run to assert:
python3 tooling/smoke.py --seed-expected myext
python3 tooling/smoke.py myext
make ext-smoke-all # smoke every extension
make ext-list-broken # crates flagged un-buildable on wasm32-wasip2
python3 tooling/t-status.py # tooling-improvement items from build experienceExtensions load through the native host runner (ducklink); the
wac-composed standalone CLI links a no-op loader stub and cannot instantiate
them. isin (hand-rolled) and baseN (crate-backed) are worked examples. See
docs/component-extension-guide.md for the
capability surface and packaging details.
wit/core/duckdb-core.witdefines the sharedduckdb:component/databaseinterface implemented by the core component.wit/standalone/duckdb-standalone.witexports the database world for WASI runtimes, whilewit/standalone/duckdb-cli.witwires in the CLI experience on top of it.wit/browser/duckdb-browser.witwill back the browser-friendly component variant, sharing the same database surface but relying on host-provided storage and networking.
Instantiate the database component with a runtime that supports the component model. For example, using wasmtime:
wasmtime component run target/wasm32-wasi/release/ducklink_core.wasm --dir .Pre-open directories that contain database files (e.g. --dir .) so the component can access them via WASI.
The CLI component imports the database world and exposes a wasi:cli entry point. To run it with wasmtime you can compose the CLI and core components using the wac tool:
# Install the wac CLI once
cargo install wac-cli
# Compose the CLI + core component pair
wac plug target/wasm32-wasip2/release/ducklink_cli.wasm \
--plug target/wasm32-wasip2/release/ducklink_core.wasm \
-o artifacts/duckdb-cli.wasm
# Execute a query (grant directory access for any on-disk database file)
wasmtime run artifacts/duckdb-cli.wasm --dir . -- :memory: -c "select 42;"For quick validation there is also a helper script that performs the wac plug
step and executes a simple query:
scripts/smoke-cli.shThe script accepts optional environment variables (SQL, DB_PATH, EXTRA_WASMTIME_FLAGS, EXTENSIONS)
to tailor the smoke test.
For example, set EXTENSIONS="sample_extension" to pass --load-extension sample_extension
to the CLI before the query runs.
The CLI supports:
- Connecting to a database file or running purely in-memory (
ducklink_cli.wasm :memory:) - Executing a single command via
-c "SQL" - Preloading componentized extensions via
--load-extension <name>(repeat for multiple extensions); this issues aLOAD <name>statement before user SQL runs - Interactive REPL with
.help,.exit, and.quit
Result sets are rendered in a text table that mirrors the native DuckDB shell.
All WIT interfaces live under wit/ at the repository root. That directory
vendors the WASI Preview 2 packages at version 0.2.6 (the latest preview
supported by Wasmtime 37.0.2), along with the DuckDB-specific packages. The
crate-local copies under crates/*/wit/ are generated from this canonical tree
via scripts/sync-core-wit.sh and scripts/sync-cli-wit.sh. Always edit the WIT
files in wit/ first, then re-run the sync scripts to propagate changes before
building.
External extensions can depend on the definitions in wit/duckdb-extension/
to stay in sync with the host runtime without having to vendor their own copies
of the extension interfaces.
The ducklink-host crate provides a reusable Wasmtime runner that composes the CLI
and core components along with the componentized extension loader. Build and execute it via:
cargo run -p ducklink-host --bin ducklink -- -- duckdb-cli :memory: -c "select 42 as answer;"Additional directories can be exposed to the CLI with --dir /host/path::/guest/path, and
custom component artifacts can be supplied with --core-component / --cli-component. The
host automatically preopens the current working directory as . so relative database paths
continue to work.
DuckDB’s extension loader is in the process of resolving WebAssembly components from artifacts/extensions/. When an extension registers itself with the core component, the name is sanitized to [A-Za-z0-9_-] and mapped to <name>.wasm inside that directory. As the loader matures, dropping a compiled extension there will allow LOAD <name> to instantiate it through the preview2 runtime rather than the native shared-library path.
This repository ships a minimal sample extension under extensions/sample-extension-component/ that exercises the component interface. You can build and validate it end-to-end via:
make smoke-extensionThe target runs the ducklink-host test load_sample_extension_component, which:
- Builds the sample extension (if it is not already present).
- Copies the resulting component to
artifacts/extensions/sample_extension.wasm. - Instantiates it with Wasmtime using the preview2 bindings and asserts that
load()returns the expected metadata.
Currently the project does not ship a full integration test suite because executing the components requires a preview2 runtime plus a wasm32-wasi build of DuckDB. Manual smoke testing can be done after building:
wasmtime component run artifacts/duckdb-cli.wasm --dir . -- in_memory_db.duckdb -c "select 42 as answer;"There are also convenience targets:
make smoke-cli # :memory: query via scripts/smoke-cli.sh
make smoke-cli-disk # same but forces an on-disk temp database
make sample-extension # builds the sample component and copies it to artifacts/extensions/
make smoke-extension # runs Cargo test to build + load the sample extension componentTo validate the preview2 filesystem adapter against real storage outside of make, set ON_DISK_SMOKE=1 when running scripts/smoke-cli.sh; the helper will create a temporary on-disk database, grant Wasmtime access to that directory, and delete it after the query completes.
Continuous smoke coverage runs in CI via .github/workflows/smoke-tests.yml, which builds the components and executes both the in-memory and on-disk runs of scripts/smoke-cli.sh on every push and pull request.
Until hosted Actions are available (public repo / billing), the same workflow can run locally with nektos/act in Docker:
brew install act # one-time (Docker must be running)
make ci-local # runs .github/workflows/smoke-tests.yml
scripts/ci-local.sh -l # list jobs without running.actrc maps ubuntu-latest to catthehacker/ubuntu:act-latest and enables
--reuse so caches persist between runs. The wasi-sdk download in the workflow
is architecture-aware (x86_64/arm64), so it runs natively under act on Apple
silicon as well as on GitHub's x86_64 runners. The first run is slow (it pulls
the runner image, compiles the component tooling, and builds the patched DuckDB
archive); afterwards the cached archive makes runs fast.
Beyond execute / open-stream, the database interface exposes:
- Prepared statements —
prepare(conn, sql)returns a reusableprepared-statementresource;execute(params)binds positional parameters ($1,$2, ...) and runs it, rebinding from scratch each call. - Configuration —
open-with-config(path, options)opens a database applying(name, value)options (e.g.access_mode,default_order,max_memory). - Arrow —
query-arrow(conn, sql)returns the result as an Arrow IPC stream (list<u8>), decodable by any Arrow implementation (apache-arrow in JS, arrow-rs in Rust). Zero-copy is not possible across the component boundary, so buffers are serialized once into IPC bytes.
- Flesh out remaining CLI scripting parity with the native shell
- Resolve GitHub Actions billing so the smoke-tests workflow can run
This project owes a clear debt to Simon Willison
and sqlite-utils. The extension catalog,
the scaffold → smoke → feedback tooling loop, and much of the CLI ergonomics here
follow the patterns Simon established with sqlite-utils and the wider Datasette
ecosystem for making a database pleasant to extend and script from the command
line. Many of the component extensions also mirror utilities first popularized in
that ecosystem. Thank you.
