Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
227 changes: 226 additions & 1 deletion API.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
# API Reference

This document provides a reference for the SQLite functions provided by the `sqlite-sync` extension.
This document provides a reference for the SQL functions provided by the `sqlite-sync` extension. Unless noted otherwise, the APIs are available on both SQLite and PostgreSQL builds.

## Index

- [Configuration Functions](#configuration-functions)
- [`cloudsync_init()`](#cloudsync_inittable_name-crdt_algo-init_flags)
- [`cloudsync_set()`](#cloudsync_setkey-value)
- [`cloudsync_enable()`](#cloudsync_enabletable_name)
- [`cloudsync_disable()`](#cloudsync_disabletable_name)
- [`cloudsync_is_enabled()`](#cloudsync_is_enabledtable_name)
Expand All @@ -21,9 +22,15 @@ This document provides a reference for the SQLite functions provided by the `sql
- [`cloudsync_siteid()`](#cloudsync_siteid)
- [`cloudsync_db_version()`](#cloudsync_db_version)
- [`cloudsync_uuid()`](#cloudsync_uuid)
- [`cloudsync_uuid_text()`](#cloudsync_uuid_textuuid-dash_format)
- [`cloudsync_uuid_blob()`](#cloudsync_uuid_blobuuid)
- [Schema Alteration Functions](#schema-alteration-functions)
- [`cloudsync_begin_alter()`](#cloudsync_begin_altertable_name)
- [`cloudsync_commit_alter()`](#cloudsync_commit_altertable_name)
- [Payload Functions](#payload-functions)
- [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq)
- [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id)
- [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload)
- [Network Functions](#network-functions)
- [`cloudsync_network_init()`](#cloudsync_network_initmanageddatabaseid)
- [`cloudsync_network_cleanup()`](#cloudsync_network_cleanup)
Expand All @@ -40,6 +47,37 @@ This document provides a reference for the SQLite functions provided by the `sql

## Configuration Functions

### `cloudsync_set(key, value)`

**Description:** Stores a global CloudSync setting in the current database. Settings persist across database reopens and are loaded automatically by the extension.

The following payload setting is supported:

| Key | Description | Default | Minimum |
|---|---|---:|---:|
| `payload_max_chunk_size` | Maximum transport payload size generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id). Values below the minimum are clamped. | `5242880` (5 MB) | `262144` (256 KB) |

`payload_max_chunk_size` affects only chunk generation. [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) continues to accept legacy payloads, monolithic payloads, and v3 chunk-fragment payloads even when they are larger than the local setting. This preserves compatibility between peers using different settings.

**Parameters:**

- `key` (TEXT): The setting key.
- `value` (TEXT): The setting value. For `payload_max_chunk_size`, pass the value in bytes.

**Returns:** SQLite returns no value. PostgreSQL returns `true` on success.

**Example:**

```sql
-- Use 1 MB transport chunks
SELECT cloudsync_set('payload_max_chunk_size', '1048576');

-- Restore the default 5 MB transport chunks
SELECT cloudsync_set('payload_max_chunk_size', '5242880');
```

---

### `cloudsync_init(table_name, [crdt_algo], [init_flags])`

**Description:** Initializes a table for `sqlite-sync` synchronization. This function is idempotent and needs to be called only once per table on each site; configurations are stored in the database and automatically loaded with the extension.
Expand Down Expand Up @@ -363,6 +401,45 @@ INSERT INTO products (id, name) VALUES (cloudsync_uuid(), 'New Product');

---

### `cloudsync_uuid_text(uuid, [dash_format])`

**Description:** Converts a 16-byte binary UUID (such as the `site_id` stored in `cloudsync_changes`, or the value returned by [`cloudsync_siteid()`](#cloudsync_siteid)) into its canonical string form.

**Parameters:**

- `uuid` (BLOB/BYTEA): The 16-byte UUID. Returns `NULL` if `uuid` is `NULL`; raises an error if it is not exactly 16 bytes.
- `dash_format` (BOOLEAN, optional, default `true`): When `true`, returns the canonical 36-character dashed form (`xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx`); when `false`, returns the bare 32-character hex form.

**Returns:** The UUID as a TEXT value (lowercase hex).

**Example:**

```sql
SELECT cloudsync_uuid_text(cloudsync_siteid()); -- 0190a1b2-c3d4-7e5f-8a9b-001122334455
SELECT cloudsync_uuid_text(cloudsync_siteid(), false); -- 0190a1b2c3d47e5f8a9b001122334455
```

---

### `cloudsync_uuid_blob(uuid)`

**Description:** Converts a UUID string into its 16-byte binary form. This is the inverse of [`cloudsync_uuid_text()`](#cloudsync_uuid_textuuid-dash_format) and lets string-based callers (for example, an HTTP `/check` endpoint holding a stringified `site_id`) pass a `site_id` to [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id).

**Parameters:**

- `uuid` (TEXT): A UUID string. Tolerant: accepts the canonical dashed form and the bare 32-hex form, case-insensitive. Returns `NULL` if `uuid` is `NULL`; raises an error on malformed input.

**Returns:** The 16-byte UUID as a BLOB/BYTEA.

**Example:**

```sql
SELECT cloudsync_uuid_blob('0190a1b2-c3d4-7e5f-8a9b-001122334455');
SELECT cloudsync_uuid_blob('0190A1B2C3D47E5F8A9B001122334455');
```

---

## Schema Alteration Functions

### `cloudsync_begin_alter(table_name)`
Expand Down Expand Up @@ -409,6 +486,150 @@ SELECT cloudsync_commit_alter('my_table');

---

## Payload Functions

### `cloudsync_payload_encode(tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq)`

**Description:** Encodes rows from `cloudsync_changes` into a single monolithic payload. This is the legacy payload API and remains fully supported for backward compatibility.

Use this API when the expected payload size is modest or when you need to interoperate with callers that expect a single BLOB. For large rowsets or large individual BLOB/TEXT values, prefer [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id), which splits transport payloads according to `payload_max_chunk_size`.

**Parameters:** The function is an aggregate over the columns returned by `cloudsync_changes`:

- `tbl` (TEXT): Source table name.
- `pk` (BLOB): Encoded primary key.
- `col_name` (TEXT): Changed column name.
- `col_value` (BLOB): Encoded column value.
- `col_version` (INTEGER/BIGINT): Column version.
- `db_version` (INTEGER/BIGINT): Source database version.
- `site_id` (BLOB): Source site identifier.
- `cl` (INTEGER/BIGINT): Causal length.
- `seq` (INTEGER/BIGINT): Sequence number within the source database version.

**Returns:** A single payload BLOB.

**Example:**

```sql
SELECT cloudsync_payload_encode(
tbl, pk, col_name, col_value, col_version, db_version, site_id, cl, seq
) AS payload
FROM cloudsync_changes;
```

---

### `cloudsync_payload_chunks([since_db_version], [filter_site_id], [until_db_version], [exclude_filter_site_id])`

**Description:** Generates sync payloads as a stream of transport-sized chunks. It is the chunk-aware evolution of [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq), designed for large rowsets and for single BLOB/TEXT values that are larger than the configured chunk size.

The maximum generated chunk size is controlled by the global `payload_max_chunk_size` setting. The default is 5 MB and the technical minimum is 256 KB:

```sql
SELECT cloudsync_set('payload_max_chunk_size', '5242880');
```

When a single encoded column value does not fit in one chunk, CloudSync transparently emits v3 payload fragments for that value. The receiver stages fragments internally and applies the value when all parts arrive. Fragments can arrive out of order; incomplete stale fragment groups are cleaned up automatically.

`cloudsync_payload_chunks()` does not change the apply contract: [`cloudsync_payload_apply()`](#cloudsync_payload_applypayload) accepts legacy payloads, monolithic payloads, and v3 chunk-fragment payloads. The local `payload_max_chunk_size` setting is not used to reject incoming payloads.

**Important memory note:** chunking limits the size of each transport payload that CloudSync generates. It does not remove the database engine's need to materialize a single final cell value when applying a very large BLOB/TEXT column. In other words, a 500 MB BLOB can be transported in smaller chunks, but the receiving database must still be able to store and bind the completed 500 MB value when that row is applied.

**Parameters:**

- `since_db_version` (INTEGER/BIGINT, optional): Start after this source database version. If omitted, CloudSync uses the stored send checkpoint.
- `filter_site_id` (BLOB, optional): Site ID to filter on. With `exclude_filter_site_id` unset/`false` it selects changes **from** this site; with `exclude_filter_site_id` `true` it selects changes from every site **except** this one. If omitted (and not excluding), CloudSync uses the local site ID.
- `until_db_version` (INTEGER/BIGINT, optional): Upper watermark to include. If omitted or `0`, CloudSync captures the current maximum source database version before streaming chunks.
- `exclude_filter_site_id` (BOOLEAN, optional, default `false`): When `true`, stream changes from all sites **except** `filter_site_id`. This is what the `/check` download path needs — a peer must not receive its own changes back. Setting it `true` without a `filter_site_id` is an error. The site_id stored in `cloudsync_changes` is the 16-byte binary UUID; string callers can convert with [`cloudsync_uuid_blob()`](#cloudsync_uuid_blobuuid).

**Returns:** A rowset with one row per chunk:

| Column | Description |
|---|---|
| `payload` | Payload BLOB to pass to `cloudsync_payload_apply()`. |
| `chunk_index` | Zero-based chunk index for this stream. |
| `payload_size` | Payload size in bytes. |
| `rows` | Number of encoded payload rows in this chunk. Fragment chunks usually contain one fragment row. |
| `db_version_min` | Minimum source `db_version` represented by this chunk. |
| `db_version_max` | Maximum source `db_version` represented by this chunk. |
| `watermark_db_version` | Stable upper watermark captured for this chunk stream. Store this after all chunks are durably transferred/applied. |

**SQLite usage:** `cloudsync_payload_chunks` is exposed as a virtual table with hidden constraint columns:

```sql
-- Default: uses the stored send checkpoint and local site id
SELECT payload, chunk_index, payload_size, watermark_db_version
FROM cloudsync_payload_chunks
ORDER BY chunk_index;

-- Explicit arguments through hidden columns
SELECT payload, chunk_index, payload_size, watermark_db_version
FROM cloudsync_payload_chunks
WHERE since_db_version = 100
AND site_id = cloudsync_siteid()
AND until_db_version = 200
ORDER BY chunk_index;

-- /check download: all changes EXCEPT the requesting peer's site
SELECT payload, chunk_index, watermark_db_version
FROM cloudsync_payload_chunks
WHERE since_db_version = 100
AND site_id = cloudsync_uuid_blob('0190a1b2-c3d4-7e5f-8a9b-001122334455')
AND exclude_filter_site_id = 1
ORDER BY chunk_index;
```

**PostgreSQL usage:** `cloudsync_payload_chunks` is exposed as a set-returning function with optional arguments:

```sql
-- Default: uses the stored send checkpoint and local site id
SELECT *
FROM cloudsync_payload_chunks();

-- Explicit arguments
SELECT *
FROM cloudsync_payload_chunks(100, cloudsync_siteid(), 200);

-- /check download: all changes EXCEPT the requesting peer's site
SELECT *
FROM cloudsync_payload_chunks(100, cloudsync_uuid_blob('0190a1b2-c3d4-7e5f-8a9b-001122334455'), NULL, true);
```

**Apply example:**

```sql
-- Apply chunks on a receiving peer. Chunks may be applied one at a time.
SELECT cloudsync_payload_apply(?);
```

On PostgreSQL, apply chunks as individual statements from the transport/client layer. Do not use a set-based statement such as `SELECT cloudsync_payload_apply(payload) FROM chunks_table;` while reading payloads from a table in the same database session. `cloudsync_payload_apply()` performs writes through SPI, and applying while the same statement is still scanning a payload table can conflict with PostgreSQL executor resource ownership. Fetch each payload into the client (or into a local procedural variable after the read completes) and then call `cloudsync_payload_apply()` for that single payload.

---

### `cloudsync_payload_apply(payload)`

**Description:** Applies a sync payload to the current database. The function accepts all supported payload formats:

- Legacy payloads generated by older SQLite Sync versions.
- Monolithic payloads generated by [`cloudsync_payload_encode()`](#cloudsync_payload_encodetbl-pk-col_name-col_value-col_version-db_version-site_id-cl-seq).
- Chunk-fragment payloads generated by [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id).

When a v3 fragment payload is received, CloudSync stores the fragment in an internal table and returns after applying zero or more completed values. Once the final fragment for a value is received, the completed value is validated and applied. Duplicate fragment delivery is idempotent.

**Parameters:**

- `payload` (BLOB/BYTEA): Payload BLOB to apply.

**Returns:** Number of payload rows applied. Fragment payloads that are staged but not yet complete can return `0`.

**Example:**

```sql
SELECT cloudsync_payload_apply(:payload);
```

---

## Network Functions

### `cloudsync_network_init(managedDatabaseId)`
Expand Down Expand Up @@ -500,6 +721,10 @@ This means: if you get JSON back, the server was reachable and the network proto

**Description:** Sends all unsent local changes to the remote server.

The send path streams payloads through [`cloudsync_payload_chunks()`](#cloudsync_payload_chunkssince_db_version-filter_site_id-until_db_version-exclude_filter_site_id), so `payload_max_chunk_size` also limits the payloads generated for network transport. Each generated chunk is uploaded/applied independently; the local send checkpoint is advanced only after the chunk stream completes successfully.

Chunk transport is transparent to the CloudSync backend. Each chunk is sent as a normal `/apply` payload, either inline as a base64 `blob` or through the upload `url` path. There is no separate chunk flag: old payloads, monolithic payloads, and v3 fragment payloads are distinguished by the payload format itself.

**Parameters:** None.

**Returns:** A JSON string with the send result:
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,22 @@ All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).

## [Unreleased]

### Added

- **Chunked payload generation** via `cloudsync_payload_chunks()`, available as a SQLite virtual table and as a PostgreSQL set-returning function. The API emits transport-sized payload chunks and transparently fragments oversized BLOB/TEXT values into v3 fragment payloads.
- **`payload_max_chunk_size` global setting** for controlling generated chunk size. The default is 5 MB and values below the 256 KB technical minimum are clamped.
- **`exclude_filter_site_id` argument** for `cloudsync_payload_chunks()`. When set, the function streams changes from every site **except** `filter_site_id`, which is what the `/check` download path needs (a peer must not receive its own changes back). The default (omitted/`false`) preserves the existing single-site behavior. Passing the flag without a `filter_site_id` is an error.
- **`cloudsync_uuid_text()` / `cloudsync_uuid_blob()`** scalar functions on both SQLite and PostgreSQL, converting between the 16-byte binary `site_id` and its canonical UUID string. `cloudsync_uuid_text()` takes an optional `dash_format` argument (default `true`); `cloudsync_uuid_blob()` accepts dashed or undashed, case-insensitive input. These let string-based callers (e.g. the `/check` endpoint) pass a `site_id` to `cloudsync_payload_chunks()`.
- **Payload chunking documentation** in `API.md` and `PERFORMANCE.md`, including the explicit memory note that chunking bounds transport payloads but the database must still materialize a completed single BLOB/TEXT value when it is applied.
- **PostgreSQL `1.0 -> 1.1` upgrade script** (`migrations/cloudsync--1.0--1.1.sql`) for the new chunked-payload SQL surface, so existing deployments can `ALTER EXTENSION cloudsync UPDATE`.

### Changed

- `cloudsync_payload_apply()` now accepts legacy payloads, monolithic payloads, and v3 fragment payloads without enforcing the local `payload_max_chunk_size`, preserving compatibility between peers with different settings.
- `cloudsync_network_send_changes()` now streams outgoing changes through `cloudsync_payload_chunks()` instead of first building one monolithic payload. This bounds transport payload size for the built-in network path and lets large rowsets or oversized BLOB/TEXT values flow through the same `/apply` endpoint as regular payloads.

## [1.0.20] - 2026-05-26

### Changed
Expand Down
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,12 @@ COV_FILES = $(filter-out $(SRC_DIR)/lz4.c $(NETWORK_DIR)/network.c $(SQLITE_IMPL
CURL_LIB = $(CURL_DIR)/$(PLATFORM)/libcurl.a
TEST_TARGET = $(patsubst %.c,$(DIST_DIR)/%$(EXE), $(notdir $(TEST_SRC)))

# Build curl hermetically: neutralize the developer's ambient build env so
# curl's ./configure compile tests aren't broken by overrides leaking in
# (e.g. exported LDFLAGS/CPPFLAGS/LIBS pointing at Homebrew). Build flags for
# curl are supplied explicitly via CURL_CONFIG.
CURL_CONFIG_ENV = LDFLAGS= CPPFLAGS= LIBS= CFLAGS=

# Platform-specific settings
ifeq ($(PLATFORM),windows)
TARGET := $(DIST_DIR)/cloudsync.dll
Expand Down Expand Up @@ -326,7 +332,7 @@ else
unzip $(CURL_DIR)/src/curl.zip -d $(CURL_DIR)/src/.
endif

cd $(CURL_SRC) && ./configure \
cd $(CURL_SRC) && $(CURL_CONFIG_ENV) ./configure \
--without-libpsl \
--disable-alt-svc \
--disable-ares \
Expand Down
Loading
Loading