RDoc-3836 CDC documentation by Lwiel · Pull Request #2539 · ravendb/docs

Lwiel · 2026-06-24T13:26:07Z

Issue link

https://issues.hibernatingrhinos.com/issue/RDoc-3836/CDC-Feature-Documentation

Additional description

Documentation for the CDC feature

Type of change

Changes in docs URLs

No changes in docs URLs
Articles are restructured, URLs will change, mapping is required (update /scripts/redirects.json file, set Documents Moved PR label)

Changes in UX/UI

No changes in UX/UI
Changes in UX/UI (include screenshots and description)

Adds full CDC Sink ongoing task documentation in Docusaurus MDX format: - 16 core pages: overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration - 9 PostgreSQL pages: prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui - 4 PostgreSQL examples: simple-migration, denormalization, event-sourcing, complex-nesting - 1 SQL Server stub: overview - 4 _category_.json navigation files

…al features - Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with unified Columns list of CdcColumnMapping { Column, Name, Type } across all files - Add CdcColumnType enum documentation (Default, Json, Attachment) - Add REST API endpoints table to configuration-reference - Add CdcSink.PollIntervalInSec to server-configuration - Add error handling details to monitoring (threshold, fallback, exponential backoff) - Add ALTER PUBLICATION auto-fix note to postgres/initial-setup - Fix how-it-works: sequential scan description, Child Before Parent section - Fix Startup and Verification: split into per-database subsections - Update all prose references from ColumnsMapping to Columns list

…chment handling - Replace all new() shorthand with new CdcColumnMapping() across all files - attachment-handling: clarify that text columns (text, nvarchar, etc.) as well as binary columns can use Type = CdcColumnType.Attachment

…ble sections

… uses application/octet-stream

…ted documents

…uation style

…$old documentation - Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector) - Add patching.mdx "$row and $old: Names and Types" - Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename Slots" section (described hash-based naming, no longer accurate) with correct "Slot and Publication Names Are Immutable" section reflecting enforced immutability

- Name → CollectionName on CdcSinkTableConfig across all files - Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type) - Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables - Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference - Add CdcColumnMapping and CdcColumnType reference sections - Add put(id, document) and del(id) to patch capabilities - Add JSON Columns section to column-mapping - Remove non-existent Array References section from linked-tables - Remove non-existent Disabling an Embedded Table section from embedded-tables - Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec, PollIntervalInSec applies to SQL Server only) - Fix licensing link in overview - Add SQL Server and MySQL/MariaDB as supported source databases

- postgres/overview.mdx: connection string, logical replication explanation, prerequisites summary, section index - sql-server/overview.mdx: expand from stub to full page with connection string, CDC prerequisites, polling behavior, SourceTableSchema default - mysql/overview.mdx + _category_.json: connection string, binlog prerequisites, streaming behavior, required privileges

…roubleshooting sections - Source Schema Changes: how each database engine handles DDL changes on source tables while CDC Sink is running (adding/removing/renaming columns, SQL Server capture instance limitations) - Partial Export/Import and State Loss: @cdc-states collection, recovery guidance, SkipInitialLoad workaround, LSN editing risks

…update behavior Broken links fixed: - api-reference: send-multiple-operations → what-are-operations - attachment-handling: what-are-attachments → attachments/overview - column-mapping, patching: postgres/type-mapping.mdx (nonexistent) → cross-reference to patching.mdx#row-column-types PR comment fixes: - MySQL overview: rename MyMySqlConnection → MySqlConnection - Postgres overview: slot/publication names are GUID-based on first use, not deterministic hash-based New content: - how-it-works: "Updating the Task Configuration" section explaining that config changes only apply to new CDC events going forward. Existing documents are not retroactively re-processed. To apply changes to all documents, delete and recreate the task.

Move schema change documentation from troubleshooting into its own page with per-engine detail: - PostgreSQL: auto-detects via RelationMessage, most resilient - MySQL: detects via TableMapEvent column types, auto-recovers - SQL Server: requires explicit capture instance procedure (create new instance, drain old, then drop) - Quick reference table, SQL examples, recovery mechanism - Troubleshooting retains a short summary with link to the new page

MySQL CDC detects changes by column position. Compound ALTER TABLE statements (add + drop, ADD COLUMN ... AFTER ...) cause positional shifts that are hard to resolve. Apply one change at a time and let CDC Sink catch up between each.

Fix typo, missing heading, misleading wording, and inconsistent descriptions flagged during code review.

…etting - Rename CdcSink.PollIntervalInSec to CdcSink.SqlServer.PollIntervalInSec to match actual code (3 files) - Add missing CdcSink.Postgres.ReplicationTimeoutInSec (default 10s) - Fix type conversion: decimal→double should be numeric/decimal→decimal

…y endpoint with /schema

… PK only

…sed verbatim)

…s to start)

…scribe real backoff

…ape hatch

…ot first use)

… style)

Lwiel · 2026-07-02T12:22:51Z

+* CDC Sink is the reverse of ETL: instead of pushing data _from_ RavenDB _to_ SQL,
+  CDC Sink _pulls_ data _from_ SQL _into_ RavenDB.


SQL -> relational database. Though it could suggest that ETLs support relational databases only

Lwiel · 2026-07-02T12:24:45Z

+  The relational database is the source of truth; RavenDB receives a continuously-updated
+  document model derived from it.
+
+* CDC Sink maps normalized relational tables into rich, nested RavenDB documents -


I don't think "rich, nested" means much in this context

Lwiel · 2026-07-02T12:27:10Z

+* **Migrate from SQL to RavenDB**
+  Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB
+  documents where an Order contains embedded LineItems and a reference to the Customer -
+  automatically and continuously, without changing your SQL application.


Formatting isn't displayed correctly - no newline

Lwiel · 2026-07-02T12:53:39Z

+without requiring changes to the source system.
+
+* **Migrate from SQL to RavenDB**
+  Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB


No point mentioning Northwind tables here explicitly

Lwiel · 2026-07-02T12:57:20Z

+1. **Create** - Define the task in Studio or via the Client API
+   Specify the connection string, table mappings, and transformation options
+
+2. **Verify** - CDC Sink verifies the source database is properly configured
+   Checks permissions, replication prerequisites, and table configuration
+
+3. **Initial Load** - Full table scan populates RavenDB with current data
+   Progress is tracked per-table and persists across restarts
+
+4. **Stream** - Real-time change streaming begins
+   All INSERTs, UPDATEs, and DELETEs are applied to RavenDB documents as they occur
+
+5. **Monitor** - View statistics, errors, and progress in Studio
+
+6. **Retire** - Delete the task in RavenDB when no longer needed
+   PostgreSQL artifacts (replication slot, publication) must be cleaned up by
+   the database administrator separately


Lwiel · 2026-07-02T12:58:05Z

+   PostgreSQL artifacts (replication slot, publication) must be cleaned up by
+   the database administrator separately


Should be an admonition I think

Lwiel · 2026-07-02T12:58:40Z

@@ -0,0 +1 @@
+{"position": 10, "label": "Examples"}


Lwiel · 2026-07-02T12:58:58Z

@@ -0,0 +1 @@
+{"position": 16, "label": "PostgreSQL"}


Lwiel · 2026-07-02T12:59:13Z

@@ -0,0 +1 @@
+{"position": 2, "label": "CDC Sink"}


Lwiel · 2026-07-02T13:01:26Z

+After verification, CDC Sink creates the necessary change-tracking infrastructure
+in the source database, then begins the initial load.
+
+### PostgreSQL


We don't mention other supported providers

Lwiel · 2026-07-02T13:03:47Z

+
+When a CDC Sink task starts, it verifies that the source database is properly configured
+before doing anything else. If any check fails, CDC Sink reports the exact issue and the
+SQL an administrator needs to run to fix it. The task does not start until all checks pass.


I really, really don't like that there's a generic "SQL" thrown all around these articles - we should be precise about what are we talking about

Lwiel · 2026-07-02T13:05:06Z

+re-scanning the entire table.
+
+**Batch pipelining:** While one batch is being written to RavenDB, the next batch is
+being read from the source database, keeping both systems busy.


Maybe not the best wording

Lwiel · 2026-07-02T13:07:20Z

+
+<Tabs>
+<TabItem value="sql" label="sql">
+<CodeBlock language="sql">


Should be capitalized

Lwiel · 2026-07-02T13:11:15Z

+To apply configuration changes to **all** documents (not just new events), delete
+the CDC Sink task and recreate it. The new task will perform a fresh initial load,


Don't we have a restart option?

Lwiel · 2026-07-02T13:17:35Z

+</TabItem>
+</Tabs>
+
+**Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...`


Why the escaped characters?

Lwiel · 2026-07-02T13:33:54Z

+
+## Connection String
+
+Use a `SqlConnectionString` with `FactoryName` set to `"System.Data.SqlClient"` or


One of them is obsolete

ppekrol mentioned this pull request Jun 24, 2026

RavenDB-26046 - Add CDC Sink documentation #2387

Closed

3 tasks

ayende and others added 25 commits July 2, 2026 14:04

attachment-handling: add SQL schema examples for root and embedded ta…

bfa26e9

…ble sections

attachment-handling: text columns use text/plain content type, binary…

81faab4

… uses application/octet-stream

CdcColumnType.Json: any JSON value, not just objects/arrays

7619fff

patching, troubleshooting: replace get() with load() for loading rela…

4db186e

…ted documents

Reformat long lines in code blocks; fix CdcColumnMapping Type= contin…

5467b13

…uation style

RavenDB-26046 Address PR ravendb#2387 review comments

955070e

Fix typo, missing heading, misleading wording, and inconsistent descriptions flagged during code review.

RavenDB-26046 Fix CDC Sink API reference: replace non-existent /verif…

20f9033

…y endpoint with /schema

RavenDB-26046 Fix deep-nesting JoinColumns to reference direct parent…

5a52100

… PK only

RavenDB-26046 Fix document ID casing in examples (CollectionName is u…

e327006

…sed verbatim)

RavenDB-26046 Correct REPLICA IDENTITY permission behavior (task fail…

d81faf0

…s to start)

RavenDB-26046 Remove unsupported 'set MaxFallbackTime to 0' claim; de…

4774c09

…scribe real backoff

RavenDB-26046 Document OnDelete.IgnoreDeletes as REPLICA IDENTITY esc…

b2fcebc

…ape hatch

RavenDB-26046 Fix slot/publication naming timing (at task creation, n…

88a219a

…ot first use)

RavenDB-26046 Normalize em dashes to spaced hyphens (match docs house…

04442cf

… style)

Lwiel commented Jul 2, 2026

View reviewed changes

Lwiel force-pushed the RDoc-3836 branch from 5cec537 to 04442cf Compare July 2, 2026 14:05

		* CDC Sink is the reverse of ETL: instead of pushing data _from_ RavenDB _to_ SQL,
		CDC Sink _pulls_ data _from_ SQL _into_ RavenDB.

		PostgreSQL artifacts (replication slot, publication) must be cleaned up by
		the database administrator separately

		To apply configuration changes to all documents (not just new events), delete
		the CDC Sink task and recreate it. The new task will perform a fresh initial load,


		## Connection String

		Use a `SqlConnectionString` with `FactoryName` set to `"System.Data.SqlClient"` or

Uh oh!

Conversation

Lwiel commented Jun 24, 2026

Issue link

Additional description

Type of change

Changes in docs URLs

Changes in UX/UI

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants