Skip to content

RDoc-3836 CDC documentation#2539

Open
Lwiel wants to merge 25 commits into
ravendb:mainfrom
Lwiel:RDoc-3836
Open

RDoc-3836 CDC documentation#2539
Lwiel wants to merge 25 commits into
ravendb:mainfrom
Lwiel:RDoc-3836

Conversation

@Lwiel

@Lwiel Lwiel commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Issue link

https://issues.hibernatingrhinos.com/issue/RDoc-3836/CDC-Feature-Documentation

Additional description

Documentation for the CDC feature

Type of change

  • Content - docs
  • Content - cloud
  • Content - guides
  • Content - start pages/other
  • New docs feature (consider updating /templates or readme)
  • Bug fix
  • Optimization
  • Other

Changes in docs URLs

  • No changes in docs URLs
  • Articles are restructured, URLs will change, mapping is required (update /scripts/redirects.json file, set Documents Moved PR label)

Changes in UX/UI

  • No changes in UX/UI
  • Changes in UX/UI (include screenshots and description)

ayende and others added 25 commits July 2, 2026 14:04
Adds full CDC Sink ongoing task documentation in Docusaurus MDX format:

- 16 core pages: overview, how-it-works, schema-design, embedded-tables,
  linked-tables, column-mapping, patching, delete-strategies,
  property-retention, attachment-handling, configuration-reference,
  api-reference, monitoring, failover-and-consistency, troubleshooting,
  server-configuration
- 9 PostgreSQL pages: prerequisites-checklist, wal-configuration,
  permissions-and-roles, initial-setup, replica-identity,
  replica-identity-manual-setup, cleanup-and-maintenance,
  monitoring-postgres, studio-ui
- 4 PostgreSQL examples: simple-migration, denormalization,
  event-sourcing, complex-nesting
- 1 SQL Server stub: overview
- 4 _category_.json navigation files
…al features

- Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with
  unified Columns list of CdcColumnMapping { Column, Name, Type } across all files
- Add CdcColumnType enum documentation (Default, Json, Attachment)
- Add REST API endpoints table to configuration-reference
- Add CdcSink.PollIntervalInSec to server-configuration
- Add error handling details to monitoring (threshold, fallback, exponential backoff)
- Add ALTER PUBLICATION auto-fix note to postgres/initial-setup
- Fix how-it-works: sequential scan description, Child Before Parent section
- Fix Startup and Verification: split into per-database subsections
- Update all prose references from ColumnsMapping to Columns list
…chment handling

- Replace all new() shorthand with new CdcColumnMapping() across all files
- attachment-handling: clarify that text columns (text, nvarchar, etc.) as well
  as binary columns can use Type = CdcColumnType.Attachment
…$old documentation

- Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types
  and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector)

- Add patching.mdx "$row and $old: Names and Types"

- Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename
  Slots" section (described hash-based naming, no longer accurate) with correct
  "Slot and Publication Names Are Immutable" section reflecting enforced immutability
- Name → CollectionName on CdcSinkTableConfig across all files
- Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type)
- Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables
- Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference
- Add CdcColumnMapping and CdcColumnType reference sections
- Add put(id, document) and del(id) to patch capabilities
- Add JSON Columns section to column-mapping
- Remove non-existent Array References section from linked-tables
- Remove non-existent Disabling an Embedded Table section from embedded-tables
- Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec,
  PollIntervalInSec applies to SQL Server only)
- Fix licensing link in overview
- Add SQL Server and MySQL/MariaDB as supported source databases
- postgres/overview.mdx: connection string, logical replication explanation,
  prerequisites summary, section index
- sql-server/overview.mdx: expand from stub to full page with connection string,
  CDC prerequisites, polling behavior, SourceTableSchema default
- mysql/overview.mdx + _category_.json: connection string, binlog prerequisites,
  streaming behavior, required privileges
…roubleshooting sections

- Source Schema Changes: how each database engine handles DDL changes
  on source tables while CDC Sink is running (adding/removing/renaming
  columns, SQL Server capture instance limitations)
- Partial Export/Import and State Loss: @cdc-states collection, recovery
  guidance, SkipInitialLoad workaround, LSN editing risks
…update behavior

Broken links fixed:
- api-reference: send-multiple-operations → what-are-operations
- attachment-handling: what-are-attachments → attachments/overview
- column-mapping, patching: postgres/type-mapping.mdx (nonexistent)
  → cross-reference to patching.mdx#row-column-types

PR comment fixes:
- MySQL overview: rename MyMySqlConnection → MySqlConnection
- Postgres overview: slot/publication names are GUID-based on first
  use, not deterministic hash-based

New content:
- how-it-works: "Updating the Task Configuration" section explaining
  that config changes only apply to new CDC events going forward.
  Existing documents are not retroactively re-processed. To apply
  changes to all documents, delete and recreate the task.
Move schema change documentation from troubleshooting into its own
page with per-engine detail:
- PostgreSQL: auto-detects via RelationMessage, most resilient
- MySQL: detects via TableMapEvent column types, auto-recovers
- SQL Server: requires explicit capture instance procedure (create
  new instance, drain old, then drop)
- Quick reference table, SQL examples, recovery mechanism
- Troubleshooting retains a short summary with link to the new page
MySQL CDC detects changes by column position. Compound ALTER TABLE
statements (add + drop, ADD COLUMN ... AFTER ...) cause positional
shifts that are hard to resolve. Apply one change at a time and let
CDC Sink catch up between each.
Fix typo, missing heading, misleading wording, and inconsistent
descriptions flagged during code review.
…etting

- Rename CdcSink.PollIntervalInSec to CdcSink.SqlServer.PollIntervalInSec
  to match actual code (3 files)
- Add missing CdcSink.Postgres.ReplicationTimeoutInSec (default 10s)
- Fix type conversion: decimal→double should be numeric/decimal→decimal
Comment on lines +22 to +23
* CDC Sink is the reverse of ETL: instead of pushing data _from_ RavenDB _to_ SQL,
CDC Sink _pulls_ data _from_ SQL _into_ RavenDB.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SQL -> relational database. Though it could suggest that ETLs support relational databases only

The relational database is the source of truth; RavenDB receives a continuously-updated
document model derived from it.

* CDC Sink maps normalized relational tables into rich, nested RavenDB documents -

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think "rich, nested" means much in this context

Comment on lines +50 to +53
* **Migrate from SQL to RavenDB**
Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB
documents where an Order contains embedded LineItems and a reference to the Customer -
automatically and continuously, without changing your SQL application.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting isn't displayed correctly - no newline

without requiring changes to the source system.

* **Migrate from SQL to RavenDB**
Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No point mentioning Northwind tables here explicitly

Comment on lines +106 to +122
1. **Create** - Define the task in Studio or via the Client API
Specify the connection string, table mappings, and transformation options

2. **Verify** - CDC Sink verifies the source database is properly configured
Checks permissions, replication prerequisites, and table configuration

3. **Initial Load** - Full table scan populates RavenDB with current data
Progress is tracked per-table and persists across restarts

4. **Stream** - Real-time change streaming begins
All INSERTs, UPDATEs, and DELETEs are applied to RavenDB documents as they occur

5. **Monitor** - View statistics, errors, and progress in Studio

6. **Retire** - Delete the task in RavenDB when no longer needed
PostgreSQL artifacts (replication slot, publication) must be cleaned up by
the database administrator separately

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Newlines

Comment on lines +121 to +122
PostgreSQL artifacts (replication slot, publication) must be cleaned up by
the database administrator separately

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be an admonition I think

@@ -0,0 +1 @@
{"position": 10, "label": "Examples"}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting

@@ -0,0 +1 @@
{"position": 16, "label": "PostgreSQL"}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting

@@ -0,0 +1 @@
{"position": 2, "label": "CDC Sink"}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting

After verification, CDC Sink creates the necessary change-tracking infrastructure
in the source database, then begins the initial load.

### PostgreSQL

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't mention other supported providers


When a CDC Sink task starts, it verifies that the source database is properly configured
before doing anything else. If any check fails, CDC Sink reports the exact issue and the
SQL an administrator needs to run to fix it. The task does not start until all checks pass.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really, really don't like that there's a generic "SQL" thrown all around these articles - we should be precise about what are we talking about

re-scanning the entire table.

**Batch pipelining:** While one batch is being written to RavenDB, the next batch is
being read from the source database, keeping both systems busy.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not the best wording


<Tabs>
<TabItem value="sql" label="sql">
<CodeBlock language="sql">

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be capitalized

Comment on lines +203 to +204
To apply configuration changes to **all** documents (not just new events), delete
the CDC Sink task and recreate it. The new task will perform a fresh initial load,

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we have a restart option?

</TabItem>
</Tabs>

**Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the escaped characters?


## Connection String

Use a `SqlConnectionString` with `FactoryName` set to `"System.Data.SqlClient"` or

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of them is obsolete

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants