RDoc-3836 CDC documentation#2539
Conversation
Adds full CDC Sink ongoing task documentation in Docusaurus MDX format: - 16 core pages: overview, how-it-works, schema-design, embedded-tables, linked-tables, column-mapping, patching, delete-strategies, property-retention, attachment-handling, configuration-reference, api-reference, monitoring, failover-and-consistency, troubleshooting, server-configuration - 9 PostgreSQL pages: prerequisites-checklist, wal-configuration, permissions-and-roles, initial-setup, replica-identity, replica-identity-manual-setup, cleanup-and-maintenance, monitoring-postgres, studio-ui - 4 PostgreSQL examples: simple-migration, denormalization, event-sourcing, complex-nesting - 1 SQL Server stub: overview - 4 _category_.json navigation files
…al features
- Replace ColumnsMapping (Dictionary) + AttachmentNameMapping (Dictionary) with
unified Columns list of CdcColumnMapping { Column, Name, Type } across all files
- Add CdcColumnType enum documentation (Default, Json, Attachment)
- Add REST API endpoints table to configuration-reference
- Add CdcSink.PollIntervalInSec to server-configuration
- Add error handling details to monitoring (threshold, fallback, exponential backoff)
- Add ALTER PUBLICATION auto-fix note to postgres/initial-setup
- Fix how-it-works: sequential scan description, Child Before Parent section
- Fix Startup and Verification: split into per-database subsections
- Update all prose references from ColumnsMapping to Columns list
…chment handling - Replace all new() shorthand with new CdcColumnMapping() across all files - attachment-handling: clarify that text columns (text, nvarchar, etc.) as well as binary columns can use Type = CdcColumnType.Attachment
… uses application/octet-stream
…$old documentation - Add postgres/type-mapping.mdx: full reference table of PostgreSQL column types and their JavaScript/CLR equivalents (scalars, arrays, json/jsonb, bytea, pgvector) - Add patching.mdx "$row and $old: Names and Types" - Fix cleanup-and-maintenance.mdx: replace obsolete "Configuration Changes That Rename Slots" section (described hash-based naming, no longer accurate) with correct "Slot and Publication Names Are Immutable" section reflecting enforced immutability
- Name → CollectionName on CdcSinkTableConfig across all files - Remove Type from CdcSinkLinkedTableConfig (linked tables have no relation type) - Remove Disabled from CdcSinkEmbeddedTableConfig, add LinkedTables - Add FactoryName table (Npgsql, SqlClient, MySql) to configuration-reference - Add CdcColumnMapping and CdcColumnType reference sections - Add put(id, document) and del(id) to patch capabilities - Add JSON Columns section to column-mapping - Remove non-existent Array References section from linked-tables - Remove non-existent Disabling an Embedded Table section from embedded-tables - Update server-configuration descriptions (MaxBatchSize, MaxFallbackTimeInSec, PollIntervalInSec applies to SQL Server only) - Fix licensing link in overview - Add SQL Server and MySQL/MariaDB as supported source databases
- postgres/overview.mdx: connection string, logical replication explanation, prerequisites summary, section index - sql-server/overview.mdx: expand from stub to full page with connection string, CDC prerequisites, polling behavior, SourceTableSchema default - mysql/overview.mdx + _category_.json: connection string, binlog prerequisites, streaming behavior, required privileges
…roubleshooting sections - Source Schema Changes: how each database engine handles DDL changes on source tables while CDC Sink is running (adding/removing/renaming columns, SQL Server capture instance limitations) - Partial Export/Import and State Loss: @cdc-states collection, recovery guidance, SkipInitialLoad workaround, LSN editing risks
…update behavior Broken links fixed: - api-reference: send-multiple-operations → what-are-operations - attachment-handling: what-are-attachments → attachments/overview - column-mapping, patching: postgres/type-mapping.mdx (nonexistent) → cross-reference to patching.mdx#row-column-types PR comment fixes: - MySQL overview: rename MyMySqlConnection → MySqlConnection - Postgres overview: slot/publication names are GUID-based on first use, not deterministic hash-based New content: - how-it-works: "Updating the Task Configuration" section explaining that config changes only apply to new CDC events going forward. Existing documents are not retroactively re-processed. To apply changes to all documents, delete and recreate the task.
Move schema change documentation from troubleshooting into its own page with per-engine detail: - PostgreSQL: auto-detects via RelationMessage, most resilient - MySQL: detects via TableMapEvent column types, auto-recovers - SQL Server: requires explicit capture instance procedure (create new instance, drain old, then drop) - Quick reference table, SQL examples, recovery mechanism - Troubleshooting retains a short summary with link to the new page
MySQL CDC detects changes by column position. Compound ALTER TABLE statements (add + drop, ADD COLUMN ... AFTER ...) cause positional shifts that are hard to resolve. Apply one change at a time and let CDC Sink catch up between each.
Fix typo, missing heading, misleading wording, and inconsistent descriptions flagged during code review.
…etting - Rename CdcSink.PollIntervalInSec to CdcSink.SqlServer.PollIntervalInSec to match actual code (3 files) - Add missing CdcSink.Postgres.ReplicationTimeoutInSec (default 10s) - Fix type conversion: decimal→double should be numeric/decimal→decimal
…y endpoint with /schema
…scribe real backoff
| * CDC Sink is the reverse of ETL: instead of pushing data _from_ RavenDB _to_ SQL, | ||
| CDC Sink _pulls_ data _from_ SQL _into_ RavenDB. |
There was a problem hiding this comment.
SQL -> relational database. Though it could suggest that ETLs support relational databases only
| The relational database is the source of truth; RavenDB receives a continuously-updated | ||
| document model derived from it. | ||
|
|
||
| * CDC Sink maps normalized relational tables into rich, nested RavenDB documents - |
There was a problem hiding this comment.
I don't think "rich, nested" means much in this context
| * **Migrate from SQL to RavenDB** | ||
| Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB | ||
| documents where an Order contains embedded LineItems and a reference to the Customer - | ||
| automatically and continuously, without changing your SQL application. |
There was a problem hiding this comment.
Formatting isn't displayed correctly - no newline
| without requiring changes to the source system. | ||
|
|
||
| * **Migrate from SQL to RavenDB** | ||
| Transform normalized SQL tables (orders, order_lines, customers) into rich RavenDB |
There was a problem hiding this comment.
No point mentioning Northwind tables here explicitly
| 1. **Create** - Define the task in Studio or via the Client API | ||
| Specify the connection string, table mappings, and transformation options | ||
|
|
||
| 2. **Verify** - CDC Sink verifies the source database is properly configured | ||
| Checks permissions, replication prerequisites, and table configuration | ||
|
|
||
| 3. **Initial Load** - Full table scan populates RavenDB with current data | ||
| Progress is tracked per-table and persists across restarts | ||
|
|
||
| 4. **Stream** - Real-time change streaming begins | ||
| All INSERTs, UPDATEs, and DELETEs are applied to RavenDB documents as they occur | ||
|
|
||
| 5. **Monitor** - View statistics, errors, and progress in Studio | ||
|
|
||
| 6. **Retire** - Delete the task in RavenDB when no longer needed | ||
| PostgreSQL artifacts (replication slot, publication) must be cleaned up by | ||
| the database administrator separately |
| PostgreSQL artifacts (replication slot, publication) must be cleaned up by | ||
| the database administrator separately |
There was a problem hiding this comment.
Should be an admonition I think
| @@ -0,0 +1 @@ | |||
| {"position": 10, "label": "Examples"} | |||
| @@ -0,0 +1 @@ | |||
| {"position": 16, "label": "PostgreSQL"} | |||
| @@ -0,0 +1 @@ | |||
| {"position": 2, "label": "CDC Sink"} | |||
| After verification, CDC Sink creates the necessary change-tracking infrastructure | ||
| in the source database, then begins the initial load. | ||
|
|
||
| ### PostgreSQL |
There was a problem hiding this comment.
We don't mention other supported providers
|
|
||
| When a CDC Sink task starts, it verifies that the source database is properly configured | ||
| before doing anything else. If any check fails, CDC Sink reports the exact issue and the | ||
| SQL an administrator needs to run to fix it. The task does not start until all checks pass. |
There was a problem hiding this comment.
I really, really don't like that there's a generic "SQL" thrown all around these articles - we should be precise about what are we talking about
| re-scanning the entire table. | ||
|
|
||
| **Batch pipelining:** While one batch is being written to RavenDB, the next batch is | ||
| being read from the source database, keeping both systems busy. |
There was a problem hiding this comment.
Maybe not the best wording
|
|
||
| <Tabs> | ||
| <TabItem value="sql" label="sql"> | ||
| <CodeBlock language="sql"> |
| To apply configuration changes to **all** documents (not just new events), delete | ||
| the CDC Sink task and recreate it. The new task will perform a fresh initial load, |
There was a problem hiding this comment.
Don't we have a restart option?
| </TabItem> | ||
| </Tabs> | ||
|
|
||
| **Document ID generation:** `\{CollectionName\}/\{pk1\}/\{pk2\}/...` |
There was a problem hiding this comment.
Why the escaped characters?
|
|
||
| ## Connection String | ||
|
|
||
| Use a `SqlConnectionString` with `FactoryName` set to `"System.Data.SqlClient"` or |
There was a problem hiding this comment.
One of them is obsolete
Issue link
https://issues.hibernatingrhinos.com/issue/RDoc-3836/CDC-Feature-Documentation
Additional description
Documentation for the CDC feature
Type of change
/templatesor readme)Changes in docs URLs
/scripts/redirects.jsonfile, setDocuments MovedPR label)Changes in UX/UI