Skip to content

ticdc: add scheduler configuration recommendations for table split mode#22483

Merged
ti-chi-bot[bot] merged 21 commits intopingcap:masterfrom
hongyunyan:0225
Feb 26, 2026
Merged

ticdc: add scheduler configuration recommendations for table split mode#22483
ti-chi-bot[bot] merged 21 commits intopingcap:masterfrom
hongyunyan:0225

Conversation

@hongyunyan
Copy link
Contributor

@hongyunyan hongyunyan commented Feb 25, 2026

Update TOC and the ticdc integration guide to include ksqlDB and SQL Server. The TOC entry for the Confluent/Snowflake integration is expanded with nested links for Confluent Cloud, Snowflake, ksqlDB, and SQL Server, and the document title, H1, and summary in ticdc/integrate-confluent-using-ticdc.md are updated to reflect the broader integration coverage.

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot bot added missing-translation-status This PR does not have translation status info. size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request synchronizes the English documentation with recent updates made in the Chinese documentation, specifically by adding detailed configuration recommendations for TiCDC's table split mode within the new architecture. The changes aim to provide users with consistent and up-to-date guidance on optimizing TiCDC settings for table splitting, thereby improving replication efficiency and stability. Additionally, a new document has been added to comprehensively describe TiCDC's data synchronization capabilities.

Highlights

  • New Configuration Recommendations Section: Introduced a new section in the ticdc-architecture.md document titled 'Configuration recommendations for table split mode'.
  • Scheduler Parameter Documentation: Documented recommended usage and default values for scheduler.region-threshold, scheduler.region-count-per-span, and scheduler.write-key-threshold.
  • Critical Configuration Caution: Added a crucial caution regarding the relationship between scheduler.region-threshold and scheduler.region-count-per-span to prevent repeated scheduling and increased replication latency.
  • New Data Synchronization Capabilities Document: Added a new comprehensive document, ticdc-data-sync-capabilities.md, detailing TiCDC's data synchronization principles, consistency guarantees, and DDL/DML handling.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • ticdc/ticdc-architecture.md
    • Added a new section 'Configuration recommendations for table split mode'.
    • Detailed recommended values and usage for scheduler.region-threshold, scheduler.region-count-per-span, and scheduler.write-key-threshold.
    • Included a warning about scheduler.region-threshold being less than scheduler.region-count-per-span to avoid issues.
  • ticdc/ticdc-data-sync-capabilities.md
    • New file added: 'TiCDC's Data Synchronization Capability'.
    • Introduced background on TiCDC's role in real-time data synchronization.
    • Described how TiCDC monitors Raft Log, generates logical operations, and provides consistency guarantees.
    • Listed supported downstreams for data synchronization.
    • Clarified synchronization support for DDL/DML, DQL/DCL, system tables, and temporary tables.
    • Explained handling of index and foreign key constraints, including asynchronous DDL execution for indexes.
    • Stated that TiCDC checks internal integrity but not downstream constraint compliance.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds configuration recommendations for the table split mode and introduces a new document about TiCDC's data synchronization capabilities. The changes are valuable for keeping the English documentation aligned with the Chinese version. I have provided several suggestions, primarily for the new document, to enhance clarity, correct minor grammatical issues, and ensure adherence to the repository's documentation style guide. Key suggestions include fixing a typo, correcting a broken link, and rephrasing sentences for better readability and consistency.


3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks.

4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS](/ticdc/ticdc-sink-to-cloud-storage.md).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There's a missing closing parenthesis in the Markdown link, which breaks the link for storage services. I've also added "and" for grammatical correctness.

Suggested change
4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS](/ticdc/ticdc-sink-to-cloud-storage.md).
4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), and [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS)](/ticdc/ticdc-sink-to-cloud-storage.md).
References
  1. The style guide requires correcting grammar and ensuring technical accuracy, which includes working links. (link)

Comment on lines 2 to 3
title: TiCDC's Data Synchronization Capability
summary: Learn the TiCDC's data synchronization capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For better clarity and style, I suggest minor tweaks to the title and summary.

Suggested change
title: TiCDC's Data Synchronization Capability
summary: Learn the TiCDC's data synchronization capabilities.
title: TiCDC Data Synchronization Capabilities
summary: Learn about TiCDC's data synchronization capabilities.
References
  1. The style guide prioritizes clarity and simplicity in documentation. (link)


# TiCDC's Data Synchronization Capability

## Backgroud
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

There's a typo in the heading.

Suggested change
## Backgroud
## Background
References
  1. The style guide requires correcting spelling mistakes. (link)


2. TiCDC generates logical operations (such as INSERT/UPDATE/DELETE) that are equivalent to SQL semantics, rather than restoring the original SQL executed upstream one by one. Refer to [TiCDC's Implementation Principles for Processing Data Changes](/ticdc/ticdc-overview.md#implementation-of-processing-data-changes).

3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

The phrase "provides the guarantee of" can be simplified to "guarantees". Also, for consistency, "final consistency" should be "eventual consistency".

Suggested change
3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks.
3. TiCDC guarantees eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides eventual consistency in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks.
References
  1. The style guide recommends avoiding unnecessary words and using consistent terminology. (link)


## Data synchronization capabilities of TiCDC

1. TiCDC supports synchronizing DDL and DML statements executed upstream, but does not synchronize DDL and DML executed in upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created in the upstream.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For better clarity and consistency, I suggest minor wording improvements.

Suggested change
1. TiCDC supports synchronizing DDL and DML statements executed upstream, but does not synchronize DDL and DML executed in upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created in the upstream.
1. TiCDC supports synchronizing DDL and DML statements executed on the upstream, but does not synchronize DDL and DML executed on upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created on the upstream.
References
  1. The style guide emphasizes clarity and readability. (link)


2. TiCDC does not support synchronizing DQL (Data Query Language) statements, nor does it support synchronizing DCL (Data Control Language) statements.

3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

According to the style guide, it's better to use the present tense and simplify wording for better readability.

Suggested change
3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls).
3. TiCDC supports synchronizing index settings on the upstream table through DDL (`add index`, `create index`). To reduce the impact on changefeed synchronization delay, if the downstream is TiDB, TiCDC [asynchronously executes the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls).
References
  1. The style guide recommends preferring present tense and avoiding unnecessary words. (link)


3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls).

4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

According to the style guide, it's better to use the present tense and address the user as "you".

Suggested change
4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled.
4. For the foreign key constraints set on the table, TiCDC synchronizes the corresponding DDL (`add foreign key`) statements but is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, you need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled.
References
  1. The style guide recommends writing in the second person ('you') and preferring the present tense. (link)


4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled.

5. TiCDC only checks the integrity of the upstream changes received internally, and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC will report an error when writing downstream. No newline at end of file
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

According to the style guide, it's better to use the present tense for improved readability.

Suggested change
5. TiCDC only checks the integrity of the upstream changes received internally, and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC will report an error when writing downstream.
5. TiCDC only checks the integrity of the upstream changes received internally and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC reports an error when writing to the downstream.
References
  1. The style guide recommends preferring the present tense. (link)

@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels Feb 25, 2026
hongyunyan and others added 3 commits February 25, 2026 13:27
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 25, 2026

@3AceShowHand: adding LGTM is restricted to approvers and reviewers in OWNERS files.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@hfxsd hfxsd self-assigned this Feb 25, 2026
@hfxsd hfxsd added translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn. needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. and removed missing-translation-status This PR does not have translation status info. labels Feb 25, 2026
@hfxsd hfxsd requested a review from qiancai February 26, 2026 07:48
Copy link
Collaborator

@qiancai qiancai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LGTM

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Feb 26, 2026
hfxsd and others added 2 commits February 26, 2026 16:29
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Feb 26, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 26, 2026

[LGTM Timeline notifier]

Timeline:

  • 2026-02-26 08:22:22.996640347 +0000 UTC m=+343215.511434956: ☑️ agreed by qiancai.
  • 2026-02-26 08:48:03.256564096 +0000 UTC m=+344755.771358715: ☑️ agreed by hfxsd.

@hfxsd
Copy link
Collaborator

hfxsd commented Feb 26, 2026

/approve

@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: hfxsd

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the approved label Feb 26, 2026
@ti-chi-bot ti-chi-bot bot merged commit e3a2d83 into pingcap:master Feb 26, 2026
10 of 12 checks passed
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-8.5: #22501.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved lgtm needs-cherry-pick-release-8.5 Should cherry pick this PR to release-8.5 branch. size/S Denotes a PR that changes 10-29 lines, ignoring generated files. translation/from-docs-cn This PR is translated from a PR in pingcap/docs-cn.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants