ticdc: add scheduler configuration recommendations for table split mode#22483
ticdc: add scheduler configuration recommendations for table split mode#22483ti-chi-bot[bot] merged 21 commits intopingcap:masterfrom
Conversation
Summary of ChangesHello @hongyunyan, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request synchronizes the English documentation with recent updates made in the Chinese documentation, specifically by adding detailed configuration recommendations for TiCDC's table split mode within the new architecture. The changes aim to provide users with consistent and up-to-date guidance on optimizing TiCDC settings for table splitting, thereby improving replication efficiency and stability. Additionally, a new document has been added to comprehensively describe TiCDC's data synchronization capabilities. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds configuration recommendations for the table split mode and introduces a new document about TiCDC's data synchronization capabilities. The changes are valuable for keeping the English documentation aligned with the Chinese version. I have provided several suggestions, primarily for the new document, to enhance clarity, correct minor grammatical issues, and ensure adherence to the repository's documentation style guide. Key suggestions include fixing a typo, correcting a broken link, and rephrasing sentences for better readability and consistency.
|
|
||
| 3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks. | ||
|
|
||
| 4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS](/ticdc/ticdc-sink-to-cloud-storage.md). |
There was a problem hiding this comment.
There's a missing closing parenthesis in the Markdown link, which breaks the link for storage services. I've also added "and" for grammatical correctness.
| 4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS](/ticdc/ticdc-sink-to-cloud-storage.md). | |
| 4. TiCDC supports synchronizing data to multiple downstreams, including [TiDB and MySQL-compatible databases](/ticdc/ticdc-sink-to-mysql.md), [Kafka](/ticdc/ticdc-sink-to-kafka.md), [Pulsar](/ticdc/ticdc-sink-to-pulsar), and [storage services (Amazon S3, GCS, Azure Blob Storage, and NFS)](/ticdc/ticdc-sink-to-cloud-storage.md). |
References
- The style guide requires correcting grammar and ensuring technical accuracy, which includes working links. (link)
| title: TiCDC's Data Synchronization Capability | ||
| summary: Learn the TiCDC's data synchronization capabilities. |
There was a problem hiding this comment.
For better clarity and style, I suggest minor tweaks to the title and summary.
| title: TiCDC's Data Synchronization Capability | |
| summary: Learn the TiCDC's data synchronization capabilities. | |
| title: TiCDC Data Synchronization Capabilities | |
| summary: Learn about TiCDC's data synchronization capabilities. |
References
- The style guide prioritizes clarity and simplicity in documentation. (link)
|
|
||
| # TiCDC's Data Synchronization Capability | ||
|
|
||
| ## Backgroud |
There was a problem hiding this comment.
There's a typo in the heading.
| ## Backgroud | |
| ## Background |
References
- The style guide requires correcting spelling mistakes. (link)
|
|
||
| 2. TiCDC generates logical operations (such as INSERT/UPDATE/DELETE) that are equivalent to SQL semantics, rather than restoring the original SQL executed upstream one by one. Refer to [TiCDC's Implementation Principles for Processing Data Changes](/ticdc/ticdc-overview.md#implementation-of-processing-data-changes). | ||
|
|
||
| 3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks. |
There was a problem hiding this comment.
The phrase "provides the guarantee of" can be simplified to "guarantees". Also, for consistency, "final consistency" should be "eventual consistency".
| 3. TiCDC provides the guarantee of eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides the final consistency guarantee in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks. | |
| 3. TiCDC guarantees eventual consistency of transactions. [redo log](/ticdc/ticdc-sink-to-mysql.md#eventually-consistent-replication-in-disaster-scenarios) provides eventual consistency in disaster recovery scenarios. [Syncpoint](/ticdc/ticdc-upstream-downstream-check.md#enable-syncpoint) provides consistent snapshot reads and data consistency checks. |
References
- The style guide recommends avoiding unnecessary words and using consistent terminology. (link)
|
|
||
| ## Data synchronization capabilities of TiCDC | ||
|
|
||
| 1. TiCDC supports synchronizing DDL and DML statements executed upstream, but does not synchronize DDL and DML executed in upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created in the upstream. |
There was a problem hiding this comment.
For better clarity and consistency, I suggest minor wording improvements.
| 1. TiCDC supports synchronizing DDL and DML statements executed upstream, but does not synchronize DDL and DML executed in upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created in the upstream. | |
| 1. TiCDC supports synchronizing DDL and DML statements executed on the upstream, but does not synchronize DDL and DML executed on upstream system tables (including `mysql.*` and `information_schema.*`), nor does it synchronize temporary tables created on the upstream. |
References
- The style guide emphasizes clarity and readability. (link)
|
|
||
| 2. TiCDC does not support synchronizing DQL (Data Query Language) statements, nor does it support synchronizing DCL (Data Control Language) statements. | ||
|
|
||
| 3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls). |
There was a problem hiding this comment.
According to the style guide, it's better to use the present tense and simplify wording for better readability.
| 3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls). | |
| 3. TiCDC supports synchronizing index settings on the upstream table through DDL (`add index`, `create index`). To reduce the impact on changefeed synchronization delay, if the downstream is TiDB, TiCDC [asynchronously executes the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls). |
References
- The style guide recommends preferring present tense and avoiding unnecessary words. (link)
|
|
||
| 3. TiCDC supports synchronizing the settings of the index in the upstream table through DDL (`add index`, `create index`), and in order to reduce the impact on the synchronization delay of Changefeed, if the downstream is TiDB, TiCDC will [asynchronously execute the DDL operations of creating and adding indexes](/ticdc/ticdc-ddl.md#asynchronous-execution-of-add-index-and-create-index-ddls). | ||
|
|
||
| 4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled. |
There was a problem hiding this comment.
According to the style guide, it's better to use the present tense and address the user as "you".
| 4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled. | |
| 4. For the foreign key constraints set on the table, TiCDC synchronizes the corresponding DDL (`add foreign key`) statements but is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, you need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled. |
References
- The style guide recommends writing in the second person ('you') and preferring the present tense. (link)
|
|
||
| 4. For the foreign key constraints set in the table, TiCDC will synchronize the corresponding DDL (`add foreign key`) statements, but TiCDC is not responsible for synchronizing the settings of upstream system variables, such as [foreign_key_checks](/system-variables.md#foreign_key_checks). Therefore, customers need to set appropriate system variables in the downstream to determine whether the downstream foreign key constraint check is enabled. | ||
|
|
||
| 5. TiCDC only checks the integrity of the upstream changes received internally, and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC will report an error when writing downstream. No newline at end of file |
There was a problem hiding this comment.
According to the style guide, it's better to use the present tense for improved readability.
| 5. TiCDC only checks the integrity of the upstream changes received internally, and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC will report an error when writing downstream. | |
| 5. TiCDC only checks the integrity of the upstream changes received internally and does not participate in checking whether the data changes meet the various downstream constraints. If a data change that does not meet the downstream constraints is encountered, TiCDC reports an error when writing to the downstream. |
References
- The style guide recommends preferring the present tense. (link)
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com>
|
@3AceShowHand: adding LGTM is restricted to approvers and reviewers in OWNERS files. DetailsIn response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Co-authored-by: Grace Cai <qqzczy@126.com>
|
/approve |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: hfxsd The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
In response to a cherrypick label: new pull request created to branch |
Update TOC and the ticdc integration guide to include ksqlDB and SQL Server. The TOC entry for the Confluent/Snowflake integration is expanded with nested links for Confluent Cloud, Snowflake, ksqlDB, and SQL Server, and the document title, H1, and summary in ticdc/integrate-confluent-using-ticdc.md are updated to reflect the broader integration coverage.
First-time contributors' checklist
What is changed, added or deleted? (Required)
Which TiDB version(s) do your changes apply to? (Required)
Tips for choosing the affected version(s):
By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.
For details, see tips for choosing the affected versions.
What is the related PR or file link(s)?
Do your changes match any of the following descriptions?