Skip to content

v8.5.6 ticdc: add scheduler configuration recommendations for table split mode (#22483)#22501

Open
ti-chi-bot wants to merge 16 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-22483-to-release-8.5
Open

v8.5.6 ticdc: add scheduler configuration recommendations for table split mode (#22483)#22501
ti-chi-bot wants to merge 16 commits intopingcap:release-8.5from
ti-chi-bot:cherry-pick-22483-to-release-8.5

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #22483

Update TOC and the ticdc integration guide to include ksqlDB and SQL Server. The TOC entry for the Confluent/Snowflake integration is expanded with nested links for Confluent Cloud, Snowflake, ksqlDB, and SQL Server, and the document title, H1, and summary in ticdc/integrate-confluent-using-ticdc.md are updated to reflect the broader integration coverage.

First-time contributors' checklist

What is changed, added or deleted? (Required)

Which TiDB version(s) do your changes apply to? (Required)

Tips for choosing the affected version(s):

By default, CHOOSE MASTER ONLY so your changes will be applied to the next TiDB major or minor releases. If your PR involves a product feature behavior change or a compatibility change, CHOOSE THE AFFECTED RELEASE BRANCH(ES) AND MASTER.

For details, see tips for choosing the affected versions.

  • master (the latest development version)
  • v9.0 (TiDB 9.0 versions)
  • v8.5 (TiDB 8.5 versions)
  • v8.1 (TiDB 8.1 versions)
  • v7.5 (TiDB 7.5 versions)
  • v7.1 (TiDB 7.1 versions)
  • v6.5 (TiDB 6.5 versions)
  • v6.1 (TiDB 6.1 versions)
  • v5.4 (TiDB 5.4 versions)

What is the related PR or file link(s)?

Do your changes match any of the following descriptions?

  • Delete files
  • Change aliases
  • Need modification after applied to another branch
  • Might cause conflicts after applied to another branch

@ti-chi-bot ti-chi-bot added lgtm size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR. labels Feb 26, 2026
@ti-chi-bot
Copy link

ti-chi-bot bot commented Feb 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oreoxmt for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @ti-chi-bot, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the TiCDC documentation by providing crucial guidance on configuring table-level task splitting. It introduces recommended settings for optimizing performance and understanding the behavior of the new TiCDC architecture, ensuring users can effectively manage data replication.

Highlights

  • TiCDC Table Splitting Configurations: Introduced a new section in the TiCDC architecture documentation detailing recommended configurations for table-level task splitting, including guidance on scheduler.region-threshold, scheduler.region-count-per-span, and scheduler.write-key-threshold.
  • New Configuration Parameter Documentation: Added documentation for the scheduler.region-count-per-span parameter in the TiCDC changefeed configuration guide, explaining its purpose and default value in the new architecture.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • ticdc/ticdc-architecture.md
    • Added a new section "Recommended configurations for table-level task splitting" to provide detailed guidance on scheduler parameters.
    • Updated the note regarding MySQL sink changefeeds to clarify "table-level task splitting mode".
  • ticdc/ticdc-changefeed-config.md
    • Documented the scheduler.region-count-per-span parameter, including its introduction version, purpose, and default value.
Activity
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds helpful documentation on the recommended scheduler configurations for table-level task splitting in TiCDC's new architecture. The changes are clear and provide valuable guidance for users. I've made a few minor suggestions to improve wording for better clarity and consistency, following the repository's documentation style guide.

In table split mode, pay attention to the following settings:

- [`scheduler.region-threshold`](/ticdc/ticdc-changefeed-config.md#region-threshold): the default value is `10000`. When the number of Regions in a table exceeds this threshold, TiCDC splits the table. For tables with relatively few Regions but high overall write throughput, you can reduce this value appropriately. This parameter must be greater than or equal to `scheduler.region-count-per-span`. Otherwise, tasks might be rescheduled repeatedly, which increases replication latency.
- [`scheduler.region-count-per-span`](/ticdc/ticdc-changefeed-config.md#region-count-per-span-new-in-v854): the default value is `100`. During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

To improve clarity and use active voice as recommended by the style guide, consider rephrasing this sentence. Also, for consistency with the rest of the document, Regions should be capitalized.

Suggested change
- [`scheduler.region-count-per-span`](/ticdc/ticdc-changefeed-config.md#region-count-per-span-new-in-v854): the default value is `100`. During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` regions.
- [`scheduler.region-count-per-span`](/ticdc/ticdc-changefeed-config.md#region-count-per-span-new-in-v854): the default value is `100`. During changefeed initialization, TiCDC splits tables that meet the split conditions according to this parameter. After splitting, each sub-table contains at most `region-count-per-span` Regions.
References
  1. The style guide recommends using active voice over passive voice (line 43) and using consistent terminology (line 25). The suggested change improves on both aspects. (link)


- [`scheduler.region-threshold`](/ticdc/ticdc-changefeed-config.md#region-threshold): the default value is `10000`. When the number of Regions in a table exceeds this threshold, TiCDC splits the table. For tables with relatively few Regions but high overall write throughput, you can reduce this value appropriately. This parameter must be greater than or equal to `scheduler.region-count-per-span`. Otherwise, tasks might be rescheduled repeatedly, which increases replication latency.
- [`scheduler.region-count-per-span`](/ticdc/ticdc-changefeed-config.md#region-count-per-span-new-in-v854): the default value is `100`. During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` regions.
- [`scheduler.write-key-threshold`](/ticdc/ticdc-changefeed-config.md#write-key-threshold): the default value is `0` (disabled). When the sink write throughput of a table exceeds this threshold, TiCDC triggers table splitting. In most cases, keep this parameter to `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

For better readability, at is more idiomatic than to in this context.

Suggested change
- [`scheduler.write-key-threshold`](/ticdc/ticdc-changefeed-config.md#write-key-threshold): the default value is `0` (disabled). When the sink write throughput of a table exceeds this threshold, TiCDC triggers table splitting. In most cases, keep this parameter to `0`.
- [`scheduler.write-key-threshold`](/ticdc/ticdc-changefeed-config.md#write-key-threshold): the default value is `0` (disabled). When the sink write throughput of a table exceeds this threshold, TiCDC triggers table splitting. In most cases, keep this parameter at `0`.
References
  1. The style guide recommends correcting grammar mistakes for better readability (line 20). (link)


#### `region-count-per-span` <span class="version-mark">New in v8.5.4</span>

- Introduced in the [TiCDC new architecture](/ticdc/ticdc-architecture.md). During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` Regions.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

To improve clarity and use active voice as recommended by the style guide, consider rephrasing this sentence.

Suggested change
- Introduced in the [TiCDC new architecture](/ticdc/ticdc-architecture.md). During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` Regions.
- Introduced in the [TiCDC new architecture](/ticdc/ticdc-architecture.md). During changefeed initialization, TiCDC splits tables that meet the split conditions according to this parameter. After splitting, each sub-table contains at most `region-count-per-span` Regions.
References
  1. The style guide recommends using active voice over passive voice (line 43) and avoiding unnecessary words (line 24). The suggested change improves on both aspects. (link)

#### `region-count-per-span` <span class="version-mark">New in v8.5.4</span>

- Introduced in the [TiCDC new architecture](/ticdc/ticdc-architecture.md). During changefeed initialization, tables that meet the split conditions are split according to this parameter. After splitting, each split sub-table contains at most `region-count-per-span` Regions.
- Default value: `100`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

low

Please remove the trailing space for consistent formatting.

Suggested change
- Default value: `100`
- Default value: `100`
References
  1. The style guide recommends correcting grammar, spelling, and punctuation mistakes (line 20), which extends to minor formatting issues like trailing spaces. (link)

@hfxsd hfxsd changed the title ticdc: add scheduler configuration recommendations for table split mode (#22483) v8.5.6 ticdc: add scheduler configuration recommendations for table split mode (#22483) Feb 26, 2026
@hfxsd hfxsd assigned hfxsd and unassigned hongyunyan Feb 26, 2026
@hfxsd hfxsd added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 26, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm size/S Denotes a PR that changes 10-29 lines, ignoring generated files. type/cherry-pick-for-release-8.5 This PR is cherry-picked to release-8.5 from a source PR.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants