Improve chunking strategy for tables with the composite PK by kamil-holubicki · Pull Request #28 · mysql/mysql-shell

kamil-holubicki · 2026-05-28T11:30:41Z

This PR addresses two issues:

Issue 1 (the big one):

PS-10413: Improve chunking strategy for tables with the composite PK

https://perconadev.atlassian.net/browse/PS-10413

Introduced an enhanced chunking algorithm.

Problem:
The original algorithm consider only 1st column of the primary key when
chunking the table for parallel dump. If the table contains
the composite PK it may happen that there is a huge amount of rows for
a given key part.
As the result, chunk sizes are not well-balanced.
Dumping process is delegated to parallel workers. Each chunk is dumped
by the separate thread. If there is a huge chunk and multiple small
chunks, all small chunks will be quickly processed in parallel, but
the huge one will use a thread for a long time, while other worker
threads are idle.

Solution:
Implemented chunking algorithms that uses other key parts to produce
table chunks.

Chunking Algorithm Overview

The chunking mechanism divides large table into manageable chunks to
enable parallel data extraction and optimal memory usage.
The algorithm supports two strategies:
ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh.

Integer Column Chunking:

For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL):
Phase 1: Range Expansion (Linear)**

Starts with an estimated step size based on:
index_range / estimated_chunks
Expands the search range until enough rows are found
(rows >= rows_per_chunk)
Stops if maximum range is reached or sufficient rows are found
Phase 2: Binary Search (Shrinking)**
If too many rows are found (rows > rows_per_chunk + accuracy)
Uses binary search to narrow the range to match target row count
Continues until: delta <= accuracy OR range shrinks to 1
Falls back to nested chunking if single value still exceeds row limit

Nested Chunking (Deep Chunking):

When a single value in the current key part exceeds rows_per_chunk:

Recursively chunks the next key part with boundary condition
Only applies if next column is optimizable (INT-like type)

Chunk Gluing (ENHANCED strategy only):

The Gluer class merges small chunks to optimize dump file count:

Accumulates consecutive chunks when row count < max_rows_cnt
Flushes when accumulated size > 3 * max_rows_cnt or at table end
Prevents fragmentation by combining undersized chunks
DummyGluer (for ORIGINAL) disables this optimization

Introduced dump configuration options:

adaptiveStepStrategy - strategy used for determining chunk boundaries
original - Default. Use the original implementation
enhanced - Use the new approach for calculations

maxKeyPrefixLength - limits the number of key parts used for chunking
(depth)
0 - Use the whole length of the key (up not compatible column)
Default: 1 to keep the original behavior

Isse 2 (the small one):

PS-10416: Calculate and send checksum header for uploads to support S3 Object Lock

https://perconadev.atlassian.net/browse/PS-10416

Object Lock feature of the AWS S3 requires Content-MD5 request header
to be present in the PUT request.

Added calculation of this header. It is calculated always, as it
simplifies the logic and does not cause any harm even if not needed.

…3 Object Lock https://perconadev.atlassian.net/browse/PS-10416 Object Lock feature of the AWS S3 requires Content-MD5 request header to be present in the PUT request. Added calculation of this header. It is calculated always, as it simplifies the logic and does not cause any harm even if not needed.

https://perconadev.atlassian.net/browse/PS-10413 Introduced enhanced chunking algorithm. Problem: The original algorithm consider only 1st column of the primary key when chunking the table for parallel dump. If the table contains the composite PK it may happen that there is a huge amount of rows for a given key part. As the result, chunk sizes are not well-balanced. Dumping process is delegated to parallel workers. Each chunk is dumped by the separate thread. If there is a huge chunk and multiple small chunks, all small chunks will be quickly processed in parallel, but the huge one will use a thread for a long time, while other worker threads are idle. Solution: Implemented chunking algorithms that uses other key parts to produce table chunks. Chunking Algorithm Overview The chunking mechanism divides large table into manageable chunks to enable parallel data extraction and optimal memory usage. The algorithm supports two strategies: ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh. Integer Column Chunking: For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL): Phase 1: Range Expansion (Linear)** - Starts with an estimated step size based on: index_range / estimated_chunks - Expands the search range until enough rows are found (rows >= rows_per_chunk) - Stops if maximum range is reached or sufficient rows are found Phase 2: Binary Search (Shrinking)** - If too many rows are found (rows > rows_per_chunk + accuracy) - Uses binary search to narrow the range to match target row count - Continues until: delta <= accuracy OR range shrinks to 1 - Falls back to nested chunking if single value still exceeds row limit Nested Chunking (Deep Chunking): When a single value in the current key part exceeds rows_per_chunk: - Recursively chunks the next key part with boundary condition - Only applies if next column is optimizable (INT-like type) Chunk Gluing (ENHANCED strategy only): The Gluer<T> class merges small chunks to optimize dump file count: - Accumulates consecutive chunks when row count < max_rows_cnt - Flushes when accumulated size > 3 * max_rows_cnt or at table end - Prevents fragmentation by combining undersized chunks - DummyGluer (for ORIGINAL) disables this optimization Introduced dump configuration options: adaptiveStepStrategy - strategy used for determining chunk boundaries original - Default. Use the original implementation enhanced - Use the new approach for calculations maxKeyPrefixLength - limits the number of key parts used for chunking (depth) 0 - Use the whole length of the key (up not compatible column) Default: 1 to keep the original behavior # Conflicts: # modules/util/dump/dump_options.cc

https://perconadev.atlassian.net/browse/PS-10898

…eStepStrategy: "enhanced" PS-10912: Partition table dumpInstance o/p shows incorrect no of rows w/ adaptiveStepStrategy: "enhanced" PS-10935: dumpInstance: rows written does not match for Unique Indexs w/ adaptiveStepStrategy: "enhanced" https://perconadev.atlassian.net/browse/PS-10897 https://perconadev.atlassian.net/browse/PS-10912 https://perconadev.atlassian.net/browse/PS-10935 Problem: When the last PK(0) is processed by the nested chunking, the nested chunk is the last chunk in the dump. In such a case, when we return from nested chunking logic, there is nothing else to be chunked on the top level. However, the top level logic was not aware of the above and attempted to dump the last chunk, which was the whole PK(0) key. Effectively PK(0) was dumped twice: the first time by nested chunking, the second time by the top level. Solution: Top level generates the last chunk which is empty. Generating the last chunk is required by the protocol.

…tegy: "enhanced" https://perconadev.atlassian.net/browse/PS-10922 Problem: When trying to estimate rows count in a given range, parsing of the EXPLAIN query result fails. This is because the original implementaton of parsing EXPLAIN output JSON does not cover all possible return values. In such a case exception is raised and execution stops with error. Solution: Added handling of the case when EXPLAIN output says 'zero_rows_aggregated', which means zero rows in a range.

…trategy & chunking nesting depth https://perconadev.atlassian.net/browse/PS-10933 Improved config options dependencies handling.

2. The project default language mode moved to C++23 in 9.7.0, which changes how 0 -> std::string is resolved, and that trips the deleted std::string(nullptr_t) constructor when Decimal is constructed from 0.

…g index in adaptive_step_v2 Problem: When chunking integer columns with adaptiveStepStrategy: "enhanced", adaptive_step_v2() asks the server for a per-range row-count estimate via EXPLAIN FORMAT=JSON SELECT COUNT(*) and uses that estimate to drive a binary chop on the chunking range. This relies on the estimate being roughly monotonic in the range width. In practice, on tables with composite keys and additional secondary indexes covering a leading key part, the optimizer can pick a different access path for the EXPLAIN'd COUNT(*) than the index the chunker is iterating on (e.g. a `ref` lookup on a shorter index that ignores the BETWEEN predicate on a later key part). When that happens, EXPLAIN returns a constant cardinality (~ rows for the leading key part) regardless of the BETWEEN range, while for narrower ranges it can flip to the primary key and report 0. The chop loop sees this 0/N flapping, its `left` cursor is raised on the first false "expand" probe, and the loop exits via `left >= right` with rows == 0 and a wide step. Because new_step != 1, the deep-chunking branch (chunk by next key part) is never entered and the chunk is emitted as one big slice. Solution: Pin the EXPLAIN COUNT(*) probe in adaptive_step_v2() to the same index the chunker is iterating on by appending a FORCE INDEX (...) clause after the table reference. With that, EXPLAIN's `rows` is the records_in_range estimate against the chunking index, monotone in the range width (modulo small dive noise). The chop converges to range 1 when the range really is too big, which lets the existing deep-chunking path engage on the next key part as designed. The change is intentionally narrow: * only adaptive_step_v2 (the "enhanced" strategy); * only the EXPLAIN probe (not the dump-data SELECT, not the boundary SELECTs in chunk_column / chunk_non_integer_column, not the original adaptive_step v1); * only the integer column path (adaptive_step_v2 is reachable only from chunk_integer_column). To carry the index name into the chunker, Instance_cache::Index gains a quoted_name() accessor populated via set_name() at build time from the index map key in the cache. A small helper force_index_clause() in dumper.cc formats the clause and returns "" when the table has no usable chunking index, so behavior is unchanged in that case.

mysql-oca-bot · 2026-05-28T13:55:27Z

Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment:
"I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it."
Thanks

kamil-holubicki added 9 commits April 23, 2026 12:54

PS-10898: Missing help for new options

ece24b1

https://perconadev.atlassian.net/browse/PS-10898

PS-10933: dumpInstance() when chunking:false skip messages for step s…

33881b4

…trategy & chunking nesting depth https://perconadev.atlassian.net/browse/PS-10933 Improved config options dependencies handling.

Addressed review comments.

d0ea7a0

1. Addressed review comments.

e5e57ab

2. The project default language mode moved to C++23 in 9.7.0, which changes how 0 -> std::string is resolved, and that trips the deleted std::string(nullptr_t) constructor when Decimal is constructed from 0.

kamil-holubicki changed the title ~~Ps 10413 and ps 10416 9.7~~ Improve chunking strategy for tables with the composite PK May 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve chunking strategy for tables with the composite PK#28

Improve chunking strategy for tables with the composite PK#28
kamil-holubicki wants to merge 9 commits into
mysql:9.7from
kamil-holubicki:PS-10413_and_PS-10416_9.7

kamil-holubicki commented May 28, 2026

Uh oh!

mysql-oca-bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kamil-holubicki commented May 28, 2026

Uh oh!

mysql-oca-bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants