Skip to content

Improve chunking strategy for tables with the composite PK#28

Open
kamil-holubicki wants to merge 9 commits into
mysql:9.7from
kamil-holubicki:PS-10413_and_PS-10416_9.7
Open

Improve chunking strategy for tables with the composite PK#28
kamil-holubicki wants to merge 9 commits into
mysql:9.7from
kamil-holubicki:PS-10413_and_PS-10416_9.7

Conversation

@kamil-holubicki
Copy link
Copy Markdown

This PR addresses two issues:

Issue 1 (the big one):

PS-10413: Improve chunking strategy for tables with the composite PK

https://perconadev.atlassian.net/browse/PS-10413

Introduced an enhanced chunking algorithm.

Problem:
The original algorithm consider only 1st column of the primary key when
chunking the table for parallel dump. If the table contains
the composite PK it may happen that there is a huge amount of rows for
a given key part.
As the result, chunk sizes are not well-balanced.
Dumping process is delegated to parallel workers. Each chunk is dumped
by the separate thread. If there is a huge chunk and multiple small
chunks, all small chunks will be quickly processed in parallel, but
the huge one will use a thread for a long time, while other worker
threads are idle.

Solution:
Implemented chunking algorithms that uses other key parts to produce
table chunks.

Chunking Algorithm Overview

The chunking mechanism divides large table into manageable chunks to
enable parallel data extraction and optimal memory usage.
The algorithm supports two strategies:
ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh.

Integer Column Chunking:

For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL):
Phase 1: Range Expansion (Linear)**

  • Starts with an estimated step size based on:
    index_range / estimated_chunks
  • Expands the search range until enough rows are found
    (rows >= rows_per_chunk)
  • Stops if maximum range is reached or sufficient rows are found
    Phase 2: Binary Search (Shrinking)**
  • If too many rows are found (rows > rows_per_chunk + accuracy)
  • Uses binary search to narrow the range to match target row count
  • Continues until: delta <= accuracy OR range shrinks to 1
  • Falls back to nested chunking if single value still exceeds row limit

Nested Chunking (Deep Chunking):

When a single value in the current key part exceeds rows_per_chunk:

  • Recursively chunks the next key part with boundary condition
  • Only applies if next column is optimizable (INT-like type)

Chunk Gluing (ENHANCED strategy only):

The Gluer class merges small chunks to optimize dump file count:

  • Accumulates consecutive chunks when row count < max_rows_cnt
  • Flushes when accumulated size > 3 * max_rows_cnt or at table end
  • Prevents fragmentation by combining undersized chunks
  • DummyGluer (for ORIGINAL) disables this optimization

Introduced dump configuration options:

adaptiveStepStrategy - strategy used for determining chunk boundaries
original - Default. Use the original implementation
enhanced - Use the new approach for calculations

maxKeyPrefixLength - limits the number of key parts used for chunking
(depth)
0 - Use the whole length of the key (up not compatible column)
Default: 1 to keep the original behavior

Isse 2 (the small one):

PS-10416: Calculate and send checksum header for uploads to support S3 Object Lock

https://perconadev.atlassian.net/browse/PS-10416

Object Lock feature of the AWS S3 requires Content-MD5 request header
to be present in the PUT request.

Added calculation of this header. It is calculated always, as it
simplifies the logic and does not cause any harm even if not needed.

…3 Object Lock

https://perconadev.atlassian.net/browse/PS-10416

Object Lock feature of the AWS S3 requires Content-MD5 request header
to be present in the PUT request.

Added calculation of this header. It is calculated always, as it
simplifies the logic and does not cause any harm even if not needed.
https://perconadev.atlassian.net/browse/PS-10413

Introduced enhanced chunking algorithm.

Problem:
The original algorithm consider only 1st column of the primary key when
chunking the table for parallel dump. If the table contains
the composite PK it may happen that there is a huge amount of rows for
a given key part.
As the result, chunk sizes are not well-balanced.
Dumping process is delegated to parallel workers. Each chunk is dumped
by the separate thread. If there is a huge chunk and multiple small
chunks, all small chunks will be quickly processed in parallel, but
the huge one will use a thread for a long time, while other worker
threads are idle.

Solution:
Implemented chunking algorithms that uses other key parts to produce
table chunks.

Chunking Algorithm Overview

The chunking mechanism divides large table into manageable chunks to
enable parallel data extraction and optimal memory usage.
The algorithm supports two strategies:
ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh.

Integer Column Chunking:

For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL):
Phase 1: Range Expansion (Linear)**
- Starts with an estimated step size based on:
  index_range / estimated_chunks
- Expands the search range until enough rows are found
  (rows >= rows_per_chunk)
- Stops if maximum range is reached or sufficient rows are found
Phase 2: Binary Search (Shrinking)**
- If too many rows are found (rows > rows_per_chunk + accuracy)
- Uses binary search to narrow the range to match target row count
- Continues until: delta <= accuracy OR range shrinks to 1
- Falls back to nested chunking if single value still exceeds row limit

Nested Chunking (Deep Chunking):

When a single value in the current key part exceeds rows_per_chunk:
- Recursively chunks the next key part with boundary condition
- Only applies if next column is optimizable (INT-like type)

Chunk Gluing (ENHANCED strategy only):

The Gluer<T> class merges small chunks to optimize dump file count:
- Accumulates consecutive chunks when row count < max_rows_cnt
- Flushes when accumulated size > 3 * max_rows_cnt or at table end
- Prevents fragmentation by combining undersized chunks
- DummyGluer (for ORIGINAL) disables this optimization

Introduced dump configuration options:

adaptiveStepStrategy - strategy used for determining chunk boundaries
original - Default. Use the original implementation
enhanced - Use the new approach for calculations

maxKeyPrefixLength - limits the number of key parts used for chunking
(depth)
0 - Use the whole length of the key (up not compatible column)
Default: 1 to keep the original behavior

# Conflicts:
#	modules/util/dump/dump_options.cc
…eStepStrategy: "enhanced"

PS-10912: Partition table dumpInstance o/p shows incorrect no of rows w/ adaptiveStepStrategy: "enhanced"
PS-10935: dumpInstance: rows written does not match for Unique Indexs w/ adaptiveStepStrategy: "enhanced"

https://perconadev.atlassian.net/browse/PS-10897
https://perconadev.atlassian.net/browse/PS-10912
https://perconadev.atlassian.net/browse/PS-10935

Problem:
When the last PK(0) is processed by the nested chunking, the nested
chunk is the last chunk in the dump. In such a case, when we return from
nested chunking logic, there is nothing else to be chunked on the top
level. However, the top level logic was not aware of the above and
attempted to dump the last chunk, which was the whole PK(0) key.
Effectively PK(0) was dumped twice: the first time by nested chunking,
the second time by the top level.

Solution:
Top level generates the last chunk which is empty. Generating the last
chunk is required by the protocol.
…tegy: "enhanced"

https://perconadev.atlassian.net/browse/PS-10922

Problem:
When trying to estimate rows count in a given range, parsing of the
EXPLAIN query result fails.
This is because the original implementaton of parsing EXPLAIN output
JSON does not cover all possible return values. In such a case exception
is raised and execution stops with error.

Solution:
Added handling of the case when EXPLAIN output says
'zero_rows_aggregated', which means zero rows in a range.
…trategy & chunking nesting depth

https://perconadev.atlassian.net/browse/PS-10933

Improved config options dependencies handling.
2. The project default language mode moved to C++23 in 9.7.0,
which changes how 0 -> std::string is resolved, and that trips
the deleted std::string(nullptr_t) constructor when Decimal is
constructed from 0.
…g index in adaptive_step_v2

Problem:
When chunking integer columns with adaptiveStepStrategy: "enhanced",
adaptive_step_v2() asks the server for a per-range row-count estimate
via EXPLAIN FORMAT=JSON SELECT COUNT(*) and uses that estimate to drive
a binary chop on the chunking range. This relies on the estimate being
roughly monotonic in the range width.
In practice, on tables with composite keys and additional secondary
indexes covering a leading key part, the optimizer can pick a different
access path for the EXPLAIN'd COUNT(*) than the index the chunker is
iterating on (e.g. a `ref` lookup on a shorter index that ignores the
BETWEEN predicate on a later key part). When that happens, EXPLAIN
returns a constant cardinality (~ rows for the leading key part)
regardless of the BETWEEN range, while for narrower ranges it can flip
to the primary key and report 0. The chop loop sees this 0/N flapping,
its `left` cursor is raised on the first false "expand" probe, and the
loop exits via `left >= right` with rows == 0 and a wide step. Because
new_step != 1, the deep-chunking branch (chunk by next key part) is
never entered and the chunk is emitted as one big slice.
Solution:
Pin the EXPLAIN COUNT(*) probe in adaptive_step_v2() to the same index
the chunker is iterating on by appending a FORCE INDEX (...) clause
after the table reference. With that, EXPLAIN's `rows` is the
records_in_range estimate against the chunking index, monotone in the
range width (modulo small dive noise). The chop converges to range 1
when the range really is too big, which lets the existing
deep-chunking path engage on the next key part as designed.
The change is intentionally narrow:
  * only adaptive_step_v2 (the "enhanced" strategy);
  * only the EXPLAIN probe (not the dump-data SELECT, not the boundary
    SELECTs in chunk_column / chunk_non_integer_column, not the
    original adaptive_step v1);
  * only the integer column path (adaptive_step_v2 is reachable only
    from chunk_integer_column).
To carry the index name into the chunker, Instance_cache::Index gains
a quoted_name() accessor populated via set_name() at build time from
the index map key in the cache. A small helper force_index_clause() in
dumper.cc formats the clause and returns "" when the table has no
usable chunking index, so behavior is unchanged in that case.
@kamil-holubicki kamil-holubicki changed the title Ps 10413 and ps 10416 9.7 Improve chunking strategy for tables with the composite PK May 28, 2026
@mysql-oca-bot
Copy link
Copy Markdown

Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment:
"I confirm the code being submitted is offered under the terms of the OCA, and that I am authorized to contribute it."
Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants