Improve chunking strategy for tables with the composite PK#28
Open
kamil-holubicki wants to merge 9 commits into
Open
Improve chunking strategy for tables with the composite PK#28kamil-holubicki wants to merge 9 commits into
kamil-holubicki wants to merge 9 commits into
Conversation
…3 Object Lock https://perconadev.atlassian.net/browse/PS-10416 Object Lock feature of the AWS S3 requires Content-MD5 request header to be present in the PUT request. Added calculation of this header. It is calculated always, as it simplifies the logic and does not cause any harm even if not needed.
https://perconadev.atlassian.net/browse/PS-10413 Introduced enhanced chunking algorithm. Problem: The original algorithm consider only 1st column of the primary key when chunking the table for parallel dump. If the table contains the composite PK it may happen that there is a huge amount of rows for a given key part. As the result, chunk sizes are not well-balanced. Dumping process is delegated to parallel workers. Each chunk is dumped by the separate thread. If there is a huge chunk and multiple small chunks, all small chunks will be quickly processed in parallel, but the huge one will use a thread for a long time, while other worker threads are idle. Solution: Implemented chunking algorithms that uses other key parts to produce table chunks. Chunking Algorithm Overview The chunking mechanism divides large table into manageable chunks to enable parallel data extraction and optimal memory usage. The algorithm supports two strategies: ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh. Integer Column Chunking: For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL): Phase 1: Range Expansion (Linear)** - Starts with an estimated step size based on: index_range / estimated_chunks - Expands the search range until enough rows are found (rows >= rows_per_chunk) - Stops if maximum range is reached or sufficient rows are found Phase 2: Binary Search (Shrinking)** - If too many rows are found (rows > rows_per_chunk + accuracy) - Uses binary search to narrow the range to match target row count - Continues until: delta <= accuracy OR range shrinks to 1 - Falls back to nested chunking if single value still exceeds row limit Nested Chunking (Deep Chunking): When a single value in the current key part exceeds rows_per_chunk: - Recursively chunks the next key part with boundary condition - Only applies if next column is optimizable (INT-like type) Chunk Gluing (ENHANCED strategy only): The Gluer<T> class merges small chunks to optimize dump file count: - Accumulates consecutive chunks when row count < max_rows_cnt - Flushes when accumulated size > 3 * max_rows_cnt or at table end - Prevents fragmentation by combining undersized chunks - DummyGluer (for ORIGINAL) disables this optimization Introduced dump configuration options: adaptiveStepStrategy - strategy used for determining chunk boundaries original - Default. Use the original implementation enhanced - Use the new approach for calculations maxKeyPrefixLength - limits the number of key parts used for chunking (depth) 0 - Use the whole length of the key (up not compatible column) Default: 1 to keep the original behavior # Conflicts: # modules/util/dump/dump_options.cc
…eStepStrategy: "enhanced" PS-10912: Partition table dumpInstance o/p shows incorrect no of rows w/ adaptiveStepStrategy: "enhanced" PS-10935: dumpInstance: rows written does not match for Unique Indexs w/ adaptiveStepStrategy: "enhanced" https://perconadev.atlassian.net/browse/PS-10897 https://perconadev.atlassian.net/browse/PS-10912 https://perconadev.atlassian.net/browse/PS-10935 Problem: When the last PK(0) is processed by the nested chunking, the nested chunk is the last chunk in the dump. In such a case, when we return from nested chunking logic, there is nothing else to be chunked on the top level. However, the top level logic was not aware of the above and attempted to dump the last chunk, which was the whole PK(0) key. Effectively PK(0) was dumped twice: the first time by nested chunking, the second time by the top level. Solution: Top level generates the last chunk which is empty. Generating the last chunk is required by the protocol.
…tegy: "enhanced" https://perconadev.atlassian.net/browse/PS-10922 Problem: When trying to estimate rows count in a given range, parsing of the EXPLAIN query result fails. This is because the original implementaton of parsing EXPLAIN output JSON does not cover all possible return values. In such a case exception is raised and execution stops with error. Solution: Added handling of the case when EXPLAIN output says 'zero_rows_aggregated', which means zero rows in a range.
…trategy & chunking nesting depth https://perconadev.atlassian.net/browse/PS-10933 Improved config options dependencies handling.
2. The project default language mode moved to C++23 in 9.7.0, which changes how 0 -> std::string is resolved, and that trips the deleted std::string(nullptr_t) constructor when Decimal is constructed from 0.
…g index in adaptive_step_v2
Problem:
When chunking integer columns with adaptiveStepStrategy: "enhanced",
adaptive_step_v2() asks the server for a per-range row-count estimate
via EXPLAIN FORMAT=JSON SELECT COUNT(*) and uses that estimate to drive
a binary chop on the chunking range. This relies on the estimate being
roughly monotonic in the range width.
In practice, on tables with composite keys and additional secondary
indexes covering a leading key part, the optimizer can pick a different
access path for the EXPLAIN'd COUNT(*) than the index the chunker is
iterating on (e.g. a `ref` lookup on a shorter index that ignores the
BETWEEN predicate on a later key part). When that happens, EXPLAIN
returns a constant cardinality (~ rows for the leading key part)
regardless of the BETWEEN range, while for narrower ranges it can flip
to the primary key and report 0. The chop loop sees this 0/N flapping,
its `left` cursor is raised on the first false "expand" probe, and the
loop exits via `left >= right` with rows == 0 and a wide step. Because
new_step != 1, the deep-chunking branch (chunk by next key part) is
never entered and the chunk is emitted as one big slice.
Solution:
Pin the EXPLAIN COUNT(*) probe in adaptive_step_v2() to the same index
the chunker is iterating on by appending a FORCE INDEX (...) clause
after the table reference. With that, EXPLAIN's `rows` is the
records_in_range estimate against the chunking index, monotone in the
range width (modulo small dive noise). The chop converges to range 1
when the range really is too big, which lets the existing
deep-chunking path engage on the next key part as designed.
The change is intentionally narrow:
* only adaptive_step_v2 (the "enhanced" strategy);
* only the EXPLAIN probe (not the dump-data SELECT, not the boundary
SELECTs in chunk_column / chunk_non_integer_column, not the
original adaptive_step v1);
* only the integer column path (adaptive_step_v2 is reachable only
from chunk_integer_column).
To carry the index name into the chunker, Instance_cache::Index gains
a quoted_name() accessor populated via set_name() at build time from
the index map key in the cache. A small helper force_index_clause() in
dumper.cc formats the clause and returns "" when the table has no
usable chunking index, so behavior is unchanged in that case.
|
Hi, thank you for your contribution. Please confirm this code is submitted under the terms of the OCA (Oracle's Contribution Agreement) you have previously signed by cutting and pasting the following text as a comment: |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR addresses two issues:
Issue 1 (the big one):
PS-10413: Improve chunking strategy for tables with the composite PK
https://perconadev.atlassian.net/browse/PS-10413
Introduced an enhanced chunking algorithm.
Problem:
The original algorithm consider only 1st column of the primary key when
chunking the table for parallel dump. If the table contains
the composite PK it may happen that there is a huge amount of rows for
a given key part.
As the result, chunk sizes are not well-balanced.
Dumping process is delegated to parallel workers. Each chunk is dumped
by the separate thread. If there is a huge chunk and multiple small
chunks, all small chunks will be quickly processed in parallel, but
the huge one will use a thread for a long time, while other worker
threads are idle.
Solution:
Implemented chunking algorithms that uses other key parts to produce
table chunks.
Chunking Algorithm Overview
The chunking mechanism divides large table into manageable chunks to
enable parallel data extraction and optimal memory usage.
The algorithm supports two strategies:
ORIGINAL and ENHANCED. ORIGINAL is the default behavior of mysqlsh.
Integer Column Chunking:
For numeric primary keys (INTEGER, UNSIGNED INTEGER, DECIMAL):
Phase 1: Range Expansion (Linear)**
index_range / estimated_chunks
(rows >= rows_per_chunk)
Phase 2: Binary Search (Shrinking)**
Nested Chunking (Deep Chunking):
When a single value in the current key part exceeds rows_per_chunk:
Chunk Gluing (ENHANCED strategy only):
The Gluer class merges small chunks to optimize dump file count:
Introduced dump configuration options:
adaptiveStepStrategy - strategy used for determining chunk boundaries
original - Default. Use the original implementation
enhanced - Use the new approach for calculations
maxKeyPrefixLength - limits the number of key parts used for chunking
(depth)
0 - Use the whole length of the key (up not compatible column)
Default: 1 to keep the original behavior
Isse 2 (the small one):
PS-10416: Calculate and send checksum header for uploads to support S3 Object Lock
https://perconadev.atlassian.net/browse/PS-10416
Object Lock feature of the AWS S3 requires Content-MD5 request header
to be present in the PUT request.
Added calculation of this header. It is calculated always, as it
simplifies the logic and does not cause any harm even if not needed.