util/collate: support latin1_swedish_ci#67255
util/collate: support latin1_swedish_ci#67255kennytm wants to merge 9 commits intopingcap:masterfrom
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
This comment was marked as resolved.
This comment was marked as resolved.
|
/ok-to-test |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds native support for the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~30 minutes Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Since TiDB treats
Alternative we can treat all non-Latin1 characters being invalid and map them all to
EDIT: Went with the |
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #67255 +/- ##
================================================
- Coverage 77.8039% 77.6696% -0.1344%
================================================
Files 2023 2025 +2
Lines 556185 556234 +49
================================================
- Hits 432734 432025 -709
- Misses 121706 122428 +722
- Partials 1745 1781 +36
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@kennytm Are you also adding the collation to TiKV and TiFlash? |
|
Not yet. I'll submit PR if the current design (all non-Latin1 maps to mysql> select count(1) from t where val < concat(_latin1'ä', val);
ERROR 1105 (HY000): other error: [components/tidb_query_datatype/src/codec/error.rs:215]: invalid schema: UnsupportedCollation { code: -8 }
mysql> select /*+ read_from_storage(tiflash[t]) */ count(1) from t where val < concat(_latin1'ä', val);
ERROR 1105 (HY000): other error for mpp stream: Code: 49, e.displayText() = DB::Exception: static TiDBCollatorPtr TiDB::ITiDBCollator::getCollator(int32_t): invalid collation ID: 8, e.what() = DB::Exception,Edit: |
fca80e0 to
501af8e
Compare
|
@kennytm: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
What problem does this PR solve?
Issue Number: close #67198, close #36057
Problem Summary: TiDB did not support the
latin1_swedish_cicollation, which was the default collation on MySQL 5.x and often inherited after upgrading. Lack of such collation complicates migrations to/from TiDB.What changed and how does it work?
A new Collator is added to the
newCollatorMapfor "latin1_swedish_ci" (ID = 8). The Collator embeds a look-up table which maps every Latin-1 character to a specific weight, compatible with MySQL 8.4 (the look-up table can be exchanged to support more SBCS collations e.g.latin1_general_ciin the future, if there is actual demand).Other behavior of
latin1, e.g. the default collation (latin1_bin) and the incompatible HEX() behavior (#18955) are not changed.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.
Summary by CodeRabbit
New Features
Tests
Chores