MDEV-39410: ucs2 data not stored correctly in the context#5101
Conversation
For constant tables, the rows are stored in the context using the charset my_charset_utf8mb4_bin. However, that is not correct, as every field in the row could use its own charset. This leads to a replay failure, as the data that gets read from the sql_script file is different from the original row data. This PR addresses uses field's own charset instead of my_charset_utf8mb4_bin.
There was a problem hiding this comment.
Code Review
This pull request addresses MDEV-39410 by ensuring ucs2 data is correctly handled in the optimizer context, specifically by using the field's actual charset during row formatting. It also adds a regression test for this scenario. Feedback suggests reverting the change to the initial charset of the temporary string buffer in format_and_store_row, as the charset is overwritten later and the modification may be misleading for future maintenance.
| Field **pfield; | ||
| char row_buff_tmp[512]; | ||
| String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_bin); | ||
| String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_utf8mb4_bin); |
There was a problem hiding this comment.
This change appears to be unnecessary and could be confusing. The charset of the tmp string, initialized here, is immediately overwritten by the call to field->val_str(&tmp) on line 3128. For most field types, val_str invokes String::set, which updates both the content and the charset of the string object. Consequently, the initial charset specified in the String constructor is never actually used.
Setting it to &my_charset_utf8mb4_bin might mislead future developers into thinking this specific charset is important for the logic that follows. The original &my_charset_bin is a more generic and suitable choice for a temporary buffer. To enhance code clarity, I recommend reverting this change.
String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_bin);
For constant tables, the rows are stored in the context using the charset my_charset_utf8mb4_bin. However, that is not correct, as every field in the row could use its own charset.
This leads to a replay failure, as the data that gets read from the sql_script file is different from the original row data.
This PR addresses uses field's own charset instead of my_charset_utf8mb4_bin.