Skip to content

MDEV-39410: ucs2 data not stored correctly in the context#5101

Open
bsrikanth-mariadb wants to merge 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
13.1-MDEV-39410-ucs2-data-not-stored-correctly-in-context
Open

MDEV-39410: ucs2 data not stored correctly in the context#5101
bsrikanth-mariadb wants to merge 1 commit into
bb-12.3-MDEV-39368-test-replayfrom
13.1-MDEV-39410-ucs2-data-not-stored-correctly-in-context

Conversation

@bsrikanth-mariadb
Copy link
Copy Markdown
Contributor

For constant tables, the rows are stored in the context using the charset my_charset_utf8mb4_bin. However, that is not correct, as every field in the row could use its own charset.

This leads to a replay failure, as the data that gets read from the sql_script file is different from the original row data.

This PR addresses uses field's own charset instead of my_charset_utf8mb4_bin.

For constant tables, the rows are stored in the context using the
charset my_charset_utf8mb4_bin. However, that is not correct, as
every field in the row could use its own charset.

This leads to a replay failure, as the data that gets read from the
sql_script file is different from the original row data.

This PR addresses uses field's own charset instead of
my_charset_utf8mb4_bin.
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses MDEV-39410 by ensuring ucs2 data is correctly handled in the optimizer context, specifically by using the field's actual charset during row formatting. It also adds a regression test for this scenario. Feedback suggests reverting the change to the initial charset of the temporary string buffer in format_and_store_row, as the charset is overwritten later and the modification may be misleading for future maintenance.

Comment thread sql/filesort.cc
Field **pfield;
char row_buff_tmp[512];
String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_bin);
String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_utf8mb4_bin);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change appears to be unnecessary and could be confusing. The charset of the tmp string, initialized here, is immediately overwritten by the call to field->val_str(&tmp) on line 3128. For most field types, val_str invokes String::set, which updates both the content and the charset of the string object. Consequently, the initial charset specified in the String constructor is never actually used.

Setting it to &my_charset_utf8mb4_bin might mislead future developers into thinking this specific charset is important for the logic that follows. The original &my_charset_bin is a more generic and suitable choice for a temporary buffer. To enhance code clarity, I recommend reverting this change.

  String tmp(row_buff_tmp, sizeof(row_buff_tmp), &my_charset_bin);

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

1 participant