Skip to content

[fix](test) Make test_analyze_long_string Case 5 stable against sample rows randomness#64408

Open
yujun777 wants to merge 1 commit into
apache:masterfrom
yujun777:fix-flaky-test-analyze-long-string
Open

[fix](test) Make test_analyze_long_string Case 5 stable against sample rows randomness#64408
yujun777 wants to merge 1 commit into
apache:masterfrom
yujun777:fix-flaky-test-analyze-long-string

Conversation

@yujun777

@yujun777 yujun777 commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

related PR: #62686

Case 5 (DUJ1 template) uses sample rows 3 on a table with 5 rows, but
only 1 row had a long string exceeding statistics_max_string_column_length
(1024). With sample rows 3 reading only 3 of 5 rows, there was a ~40%
chance that the long row was missed and the assert_true guard never fired.
When missed, big_str completed normally with an empty skip message,
causing the test assertion to fail:

expected skip reason visible for col big_str, got msg=
==> expected: but was:

The fix makes all 5 rows have repeat('z', 2048) for big_str, so the
long-string guard always triggers regardless of which rows are sampled.

No other cases are affected: Case 1/2/6 use full-table analyze, Case 3
uses sample percent 100, and Case 4 explicitly expects the guard NOT
to apply (partition path).

…e rows randomness

Case 5 forces the DUJ1 template via debug point and uses `sample rows 3`
on 5 rows where only 1 had a long big_str value exceeding
statistics_max_string_column_length (1024). With only 1/5 rows exceeding
the limit, the sample had ~40% chance of missing the long row, causing
the assert_true guard to never fire and big_str to complete without a
skip message.

Key changes:
- Change all 5 rows in Case 5 to use repeat('z', 2048) for big_str so
  the long-string guard triggers regardless of which rows are sampled
@hello-stephen

Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@yujun777

Copy link
Copy Markdown
Contributor Author

run buildall

@github-actions github-actions Bot added the approved Indicates a PR has been approved by one committer. label Jun 11, 2026
@github-actions

Copy link
Copy Markdown
Contributor

PR approved by at least one committer and no changes requested.

@github-actions

Copy link
Copy Markdown
Contributor

PR approved by anyone and no changes requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by one committer. dev/3.1.x dev/4.0.x dev/4.1.x reviewed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants