Skip to content

unicode: Use the correct maximum size of Cyrillic#11928

Open
cosmo0920 wants to merge 1 commit into
masterfrom
cosmo0920-correct-max-char-width
Open

unicode: Use the correct maximum size of Cyrillic#11928
cosmo0920 wants to merge 1 commit into
masterfrom
cosmo0920-correct-max-char-width

Conversation

@cosmo0920

@cosmo0920 cosmo0920 commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

CP866 should be handled as the maximum byte is 3.


Enter [N/A] in the box, if an item is not applicable to your change.

Testing
Before we can approve your change; please submit the following in a comment:

  • Example configuration file for the change
  • Debug log output from testing the change
  • Attached Valgrind output that shows no leaks or memory corruption was found

If this is a change to packaging of containers or native binaries then please confirm it works for all targets.

  • Run local packaging test showing all targets (including any new ones) build.
  • Set ok-package-test label to test for all targets (requires maintainer to do).

Documentation

  • Documentation required for this feature

Backporting

  • Backport to latest stable release.

Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.

Summary by CodeRabbit

  • Bug Fixes
    • Improved handling of Windows-encoded text characters to support wider character widths.

Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
@coderabbitai

coderabbitai Bot commented Jun 9, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

The PR updates the Windows-866 converter configuration in the Unicode handler by increasing the max_width field from 2 to 3. This single-line change affects how the converter reports its maximum character width for the Windows-866 encoding scheme.

Changes

Win866 Converter Configuration Update

Layer / File(s) Summary
Win866 converter max_width adjustment
src/unicode/flb_utf8_and_win.c
The win866_converter global variable increases its .max_width setting from 2 to 3.

Estimated Code Review Effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Poem

A rabbit hopped through encoding lands so wide,
Found win866 with max_width of pride,
From two to three, the change so neat,
Characters broader make the codec complete! 🐰✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: updating the maximum width for Cyrillic character handling in the unicode converter from 2 to 3 bytes.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch cosmo0920-correct-max-char-width

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 Infer (1.2.0)
src/unicode/flb_utf8_and_win.c

src/unicode/flb_utf8_and_win.c:20:10: fatal error: 'fluent-bit/flb_log.h' file not found
20 | #include <fluent-bit/flb_log.h>
| ^~~~~~~~~~~~~~~~~~~~~~
1 error generated.
Error: the following clang command did not run successfully:
/opt/infer-linux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/bin/clang-18
@/tmp/coderabbit-infer/7fe153dfd6d3458997130172f4e5b47504197275-e27c65e894c60c01/tmp/clang_command_.tmp.84cae5.txt
++Contents of '/tmp/coderabbit-infer/7fe153dfd6d3458997130172f4e5b47504197275-e27c65e894c60c01/tmp/clang_command_.tmp.84cae5.txt':
"-cc1" "-load"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../../facebook-clang-plugins/libtooling/build/FacebookClangPlugin.dylib"
"-add-plugin" "BiniouASTExporter" "-plugin-arg-BiniouASTExporter" "-"
"-plugin-arg-BiniouASTExporter" "PREPEND_CURRENT_DIR=1"
"-plugin-arg-BiniouASTExporter" "MAX_STRING_SIZE=65535" "-cc1" "-triple"
"x86_64-unknown-linux-gnu" "-emit-obj" "-

... [truncated 744 characters] ...

inux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include"
"-internal-isystem" "/usr/local/include" "-internal-isystem"
"/usr/lib/gcc/x86_64-linux-gnu/12/../../../../x86_64-linux-gnu/include"
"-internal-externc-isystem" "/usr/include/x86_64-linux-gnu"
"-internal-externc-isystem" "/include" "-internal-externc-isystem"
"/usr/include" "-Wno-ignored-optimization-argument" "-Wno-everything"
"-ferror-limit" "19" "-fgnuc-version=4.2.1" "-fskip-odr-check-in-gmf"
"-D__GCC_HAVE_DWARF2_CFI_ASM=1" "-o"
"/tmp/coderabbit-infer/e27c65e894c60c01/file.o" "-x" "c"
"src/unicode/flb_utf8_and_win.c" "-O0" "-fno-builtin" "-include"
"/opt/infer-linux-x86_64-v1.2.0/lib/infer/infer/bin/../lib/clang_wrappers/global_defines.h"
"-Wno-everything"


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/unicode/flb_utf8_and_win.c`:
- Line 121: Update the CP866 converter's max UTF-8 byte width to 3 by setting
the win866_converter's .max_width field to 3 (replacing the previous 2) to
prevent buffer overflows for U+2500–U+257F box-drawing and similar characters;
ensure this matches the other Windows codepage converters (e.g., the converters
defined near the .max_width entries at lines with win1250_converter,
win1251_converter, etc.), run the unicode conversion tests, and adjust any
related buffer allocation logic/comments to reflect the 3-byte UTF-8 maximum.
- Line 121: Add targeted CP866 (Win866) box-drawing coverage: extend
tests/internal/unicode.c to include test vectors covering CP866 bytes 0xB0–0xDF
mapped to U+2500+ (box-drawing) and assert correct UTF-8 conversion and
round-trip via the conversion routines used in src/unicode/flb_utf8_and_win.c
(locate the encoder/decoder functions in that file and any helpers that
reference the .max_width setting). Update
tests/runtime/data/tail/generate_generic_encoder_testing_data.py to emit the
same CP866 sequences so CI-generated test data includes box-drawing chars. After
adding tests, run Valgrind (or equivalent) against the new test suite focusing
on the conversion path to ensure no memory corruption/leaks and fix any issues
found; consider preparing the change as a backport candidate for Windows
codepage conversion safety.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 54ef58ad-51b6-4bc6-bc59-c58f92d0c45c

📥 Commits

Reviewing files that changed from the base of the PR and between 5880717 and 7fe153d.

📒 Files selected for processing (1)
  • src/unicode/flb_utf8_and_win.c

Comment thread src/unicode/flb_utf8_and_win.c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant