unicode: Use the correct maximum size of Cyrillic#11928
Conversation
Signed-off-by: Hiroshi Hatake <hiroshi@chronosphere.io>
📝 WalkthroughWalkthroughThe PR updates the Windows-866 converter configuration in the Unicode handler by increasing the ChangesWin866 Converter Configuration Update
Estimated Code Review Effort🎯 1 (Trivial) | ⏱️ ~2 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 Infer (1.2.0)src/unicode/flb_utf8_and_win.csrc/unicode/flb_utf8_and_win.c:20:10: fatal error: 'fluent-bit/flb_log.h' file not found ... [truncated 744 characters] ... inux-x86_64-v1.2.0/lib/infer/facebook-clang-plugins/clang/install/lib/clang/18/include" Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/unicode/flb_utf8_and_win.c`:
- Line 121: Update the CP866 converter's max UTF-8 byte width to 3 by setting
the win866_converter's .max_width field to 3 (replacing the previous 2) to
prevent buffer overflows for U+2500–U+257F box-drawing and similar characters;
ensure this matches the other Windows codepage converters (e.g., the converters
defined near the .max_width entries at lines with win1250_converter,
win1251_converter, etc.), run the unicode conversion tests, and adjust any
related buffer allocation logic/comments to reflect the 3-byte UTF-8 maximum.
- Line 121: Add targeted CP866 (Win866) box-drawing coverage: extend
tests/internal/unicode.c to include test vectors covering CP866 bytes 0xB0–0xDF
mapped to U+2500+ (box-drawing) and assert correct UTF-8 conversion and
round-trip via the conversion routines used in src/unicode/flb_utf8_and_win.c
(locate the encoder/decoder functions in that file and any helpers that
reference the .max_width setting). Update
tests/runtime/data/tail/generate_generic_encoder_testing_data.py to emit the
same CP866 sequences so CI-generated test data includes box-drawing chars. After
adding tests, run Valgrind (or equivalent) against the new test suite focusing
on the conversion path to ensure no memory corruption/leaks and fix any issues
found; consider preparing the change as a backport candidate for Windows
codepage conversion safety.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 54ef58ad-51b6-4bc6-bc59-c58f92d0c45c
📒 Files selected for processing (1)
src/unicode/flb_utf8_and_win.c
CP866 should be handled as the maximum byte is 3.
Enter
[N/A]in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
If this is a change to packaging of containers or native binaries then please confirm it works for all targets.
ok-package-testlabel to test for all targets (requires maintainer to do).Documentation
Backporting
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.
Summary by CodeRabbit