Skip to content

Fix byte/character offset confusion in formatter for multi-byte UTF-8#105

Merged
coder3101 merged 4 commits intomainfrom
copilot/fix-empty-comment-line-issue
Feb 2, 2026
Merged

Fix byte/character offset confusion in formatter for multi-byte UTF-8#105
coder3101 merged 4 commits intomainfrom
copilot/fix-empty-comment-line-issue

Conversation

Copy link
Contributor

Copilot AI commented Feb 2, 2026

Formatting proto files with multi-byte UTF-8 characters (Cyrillic, etc.) was non-idempotent, adding empty // comment lines on each format operation.

Root Cause

offset_to_position in src/formatter/clang.rs converted clang-format's byte offsets to LSP positions using byte arithmetic:

let character = offset - last_newline;  // treats byte offset as character offset

This fails for multi-byte UTF-8. Example: byte offset 134 in Cyrillic text → calculated position 119 → should be 77 UTF-16 code units.

Changes

  • Fixed offset calculation: Count UTF-16 code units from last newline instead of byte arithmetic

    let text_after_newline = &up_to_offset[last_newline..];
    let character = text_after_newline.encode_utf16().count();
  • Added tests: test_offset_to_position_cyrillic (unit) and test_textedit_from_clang_output_cyrillic (integration) with multi-byte UTF-8 input

Original prompt

This section details on the original issue you should resolve

<issue_title>Formatting inserts a new empty // comment line every time (non-idempotent formatting)</issue_title>
<issue_description>Hi! I just so happen to stumble upon this tricky bug in formatting.

What happened

I have this example:

message Test {
  // Обратная совместимость: если true, применяет фильтры enabled_not_false и removed_not_true.
  int32 x = 1;
}

When applying a formatting it seem to try to split the comment in two and spread among two lines. But in the end it just adds a new line with empty comment:

message Test {
  // Обратная совместимость: если true, применяет фильтры enabled_not_false и removed_not_true.
  // 
  int32 x = 1;
}

Further formatting just add empty line comments.

message Test {
  // Обратная совместимость: если true, применяет фильтры enabled_not_false и removed_not_true.
  // 
  // 
  // 
  int32 x = 1;
}

Environment

  • OS: Linux fedora 42
  • Neovim: NVIM v0.11.1
  • protols: 0.13.2
  • clang-format: clang-format version 20.1.8 (Fedora 20.1.8-4.fc42)
  • Formatting trigger: Neovim LSP (vim.lsp.buf.format()), also happens on :w (format-on-save enabled)

Video Example

https://github.com/user-attachments/assets/f33be03e-78e6-45db-8c83-89ae15a31d0b
</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits February 2, 2026 15:34
Co-authored-by: asharkhan3101 <140482588+asharkhan3101@users.noreply.github.com>
Co-authored-by: asharkhan3101 <140482588+asharkhan3101@users.noreply.github.com>
Co-authored-by: asharkhan3101 <140482588+asharkhan3101@users.noreply.github.com>
Copilot AI changed the title [WIP] Fix formatting bug creating empty comment lines Fix byte/character offset confusion in formatter for multi-byte UTF-8 Feb 2, 2026
Copilot AI requested a review from asharkhan3101 February 2, 2026 15:41
@asharkhan3101 asharkhan3101 marked this pull request as ready for review February 2, 2026 15:49
@coder3101 coder3101 merged commit 8bc1877 into main Feb 2, 2026
1 check passed
@coder3101 coder3101 deleted the copilot/fix-empty-comment-line-issue branch February 2, 2026 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Formatting inserts a new empty // comment line every time (non-idempotent formatting)

3 participants