Skip to content

fix: detect binary files via NUL bytes#85

Open
leno23 wants to merge 1 commit into
dolph:mainfrom
leno23:fix/binary-nul-detection-issue-9-v2
Open

fix: detect binary files via NUL bytes#85
leno23 wants to merge 1 commit into
dolph:mainfrom
leno23:fix/binary-nul-detection-issue-9-v2

Conversation

@leno23
Copy link
Copy Markdown

@leno23 leno23 commented May 30, 2026

Fixes #9

Made with Cursor

Scan for NUL bytes and validate UTF-8 while reading so mixed text/binary
files are skipped. Drops the golang.org/x/tools dependency.

Fixes dolph#9
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 10c587177f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread file_handling.go
Comment on lines +99 to +100
if !utf8.Valid(chunk[:readN]) {
return "", nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Preserve UTF-8 state across read chunks

When a valid UTF-8 file has a multi-byte rune split across a 32 KiB read boundary after the initial 1 KiB probe, this per-chunk utf8.Valid check returns false because neither half is valid alone, causing Read to return empty and ReplaceContents to silently skip the text file. For example, a file with 1024 ASCII bytes, then 32767 ASCII bytes, then é followed by the search string will no longer be rewritten even though it is valid UTF-8; validate across chunk boundaries or after accumulating the bytes instead.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Binary detection samples only the first 1024 bytes; mixed files are corrupted

1 participant