fix: detect binary files via NUL bytes#85
Conversation
Scan for NUL bytes and validate UTF-8 while reading so mixed text/binary files are skipped. Drops the golang.org/x/tools dependency. Fixes dolph#9
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 10c587177f
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if !utf8.Valid(chunk[:readN]) { | ||
| return "", nil |
There was a problem hiding this comment.
Preserve UTF-8 state across read chunks
When a valid UTF-8 file has a multi-byte rune split across a 32 KiB read boundary after the initial 1 KiB probe, this per-chunk utf8.Valid check returns false because neither half is valid alone, causing Read to return empty and ReplaceContents to silently skip the text file. For example, a file with 1024 ASCII bytes, then 32767 ASCII bytes, then é followed by the search string will no longer be rewritten even though it is valid UTF-8; validate across chunk boundaries or after accumulating the bytes instead.
Useful? React with 👍 / 👎.
Fixes #9
Made with Cursor