Improve image reading with buffer validation#240
Improve image reading with buffer validation#240hert1zm wants to merge 2 commits intoPFCCLab:mainfrom
Conversation
Add buffer check for image reading to handle empty or corrupted files.
There was a problem hiding this comment.
Pull request overview
Adds defensive handling in the auto-recognition worker so corrupted/undecodable images don’t crash the batch run, improving robustness when processing large image sets.
Changes:
- Validate the
np.fromfile()buffer before callingcv2.imdecode(). - Log warnings and skip files when the buffer is empty or decode returns
None.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if buf.size == 0: | ||
| logger.warning( | ||
| "Failed to read the image's buffer. The file may be corrupted or in an unsupported format : %s", | ||
| img_path, | ||
| ) | ||
| self.result_dic = None | ||
| continue | ||
| img = cv2.imdecode(buf, cv2.IMREAD_COLOR) | ||
| if img is None: | ||
| logger.warning( | ||
| "Failed to decode image file %s. The file may be corrupted or in an unsupported format.", | ||
| img_path, | ||
| ) | ||
| self.result_dic = None | ||
| continue |
There was a problem hiding this comment.
continue here skips the common post-processing block ("结果保存"), including findex += 1 and progressBarValue.emit(findex). This changes behavior for decode failures/empty buffers: skipped files no longer advance the progress bar, so the progress UI can finish below len_bar and the time-left estimate becomes inaccurate. Consider letting execution fall through to the existing self.result_dic is None handling (no continue), or increment/emit progress before continuing, so every input file advances progress consistently.
| img = cv2.imdecode( | ||
| np.fromfile(img_path, dtype=np.uint8), cv2.IMREAD_COLOR | ||
| ) | ||
| buf = np.fromfile(img_path, dtype=np.uint8) |
There was a problem hiding this comment.
np.fromfile(img_path, dtype=np.uint8) can raise (e.g., missing file, permission error, I/O error). Since run() catches Exception but then re-raises, these cases will still terminate the worker/app, which undermines the goal of “skip unreadable files”. Consider catching OSError/IOError around the fromfile/imdecode path, logging a warning, and continuing without re-raising.
| buf = np.fromfile(img_path, dtype=np.uint8) | |
| try: | |
| buf = np.fromfile(img_path, dtype=np.uint8) | |
| except (OSError, IOError) as e: | |
| logger.warning( | |
| "Failed to read image file %s due to an OS/I/O error: %s", | |
| img_path, | |
| e, | |
| ) | |
| self.result_dic = None | |
| continue |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Currently, if the image set contains even a single corrupted or unreadable file, the entire application crashes during auto-detection/auto-recognition. This patch introduces a validation step on the image buffer before calling
cv2.imdecode(). If the buffer is empty or invalid, the application logs a warning and skips the file instead of raising an exception. This prevents the full process from terminating unexpectedly and avoids losing progress when processing large batches of images.Since this problem could occur in multiple occasions, consider building a custom utility to replace all the calls to
cv2.imdecode()to implement a buffer validation and prevent crashing in case the image set contains corrupted or unusable images: