Improve image reading with buffer validation by hert1zm · Pull Request #240 · PFCCLab/PPOCRLabel

hert1zm · 2026-02-18T11:31:43Z

Currently, if the image set contains even a single corrupted or unreadable file, the entire application crashes during auto-detection/auto-recognition. This patch introduces a validation step on the image buffer before calling cv2.imdecode(). If the buffer is empty or invalid, the application logs a warning and skips the file instead of raising an exception. This prevents the full process from terminating unexpectedly and avoids losing progress when processing large batches of images.

Since this problem could occur in multiple occasions, consider building a custom utility to replace all the calls to cv2.imdecode() to implement a buffer validation and prevent crashing in case the image set contains corrupted or unusable images:

def imread(path):
    buf = np.fromfile(path, dtype=np.uint8)
    if buf.size == 0:
        logger.warning(
                "Failed to read the image's buffer. The file may be corrupted or in an unsupported format : %s", 
                img_path,
        )
        self.result_dic = None
        return None
    return cv2.imdecode(buf, cv2.IMREAD_COLOR)

Add buffer check for image reading to handle empty or corrupted files.

Copilot

Pull request overview

Adds defensive handling in the auto-recognition worker so corrupted/undecodable images don’t crash the batch run, improving robustness when processing large image sets.

Changes:

Validate the np.fromfile() buffer before calling cv2.imdecode().
Log warnings and skip files when the buffer is empty or decode returns None.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-22T23:35:00Z

libs/autoDialog.py

+                        if buf.size == 0:
+                            logger.warning(
+                                "Failed to read the image's buffer. The file may be corrupted or in an unsupported format : %s", 
+                                img_path,
+                            )
+                            self.result_dic = None
+                            continue
+                        img = cv2.imdecode(buf, cv2.IMREAD_COLOR)
                        if img is None:
                            logger.warning(
                                "Failed to decode image file %s. The file may be corrupted or in an unsupported format.",
                                img_path,
                            )
                            self.result_dic = None
+                            continue


continue here skips the common post-processing block ("结果保存"), including findex += 1 and progressBarValue.emit(findex). This changes behavior for decode failures/empty buffers: skipped files no longer advance the progress bar, so the progress UI can finish below len_bar and the time-left estimate becomes inaccurate. Consider letting execution fall through to the existing self.result_dic is None handling (no continue), or increment/emit progress before continuing, so every input file advances progress consistently.

Copilot · 2026-03-22T23:35:01Z

libs/autoDialog.py

-                        img = cv2.imdecode(
-                            np.fromfile(img_path, dtype=np.uint8), cv2.IMREAD_COLOR
-                        )
+                        buf = np.fromfile(img_path, dtype=np.uint8)


np.fromfile(img_path, dtype=np.uint8) can raise (e.g., missing file, permission error, I/O error). Since run() catches Exception but then re-raises, these cases will still terminate the worker/app, which undermines the goal of “skip unreadable files”. Consider catching OSError/IOError around the fromfile/imdecode path, logging a warning, and continuing without re-raising.

Suggested change

buf = np.fromfile(img_path, dtype=np.uint8)

try:

buf = np.fromfile(img_path, dtype=np.uint8)

except (OSError, IOError) as e:

logger.warning(

"Failed to read image file %s due to an OS/I/O error: %s",

img_path,

e,

)

self.result_dic = None

continue

libs/autoDialog.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Improve image reading with buffer validation

ec0aeba

Add buffer check for image reading to handle empty or corrupted files.

PFCCLab locked as resolved and limited conversation to collaborators Mar 22, 2026

PFCCLab unlocked this conversation Mar 22, 2026

GreatV marked this pull request as draft March 22, 2026 23:32

GreatV marked this pull request as ready for review March 22, 2026 23:33

Copilot AI review requested due to automatic review settings March 22, 2026 23:33

Copilot started reviewing on behalf of GreatV March 22, 2026 23:33 View session

Copilot AI reviewed Mar 22, 2026

View reviewed changes

Apply suggestion from @Copilot

35927b8

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve image reading with buffer validation#240

Improve image reading with buffer validation#240
hert1zm wants to merge 2 commits intoPFCCLab:mainfrom
hert1zm:patch-1

hert1zm commented Feb 18, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Copilot AI Mar 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

hert1zm commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 22, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hert1zm commented Feb 18, 2026 •

edited

Loading