Skip to content

Conversation

@gaurav02081
Copy link

@gaurav02081 gaurav02081 commented Jan 11, 2026

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

  • I have never used the project.
  • I have used the project briefly.
  • I have used the project extensively, but have not contributed previously.
  • I am an active contributor to the project.

This PR reduces false regression failures by normalizing non-semantic text differences during comparison.

CRLF and LF line endings are treated equivalently
Trailing whitespace at the end of lines does not cause diffs
Binary files are exempt from normalization

Changes

Updated TestResultFile.read_lines to normalize text output during comparison

Added a focused unit test covering:

CRLF vs LF normalization
Trailing whitespace handling
Binary file exemption

These differences are non-semantic and can introduce noise in regression results, making diffs harder to interpret.

Testing

Added unit tests (tests/test_normalization.py)

Tests pass locally

image

@gaurav02081 gaurav02081 changed the title Normalize line endings and trailing whitespace in regression comparisons [IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons Jan 11, 2026
@canihavesomecoffee
Copy link
Member

If we have a platform that wants to make sure that there's no changes at all to generated subtitle files, is it a good idea to start ignoring certain changes such as the ones you mention?

@sonarqubecloud
Copy link

@gaurav02081
Copy link
Author

--> I went ahead and validated this locally to make sure the normalization doesn’t hide any real subtitle changes.

I tested the behavior using a small standalone script (verify_normalization.py) that directly exercises TestResultFile.read_lines(not commited this PR) in isolation. The script generates temporary subtitle files and compares them across several scenarios:

  • Identical files

  • CRLF vs LF line endings

  • Trailing whitespace differences

  • Actual subtitle text changes

  • Timestamp (timing) changes

  • Binary files (e.g. .jpg) to confirm normalization is skipped

The results show that only non-semantic differences (line endings and trailing whitespace) are normalized, while real content and timing changes are still detected correctly. Binary files are excluded from normalization as intended.

i am attaching the output image of the script .

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants