[IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons #1006

gaurav02081 · 2026-01-11T10:31:30Z

Please prefix your pull request with one of the following: [FEATURE] [FIX] [IMPROVEMENT].

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.

My familiarity with the project is as follows (check one):

I have never used the project.
I have used the project briefly.
I have used the project extensively, but have not contributed previously.
I am an active contributor to the project.

This PR reduces false regression failures by normalizing non-semantic text differences during comparison.

CRLF and LF line endings are treated equivalently
Trailing whitespace at the end of lines does not cause diffs
Binary files are exempt from normalization

Changes

Updated TestResultFile.read_lines to normalize text output during comparison

Added a focused unit test covering:

CRLF vs LF normalization
Trailing whitespace handling
Binary file exemption

These differences are non-semantic and can introduce noise in regression results, making diffs harder to interpret.

Testing

Added unit tests (tests/test_normalization.py)

Tests pass locally

canihavesomecoffee · 2026-01-19T15:36:32Z

If we have a platform that wants to make sure that there's no changes at all to generated subtitle files, is it a good idea to start ignoring certain changes such as the ones you mention?

sonarqubecloud · 2026-01-21T16:56:04Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

gaurav02081 · 2026-01-21T17:08:03Z

--> I went ahead and validated this locally to make sure the normalization doesn’t hide any real subtitle changes.

I tested the behavior using a small standalone script (verify_normalization.py) that directly exercises TestResultFile.read_lines(not commited this PR) in isolation. The script generates temporary subtitle files and compares them across several scenarios:

Identical files
CRLF vs LF line endings
Trailing whitespace differences
Actual subtitle text changes
Timestamp (timing) changes
Binary files (e.g. .jpg) to confirm normalization is skipped

The results show that only non-semantic differences (line endings and trailing whitespace) are normalized, while real content and timing changes are still detected correctly. Binary files are excluded from normalization as intended.

i am attaching the output image of the script .

Normalize line endings and trailing whitespace in regression comparisons

373b62c

gaurav02081 requested review from canihavesomecoffee and thealphadollar as code owners January 11, 2026 10:31

Merge branch 'master' into fix/regression-normalization

dce6845

gaurav02081 changed the title ~~Normalize line endings and trailing whitespace in regression comparisons~~ [IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons Jan 11, 2026

Merge branch 'master' into fix/regression-normalization

cd70dbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons #1006

[IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons #1006

Uh oh!

gaurav02081 commented Jan 11, 2026 •

edited

Loading

Uh oh!

canihavesomecoffee commented Jan 19, 2026

Uh oh!

sonarqubecloud bot commented Jan 21, 2026

Uh oh!

gaurav02081 commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

[IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons #1006

Are you sure you want to change the base?

[IMPROVEMENT] Normalize line endings and trailing whitespace in regression comparisons #1006

Uh oh!

Conversation

gaurav02081 commented Jan 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

canihavesomecoffee commented Jan 19, 2026

Uh oh!

sonarqubecloud bot commented Jan 21, 2026

Quality Gate passed

Uh oh!

gaurav02081 commented Jan 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

gaurav02081 commented Jan 11, 2026 •

edited

Loading