[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support) by pszemus · Pull Request #2105 · CCExtractor/ccextractor

pszemus · 2026-02-10T15:57:06Z

In raising this pull request, I confirm the following (please check boxes):

I have read and understood the contributors guide.
I have checked that another pull request for this purpose does not exist.
I have considered, and confirmed that this submission will be valuable to others.
I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
I give this submission freely, and claim no ownership to its content.
I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

I have never used CCExtractor.
I have used CCExtractor just a couple of times.
I absolutely love CCExtractor, but have not contributed previously.
I am an active contributor to CCExtractor.

When streaming subtitles (particularly DVBSUB) from ccextractor to WebSocket endpoints via tools like websocat, multi-line subtitles cause issues. Each line is sent as a separate message, resulting in only the last line being visible at the receiving end.

For example, using the following pipeline:

ccextractor --udp <src_stream_address> --codec dvbsub --out=txt --stdout --forceflush | websocat ws://<endpoint-uri>

multi-line subtitle frames are sent line-by-line, losing all but the final line.

This PR introduces the --null-terminated option, which appends a null character (\0) as a frame delimiter after each complete subtitle frame (whether single or multi-line). This enables proper frame boundaries for streaming scenarios.

Then, it'll be possible to create the following pipeline:

ccextractor --udp <src_stream_address> --codec dvbsub --out=txt --null-terminated --stdout --forceflush | websocat -0 ws://<endpoint-uri>

With this change, websocat's -0 flag can properly parse complete subtitle frames using the null delimiter (see websocat documentation).

Benefits:

Enables reliable WebSocket streaming of subtitles without data loss
Maintains backward compatibility (opt-in feature)
Follows established patterns for null-terminated stream processing
Simple, focused change that solves a real-world use case

Please compare the following two output files, where with --null-terminated enabled new lines in multi-line subtitles were preserved and all frames end with \0.

--out=webvtt:
ccextractor_webvtt.txt
--out=txt --null-terminated:
ccextractor_txt_null-terminated.txt

cfsmp3

Good feature with a clear real-world use case. The implementation is clean and properly wired through both C and Rust. However, the --null-terminated flag currently only works for DVB bitmap subtitles, not for text-based captions (CEA-608/708). This needs to be fixed before merging.

The problem

In src/lib_ccx/ccx_encoders_transcript.c, you replaced encoded_crlf with encoded_end_frame in only one place — the bitmap subtitle path at line 92:

// write_cc_bitmap_as_transcript() — line 92 — ✅ changed
write_wrapped(context->out->fh, context->encoded_end_frame, context->encoded_end_frame_length);

But the text subtitle path (write_cc_buffer_as_transcript) still uses encoded_crlf in three places that also need updating:

// Line 206 — ❌ not changed (end of each subtitle line)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);

// Line 328 — ❌ not changed (end of each subtitle block)
ret = write(context->out->fh, context->encoded_crlf, context->encoded_crlf_length);

There's also line 77 and 90 where encoded_crlf is used for parsing/splitting tokens — those should probably stay as-is since they're detecting line breaks within the input, not writing output.

How to verify

I tested with a CEA-608 stream:

./ccextractor input.ts --txt --stdout --null-terminated 2>/dev/null | xxd | head -30

The output contains only 0d 0a (CRLF) — zero null bytes. The flag has no effect for text-based captions.

What to fix

In src/lib_ccx/ccx_encoders_transcript.c, replace encoded_crlf with encoded_end_frame on lines 206 and 328 (the two write() calls in write_cc_buffer_as_transcript). Leave lines 77 and 90 alone — those are input parsing, not output.

Note: you'll also need to update the ret < context->encoded_crlf_length comparisons on lines 207 and 329 to use encoded_end_frame_length accordingly.

pszemus · 2026-02-16T13:35:00Z

Thanks @cfsmp3 I've fixed missing code paths.
With my test file, now the output changes after setting --null-terminated from:

00000000: 5745 4c4c 2c20 4920 4755 4553 5320 594f  WELL, I GUESS YO
00000010: 5520 434f 554c 4420 5341 5920 5448 4154  U COULD SAY THAT
00000020: 0d0a 4920 4341 5245 2e2e 2e42 4543 4155  ..I CARE...BECAU
00000030: 5345 2049 2042 524f 5547 4854 2059 4f55  SE I BROUGHT YOU
00000040: 0d0a 494e 544f 2054 4849 5320 574f 524c  ..INTO THIS WORL
00000050: 442e 0d0a

to:

00000000: 5745 4c4c 2c20 4920 4755 4553 5320 594f  WELL, I GUESS YO
00000010: 5520 434f 554c 4420 5341 5920 5448 4154  U COULD SAY THAT
00000020: 0049 2043 4152 452e 2e2e 4245 4341 5553  .I CARE...BECAUS
00000030: 4520 4920 4252 4f55 4748 5420 594f 5500  E I BROUGHT YOU.
00000040: 494e 544f 2054 4849 5320 574f 524c 442e  INTO THIS WORLD.
00000050: 00

cfsmp3

Thanks for addressing the previous feedback — the C paths all work now. However there's still one path that doesn't respect --null-terminated:

CEA-708 via the Rust decoder — src/rust/src/decoder/tv_screen.rs:353 hardcodes \r\n:

writer.write_to_file(b"\r\n")?;

This means --null-terminated has no effect on CEA-708 transcript output. You can verify:

ccextractor input.ts --txt -o /tmp/test.txt --null-terminated -svc 1
xxd /tmp/test.p1.svc01.txt | head -20
# No null bytes — only 0d 0a

The frame_terminator_0 option needs to be plumbed into the Rust Writer struct so that write_transcript can use it instead of the hardcoded \r\n.

SuvidhJ · 2026-03-02T18:46:12Z

Hi @pszemus, I noticed the latest review feedback about plumbing frame_terminator_0 into the Rust Writer struct for CEA-708 support. I'd be happy to help with this if you'd like, just let me know!

pszemus · 2026-03-10T11:41:03Z

@cfsmp3 Thanks! I've made the necessary changes and the project builds well, but the Rust format check fails with a "to many arguments" error from clippy. Could you please review my changes?

pszemus · 2026-03-13T09:43:48Z

Hi @pszemus, I noticed the latest review feedback about plumbing frame_terminator_0 into the Rust Writer struct for CEA-708 support. I'd be happy to help with this if you'd like, just let me know!

Hi @SuvidhJ It would be much appreciated if you could review the changes I made in Rust decoder.

cfsmp3

Thanks for the update — the DVBSUB bitmap path works correctly, and the Rust formatting/clippy issues are resolved. However, --null-terminated only produces correct frame-level \0 delimiters on the DVBSUB (bitmap/OCR) path. On CEA-608 and CEA-708, the \0 is written per line, not per frame, which breaks the websocat -0 use case for those codecs and contradicts the PR description.

How to reproduce

CEA-608:

./ccextractor sample.ts -out=txt --null-terminated -o /tmp/test.txt
xxd /tmp/test.txt | head -20

You'll see \0 after every individual line, not after each complete subtitle frame. A two-line pop-on caption like:

♪ So no one told you
life was gonna be this way ♪

produces line1\0line2\0 instead of the expected line1\nline2\0.

CEA-708:

./ccextractor sample.ts -out=txt --null-terminated -svc 1 -o /tmp/test708.txt
xxd /tmp/test708.p1.svc01.txt | head -20

Same issue — \0 per row instead of per frame.

Root cause

Three code paths need fixing:

write_cc_line_as_transcript2 (CEA-608, ccx_encoders_transcript.c ~line 325): This function is called per line and writes encoded_end_frame at the end of each individual line. The caller write_cc_buffer_as_transcript2 iterates over the 15 rows and calls this function for each used row. Fix: keep using encoded_crlf (or \n) as the separator between lines within a frame, and only write encoded_end_frame once after the last line. Since write_cc_line_as_transcript2 doesn't know whether it's the last line, the frame terminator should be moved to the caller (write_cc_buffer_as_transcript2) after the row loop.
write_cc_subtitle_as_transcript (ccx_encoders_transcript.c ~line 203): The do...while (strtok_r) loop writes encoded_end_frame after each token (line). Fix: use \n (or encoded_crlf) between tokens within the loop, and write encoded_end_frame once after the loop exits.
CEA-708 Rust path (tv_screen.rs ~line 353-354): end_frame is written inside the row iteration loop. Fix: move the write_to_file(&end_frame) call outside the for row_index in ... loop, writing it once after all rows are emitted. Use \n or encoded_crlf between rows within the loop (the current line separator behavior).

Why the bitmap path works

In write_cc_bitmap_as_transcript, the entire subtitle text is processed first (internal CRLFs are replaced with spaces), then encoded_end_frame is written once at the end. This is the correct pattern — the other paths should follow a similar structure.

Testing checklist

After fixing, verify with xxd that:

CEA-608 multi-line pop-on: \n between lines, single \0 at frame end
CEA-608 single-line: single \0 at frame end
CEA-708 multi-line: \n between lines, single \0 at frame end
Normal mode (without --null-terminated): output identical to master (no regression)
--lf mode: \n line terminators still work as before

pszemus · 2026-03-18T11:37:08Z

Thanks @cfsmp3 I must have missed that paths.
I think I've fixed them now. I moved writing line ends to the beginning of the loop and ended the loop with encoded_end_frame to remove the double crlf when --null-terminated is not set and keep the original behaviour.

ccextractor-bot · 2026-03-18T12:11:16Z

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 9f250b1...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	3/7
DVD	3/3
DVR-MS	2/2
General	20/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	77/86
Teletext	20/21
WTV	13/13
XDS	31/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=srt --latin1 b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
ccextractor --out=spupng c83f765c66...
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8738
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8738
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8738
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8738
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8738
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8738
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8926
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8738
ccextractor --service all da904de35d..., Last passed:
Test 8738
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8738
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8738
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8738
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8738
ccextractor --service 1 c83f765c66..., Last passed:
Test 8738
ccextractor --myth c83f765c66..., Last passed:
Test 8738
ccextractor --in=raw fb79021542..., Last passed:
Test 8738
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8738
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8738
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8738

Congratulations: Merging this PR would fix the following tests:

ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

ccextractor-bot · 2026-03-18T12:37:07Z

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 9f250b1...:

Report Name	Tests Passed
Broken	9/13
CEA-708	1/14
DVB	4/7
DVD	3/3
DVR-MS	2/2
General	22/27
Hardsubx	1/1
Hauppage	3/3
MP4	3/3
NoCC	10/10
Options	81/86
Teletext	20/21
WTV	13/13
XDS	31/34

Your PR breaks these cases:

ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b...
ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9...
ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc...
ccextractor --autoprogram --out=srt --latin1 b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e...
ccextractor --autoprogram --out=ttxt --latin1 01509e4d27...
ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e...
ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065...
ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7...

NOTE: The following tests have been failing on the master branch as well as the PR:

ccextractor --out=srt --latin1 --autoprogram 73d9313d64..., Last passed:
Test 8611
ccextractor --out=ttxt --latin1 001dd8cdf7..., Last passed:
Test 8611
ccextractor --out=srt --latin1 4d4e938ef6..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup ea83ff7bcb..., Last passed:
Test 8611
ccextractor --service 1 --out=txt f17524b53f..., Last passed:
Test 8611
ccextractor --service 1 --out=txt 80848c45f8..., Last passed:
Test 8611
ccextractor --service 1 --out=txt --no-bom --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --out=txt --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1 --out=srt da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=sami da904de35d..., Last passed:
Test 8611
ccextractor --service 1 --out=ttxt da904de35d..., Last passed:
Test 8943
ccextractor --service 1[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1[EUC-KR] --no-rollup b5d6aad89f..., Last passed:
Test 8611
ccextractor --service all da904de35d..., Last passed:
Test 8611
ccextractor --service all[EUC-KR] b5d6aad89f..., Last passed:
Test 8611
ccextractor --service 1,2[UTF-8],3[EUC-KR],54 --out=txt da904de35d..., Last passed:
Test 8611
ccextractor --autoprogram --out=srt --latin1 d41b53b504..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 79a51f3500..., Last passed:
Test 8611
ccextractor --stdout --quiet --no-fontcolor 767b546f96..., Last passed:
Test 8611
ccextractor --service 1 c83f765c66..., Last passed:
Test 8611
ccextractor --myth c83f765c66..., Last passed:
Test 8611
ccextractor --in=raw fb79021542..., Last passed:
Test 8611
ccextractor --mp4vidtrack 5df914ce77..., Last passed:
Test 8611
ccextractor --xmltv=3 --out=null 96efd279cf..., Last passed:
Test 8611
ccextractor --datapid 2310 --autoprogram --out=srt --latin1 e639e54550..., Last passed:
Test 8611

Congratulations: Merging this PR would fix the following tests:

ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
ccextractor --out=spupng c83f765c66..., Last passed: Never
ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

[feat] Allow output \0 terminated frames

ff9c160

pszemus force-pushed the null-terminated-frames branch from bdf3aa1 to ff9c160 Compare February 11, 2026 15:42

Fix rust FromCType

6d7c192

cfsmp3 requested changes Feb 15, 2026

View reviewed changes

use encoded_end_frame for text-based captions

7d23b42

pszemus and others added 3 commits February 16, 2026 14:41

Merge branch 'CCExtractor:master' into null-terminated-frames

9aa0cb4

add changelog entry

ad1dd83

Merge branch 'master' into null-terminated-frames

fd55cd9

pszemus requested a review from cfsmp3 February 25, 2026 16:08

Merge branch 'master' into null-terminated-frames

ba53c78

cfsmp3 requested changes Feb 28, 2026

View reviewed changes

pszemus added 3 commits March 10, 2026 12:14

fix CEA-708 Rust decoder

a2d086d

Merge branch 'master' into null-terminated-frames

cca8fe1

fix Rust formating

b5312f8

pszemus force-pushed the null-terminated-frames branch from 224d594 to b5312f8 Compare March 10, 2026 11:26

remove unused crlf field - satisfy clippy function argument limit

336bd8a

pszemus requested a review from cfsmp3 March 10, 2026 11:41

silence clippy function argument limit in Writer

cb48bee

cfsmp3 requested changes Mar 14, 2026

View reviewed changes

pszemus added 2 commits March 18, 2026 12:29

Fix writing frame end with multiline captions

e6b85da

Merge branch 'master' into null-terminated-frames

4ef4df6

fix formatting errors

4158ede

pszemus requested a review from cfsmp3 March 18, 2026 12:11

cfsmp3 merged commit 03ad9e8 into CCExtractor:master Mar 19, 2026
25 of 28 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105

[FEATURE] Allow output \0 terminated frames (for WebSocket streaming support)#2105
cfsmp3 merged 15 commits intoCCExtractor:masterfrom
pszemus:null-terminated-frames

pszemus commented Feb 10, 2026

Uh oh!

cfsmp3 left a comment

Uh oh!

pszemus commented Feb 16, 2026

Uh oh!

cfsmp3 left a comment

Uh oh!

SuvidhJ commented Mar 2, 2026

Uh oh!

pszemus commented Mar 10, 2026

Uh oh!

pszemus commented Mar 13, 2026

Uh oh!

cfsmp3 left a comment

Uh oh!

pszemus commented Mar 18, 2026

Uh oh!

ccextractor-bot commented Mar 18, 2026

Uh oh!

ccextractor-bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pszemus commented Feb 10, 2026

Uh oh!

cfsmp3 left a comment

Choose a reason for hiding this comment

The problem

How to verify

What to fix

Uh oh!

pszemus commented Feb 16, 2026

Uh oh!

cfsmp3 left a comment

Choose a reason for hiding this comment

Uh oh!

SuvidhJ commented Mar 2, 2026

Uh oh!

pszemus commented Mar 10, 2026

Uh oh!

pszemus commented Mar 13, 2026

Uh oh!

cfsmp3 left a comment

Choose a reason for hiding this comment

How to reproduce

Root cause

Why the bitmap path works

Testing checklist

Uh oh!

pszemus commented Mar 18, 2026

Uh oh!

ccextractor-bot commented Mar 18, 2026

Uh oh!

ccextractor-bot commented Mar 18, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants