Skip to content

add PDF stream writer #4968

@milahu

Description

@milahu

currently pymupdf does not support writing PDF streams

currently i have to buffer the whole output document in RAM
and then i can write the complete document to disk

# pseudo code

import pymupdf

input_doc = pymupdf.open("input.pdf")
output_doc = pymupdf.open()  # empty PDF

# buffer to RAM
for page_idx in range(100):
    out_page = output_doc.new_page(width=100, height=200)
    colorspace = pymupdf.csGRAY
    page_image = get_page_image(page_idx) # np.array
    rect = input_doc[page_idx].rect
    h, w = page_image.shape[:2]
    pix = pymupdf.Pixmap(colorspace, w, h, page_image.tobytes(), False)
    out_page.insert_image(rect, pixmap=pix)

# write to disk
output_doc.save("output.pdf")
output_doc.close()

problem:
this fails if the output document is bigger than RAM

solution:
stream-write the output document to disk

i would need to pass the output file path to pymupdf.open
(which currently fails if the output file does not exist)
and maybe explicitly enable output-streaming

output_doc = pymupdf.open("output.pdf", write_stream=True)

so then i can remove the output_doc.save call
and the output file is finalized in output_doc.close()

proof of concept:
i have implemented a minimal streaming PDF writer in
rahimnathwani/binarize-pdf#2
as expected, this uses significantly less memory

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions