Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 50 additions & 10 deletions pages/modules/backpressuring-in-streams.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,16 +67,60 @@ A good example of why the backpressure mechanism implemented through streams is
a great optimization can be demonstrated by comparing the internal system tools
from Node.js' [`Stream`][] implementation.

In one scenario, we will take a large file (approximately ~9 GB) and compress it
using the familiar [`zip(1)`][] tool.
In one scenario, we will read a large file (approximately ~9 GB) using `fs.readFileSync`: and compress it
using the module [`zlib`][], that wraps around another compression tool, [`gzip(1)`][].

```cjs
const fs = require('node:fs');
const zlib = require('node:zlib');

const data = fs.readFileSync('The.Matrix.1080p.mkv');
const compressed = zlib.gzipSync(data);
fs.writeFileSync('The.Matrix.1080p.mkv.gz', compressed);
```

```mjs
import { readFileSync, writeFileSync } from 'node:fs';
import { gzipSync } from 'node:zlib';

const data = readFileSync('The.Matrix.1080p.mkv');
const compressed = gzipSync(data);
writeFileSync('The.Matrix.1080p.mkv.gz', compressed);
```

This fails on two separate limits, whichever you hit first: the buffer size cap or heap exhaustion. Lets rewite it another way, using Node.js' [`Stream`][] but without backpressure:

```cjs
const { createReadStream, createWriteStream } = require('node:fs');
const gzip = require('node:zlib').createGzip();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const gzip = require('node:zlib').createGzip();
const { createGzip } = require('node:zlib');
const gzip = createGzip();


const inp = createReadStream('The.Matrix.1080p.mkv');
const out = createWriteStream('The.Matrix.1080p.mkv.gz');

inp.on('data', (chunk) => gzip.write(chunk));
inp.on('end', () => gzip.end());
gzip.on('data', (chunk) => out.write(chunk));
gzip.on('end', () => out.end());
```
zip The.Matrix.1080p.mkv

```mjs
import { createReadStream, createWriteStream } from 'node:fs';
import { createGzip } from 'node:zlib';

const gzip = createGzip();

const inp = createReadStream('The.Matrix.1080p.mkv');
const out = createWriteStream('The.Matrix.1080p.mkv.gz');

inp.on('data', (chunk) => gzip.write(chunk));
inp.on('end', () => gzip.end());
gzip.on('data', (chunk) => out.write(chunk));
gzip.on('end', () => out.end());
```

While that will take a few minutes to complete, in another shell we may run
a script that takes Node.js' module [`zlib`][], that wraps around another
compression tool, [`gzip(1)`][].
Neither of two writes respects backpressure. Stage 1 keeps pushing into gzip even after gzip.write() returns false, and stage 2 keeps pushing into out even after out.write() returns false. Both internal buffers can grow without bound, so this is very prone to running out of memory. On a large compressible file both numbers climb fast and it heads for `JavaScript heap out of memory`.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Neither of two writes respects backpressure. Stage 1 keeps pushing into gzip even after gzip.write() returns false, and stage 2 keeps pushing into out even after out.write() returns false. Both internal buffers can grow without bound, so this is very prone to running out of memory. On a large compressible file both numbers climb fast and it heads for `JavaScript heap out of memory`.
Neither of two writes respects backpressure. Stage 1 keeps pushing into gzip even after `gzip.write()` returns false, and stage 2 keeps pushing into out even after `out.write()` returns false. Both internal buffers can grow without bound, so this is very prone to running out of memory. On a large compressible file both numbers climb fast and it heads for `JavaScript heap out of memory`.


To resolve this, we may use pipe, which pauses the read when write() returns false. When `gzip.write()` returns `false`, `pipe` calls `pause()` on the read stream, halting disk reads. Once gzip works through its backlog and the buffer empties, it emits a `'drain'` event, and `pipe` calls `resume()` to start reading again.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To resolve this, we may use pipe, which pauses the read when write() returns false. When `gzip.write()` returns `false`, `pipe` calls `pause()` on the read stream, halting disk reads. Once gzip works through its backlog and the buffer empties, it emits a `'drain'` event, and `pipe` calls `resume()` to start reading again.
To resolve this, we may use pipe, which pauses the read when `write()` returns false. When `gzip.write()` returns `false`, `pipe` calls `pause()` on the read stream, halting disk reads. Once gzip works through its backlog and the buffer empties, it emits a `'drain'` event, and `pipe` calls `resume()` to start reading again.


```cjs
const fs = require('node:fs');
Expand All @@ -100,10 +144,6 @@ const out = createWriteStream('The.Matrix.1080p.mkv.gz');
inp.pipe(gzip).pipe(out);
```

To test the results, try opening each compressed file. The file compressed by
the [`zip(1)`][] tool will notify you the file is corrupt, whereas the
compression finished by [`Stream`][] will decompress without error.

> In this example, we use `.pipe()` to get the data source from one end
> to the other. However, notice there are no proper error handlers attached. If
> a chunk of data were to fail to be properly received, the `Readable` source or
Expand Down
Loading