[v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines) by nico-martin · Pull Request #1511 · huggingface/transformers.js

nico-martin · 2026-02-03T15:24:25Z

Improved Download Progress Tracking

Problem

Transformers.js couldn't reliably track total download progress because:

File lists weren't known before downloads started
File sizes were inconsistent (compressed vs uncompressed)
No cache awareness before initiating downloads

Solution

New Exported Functions

get_files(): Determines required files before downloading
get_model_files() / get_tokenizer_files() / get_processor_files(): Helper functions to identify files for each component
get_file_metadata(): Fetches file metadata using Range requests without downloading full content
- Returns fromCache boolean to identify cached files
- Ensures consistent uncompressed file sizes
is_cached(): Checks if all files from a model are already in cache

Enhanced Progress Tracking

readResponse() with expectedSize: Falls back to metadata when content-length header is missing
total_progress callback: Provides aggregate progress across all files

Review

One thing I am not super confident is the get_model_files function. I tried to test it with different model architectures, but maybe I missed some that load files that are not in that function. @xenova, could you smoke-test some models and write mie the models that fail?

Easiest way to do that is:

import {
  get_files,
  pipeline,
} from "@huggingface/transformers";

const expectedFiles = await get_files(
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
  }
);
const loadedFiles = new Set();
const pipe = await pipeline(
  "text-generation",
  "onnx-community/gemma-3-270m-it-ONNX",
  {
    dtype: "fp32",
    device: "webgpu",
    progress_callback: (e) => {
      if (e.file) loadedFiles.add(e.file);
    },
  }
);

console.log(
  "SAME FILES:",
  expectedFiles.sort().join(",") === Array.from(loadedFiles).sort().join(",")
);

Closes #1345

HuggingFaceDocBuilderDev · 2026-02-03T15:33:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

xenova

Very exciting PR! 🙌 Just a quick review from scanning the PR briefly.

…tion that does not check for tokenizer files or processor files if the task does not use them

Co-authored-by: Joshua Lochner <admin@xenova.com>

xenova

Solid progress! Thanks 🔥

Co-authored-by: Joshua Lochner <admin@xenova.com>

…s.js into v4-cache-handler

breaks simultaneous loading

xenova

Huge PR! 🔥 Thanks so much @nico-martin.

* added progress_total progress callback status info * added get_file_metadata helper * some clean up * improved get_file_metadata and get_files * added functions to main export * removed dynamic import * restructuring * refactored the pipeline tasks so I can have a get_pipeline_files function that does not check for tokenizer files or processor files if the task does not use them * updated doc * Update packages/transformers/src/utils/core.js Co-authored-by: Joshua Lochner <admin@xenova.com> * added is_pipeline_cached and improved return object * fixes after review * added ModelRegistry to doc * added clear_cache and clear_pipeline_cache * Update packages/transformers/src/utils/cache/clear_cache.js Co-authored-by: Joshua Lochner <admin@xenova.com> * small doc fix * changed delete logic for cache * fixed examples in cache utilitiy files * fixed examples * renamed type * refactoring get_file_metadata * moved src/utils/pipeline-tasks.js to src/pipelines/index.js * fixed doc builder * fixed doc builder * created shared getFetchHeaders function * added case for DecoderOnlyWithoutHead and DecoderOnly * improved console.warn * changed to modelType = MODEL_TYPES.EncoderOnly if not foundInMapping * removed full download from get_file_metadata * Remove duplicate module tag (already set in ModelRegistry.js) * Remove test file * pnpm format * Cleanup * use config from_pretrained logic for ensuring config is of correct type * Reorder file acquisition * Add example JSDoc to file header * Add ModelRegistry tests * FIXME: skip cache clearing tests breaks simultaneous loading * Formatting * Unify model-loader.js, get_model_files.js, and session.js * console.warn to logger.warn * Only resolve dtype once in session.js * Use map + Promise.all * map -> forEach * Use env.fetch instead of global fetch * Add comment to clear_cache for clarity * Add model_file_name support in cache operations * Update cache tests * Fix TOCTOU race condition * Remove dead code * cleanup * Cleanup pipeline import/exports * renamed folder cache to model_registry * changed doc title * fix for unit tests on node 20 * standardize module name --------- Co-authored-by: Joshua Lochner <admin@xenova.com> Co-authored-by: Joshua Lochner <26504141+xenova@users.noreply.github.com>

nico-martin added 5 commits January 29, 2026 22:56

added progress_total progress callback status info

dfebb4a

added get_file_metadata helper

9216326

some clean up

17f9855

improved get_file_metadata and get_files

4fbe9a1

added functions to main export

ba4a4ba

nico-martin requested a review from xenova February 3, 2026 15:24

xenova reviewed Feb 3, 2026

View reviewed changes

nico-martin and others added 8 commits February 3, 2026 17:05

removed dynamic import

6249f31

restructuring

be8d7bb

refactored the pipeline tasks so I can have a get_pipeline_files func…

9f3e224

…tion that does not check for tokenizer files or processor files if the task does not use them

updated doc

7b327a6

Update packages/transformers/src/utils/core.js

32dad76

Co-authored-by: Joshua Lochner <admin@xenova.com>

added is_pipeline_cached and improved return object

46433d6

fixes after review

4eeed39

added ModelRegistry to doc

f19ccfd

nico-martin assigned xenova Feb 13, 2026

xenova changed the base branch from v4 to main February 13, 2026 17:03

added clear_cache and clear_pipeline_cache

3430421

xenova self-requested a review February 18, 2026 17:08

xenova requested changes Feb 19, 2026

View reviewed changes

xenova changed the title ~~V4 cache handler~~ [v4] Improve download progress tracking (model cache registry and define which files will be loaded for pipelines) Feb 19, 2026

nico-martin and others added 8 commits February 19, 2026 16:04

Update packages/transformers/src/utils/cache/clear_cache.js

be5a6b2

Co-authored-by: Joshua Lochner <admin@xenova.com>

small doc fix

563e872

Merge branch 'v4-cache-handler' of github.com:huggingface/transformer…

01d5b23

…s.js into v4-cache-handler

changed delete logic for cache

856d302

fixed examples in cache utilitiy files

4cb293d

fixed examples

83170d1

renamed type

b9d8c33

refactoring get_file_metadata

056843f

xenova added 9 commits February 26, 2026 17:35

pnpm format

5b64758

Cleanup

333d8a7

use config from_pretrained logic for ensuring config is of correct type

842334c

Reorder file acquisition

34d1299

Add example JSDoc to file header

7ef10ad

Add ModelRegistry tests

0248d90

FIXME: skip cache clearing tests

131d1df

breaks simultaneous loading

Formatting

3ba1c30

Unify model-loader.js, get_model_files.js, and session.js

f8679af

nico-martin commented Feb 27, 2026

View reviewed changes

Comment thread packages/transformers/src/models/session.js Outdated

nico-martin and others added 16 commits February 27, 2026 07:56

console.warn to logger.warn

1c34775

Only resolve dtype once in session.js

8bbe229

Use map + Promise.all

d794990

map -> forEach

499ae00

Use env.fetch instead of global fetch

e7cb64d

Add comment to clear_cache for clarity

bbc0fec

Add model_file_name support in cache operations

47ea158

Update cache tests

14400e8

Fix TOCTOU race condition

e641672

Remove dead code

59b2e64

cleanup

a49c0bf

Cleanup pipeline import/exports

36bde74

renamed folder cache to model_registry

9e10bf7

changed doc title

bafc15c

fix for unit tests on node 20

c004db3

standardize module name

3fa74fd

xenova approved these changes Mar 1, 2026

View reviewed changes

xenova merged commit 4811a61 into main Mar 1, 2026
4 checks passed

xenova deleted the v4-cache-handler branch March 1, 2026 00:27

Conversation

nico-martin commented Feb 3, 2026 • edited by xenova Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Improved Download Progress Tracking

Problem

Solution

New Exported Functions

Enhanced Progress Tracking

Review

Uh oh!

HuggingFaceDocBuilderDev commented Feb 3, 2026

Uh oh!

xenova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xenova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xenova left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

nico-martin commented Feb 3, 2026 •

edited by xenova

Loading