online-dynamic-batching

Online Dynamic Batching

Faster LLM/VLM training with one DataLoader line

DataLoader(...) → odb.ODBDataLoader(...)

Paper

ODB is introduced in Online Dynamic Batching with Formal Guarantees for LLM Training by Dian Li, Zekun Wang, Yaoru Wang, and Jiahong Yan.

Links: arXiv · PDF · BibTeX

The Project

Online Dynamic Batching (ODB) forms token-budgeted batches online, at the DataLoader/collate boundary. Short examples get larger batches, long examples get smaller batches, and the model, optimizer, attention kernel, and dataset format stay in place.

dataloader = odb.ODBDataLoader(..., token_budget=16384)  # replaces DataLoader(...)

Most training stacks decide batch shape before the final input length is known. ODB moves that decision to the point where the length is already observable.

Start here	What you get
`online-dynamic-batching`	PyTorch package, trainer adapters, docs, tests, examples, and synthetic benchmarks
Quickstart	One-line DataLoader replacement and first PyTorch loop
Choose one integration path	Pick the adapter that matches your stack: PyTorch loops, HuggingFace Trainer, LLaMA-Factory/LLaVA-Factory, Accelerate, or Lightning
Benchmark notes	Reporting policy, public synthetic benchmark, and representative results

Choose One Integration Path

Pick the row that matches your training stack. These are alternatives, not a checklist.

Framework	ODB entry point
PyTorch loops	`ODBDataLoader(...)` or `odb.apply(dataloader, ...)`
HuggingFace Trainer	`odb.integrations.hf.configure_trainer(...)` and `ODBTrainerMixin`
LLaMA-Factory / LLaVA-Factory	`odb.integrations.llamafactory.configure_trainer(...)`
Accelerate	`odb.integrations.accelerate.configure_accelerator(...)`
Lightning	`odb.integrations.lightning.ODBLightningCallback`

MM-Mix Example Projects

The MM-Mix examples are split by framework so each repository shows one clean integration path, all based on the online-dynamic-batching pip package.

Example	Focus
`odb-mm-mix-example`	Shared public data recipe and local TMDB utilities
`odb-example-llamafactory`	LLaMA-Factory integration example
`odb-example-hf-trainer`	Hugging Face `Trainer` native adapter example
`odb-mm-mix-accelerate`	Accelerate custom-loop example
`odb-mm-mix-lightning`	PyTorch Lightning adapter example

Install

pip install online-dynamic-batching

For HuggingFace Trainer and LLaMA-Factory-style adapters:

pip install "online-dynamic-batching[hf]"

Minimal Use

import odb

dataloader = odb.ODBDataLoader(
    dataset,
    token_budget=16384,
    batch_size=1,
    num_workers=4,
    prefetch_factor=64,
    collate_fn=collate_fn,
    loss_scaling="exact",
)

for batch in dataloader:
    info = odb.pop_step_info(batch, loss_scaling="exact")
    loss = model(**batch).loss * info.loss_scale
    loss.backward()

Design Values

Observe real lengths at training time instead of trusting stale length caches.
Keep batching at the DataLoader boundary, away from model and kernel code.
Make distributed variable work explicit with aligned grouping and step info.
Treat loss scaling, emitted-sample accounting, and trainer stopping semantics as first-class integration contracts.
Keep public validation reproducible before making production-scale claims.

Ecosystem Shape

The main repository carries the package, documentation, examples, benchmark notes, and agent-assisted integration skill. Supporting repositories will be split out only when they have independent value: a docs site, a public benchmark harness, public-data paper artifacts, or community-maintained recipes.

Apache-2.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

online-dynamic-batching

Online Dynamic Batching

Faster LLM/VLM training with one DataLoader line

Paper

The Project

Choose One Integration Path

MM-Mix Example Projects

Install

Minimal Use

Design Values

Ecosystem Shape

Popular repositories Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!