Conditional Flow Matching vs. Diffusion Models: A Systematic Comparison on CIFAR-10
Implementation of Optimal-Transport Conditional Flow Matching (OT-CFM) from scratch, benchmarked against DDPM and DDIM on CIFAR-10, with a focus on the trade-off between sampling speed (NFE) and generation quality (FID).
Left: OT-CFM (50 NFE, midpoint) — Right: DDPM (1000 steps)
Flow matching on toy 2D data: noise particles transported to target distributions via learned velocity fields
Left: Latent space interpolation (slerp) — Right: Independent vs OT coupling comparison
Flow Matching is a generative modeling framework that learns a continuous velocity field v_theta(x, t) to transport samples from noise (t=0) to data (t=1) along straight-line paths. Compared to diffusion models, which follow curved stochastic trajectories, the optimal-transport formulation in CFM defines direct interpolations:
x_t = (1 - t) * x_0 + t * x_1, velocity target: u = x_1 - x_0
where x_0 ~ N(0, I) is Gaussian noise and x_1 is a data sample. At inference, images are generated by solving the ODE dx/dt = v_theta(x, t) from t=0 to t=1 using standard numerical solvers (Euler, Midpoint, RK4). The straight-path formulation requires significantly fewer integration steps than diffusion models for comparable quality.
All models were trained on CIFAR-10 (32x32) and evaluated on 10,000 generated images. FID and IS are computed against the training set.
| Model | Best FID | NFE | Solver | ms/img |
|---|---|---|---|---|
| DDPM | 13.06 | 1000 | ddpm | 205.0 |
| DDIM | 17.96 | 50 | ddim | 10.5 |
| Unconditional OT-CFM | 10.65 | 100 | rk4 | 20.7 |
| Class-Conditional CFM | 12.19 | 100 | rk4 | 24.3 |
| Method | Solver | NFE | FID | IS (mean) |
|---|---|---|---|---|
| DDPM | ddpm | 1000 | 13.06 | 8.18 |
| DDIM | ddim | 10 | 22.18 | 7.77 |
| DDIM | ddim | 50 | 17.96 | 7.84 |
| DDIM | ddim | 100 | 18.43 | 7.88 |
| Unconditional OT-CFM | midpoint | 10 | 18.25 | 8.07 |
| Unconditional OT-CFM | midpoint | 50 | 12.02 | 8.44 |
| Unconditional OT-CFM | rk4 | 100 | 10.65 | 8.58 |
| Class-Conditional CFM | euler | 100 | 13.19 | 8.27 |
| Class-Conditional CFM | midpoint | 100 | 12.64 | 8.36 |
| Class-Conditional CFM | rk4 | 100 | 12.19 | 8.46 |
Key findings:
- OT-CFM matches DDPM's 1000-step quality (FID=13.06) in just ~50 steps — a 20x speedup
- DDIM saturates around NFE=50 and cannot match CFM quality at any step count
- Midpoint solver offers the best quality-per-NFE ratio; RK4 wins at high NFE (100+)
See RESULTS.md for full analysis.
Watch noise evolve into images in real-time across all three methods:
| DDPM (1000 steps) | DDIM (50 steps) | OT-CFM (50 NFE) |
|---|---|---|
![]() |
![]() |
![]() |
Spherical linear interpolation (slerp) between noise vectors — smooth transitions indicate a well-structured latent space:
| OT-CFM | DDIM |
|---|---|
![]() |
![]() |
Learned velocity fields transporting Gaussian noise into target distributions (moons, circles, spirals):
| Real vs Generated | Animated Trajectories |
|---|---|
![]() |
![]() |
Comparing random pairing (independent coupling) vs optimal-transport pairing (minibatch OT). OT coupling produces straighter marginal trajectories with less path crossing:
| Static Snapshots | Animated |
|---|---|
![]() |
![]() |
Per-class samples from the class-conditional CFM model, and the effect of classifier-free guidance (CFG):
| No guidance (cfg=1.0) | With guidance (cfg=2.0) |
|---|---|
![]() |
![]() |
Since CFM's ODE is deterministic, we can invert it (solve backwards from t=1 to t=0) to recover the noise vector for any real image, then edit by manipulating it.
Reconstruction — encode real images to noise, decode back. Near-perfect reconstruction confirms the ODE is accurately invertible:
Latent perturbations — adding noise to the recovered latent at increasing scales. Small σ preserves identity; large σ destroys it:
Attribute editing — compute class-mean latent directions and apply them to edit images (e.g., horse→car, cat→dog):
Inversion trajectory — real images dissolving into Gaussian noise along the backward ODE path:
Real image interpolation — encode two real images, interpolate their latent vectors, decode smooth transitions:
| Model | Architecture | Conditioning | Epochs | Solver(s) |
|---|---|---|---|---|
| DDPM/DDIM | UNet (128 base, [1,2,2,2], 4 heads, GN32) | None | 500 | DDPM (1000 steps), DDIM (variable) |
| Unconditional OT-CFM | Same UNet as DDPM | None | 500 | Midpoint, RK4 (torchdiffeq) |
| Class-Conditional CFM | SimpleFlowUNet (128 base, [1,2,4], GN8, class emb) | 10 classes + CFG | 210 | Euler, Midpoint, RK4 |
All models use EMA (Exponential Moving Average) weights for evaluation.
.
├── notebooks/
│ ├── ddpm/
│ │ ├── 01_ddpm.ipynb # DDPM/DDIM training
│ │ └── 02_evaluate.ipynb # DDPM/DDIM evaluation (FID, IS, runtime)
│ ├── cfm/
│ │ ├── 01_cfm.ipynb # Unconditional OT-CFM training
│ │ ├── 02_evaluate.ipynb # OT-CFM evaluation
│ │ ├── 03_evaluate_cond.ipynb # Class-conditional CFM evaluation
│ │ └── 04_ode_inversion.ipynb # ODE inversion & image editing
│ ├── 04_visualizations.ipynb # All visualizations (trajectories, interpolations, etc.)
│ └── colab_setup.ipynb # Colab environment setup
├── results/
│ ├── ddpm/ # DDPM/DDIM metrics, samples, checkpoints
│ ├── cfm/ # OT-CFM metrics, samples, checkpoints
│ ├── cfm_elora/ # Class-conditional CFM metrics
│ └── ode inversion/ # ODE inversion & editing results
├── visualizations/ # All generated figures and GIFs
├── report/ # NeurIPS report & Beamer presentation
├── docs/ # Reference papers & project proposals
├── RESULTS.md # Detailed results analysis
└── ARCHITECTURE.md # Evaluation protocol documentation
- Trains a velocity field
v_theta(x_t, t)via MSE regression on the target velocity(x_1 - x_0) - Generates samples by integrating the learned ODE from t=0 to t=1
- Supports Euler, Midpoint, and RK4 solvers with variable step counts
- Denoising Diffusion Probabilistic Model with 1000-step Markov chain
- Standard training with epsilon-prediction objective and cosine noise schedule
- Deterministic sampling from the pretrained DDPM model
- Allows skipping steps (10, 20, 50, 100) without retraining
The unconditional OT-CFM and DDPM/DDIM share the same UNet backbone for fair comparison. The class-conditional CFM uses a slightly different architecture with class embeddings and classifier-free guidance.
All training and evaluation runs on Google Colab with GPU (T4/A100).
- Open
notebooks/colab_setup.ipynbto configure GitHub auth and Drive checkpointing - Train models using the respective notebooks in
notebooks/ddpm/andnotebooks/cfm/ - Evaluate with the evaluation notebooks
- Generate all visualizations with
notebooks/04_visualizations.ipynb
- Python >= 3.9
- PyTorch >= 2.0
- torchvision
- torchdiffeq
- scipy
- matplotlib
- torchmetrics (for FID/IS computation)
- Lipman, Y., Chen, R.T.Q., Ben-Hamu, H., Nickel, M. Flow Matching for Generative Modeling. ICLR, 2023.
- Tong, A., Malkin, N., Huguet, G., Zhang, Y., et al. Improving and Generalizing Flow-Based Generative Models with Minibatch Optimal Transport. TMLR, 2024.
- Ho, J., Jain, A., Abbeel, P. Denoising Diffusion Probabilistic Models. NeurIPS, 2020.
- Song, J., Meng, C., Ermon, S. Denoising Diffusion Implicit Models. ICLR, 2021.
- Liu, X., Gong, C., Liu, Q. Flow Straight and Fast: Learning to Generate and Transfer Data with Rectified Flow. ICLR, 2023.


















