Visual Geometry Group, University of Oxford; Meta AI
Minghao Chen, Jianyuan Wang, Roman Shapovalov, Tom Monnier, Hyunyoung Jung, Dilin Wang, Rakesh Ranjan, Iro Laina, Andrea Vedaldi
AutoPartGen generates compositional 3D objects in an autoregressive manner. It can produce a set of part meshes from an object image, an indexed part mask, an object mesh, or combinations of these inputs.
Important
This repository provides a reimplementation of AutoPartGen based on TripoSG components and released checkpoints, since the original internal model is subject to release constraints. Its results are expected to differ from, and may underperform, the original system reported in the paper.
Download the released weights from
facebook/autopartgen and place
them in checkpoints/:
| Component | Expected path |
|---|---|
| Part-generation DiT | checkpoints/autopartgen_dit.pth |
| Shape VAE | checkpoints/autopartgen_vae.pth |
hf download facebook/autopartgen \
autopartgen_dit.pth autopartgen_vae.pth \
--local-dir checkpointsIf the model repository requires authentication, run hf auth login first.
The recommended setup creates the conda environment first, installs PyTorch and
torchvision explicitly, then installs the remaining AutoPartGen dependencies.
This keeps PyTorch importable before optional CUDA extensions such as diso are
built.
conda create -n autopartgen python=3.10 pip -y
conda activate autopartgen
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -e .For CPU-only development, use the same order but install the CPU PyTorch wheels:
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
pip install -e .The release runtime works without optional CUDA extensions. The default
iso-surface backend is auto: it uses DiffDMC through the optional diso
package when installed, and otherwise falls back to scikit-image marching
cubes.
python - <<'PY'
import torch
print(torch.__version__, torch.version.cuda)
PY
pip install diso --no-build-isolationdiso imports PyTorch during its build and compiles CUDA kernels, so install it
only after torch is importable. It also needs a CUDA toolkit matching your
PyTorch build (nvcc, cuda_runtime.h, and the thrust/cccl headers) on the
build path. For the pinned cu121 PyTorch build, install CUDA 12.1 toolkit
packages first if nvcc is not already available:
conda install -c nvidia/label/cuda-12.1.1 cuda-nvcc cuda-cudart-dev cuda-cccl
pip install diso --no-build-isolationPass --isosurface_backend skimage to force the dependency-free fallback, or
diso to require DiffDMC.
Image background removal is enabled by default and uses BriaRMBG (RMBG-1.4) via
transformers. The first run downloads the model. RMBG-1.4 is released under a
non-commercial license. Pass --no_remove_background if the input already has
the desired background or alpha mask.
Optional scene post-processing backends can be installed with:
pip install ".[postprocess]"Image-conditioned generation:
python inference.py \
--image examples/image/apple_character/image.png \
--output_path outputs/apple_characterMesh-conditioned generation:
python inference.py \
--mesh examples/mesh/potted_flower/mesh.glb \
--output_path outputs/potted_flower_meshThe --mesh input can be any whole-object mesh that trimesh can load,
including scanned, reconstructed, or third-party generated meshes.
Image, mesh, and indexed-mask conditioning:
python inference.py \
--image examples/image_mesh_mask/robot/image.png \
--mesh examples/image_mesh_mask/robot/mesh.glb \
--mask examples/image_mesh_mask/robot/mask.png \
--output_path outputs/robot_image_mesh_maskThe output directory contains one GLB per accepted part (mesh_000.glb,
mesh_001.glb, ...) plus the combined part mesh as mesh_combined.glb. For indexed-mask inputs,
AutoPartGen saves mask_colored.png and mask_palette.json so mask labels can
be matched to the exported part colors.
from autopartgen import (
GenerationOptions,
generate_from_image,
generate_from_image_and_mask,
generate_from_mesh,
load_pipeline,
)
pipeline = load_pipeline() # uses the packaged autopartgen/configs/default.yaml
options = GenerationOptions(grid_size=256, seed=0, postprocess=True)
image_parts = generate_from_image(
pipeline,
"examples/image/apple_character/image.png",
output_dir="outputs/apple_character",
options=options,
)
mesh_parts = generate_from_mesh(
pipeline,
"examples/mesh/potted_flower/mesh.glb",
output_dir="outputs/potted_flower_mesh",
options=options,
)
masked_parts = generate_from_image_and_mask(
pipeline,
"examples/image_mesh_mask/robot/image.png",
"examples/image_mesh_mask/robot/mask.png",
mesh="examples/image_mesh_mask/robot/mesh.glb",
output_dir="outputs/robot_image_mesh_mask",
options=options,
)The package also provides generate_from_image_and_mesh for image-and-mesh
conditioning without an indexed mask.
Release inference loads autopartgen/configs/default.yaml by default through
load_pipeline(). Runtime options can be passed through CLI flags in
inference.py or through GenerationOptions in Python.
Guidance is mode-specific. The current release defaults are:
| Mode | Image CFG | Geometry CFG |
|---|---|---|
| whole image-to-mesh stage | whole_cfg_scale=7.0 |
n/a |
image |
0.0 |
2.0 |
mesh |
0.0 |
2.0 |
image_mesh |
0.0 |
2.0 |
image_mask |
5.0 |
5.0 |
image_mesh_mask |
5.0 |
5.0 |
--image_cfg_scale, --geometry_cfg_scale, --mask_image_cfg_scale,
--mask_geometry_cfg_scale, and --whole_cfg_scale only override the config for
that run; autopartgen/configs/default.yaml is the default values.
Note
- Higher geometry guidance usually encourages more segments and accepted parts, and can sometimes make parts sharper.
- AutoPartGen is a generative model, so try a different
--seedif one sample does not work well. - To follow the input image more closely, increase the image CFG scale. To follow the input mesh more closely, increase the geometry guidance scale.
--grid_size sets the final iso-surface resolution. It must be a power of two;
the default is 256. Use 512 for higher-resolution output.
--no_post disables per-part floater removal, simplification, final scene
cleanup, and final smoothing.
--smooth_iters N enables a final Taubin smoothing pass only when N > 0.
The default --smooth_iters 0 keeps all other post-processing steps enabled
without running Taubin smoothing.
--simplify_faces N applies per-part quadric simplification. The default is
50000; set --simplify_faces 0 to disable simplification. AutoPartGen tries
fast-simplification, then pymeshlab, then Open3D. If no simplification backend
is available, the original mesh is kept and a warning is emitted. The
fast-simplification aggression can be overridden with APG_SIMPLIFY_AGG
(default 1.0).
--isosurface_backend {auto,diso,skimage} selects the iso-surface extraction
backend. The default auto uses diso (DiffDMC) when installed and otherwise
falls back to skimage. Use diso to require DiffDMC, which usually gives
cleaner watertight parts but needs the optional CUDA extension. Use skimage to
force the dependency-free marching-cubes fallback.
--iou_threshold defaults to 0.3. Floater removal and simplification run only
after a candidate passes this duplicate check. The duplicate check uses
--iou_grid_size 256 by default, and the sampled-surface fallback uses 500k
points.
--seed N controls diffusion sampling, mesh surface resampling, FPS start
points, IoU fallback sampling, and optional posterior/history noise. Re-running
the same command with the same seed, inputs, checkpoints, backend, and package
versions should produce the same outputs.
--max_parts N is a hard cap for image- or mesh-conditioned generation without
masks. The model can stop earlier when it predicts the end token. When masks are
provided, the number of part attempts follows the number of mask regions, so
progress is shown as x/y only in that mode.
--no_remove_background disables the default image background removal path.
Existing valid alpha masks are preserved even when background removal is enabled.
--no_progress disables stage logs and diffusion progress bars.
--use_coarse_bbox enables the coarse ROI bbox crop during mesh extraction.
--no_mask_visualization disables mask_colored.png and mask_palette.json
for indexed-mask inputs.
The first image-conditioned run downloads DINOv2-L/14 with registers through
torch.hub; subsequent runs use the local PyTorch cache.
The examples/ directory is grouped by inference scenario:
image/<name>/ Image-only examples
mesh/<name>/ Mesh-only examples
image_mesh_mask/<name>/ Image, mesh, and mask examples
Each example directory contains only the input files for that scenario.
The code and the released checkpoints (autopartgen_dit.pth,
autopartgen_vae.pth) are released under the FAIR Noncommercial Research
License (see LICENSE) — noncommercial research use only.
AutoPartGen builds upon several open-source projects and publicly released models. We sincerely thank their authors for their valuable contributions:
| Project | Use |
|---|---|
| TripoSG | VAE, DiT backbone, and image-to-3D pipeline structure |
| HunyuanDiT | Transformer blocks used by the TripoSG implementation |
| DINOv2 | Image features |
| Diffusers | Model and scheduler utilities |
| Transformers | Background-removal model loading |
| RMBG-1.4 | Background removal |
| DiffDMC / diso | Dual marching-cubes surface extraction |
| TRELLIS | Mesh post-processing utilities |
@article{chen2025autopartgen,
title={AutoPartGen: Autoregressive 3D Part Generation and Discovery},
author={Minghao Chen and Jianyuan Wang and Roman Shapovalov and Tom Monnier
and Hyunyoung Jung and Dilin Wang and Rakesh Ranjan and Iro Laina
and Andrea Vedaldi},
journal={arXiv preprint arXiv:2507.13346},
year={2025}
}