Skip to content

facebookresearch/AutoPartGen

AutoPartGen generates compositional 3D objects in an autoregressive manner. It can produce a set of part meshes from an object image, an indexed part mask, an object mesh, or combinations of these inputs.

Important

This repository provides a reimplementation of AutoPartGen based on TripoSG components and released checkpoints, since the original internal model is subject to release constraints. Its results are expected to differ from, and may underperform, the original system reported in the paper.

Pretrained Models

Download the released weights from facebook/autopartgen and place them in checkpoints/:

Component Expected path
Part-generation DiT checkpoints/autopartgen_dit.pth
Shape VAE checkpoints/autopartgen_vae.pth
hf download facebook/autopartgen \
  autopartgen_dit.pth autopartgen_vae.pth \
  --local-dir checkpoints

If the model repository requires authentication, run hf auth login first.

Quick Start

The recommended setup creates the conda environment first, installs PyTorch and torchvision explicitly, then installs the remaining AutoPartGen dependencies. This keeps PyTorch importable before optional CUDA extensions such as diso are built.

conda create -n autopartgen python=3.10 pip -y
conda activate autopartgen
pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cu121
pip install -r requirements.txt
pip install -e .

For CPU-only development, use the same order but install the CPU PyTorch wheels:

pip install torch==2.5.1 torchvision==0.20.1 --index-url https://download.pytorch.org/whl/cpu
pip install -r requirements.txt
pip install -e .

The release runtime works without optional CUDA extensions. The default iso-surface backend is auto: it uses DiffDMC through the optional diso package when installed, and otherwise falls back to scikit-image marching cubes.

python - <<'PY'
import torch
print(torch.__version__, torch.version.cuda)
PY
pip install diso --no-build-isolation

diso imports PyTorch during its build and compiles CUDA kernels, so install it only after torch is importable. It also needs a CUDA toolkit matching your PyTorch build (nvcc, cuda_runtime.h, and the thrust/cccl headers) on the build path. For the pinned cu121 PyTorch build, install CUDA 12.1 toolkit packages first if nvcc is not already available:

conda install -c nvidia/label/cuda-12.1.1 cuda-nvcc cuda-cudart-dev cuda-cccl
pip install diso --no-build-isolation

Pass --isosurface_backend skimage to force the dependency-free fallback, or diso to require DiffDMC.

Image background removal is enabled by default and uses BriaRMBG (RMBG-1.4) via transformers. The first run downloads the model. RMBG-1.4 is released under a non-commercial license. Pass --no_remove_background if the input already has the desired background or alpha mask.

Optional scene post-processing backends can be installed with:

pip install ".[postprocess]"

Inference

Image-conditioned generation:

python inference.py \
  --image examples/image/apple_character/image.png \
  --output_path outputs/apple_character

Mesh-conditioned generation:

python inference.py \
  --mesh examples/mesh/potted_flower/mesh.glb \
  --output_path outputs/potted_flower_mesh

The --mesh input can be any whole-object mesh that trimesh can load, including scanned, reconstructed, or third-party generated meshes.

Image, mesh, and indexed-mask conditioning:

python inference.py \
  --image examples/image_mesh_mask/robot/image.png \
  --mesh examples/image_mesh_mask/robot/mesh.glb \
  --mask examples/image_mesh_mask/robot/mask.png \
  --output_path outputs/robot_image_mesh_mask

The output directory contains one GLB per accepted part (mesh_000.glb, mesh_001.glb, ...) plus the combined part mesh as mesh_combined.glb. For indexed-mask inputs, AutoPartGen saves mask_colored.png and mask_palette.json so mask labels can be matched to the exported part colors.

Python API

from autopartgen import (
    GenerationOptions,
    generate_from_image,
    generate_from_image_and_mask,
    generate_from_mesh,
    load_pipeline,
)

pipeline = load_pipeline()  # uses the packaged autopartgen/configs/default.yaml
options = GenerationOptions(grid_size=256, seed=0, postprocess=True)

image_parts = generate_from_image(
    pipeline,
    "examples/image/apple_character/image.png",
    output_dir="outputs/apple_character",
    options=options,
)

mesh_parts = generate_from_mesh(
    pipeline,
    "examples/mesh/potted_flower/mesh.glb",
    output_dir="outputs/potted_flower_mesh",
    options=options,
)

masked_parts = generate_from_image_and_mask(
    pipeline,
    "examples/image_mesh_mask/robot/image.png",
    "examples/image_mesh_mask/robot/mask.png",
    mesh="examples/image_mesh_mask/robot/mesh.glb",
    output_dir="outputs/robot_image_mesh_mask",
    options=options,
)

The package also provides generate_from_image_and_mesh for image-and-mesh conditioning without an indexed mask.

Runtime and Post-processing

Release inference loads autopartgen/configs/default.yaml by default through load_pipeline(). Runtime options can be passed through CLI flags in inference.py or through GenerationOptions in Python.

Guidance is mode-specific. The current release defaults are:

Mode Image CFG Geometry CFG
whole image-to-mesh stage whole_cfg_scale=7.0 n/a
image 0.0 2.0
mesh 0.0 2.0
image_mesh 0.0 2.0
image_mask 5.0 5.0
image_mesh_mask 5.0 5.0

--image_cfg_scale, --geometry_cfg_scale, --mask_image_cfg_scale, --mask_geometry_cfg_scale, and --whole_cfg_scale only override the config for that run; autopartgen/configs/default.yaml is the default values.

Note

  • Higher geometry guidance usually encourages more segments and accepted parts, and can sometimes make parts sharper.
  • AutoPartGen is a generative model, so try a different --seed if one sample does not work well.
  • To follow the input image more closely, increase the image CFG scale. To follow the input mesh more closely, increase the geometry guidance scale.

--grid_size sets the final iso-surface resolution. It must be a power of two; the default is 256. Use 512 for higher-resolution output.

--no_post disables per-part floater removal, simplification, final scene cleanup, and final smoothing.

--smooth_iters N enables a final Taubin smoothing pass only when N > 0. The default --smooth_iters 0 keeps all other post-processing steps enabled without running Taubin smoothing.

--simplify_faces N applies per-part quadric simplification. The default is 50000; set --simplify_faces 0 to disable simplification. AutoPartGen tries fast-simplification, then pymeshlab, then Open3D. If no simplification backend is available, the original mesh is kept and a warning is emitted. The fast-simplification aggression can be overridden with APG_SIMPLIFY_AGG (default 1.0).

--isosurface_backend {auto,diso,skimage} selects the iso-surface extraction backend. The default auto uses diso (DiffDMC) when installed and otherwise falls back to skimage. Use diso to require DiffDMC, which usually gives cleaner watertight parts but needs the optional CUDA extension. Use skimage to force the dependency-free marching-cubes fallback.

--iou_threshold defaults to 0.3. Floater removal and simplification run only after a candidate passes this duplicate check. The duplicate check uses --iou_grid_size 256 by default, and the sampled-surface fallback uses 500k points.

--seed N controls diffusion sampling, mesh surface resampling, FPS start points, IoU fallback sampling, and optional posterior/history noise. Re-running the same command with the same seed, inputs, checkpoints, backend, and package versions should produce the same outputs.

--max_parts N is a hard cap for image- or mesh-conditioned generation without masks. The model can stop earlier when it predicts the end token. When masks are provided, the number of part attempts follows the number of mask regions, so progress is shown as x/y only in that mode.

--no_remove_background disables the default image background removal path. Existing valid alpha masks are preserved even when background removal is enabled.

--no_progress disables stage logs and diffusion progress bars.

--use_coarse_bbox enables the coarse ROI bbox crop during mesh extraction.

--no_mask_visualization disables mask_colored.png and mask_palette.json for indexed-mask inputs.

The first image-conditioned run downloads DINOv2-L/14 with registers through torch.hub; subsequent runs use the local PyTorch cache.

Examples

The examples/ directory is grouped by inference scenario:

image/<name>/       Image-only examples
mesh/<name>/        Mesh-only examples
image_mesh_mask/<name>/  Image, mesh, and mask examples

Each example directory contains only the input files for that scenario.

License

The code and the released checkpoints (autopartgen_dit.pth, autopartgen_vae.pth) are released under the FAIR Noncommercial Research License (see LICENSE) — noncommercial research use only.

Acknowledgements

AutoPartGen builds upon several open-source projects and publicly released models. We sincerely thank their authors for their valuable contributions:

Project Use
TripoSG VAE, DiT backbone, and image-to-3D pipeline structure
HunyuanDiT Transformer blocks used by the TripoSG implementation
DINOv2 Image features
Diffusers Model and scheduler utilities
Transformers Background-removal model loading
RMBG-1.4 Background removal
DiffDMC / diso Dual marching-cubes surface extraction
TRELLIS Mesh post-processing utilities

Citation

@article{chen2025autopartgen,
  title={AutoPartGen: Autoregressive 3D Part Generation and Discovery},
  author={Minghao Chen and Jianyuan Wang and Roman Shapovalov and Tom Monnier
          and Hyunyoung Jung and Dilin Wang and Rakesh Ranjan and Iro Laina
          and Andrea Vedaldi},
  journal={arXiv preprint arXiv:2507.13346},
  year={2025}
}

About

[NeurIPS 2025] AutoPartGen: Autoregressive 3D Part Generation and Discovery. This is a re-implementation of original model.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages