hub / github.com/microsoft/TRELLIS.2

github.com/microsoft/TRELLIS.2 @main sqlite

868 symbols 2,857 edges 131 files 304 documented · 35%

README

Native and Compact Structured Latents for 3D Generation

https://github.com/user-attachments/assets/63b43a7e-acc7-4c81-a900-6da450527d8f

(Compressed version due to GitHub size limits. See the full-quality video on our project page!)

TRELLIS.2 is a state-of-the-art large 3D generative model (4B parameters) designed for high-fidelity image-to-3D generation. It leverages a novel "field-free" sparse voxel structure termed O-Voxel to reconstruct and generate arbitrary 3D assets with complex topologies, sharp features, and full PBR materials.

✨ Features

1. High Quality, Resolution & Efficiency

Our 4B-parameter model generates high-resolution fully textured assets with exceptional fidelity and efficiency using vanilla DiTs. It utilizes a Sparse 3D VAE with 16× spatial downsampling to encode assets into a compact latent space.

Resolution	Total Time*	Breakdown (Shape + Mat)
512³	~3s	2s + 1s
1024³	~17s	10s + 7s
1536³	~60s	35s + 25s

*Tested on NVIDIA H100 GPU.

2. Arbitrary Topology Handling

The O-Voxel representation breaks the limits of iso-surface fields. It robustly handles complex structures without lossy conversion: * ✅ Open Surfaces (e.g., clothing, leaves) * ✅ Non-manifold Geometry * ✅ Internal Enclosed Structures

3. Rich Texture Modeling

Beyond basic colors, TRELLIS.2 models arbitrary surface attributes including Base Color, Roughness, Metallic, and Opacity, enabling photorealistic rendering and transparency support.

4. Minimalist Processing

Data processing is streamlined for instant conversions that are fully rendering-free and optimization-free. * < 10s (Single CPU): Textured Mesh → O-Voxel * < 100ms (CUDA): O-Voxel → Textured Mesh

🗺️ Roadmap

[x] Paper release
[x] Release image-to-3D inference code
[x] Release pretrained checkpoints (4B)
[x] Hugging Face Spaces demo
[x] Release shape-conditioned texture generation inference code
[x] Release training code

🛠️ Installation

Prerequisites

System: The code is currently tested only on Linux.
Hardware: An NVIDIA GPU with at least 24GB of memory is necessary. The code has been verified on NVIDIA A100 and H100 GPUs.
Software:
The CUDA Toolkit is needed to compile certain packages. Recommended version is 12.4.
Conda is recommended for managing dependencies.
Python version 3.8 or higher is required.

Installation Steps

Clone the repo: sh git clone -b main https://github.com/microsoft/TRELLIS.2.git --recursive cd TRELLIS.2
Install the dependencies:

Before running the following command there are somethings to note: - By adding --new-env, a new conda environment named trellis2 will be created. If you want to use an existing conda environment, please remove this flag. - By default the trellis2 environment will use pytorch 2.6.0 with CUDA 12.4. If you want to use a different version of CUDA, you can remove the --new-env flag and manually install the required dependencies. Refer to PyTorch for the installation command. - If you have multiple CUDA Toolkit versions installed, CUDA_HOME should be set to the correct version before running the command. For example, if you have CUDA Toolkit 12.4 and 13.0 installed, you can run export CUDA_HOME=/usr/local/cuda-12.4 before running the command. - By default, the code uses the flash-attn backend for attention. For GPUs do not support flash-attn (e.g., NVIDIA V100), you can install xformers manually and set the ATTN_BACKEND environment variable to xformers before running the code. See the Minimal Example for more details. - The installation may take a while due to the large number of dependencies. Please be patient. If you encounter any issues, you can try to install the dependencies one by one, specifying one flag at a time. - If you encounter any issues during the installation, feel free to open an issue or contact us.

Create a new conda environment named trellis2 and install the dependencies: sh . ./setup.sh --new-env --basic --flash-attn --nvdiffrast --nvdiffrec --cumesh --o-voxel --flexgemm The detailed usage of setup.sh can be found by running . ./setup.sh --help. sh Usage: setup.sh [OPTIONS] Options: -h, --help Display this help message --new-env Create a new conda environment --basic Install basic dependencies --flash-attn Install flash-attention --cumesh Install cumesh --o-voxel Install o-voxel --flexgemm Install flexgemm --nvdiffrast Install nvdiffrast --nvdiffrec Install nvdiffrec

📦 Pretrained Weights

The pretrained model TRELLIS.2-4B is available on Hugging Face. Please refer to the model card there for more details.

Model	Parameters	Resolution	Link
TRELLIS.2-4B	4 Billion	512³ - 1536³	Hugging Face

🚀 Usage

1. Image to 3D Generation

Minimal Example

Here is an example of how to use the pretrained models for 3D asset generation.

import os
os.environ['OPENCV_IO_ENABLE_OPENEXR'] = '1'
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"  # Can save GPU memory
import cv2
import imageio
from PIL import Image
import torch
from trellis2.pipelines import Trellis2ImageTo3DPipeline
from trellis2.utils import render_utils
from trellis2.renderers import EnvMap
import o_voxel

# 1. Setup Environment Map
envmap = EnvMap(torch.tensor(
    cv2.cvtColor(cv2.imread('assets/hdri/forest.exr', cv2.IMREAD_UNCHANGED), cv2.COLOR_BGR2RGB),
    dtype=torch.float32, device='cuda'
))

# 2. Load Pipeline
pipeline = Trellis2ImageTo3DPipeline.from_pretrained("microsoft/TRELLIS.2-4B")
pipeline.cuda()

# 3. Load Image & Run
image = Image.open("assets/example_image/T.png")
mesh = pipeline.run(image)[0]
mesh.simplify(16777216) # nvdiffrast limit

# 4. Render Video
video = render_utils.make_pbr_vis_frames(render_utils.render_video(mesh, envmap=envmap))
imageio.mimsave("sample.mp4", video, fps=15)

# 5. Export to GLB
glb = o_voxel.postprocess.to_glb(
    vertices            =   mesh.vertices,
    faces               =   mesh.faces,
    attr_volume         =   mesh.attrs,
    coords              =   mesh.coords,
    attr_layout         =   mesh.layout,
    voxel_size          =   mesh.voxel_size,
    aabb                =   [[-0.5, -0.5, -0.5], [0.5, 0.5, 0.5]],
    decimation_target   =   1000000,
    texture_size        =   4096,
    remesh              =   True,
    remesh_band         =   1,
    remesh_project      =   0,
    verbose             =   True
)
glb.export("sample.glb", extension_webp=True)

Upon execution, the script generates the following files: - sample.mp4: A video visualizing the generated 3D asset with PBR materials and environmental lighting. - sample.glb: The extracted PBR-ready 3D asset in GLB format.

Note: The .glb file is exported in OPAQUE mode by default. Although the alpha channel is preserved within the texture map, it is not active initially. To enable transparency, import the asset into your 3D software and manually connect the texture's alpha channel to the material's opacity or alpha input.

Web Demo

app.py provides a simple web demo for image to 3D asset generation. you can run the demo with the following command:

python app.py

Then, you can access the demo at the address shown in the terminal.

2. PBR Texture Generation

Please refer to the example_texturing.py for an example of how to generate PBR textures for a given 3D shape. Also, you can use the app_texturing.py to run a web demo for PBR texture generation.

🏋️ Training

We provide the full training codebase, enabling users to train TRELLIS.2 from scratch or fine-tune it on custom datasets.

1. Data Preparation

Before training, raw 3D assets must be converted into the O-Voxel representation. This process includes mesh conversion, compact structured latent generation, and metadata preparation.

📂 Please refer to data_toolkit/README.md for detailed instructions on data preprocessing and dataset organization.

2. Running Training

Training is managed through the train.py script, which accepts multiple command-line arguments to configure experiments:

--config: Path to the experiment configuration file.
--output_dir: Directory for training outputs.
--load_dir: Directory to load checkpoints from (defaults to output_dir).
--ckpt: Checkpoint step to resume from (defaults to the latest).
--data_dir: Dataset path or a JSON string specifying dataset locations.
--auto_retry: Number of automatic retries upon failure.
--tryrun: Perform a dry run without actual training.
--profile: Enable training profiling.
--num_nodes: Number of nodes for distributed training.
--node_rank: Rank of the current node.
--num_gpus: Number of GPUs per node (defaults to all available GPUs).
--master_addr: Master node address for distributed training.
--master_port: Port for distributed training communication.

SC-VAE Training

To train the shape SC-VAE, run:

python train.py \
  --config configs/scvae/shape_vae_next_dc_f16c32_fp16.json \
  --output_dir results/shape_vae_next_dc_f16c32_fp16 \
  --data_dir "{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"mesh_dump\": \"datasets/ObjaverseXL_sketchfab/mesh_dumps\", \"dual_grid\": \"datasets/ObjaverseXL_sketchfab/dual_grid_256\", \"asset_stats\": \"datasets/ObjaverseXL_sketchfab/asset_stats\"}}"

This command trains the shape SC-VAE on the Objaverse-XL dataset using the shape_vae_next_dc_f16c32_fp16.json configuration. Training outputs will be saved to results/shape_vae_next_dc_f16c32_fp16.

The dataset is specified as a JSON string, where each dataset entry includes:

base: Root directory of the dataset.
mesh_dump: Directory containing preprocessed mesh dumps.
dual_grid: Directory with precomputed dual-grid representations.
asset_stats: Directory containing precomputed asset statistics.

To fine-tune the model at a higher resolution, use the shape_vae_next_dc_f16c32_fp16_ft_512.json configuration. Remember to update the finetune_ckpt field and adjust the dataset paths accordingly.

To train the texture SC-VAE, run:

python train.py \
  --config configs/scvae/tex_vae_next_dc_f16c32_fp16.json \
  --output_dir results/tex_vae_next_dc_f16c32_fp16 \
  --data_dir "{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"pbr_dump\": \"datasets/ObjaverseXL_sketchfab/pbr_dumps\", \"pbr_voxel\": \"datasets/ObjaverseXL_sketchfab/pbr_voxels_256\", \"asset_stats\": \"datasets/ObjaverseXL_sketchfab/asset_stats\"}}"

Flow Model Training

To train the sparse structure flow model, run:

python train.py \
  --config configs/gen/ss_flow_img_dit_1_3B_64_bf16.json \
  --output_dir results/ss_flow_img_dit_1_3B_64_bf16 \
  --data_dir "{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"ss_latent\": \"datasets/ObjaverseXL_sketchfab/ss_latents/ss_enc_conv3d_16l8_fp16_64\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}"

This command trains the sparse-structure flow model on the Objaverse-XL dataset using the specified configuration file. Outputs are saved to results/ss_flow_img_dit_1_3B_64_bf16.

The dataset configuration includes:

base: Root dataset directory.
ss_latent: Directory containing precomputed sparse-structure latents.
render_cond: Directory containing conditional rendering images.

The second- and third-stage flow models for shape and texture generation can be trained using the following configurations:

Shape flow: slat_flow_img2shape_dit_1_3B_512_bf16.json
Texture flow: slat_flow_imgshape2tex_dit_1_3B_512_bf16.json

Example commands:

```sh

Shape flow model

python train.py \ --config configs/gen/slat_flow_img2shape_dit_1_3B_512_bf16.json \ --output_dir results/slat_flow_img2shape_dit_1_3B_512_bf16 \ --data_dir "{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c32_fp16_512\", \"render_cond\": \"datasets/ObjaverseXL_sketchfab/renders_cond\"}}"

Texture flow model

python train.py \ --config configs/gen/slat_flow_imgshape2tex_dit_1_3B_512_bf16.json \ --output_dir results/slat_flow_imgshape2tex_dit_1_3B_512_bf16 \ --data_dir "{\"ObjaverseXL_sketchfab\": {\"base\": \"datasets/ObjaverseXL_sketchfab\", \"shape_latent\": \"datasets/ObjaverseXL_sketchfab/shape_latents/shape_enc_next_dc_f16c3

Core symbols most depended-on inside this repo

reshape

called by 99

trellis2/modules/sparse/basic.py

replace

called by 90

trellis2/modules/sparse/basic.py

cuda

called by 90

trellis2/pipelines/base.py

float

called by 89

trellis2/modules/sparse/basic.py

cpu

called by 60

trellis2/pipelines/base.py

load

called by 46

trellis2/trainers/basic.py

sum

called by 29

trellis2/modules/sparse/basic.py

dim

called by 26

trellis2/modules/sparse/basic.py

Shape

Method 510

Function 229

Class 129

Languages

Python100%

Modules by API surface

trellis2/modules/sparse/basic.py83 symbols

trellis2/models/sc_vaes/sparse_unet_vae.py42 symbols

trellis2/utils/elastic_utils.py29 symbols

trellis2/trainers/flow_matching/mixins/image_conditioned.py27 symbols

trellis2/representations/mesh/base.py26 symbols

trellis2/trainers/basic.py24 symbols

trellis2/models/sparse_structure_vae.py22 symbols

trellis2/utils/general_utils.py17 symbols

trellis2/trainers/flow_matching/flow_matching.py15 symbols

trellis2/pipelines/samplers/flow_euler.py15 symbols

trellis2/modules/transformer/blocks.py15 symbols

trellis2/datasets/components.py15 symbols

Dependencies from manifests, versioned

easydict1×

numpy1×

plyfile1×

torch1×

tqdm1×

trimesh1×

zstandard1×

For agents

$ claude mcp add TRELLIS.2 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact