hub / github.com/ByteDance-Seed/Depth-Anything-3

github.com/ByteDance-Seed/Depth-Anything-3 @main sqlite

815 symbols 2,602 edges 98 files 489 documented · 60%

README

Depth Anything 3: Recovering the Visual Space from Any Views

Haotong Lin^* · Sili Chen^* · Jun Hao Liew^* · Donny Y. Chen^* · Zhenyu Li · Guang Shi · Jiashi Feng

Bingyi Kang^*†

†project lead *Equal Contribution

This work presents Depth Anything 3 (DA3), a model that predicts spatially consistent geometry from arbitrary visual inputs, with or without known camera poses. In pursuit of minimal modeling, DA3 yields two key insights: - 💎 A single plain transformer (e.g., vanilla DINO encoder) is sufficient as a backbone without architectural specialization, - ✨ A singular depth-ray representation obviates the need for complex multi-task learning.

🏆 DA3 significantly outperforms DA2 for monocular depth estimation, and VGGT for multi-view depth estimation and pose estimation. All models are trained exclusively on public academic datasets.

Depth Anything 3 - Left

Depth Anything 3

📰 News

11-12-2025: 🚀 New models and DA3-Streaming released! Handle ultra-long video sequence inference with less than 12GB GPU memory via sliding-window streaming inference. Special thanks to Kai Deng for his contribution to DA3-Streaming!
08-12-2025: 📊 Benchmark evaluation pipeline released! Evaluate pose estimation & 3D reconstruction on 5 datasets.
30-11-2025: Add use_ray_pose and ref_view_strategy (reference view selection for multi-view inputs).
25-11-2025: Add Awesome DA3 Projects, a community-driven section featuring DA3-based applications.
14-11-2025: Paper, project page, code and models are all released.

✨ Highlights

🏆 Model Zoo

We release three series of models, each tailored for specific use cases in visual geometry.

🌟 DA3 Main Series (DA3-Giant, DA3-Large, DA3-Base, DA3-Small) These are our flagship foundation models, trained with a unified depth-ray representation. By varying the input configuration, a single model can perform a wide range of tasks:
🌊 Monocular Depth Estimation: Predicts a depth map from a single RGB image.
🌊 Multi-View Depth Estimation: Generates consistent depth maps from multiple images for high-quality fusion.
🎯 Pose-Conditioned Depth Estimation: Achieves superior depth consistency when camera poses are provided as input.
📷 Camera Pose Estimation: Estimates camera extrinsics and intrinsics from one or more images.
🟡 3D Gaussian Estimation: Directly predicts 3D Gaussians, enabling high-fidelity novel view synthesis.
📐 DA3 Metric Series (DA3Metric-Large) A specialized model fine-tuned for metric depth estimation in monocular settings, ideal for applications requiring real-world scale.
🔍 DA3 Monocular Series (DA3Mono-Large). A dedicated model for high-quality relative monocular depth estimation. Unlike disparity-based models (e.g., Depth Anything 2), it directly predicts depth, resulting in superior geometric accuracy.

🔗 Leveraging these available models, we developed a nested series (DA3Nested-Giant-Large). This series combines a any-view giant model with a metric model to reconstruct visual geometry at a real-world metric scale.

🛠️ Codebase Features

Our repository is designed to be a powerful and user-friendly toolkit for both practical application and future research. - 🎨 Interactive Web UI & Gallery: Visualize model outputs and compare results with an easy-to-use Gradio-based web interface. - ⚡ Flexible Command-Line Interface (CLI): Powerful and scriptable CLI for batch processing and integration into custom workflows. - 💾 Multiple Export Formats: Save your results in various formats, including glb, npz, depth images, ply, 3DGS videos, etc, to seamlessly connect with other tools. - 🔧 Extensible and Modular Design: The codebase is structured to facilitate future research and the integration of new models or functionalities.

🚀 Quick Start

📦 Installation

pip install xformers torch\>=2 torchvision
pip install -e . # Basic
pip install --no-build-isolation git+https://github.com/nerfstudio-project/gsplat.git@0b4dddf04cb687367602c01196913cde6a743d70 # for gaussian head
pip install -e ".[app]" # Gradio, python>=3.10
pip install -e ".[all]" # ALL

For detailed model information, please refer to the Model Cards section below.

💻 Basic Usage

import glob, os, torch
from depth_anything_3.api import DepthAnything3
device = torch.device("cuda")
model = DepthAnything3.from_pretrained("depth-anything/DA3NESTED-GIANT-LARGE")
model = model.to(device=device)
example_path = "assets/examples/SOH"
images = sorted(glob.glob(os.path.join(example_path, "*.png")))
prediction = model.inference(
    images,
)
# prediction.processed_images : [N, H, W, 3] uint8   array
print(prediction.processed_images.shape)
# prediction.depth            : [N, H, W]    float32 array
print(prediction.depth.shape)  
# prediction.conf             : [N, H, W]    float32 array
print(prediction.conf.shape)  
# prediction.extrinsics       : [N, 3, 4]    float32 array # opencv w2c or colmap format
print(prediction.extrinsics.shape)
# prediction.intrinsics       : [N, 3, 3]    float32 array
print(prediction.intrinsics.shape)


export MODEL_DIR=depth-anything/DA3NESTED-GIANT-LARGE
# This can be a Hugging Face repository or a local directory
# If you encounter network issues, consider using the following mirror: export HF_ENDPOINT=https://hf-mirror.com
# Alternatively, you can download the model directly from Hugging Face
export GALLERY_DIR=workspace/gallery
mkdir -p $GALLERY_DIR

# CLI auto mode with backend reuse
da3 backend --model-dir ${MODEL_DIR} --gallery-dir ${GALLERY_DIR} # Cache model to gpu
da3 auto assets/examples/SOH \
    --export-format glb \
    --export-dir ${GALLERY_DIR}/TEST_BACKEND/SOH \
    --use-backend

# CLI video processing with feature visualization
da3 video assets/examples/robot_unitree.mp4 \
    --fps 15 \
    --use-backend \
    --export-dir ${GALLERY_DIR}/TEST_BACKEND/robo \
    --export-format glb-feat_vis \
    --feat-vis-fps 15 \
    --process-res-method lower_bound_resize \
    --export-feat "11,21,31"

# CLI auto mode without backend reuse
da3 auto assets/examples/SOH \
    --export-format glb \
    --export-dir ${GALLERY_DIR}/TEST_CLI/SOH \
    --model-dir ${MODEL_DIR}

The model architecture is defined in DepthAnything3Net, and specified with a Yaml config file located at src/depth_anything_3/configs. The input and output processing are handled by DepthAnything3. To customize the model architecture, simply create a new config file (e.g., path/to/new/config) as:

__object__:
  path: depth_anything_3.model.da3
  name: DepthAnything3Net
  args: as_params

net:
  __object__:
    path: depth_anything_3.model.dinov2.dinov2
    name: DinoV2
    args: as_params

  name: vitb
  out_layers: [5, 7, 9, 11]
  alt_start: 4
  qknorm_start: 4
  rope_start: 4
  cat_token: True

head:
  __object__:
    path: depth_anything_3.model.dualdpt
    name: DualDPT
    args: as_params

  dim_in: &head_dim_in 1536
  output_dim: 2
  features: &head_features 128
  out_channels: &head_out_channels [96, 192, 384, 768]

Then, the model can be created with the following code snippet.

from depth_anything_3.cfg import create_object, load_config

Model = create_object(load_config("path/to/new/config"))

📚 Useful Documentation

🗂️ Model Cards

Generally, you should observe that DA3-LARGE achieves comparable results to VGGT.

The Nested series uses an Any-view model to estimate pose and depth, and a monocular metric depth estimator for scaling.

⚠️ Models with the -1.1 suffix are retrained after fixing a training bug; prefer these refreshed checkpoints. The original DA3NESTED-GIANT-LARGE, DA3-GIANT, and DA3-LARGE remain available but are deprecated. You could expect much better performance for street scenes with the -1.1 models.

🗃️ Model Name	📏 Params	📊 Rel. Depth	📷 Pose Est.	🧭 Pose Cond.	🎨 GS	📐 Met. Depth	☁️ Sky Seg	📄 License
Nested
DA3NESTED-GIANT-LARGE-1.1	1.40B	✅	✅	✅	✅	✅	✅	CC BY-NC 4.0
DA3NESTED-GIANT-LARGE	1.40B	✅	✅	✅	✅	✅	✅	CC BY-NC 4.0
Any-view Model
DA3-GIANT-1.1	1.15B	✅	✅	✅	✅			CC BY-NC 4.0
DA3-GIANT	1.15B	✅	✅	✅	✅			CC BY-NC 4.0
DA3-LARGE-1.1	0.35B	✅	✅	✅				CC BY-NC 4.0
DA3-LARGE	0.35B	✅	✅	✅				CC BY-NC 4.0
DA3-BASE	0.12B	✅	✅	✅				Apache 2.0
DA3-SMALL	0.08B	✅	✅	✅				Apache 2.0

Monocular Metric Depth
DA3METRIC-LARGE	0.35B	✅				✅	✅	Apache 2.0

Monocular Depth
DA3MONO-LARGE	0.35B	✅					✅	Apache 2.0

❓ FAQ

Monocular Metric Depth: To obtain metric depth in meters from DA3METRIC-LARGE, use `metric_d

Core symbols most depended-on inside this repo

get

called by 44

src/depth_anything_3/utils/registry.py

write_next_bytes

called by 19

src/depth_anything_3/utils/read_write_model.py

info

called by 16

src/depth_anything_3/utils/logger.py

read_next_bytes

called by 13

src/depth_anything_3/utils/read_write_model.py

align_poses_umeyama

called by 13

src/depth_anything_3/utils/pose_align.py

_make_fusion_block

called by 12

src/depth_anything_3/model/dpt.py

log

called by 11

src/depth_anything_3/utils/logger.py

cleanup_cuda_memory

called by 10

src/depth_anything_3/utils/memory.py

Shape

Method 366

Function 365

Class 70

Route 14

Languages

Python100%

Modules by API surface

src/depth_anything_3/services/backend.py48 symbols

da3_streaming/loop_utils/sim3utils.py35 symbols

src/depth_anything_3/utils/geometry.py25 symbols

src/depth_anything_3/utils/io/input_processor.py24 symbols

src/depth_anything_3/utils/read_write_model.py22 symbols

src/depth_anything_3/bench/datasets/dtu.py20 symbols

src/depth_anything_3/bench/evaluator.py19 symbols

src/depth_anything_3/model/dinov2/vision_transformer.py18 symbols

src/depth_anything_3/utils/camera_trj_helpers.py17 symbols

src/depth_anything_3/bench/print_metrics.py17 symbols

da3_streaming/da3_streaming.py17 symbols

src/depth_anything_3/model/dpt.py16 symbols

Dependencies from manifests, versioned

e3nn1×

einops1×

evo1×

fastapi1×

huggingface_hub1×

imageio1×

moviepy1.0.3 · 1×

omegaconf1×

open3d1×

opencv-python1×

pillow1×

pillow_heif1×

For agents

$ claude mcp add Depth-Anything-3 \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact