MCPcopy Index your code
hub / github.com/LiheYoung/Depth-Anything

github.com/LiheYoung/Depth-Anything @main

repository ↗ · DeepWiki ↗ · Ask this repo → · + Follow
887 symbols 2,634 edges 116 files 160 documented · 18% updated 1y ago★ 8,134127 open issues
README

Depth Anything: Unleashing the Power of Large-Scale Unlabeled Data

Lihe Yang1 · Bingyi Kang2† · Zilong Huang2 · Xiaogang Xu3,4 · Jiashi Feng2 · Hengshuang Zhao1*

1HKU    2TikTok    3CUHK    4ZJU

†project lead *corresponding author

CVPR 2024

Paper PDF Project Page

This work presents Depth Anything, a highly practical solution for robust monocular depth estimation by training on a combination of 1.5M labeled images and 62M+ unlabeled images.

teaser

<a href="https://github.com/DepthAnything/Depth-Anything-V2"><b>Try our latest Depth Anything V2 models!</b></a>

News

Features of Depth Anything

If you need other features, please first check existing community supports.

  • Relative depth estimation:

    Our foundation models listed here can provide relative depth estimation for any given image robustly. Please refer here for details.

  • Metric depth estimation

    We fine-tune our Depth Anything model with metric depth information from NYUv2 or KITTI. It offers strong capabilities of both in-domain and zero-shot metric depth estimation. Please refer here for details.

  • Better depth-conditioned ControlNet

    We re-train a better depth-conditioned ControlNet based on Depth Anything. It offers more precise synthesis than the previous MiDaS-based ControlNet. Please refer here for details. You can also use our new ControlNet based on Depth Anything in ControlNet WebUI or ComfyUI's ControlNet.

  • Downstream high-level scene understanding

    The Depth Anything encoder can be fine-tuned to downstream high-level perception tasks, e.g., semantic segmentation, 86.2 mIoU on Cityscapes and 59.4 mIoU on ADE20K. Please refer here for details.

Performance

Here we compare our Depth Anything with the previously best MiDaS v3.1 BEiTL-512 model.

Please note that the latest MiDaS is also trained on KITTI and NYUv2, while we do not.

Method Params KITTI NYUv2 Sintel DDAD ETH3D DIODE
AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$ AbsRel $\delta_1$
MiDaS 345.0M 0.127 0.850 0.048 0.980 0.587 0.699 0.251 0.766 0.139 0.867 0.075 0.942
Ours-S 24.8M 0.080 0.936 0.053 0.972 0.464 0.739 0.247 0.768 0.127 0.885 0.076 0.939
Ours-B 97.5M 0.080 0.939 0.046 0.979 0.432 0.756 0.232 0.786 0.126 0.884 0.069 0.946
Ours-L 335.3M 0.076 0.947 0.043 0.981 0.458 0.760 0.230 0.789 0.127 0.882 0.066 0.952

We highlight the best and second best results in bold and italic respectively (better results: AbsRel $\downarrow$ , $\delta_1 \uparrow$).

Pre-trained models

We provide three models of varying scales for robust relative depth estimation:

Model Params Inference Time on V100 (ms) A100 RTX4090 (TensorRT)
Depth-Anything-Small 24.8M 12 8 3
Depth-Anything-Base 97.5M 13 9 6
Depth-Anything-Large 335.3M 20 13 12

Note that the V100 and A100 inference time (without TensorRT) is computed by excluding the pre-processing and post-processing stages, whereas the last column RTX4090 (with TensorRT) is computed by including these two stages (please refer to Depth-Anything-TensorRT).

You can easily load our pre-trained models by:

from depth_anything.dpt import DepthAnything

encoder = 'vits' # can also be 'vitb' or 'vitl'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder))

Depth Anything is also supported in transformers. You can use it for depth prediction within 3 lines of code (credit to @niels).

No network connection, cannot load these models?

Click here for solutions

from depth_anything.dpt import DepthAnything

model_configs = {
    'vitl': {'encoder': 'vitl', 'features': 256, 'out_channels': [256, 512, 1024, 1024]},
    'vitb': {'encoder': 'vitb', 'features': 128, 'out_channels': [96, 192, 384, 768]},
    'vits': {'encoder': 'vits', 'features': 64, 'out_channels': [48, 96, 192, 384]}
}

encoder = 'vitl' # or 'vitb', 'vits'
depth_anything = DepthAnything(model_configs[encoder])
depth_anything.load_state_dict(torch.load(f'./checkpoints/depth_anything_{encoder}14.pth'))

Note that in this locally loading manner, you also do not have to install the huggingface_hub package. In this way, please feel free to delete this line and the PyTorchModelHubMixin in this line.

Usage

Installation

git clone https://github.com/LiheYoung/Depth-Anything
cd Depth-Anything
pip install -r requirements.txt

Running

python run.py --encoder <vits | vitb | vitl> --img-path <img-directory | single-img | txt-file> --outdir <outdir> [--pred-only] [--grayscale]

Arguments: - --img-path: you can either 1) point it to an image directory storing all interested images, 2) point it to a single image, or 3) point it to a text file storing all image paths. - --pred-only is set to save the predicted depth map only. Without it, by default, we visualize both image and its depth map side by side. - --grayscale is set to save the grayscale depth map. Without it, by default, we apply a color palette to the depth map.

For example:

python run.py --encoder vitl --img-path assets/examples --outdir depth_vis

If you want to use Depth Anything on videos:

python run_video.py --encoder vitl --video-path assets/examples_video --outdir video_depth_vis

Gradio demo

To use our gradio demo locally:

python app.py

You can also try our online demo.

Import Depth Anything to your project

If you want to use Depth Anything in your own project, you can simply follow run.py to load our models and define data pre-processing.

Code snippet (note the difference between our data pre-processing and that of MiDaS)

from depth_anything.dpt import DepthAnything
from depth_anything.util.transform import Resize, NormalizeImage, PrepareForNet

import cv2
import torch
from torchvision.transforms import Compose

encoder = 'vits' # can also be 'vitb' or 'vitl'
depth_anything = DepthAnything.from_pretrained('LiheYoung/depth_anything_{:}14'.format(encoder)).eval()

transform = Compose([
    Resize(
        width=518,
        height=518,
        resize_target=False,
        keep_aspect_ratio=True,
        ensure_multiple_of=14,
        resize_method='lower_bound',
        image_interpolation_method=cv2.INTER_CUBIC,
    ),
    NormalizeImage(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    PrepareForNet(),
])

image = cv2.cvtColor(cv2.imread('your image path'), cv2.COLOR_BGR2RGB) / 255.0
image = transform({'image': image})['image']
image = torch.from_numpy(image).unsqueeze(0)

# depth shape: 1xHxW
depth = depth_anything(image)

Do not want to define image pre-processing or download model definition files?

Easily use Depth Anything through transformers within 3 lines of code! Please refer to these instructions (credit to @niels).

Note: If you encounter KeyError: 'depth_anything', please install the latest transformers from source:

pip install git+https://github.com/huggingface/transformers.git

Click here for a brief demo:

from transformers import pipeline
from PIL import Image

image = Image.open('Your-image-path')
pipe = pipeline(task="depth-estimation", model="LiheYoung/depth-anything-small-hf")
depth = pipe(image)["depth"]

Community Support

We sincerely appreciate all the extensions built on our Depth Anything from the community. Thank you a lot!

Here we list the extensions we have found: - Depth Anything TensorRT: - https://github.com/spacewalk01/depth-anything-tensorrt - https://github.com/thinvy/DepthAnythingTensorrtDeploy - https://github.com/daniel89710/trt-depth-anything - Depth Anything ONNX: https://github.com/fabio-sim/Depth-Anything-ONNX - Depth Anything in Transformers.js (3D visualization): https://huggingface.co/spaces/Xenova/depth-anything-web - Depth Anything for video (online demo): https://huggingface.co/spaces/JohanDL/Depth-Anything-Video - Depth Anything in ControlNet WebUI: https://github.com/Mikubill/sd-webui-controlnet - Depth Anything in ComfyUI's ControlNet: https://github.com/Fannovel16/comfyui_controlnet_aux - Depth Anything in X-AnyLabeling: https://github.com/CVHub520/X-AnyLabeling - Depth Anything in OpenXLab: https://openxlab.org.cn/apps/detail/yyfan/depth_anything - Depth Anything in OpenVINO: https://github.com/openvinotoolkit/openvino_notebooks/tree/main/notebooks/280-depth-anything - Depth Anything ROS: - https://github.com/scepter914/DepthAnything-ROS - https://github.com/polatztrk/depth_anything_ros - Depth Anything Android: - https://github.com/FeiGeChuanShu/ncnn-android-depth_anything - https://github.com/shubham0204/Depth-Anything-Android - Depth Anything in TouchDesigner: https://github.com/olegchomp/TDDepthAnything - LearnOpenCV research article on Depth Anything: https://learnopencv.com/depth-anything - Learn more about the DPT architecture we used: https://github.com/heyoeyo/muggled_dpt - Depth Anything in NVIDIA Jetson Orin: https://github.com/ZhuYaoHui1998/jetson-examples/blob/main/reComputer/scripts/depth-anything

If you have your amazing projects supporting or improving (e.g., speed) Depth Anything, please feel free to drop an issue. We will add them here.

Acknowledgement

We would like to express our deepest gratitude to AK(@_akhaliq) and the awesome Hugging

Core symbols most depended-on inside this repo

print
called by 77
torchhub/facebookresearch_dinov2_main/dinov2/distributed/__init__.py
append
called by 70
metric_depth/zoedepth/utils/misc.py
to
called by 61
metric_depth/zoedepth/models/depth_model.py
split
called by 41
torchhub/facebookresearch_dinov2_main/dinov2/data/datasets/image_net.py
open
called by 27
metric_depth/zoedepth/data/data_mono.py
load
called by 21
torchhub/facebookresearch_dinov2_main/dinov2/fsdp/__init__.py
update
called by 18
torchhub/facebookresearch_dinov2_main/dinov2/logging/helpers.py
max
called by 17
torchhub/facebookresearch_dinov2_main/dinov2/logging/helpers.py

Shape

Method 498
Function 248
Class 141

Languages

Python100%

Modules by API surface

metric_depth/zoedepth/data/data_mono.py33 symbols
metric_depth/zoedepth/data/transforms.py31 symbols
torchhub/facebookresearch_dinov2_main/dinov2/data/datasets/image_net_22k.py29 symbols
torchhub/facebookresearch_dinov2_main/dinov2/data/datasets/image_net.py29 symbols
metric_depth/zoedepth/models/base_models/midas.py29 symbols
metric_depth/zoedepth/models/base_models/depth_anything.py28 symbols
torchhub/facebookresearch_dinov2_main/dinov2/eval/linear.py23 symbols
metric_depth/zoedepth/utils/misc.py22 symbols
metric_depth/zoedepth/trainers/loss.py22 symbols
torchhub/facebookresearch_dinov2_main/dinov2/distributed/__init__.py21 symbols
torchhub/facebookresearch_dinov2_main/dinov2/data/samplers.py21 symbols
torchhub/facebookresearch_dinov2_main/vision_transformer.py20 symbols

Dependencies from manifests, versioned

black22.6.0 · 1×
flake85.0.4 · 1×
gradio4.14.0 · 1×
pylint2.15.0 · 1×
torch2.0.0 · 1×
torchmetrics0.10.3 · 1×
torchvision0.15.0 · 1×
xformers0.0.18 · 1×

For agents

$ claude mcp add Depth-Anything \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact