MCPcopy Index your code
hub / github.com/facebookresearch/map-anything

github.com/facebookresearch/map-anything @v1.1.2 sqlite

repository ↗ · DeepWiki ↗ · release v1.1.2 ↗
1,448 symbols 5,366 edges 203 files 914 documented · 63%
README

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Paper arXiv Project Page X Thread

Nikhil Keetha1,2    Norman Müller1    Johannes Schönberger1    Lorenzo Porzi1    Yuchen Zhang2

Tobias Fischer1    Arno Knapitsch1    Duncan Zauss1    Ethan Weber1    Nelson Antunes1

Jonathon Luiten1    Manuel Lopez-Antequera1    Samuel Rota Bulò1    Christian Richardt1

Deva Ramanan2    Sebastian Scherer2    Peter Kontschieder1

1 Meta    2 Carnegie Mellon University

Overview

MapAnything is an open-source research framework for universal metric 3D reconstruction. At its core is a simple, end-to-end trained transformer model that directly regresses the factored metric 3D geometry of a scene given various types of inputs (images, calibration, poses, or depth). A single feed-forward model supports over 12 different 3D reconstruction tasks including multi-image sfm, multi-view stereo, monocular metric depth estimation, registration, depth completion and more.

The framework provides the complete stack—data processing, training, inference, and profiling—with a modular design that allows different 3D reconstruction models (VGGT, DUSt3R, MASt3R, MUSt3R, Pi3-X, and more) to be used interchangeably through a unified interface.

Overview

Table of Contents

Quick Start

Installation

git clone https://github.com/facebookresearch/map-anything.git
cd map-anything

# Create and activate conda environment
conda create -n mapanything python=3.12 -y
conda activate mapanything

# Optional: Install torch, torchvision & torchaudio specific to your system
# Install MapAnything
pip install -e .

# For all optional dependencies
# This includes external model support (VGGT, VGGT-Omega, DUSt3R, MASt3R, MUSt3R, Pi3-X, DA3, etc.)
# See "Running External Models" section for more details
# See pyproject.toml for more details on installed packages
pip install -e ".[all]"
pre-commit install

Note that we don't pin a specific version of PyTorch or CUDA in our requirements. Please feel free to install PyTorch based on your specific system.

Image-Only Inference

For metric 3D reconstruction from images without additional geometric inputs:

# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Required imports
import torch
from mapanything.models import MapAnything
from mapanything.utils.image import load_images

# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)

# Load and preprocess images from a folder or list of paths
images = "path/to/your/images/"  # or ["path/to/img1.jpg", "path/to/img2.jpg", ...]
views = load_images(images)

# Run inference
predictions = model.infer(
    views,                            # Input views
    memory_efficient_inference=True,  # Trades off speed for more views (up to 2000 views on 140 GB). Trade off is negligible - see profiling section
    minibatch_size=None,              # Minibatch size for memory-efficient inference (use 1 for smallest GPU memory consumption). Default is dynamic computation based on available GPU memory.
    use_amp=True,                     # Use mixed precision inference (recommended)
    amp_dtype="bf16",                 # bf16 inference (recommended; falls back to fp16 if bf16 not supported)
    apply_mask=True,                  # Apply masking to dense geometry outputs
    mask_edges=True,                  # Remove edge artifacts by using normals and depth
    apply_confidence_mask=False,      # Filter low-confidence regions
    confidence_percentile=10,         # Remove bottom 10 percentile confidence pixels
    use_multiview_confidence=False,   # Enable multi-view depth consistency based confidence in place of learning-based one
)

# Access results for each view - Complete list of metric outputs
for i, pred in enumerate(predictions):
    # Geometry outputs
    pts3d = pred["pts3d"]                     # 3D points in world coordinates (B, H, W, 3)
    pts3d_cam = pred["pts3d_cam"]             # 3D points in camera coordinates (B, H, W, 3)
    depth_z = pred["depth_z"]                 # Z-depth in camera frame (B, H, W, 1)
    depth_along_ray = pred["depth_along_ray"] # Depth along ray in camera frame (B, H, W, 1)

    # Camera outputs
    ray_directions = pred["ray_directions"]   # Ray directions in camera frame (B, H, W, 3)
    intrinsics = pred["intrinsics"]           # Recovered pinhole camera intrinsics (B, 3, 3)
    camera_poses = pred["camera_poses"]       # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world poses in world frame (B, 4, 4)
    cam_trans = pred["cam_trans"]             # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world translation in world frame (B, 3)
    cam_quats = pred["cam_quats"]             # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world quaternion in world frame (B, 4)

    # Quality and masking
    confidence = pred["conf"]                 # Per-pixel confidence scores (B, H, W)
    mask = pred["mask"]                       # Combined validity mask (B, H, W, 1)
    non_ambiguous_mask = pred["non_ambiguous_mask"]                # Non-ambiguous regions (B, H, W)
    non_ambiguous_mask_logits = pred["non_ambiguous_mask_logits"]  # Mask logits (B, H, W)

    # Scaling
    metric_scaling_factor = pred["metric_scaling_factor"]  # Applied metric scaling (B,)

    # Original input
    img_no_norm = pred["img_no_norm"]         # Denormalized input images for visualization (B, H, W, 3)

Multi-Modal Inference

MapAnything supports flexible combinations of geometric inputs for enhanced metric reconstruction. Steps to try it out:

Initialize the model:

# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Required imports
import torch
from mapanything.models import MapAnything

# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)

Initialize the inputs:

# MapAnything is extremely flexible and supports any combination of inputs.
views_example = [
    {
        # View 0: Images + Calibration
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
    },
    {
        # View 1: Images + Calibration + Depth
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
        "depth_z": depth_z, # (H, W)
        "is_metric_scale": torch.tensor([True], device=device), # (1,)
    },
    {
        # View 2: Images + Calibration + Depth + Pose
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
        "depth_z": depth_z, # (H, W)
        "camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention
        "is_metric_scale": torch.tensor([True], device=device), # (1,)
    },
    ...
]

Note that MapAnything expects the input camera poses to follow the OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world convention.

Expand to show more examples

```python

Example 1: Images + Camera Intrinsics

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) }, ... ]

Example 2: Images + Intrinsics + Depth

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]

Example 3: Images + Intrinsics + Camera Poses

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]

Example 4: Images + Ray Directions + Depth (alternative to intrinsics)

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "ray_directions": ray_dirs_tensor, # (H, W, 3) "depth_z": depth_tensor, # (H, W) } ... ]

Example 5: Full Multi-Modal (Images + Intrinsics + Depth + Poses)

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) } ... ]

Example 6: Adaptive Mixed Inputs

views_example = [ { # View 0: Images + Pose "img": images, # (H, W, 3) - [0, 255] "camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention }, { # View 1: Images + Calibration "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) }, { # View 2: Images + Calibration + Depth "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) "depth_z": depth_z, # (H, W) "is_metric_scale": torch.tensor([True], device=device), # (1,) }, { # View 3: Images + Calibration + Depth + Pose

Core symbols most depended-on inside this repo

print
called by 643
mapanything/utils/train_tools.py
apply_log_to_norm
called by 60
mapanything/utils/geometry.py
store_data
called by 57
mapanything/utils/wai/core.py
load_data
called by 52
mapanything/utils/wai/core.py
update
called by 47
mapanything/utils/train_tools.py
view_name
called by 30
mapanything/datasets/base/base_dataset.py
resize
called by 25
mapanything/utils/cropping.py
load_state_dict
called by 25
mapanything/models/external/pow3r/__init__.py

Shape

Function 680
Method 586
Class 182

Languages

Python100%

Modules by API surface

mapanything/train/losses.py90 symbols
mapanything/utils/geometry.py50 symbols
data_processing/aggregate_scene_names.py50 symbols
mapanything/utils/train_tools.py41 symbols
mapanything/utils/timing.py36 symbols
mapanything/utils/wai/io.py32 symbols
mapanything/datasets/base/easy_dataset.py29 symbols
mapanything/models/external/pow3r/__init__.py26 symbols
mapanything/models/external/pi3/layers/block.py26 symbols
scripts/gradio_app.py25 symbols
mapanything/models/external/moge/models/utils.py24 symbols
mapanything/utils/colmap.py22 symbols

Dependencies from manifests, versioned

argconf
cython
hydra-core
imageio
natsort
ninja
omegaconf2.4.0.dev3 · 1×
open3d
opencv-python-headless4.10.0.84 · 1×

For agents

$ claude mcp add map-anything \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact