hub / github.com/facebookresearch/map-anything

github.com/facebookresearch/map-anything @v1.1.2 sqlite

repository ↗ · DeepWiki ↗ · release v1.1.2 ↗

1,448 symbols 5,366 edges 203 files 914 documented · 63%

README

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Nikhil Keetha^1,2 Norman Müller¹ Johannes Schönberger¹ Lorenzo Porzi¹ Yuchen Zhang²

Tobias Fischer¹    Arno Knapitsch¹    Duncan Zauss¹    Ethan Weber¹    Nelson Antunes¹

Jonathon Luiten¹    Manuel Lopez-Antequera¹    Samuel Rota Bulò¹    Christian Richardt¹

Deva Ramanan²    Sebastian Scherer²    Peter Kontschieder¹

¹ Meta ² Carnegie Mellon University

Overview

MapAnything is an open-source research framework for universal metric 3D reconstruction. At its core is a simple, end-to-end trained transformer model that directly regresses the factored metric 3D geometry of a scene given various types of inputs (images, calibration, poses, or depth). A single feed-forward model supports over 12 different 3D reconstruction tasks including multi-image sfm, multi-view stereo, monocular metric depth estimation, registration, depth completion and more.

The framework provides the complete stack—data processing, training, inference, and profiling—with a modular design that allows different 3D reconstruction models (VGGT, DUSt3R, MASt3R, MUSt3R, Pi3-X, and more) to be used interchangeably through a unified interface.

Overview

Overview
Quick Start
Installation
Image-Only Inference
Multi-Modal Inference
Running External Models
- Available Models
- Installation
- Quick Start Example
- Running Inference
- Unified Output Format
- Notes on Input Requirements
Interactive Demos
Online Demo
Local Gradio Demo
Rerun Demo
Demo Inference on COLMAP outputs
Profiling
Profiling Results
Basic Profiling
Comparing with External Models
Command-Line Arguments
Output Files
COLMAP & GSplat Support
Exporting to COLMAP Format
Visualizing COLMAP Reconstruction in Rerun
Integration with Gaussian Splatting
Data Processing for Training & Benchmarking
Training
Benchmarking
Available Benchmarks
Code License
Models
Hugging Face Hub Models
Hugging Face Hub Models (V1 Release)
Model Selection Guide
Optional Checkpoint Conversion
Building Blocks for MapAnything
Related Research
Acknowledgments
Citation

Quick Start

Installation

git clone https://github.com/facebookresearch/map-anything.git
cd map-anything

# Create and activate conda environment
conda create -n mapanything python=3.12 -y
conda activate mapanything

# Optional: Install torch, torchvision & torchaudio specific to your system
# Install MapAnything
pip install -e .

# For all optional dependencies
# This includes external model support (VGGT, VGGT-Omega, DUSt3R, MASt3R, MUSt3R, Pi3-X, DA3, etc.)
# See "Running External Models" section for more details
# See pyproject.toml for more details on installed packages
pip install -e ".[all]"
pre-commit install

Note that we don't pin a specific version of PyTorch or CUDA in our requirements. Please feel free to install PyTorch based on your specific system.

Image-Only Inference

For metric 3D reconstruction from images without additional geometric inputs:

# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Required imports
import torch
from mapanything.models import MapAnything
from mapanything.utils.image import load_images

# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)

# Load and preprocess images from a folder or list of paths
images = "path/to/your/images/"  # or ["path/to/img1.jpg", "path/to/img2.jpg", ...]
views = load_images(images)

# Run inference
predictions = model.infer(
    views,                            # Input views
    memory_efficient_inference=True,  # Trades off speed for more views (up to 2000 views on 140 GB). Trade off is negligible - see profiling section
    minibatch_size=None,              # Minibatch size for memory-efficient inference (use 1 for smallest GPU memory consumption). Default is dynamic computation based on available GPU memory.
    use_amp=True,                     # Use mixed precision inference (recommended)
    amp_dtype="bf16",                 # bf16 inference (recommended; falls back to fp16 if bf16 not supported)
    apply_mask=True,                  # Apply masking to dense geometry outputs
    mask_edges=True,                  # Remove edge artifacts by using normals and depth
    apply_confidence_mask=False,      # Filter low-confidence regions
    confidence_percentile=10,         # Remove bottom 10 percentile confidence pixels
    use_multiview_confidence=False,   # Enable multi-view depth consistency based confidence in place of learning-based one
)

# Access results for each view - Complete list of metric outputs
for i, pred in enumerate(predictions):
    # Geometry outputs
    pts3d = pred["pts3d"]                     # 3D points in world coordinates (B, H, W, 3)
    pts3d_cam = pred["pts3d_cam"]             # 3D points in camera coordinates (B, H, W, 3)
    depth_z = pred["depth_z"]                 # Z-depth in camera frame (B, H, W, 1)
    depth_along_ray = pred["depth_along_ray"] # Depth along ray in camera frame (B, H, W, 1)

    # Camera outputs
    ray_directions = pred["ray_directions"]   # Ray directions in camera frame (B, H, W, 3)
    intrinsics = pred["intrinsics"]           # Recovered pinhole camera intrinsics (B, 3, 3)
    camera_poses = pred["camera_poses"]       # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world poses in world frame (B, 4, 4)
    cam_trans = pred["cam_trans"]             # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world translation in world frame (B, 3)
    cam_quats = pred["cam_quats"]             # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world quaternion in world frame (B, 4)

    # Quality and masking
    confidence = pred["conf"]                 # Per-pixel confidence scores (B, H, W)
    mask = pred["mask"]                       # Combined validity mask (B, H, W, 1)
    non_ambiguous_mask = pred["non_ambiguous_mask"]                # Non-ambiguous regions (B, H, W)
    non_ambiguous_mask_logits = pred["non_ambiguous_mask_logits"]  # Mask logits (B, H, W)

    # Scaling
    metric_scaling_factor = pred["metric_scaling_factor"]  # Applied metric scaling (B,)

    # Original input
    img_no_norm = pred["img_no_norm"]         # Denormalized input images for visualization (B, H, W, 3)

Multi-Modal Inference

MapAnything supports flexible combinations of geometric inputs for enhanced metric reconstruction. Steps to try it out:

Initialize the model:

# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

# Required imports
import torch
from mapanything.models import MapAnything

# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"

# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)

Initialize the inputs:

# MapAnything is extremely flexible and supports any combination of inputs.
views_example = [
    {
        # View 0: Images + Calibration
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
    },
    {
        # View 1: Images + Calibration + Depth
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
        "depth_z": depth_z, # (H, W)
        "is_metric_scale": torch.tensor([True], device=device), # (1,)
    },
    {
        # View 2: Images + Calibration + Depth + Pose
        "img": image, # (H, W, 3) - [0, 255]
        "intrinsics": intrinsics, # (3, 3)
        "depth_z": depth_z, # (H, W)
        "camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention
        "is_metric_scale": torch.tensor([True], device=device), # (1,)
    },
    ...
]

Note that MapAnything expects the input camera poses to follow the OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world convention.

Expand to show more examples

```python

Example 1: Images + Camera Intrinsics

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) }, ... ]

Example 2: Images + Intrinsics + Depth

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]

Example 3: Images + Intrinsics + Camera Poses

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]

Example 4: Images + Ray Directions + Depth (alternative to intrinsics)

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "ray_directions": ray_dirs_tensor, # (H, W, 3) "depth_z": depth_tensor, # (H, W) } ... ]

Example 5: Full Multi-Modal (Images + Intrinsics + Depth + Poses)

views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) } ... ]

Example 6: Adaptive Mixed Inputs

views_example = [ { # View 0: Images + Pose "img": images, # (H, W, 3) - [0, 255] "camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention }, { # View 1: Images + Calibration "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) }, { # View 2: Images + Calibration + Depth "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) "depth_z": depth_z, # (H, W) "is_metric_scale": torch.tensor([True], device=device), # (1,) }, { # View 3: Images + Calibration + Depth + Pose

Core symbols most depended-on inside this repo

called by 643

mapanything/utils/train_tools.py

apply_log_to_norm

called by 60

mapanything/utils/geometry.py

store_data

called by 57

mapanything/utils/wai/core.py

load_data

called by 52

mapanything/utils/wai/core.py

update

called by 47

mapanything/utils/train_tools.py

view_name

called by 30

mapanything/datasets/base/base_dataset.py

resize

called by 25

mapanything/utils/cropping.py

load_state_dict

called by 25

mapanything/models/external/pow3r/__init__.py

Shape

Function 680

Method 586

Class 182

Languages

Python100%

Modules by API surface

mapanything/train/losses.py90 symbols

mapanything/utils/geometry.py50 symbols

data_processing/aggregate_scene_names.py50 symbols

mapanything/utils/train_tools.py41 symbols

mapanything/utils/timing.py36 symbols

mapanything/utils/wai/io.py32 symbols

mapanything/datasets/base/easy_dataset.py29 symbols

mapanything/models/external/pow3r/__init__.py26 symbols

mapanything/models/external/pi3/layers/block.py26 symbols

scripts/gradio_app.py25 symbols

mapanything/models/external/moge/models/utils.py24 symbols

mapanything/utils/colmap.py22 symbols

Dependencies from manifests, versioned

argconf1×

cython1×

einops1×

huggingface_hub1×

hydra-core1×

imageio1×

natsort1×

ninja1×

numpy1×

omegaconf2.4.0.dev3 · 1×

open3d1×

opencv-python-headless4.10.0.84 · 1×

For agents

$ claude mcp add map-anything \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/facebookresearch/map-anything @v1.1.2 sqlite

MapAnything: Universal Feed-Forward Metric 3D Reconstruction

Overview

Table of Contents

Quick Start

Installation

Image-Only Inference

Multi-Modal Inference

Example 1: Images + Camera Intrinsics

Example 2: Images + Intrinsics + Depth

Example 3: Images + Intrinsics + Camera Poses

Example 4: Images + Ray Directions + Depth (alternative to intrinsics)

Example 5: Full Multi-Modal (Images + Intrinsics + Depth + Poses)

Example 6: Adaptive Mixed Inputs

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents