Nikhil Keetha1,2 Norman Müller1 Johannes Schönberger1 Lorenzo Porzi1 Yuchen Zhang2
Tobias Fischer1 Arno Knapitsch1 Duncan Zauss1 Ethan Weber1 Nelson Antunes1
Jonathon Luiten1 Manuel Lopez-Antequera1 Samuel Rota Bulò1 Christian Richardt1
Deva Ramanan2 Sebastian Scherer2 Peter Kontschieder1
1 Meta 2 Carnegie Mellon University
MapAnything is an open-source research framework for universal metric 3D reconstruction. At its core is a simple, end-to-end trained transformer model that directly regresses the factored metric 3D geometry of a scene given various types of inputs (images, calibration, poses, or depth). A single feed-forward model supports over 12 different 3D reconstruction tasks including multi-image sfm, multi-view stereo, monocular metric depth estimation, registration, depth completion and more.
The framework provides the complete stack—data processing, training, inference, and profiling—with a modular design that allows different 3D reconstruction models (VGGT, DUSt3R, MASt3R, MUSt3R, Pi3-X, and more) to be used interchangeably through a unified interface.

git clone https://github.com/facebookresearch/map-anything.git
cd map-anything
# Create and activate conda environment
conda create -n mapanything python=3.12 -y
conda activate mapanything
# Optional: Install torch, torchvision & torchaudio specific to your system
# Install MapAnything
pip install -e .
# For all optional dependencies
# This includes external model support (VGGT, VGGT-Omega, DUSt3R, MASt3R, MUSt3R, Pi3-X, DA3, etc.)
# See "Running External Models" section for more details
# See pyproject.toml for more details on installed packages
pip install -e ".[all]"
pre-commit install
Note that we don't pin a specific version of PyTorch or CUDA in our requirements. Please feel free to install PyTorch based on your specific system.
For metric 3D reconstruction from images without additional geometric inputs:
# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
# Required imports
import torch
from mapanything.models import MapAnything
from mapanything.utils.image import load_images
# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)
# Load and preprocess images from a folder or list of paths
images = "path/to/your/images/" # or ["path/to/img1.jpg", "path/to/img2.jpg", ...]
views = load_images(images)
# Run inference
predictions = model.infer(
views, # Input views
memory_efficient_inference=True, # Trades off speed for more views (up to 2000 views on 140 GB). Trade off is negligible - see profiling section
minibatch_size=None, # Minibatch size for memory-efficient inference (use 1 for smallest GPU memory consumption). Default is dynamic computation based on available GPU memory.
use_amp=True, # Use mixed precision inference (recommended)
amp_dtype="bf16", # bf16 inference (recommended; falls back to fp16 if bf16 not supported)
apply_mask=True, # Apply masking to dense geometry outputs
mask_edges=True, # Remove edge artifacts by using normals and depth
apply_confidence_mask=False, # Filter low-confidence regions
confidence_percentile=10, # Remove bottom 10 percentile confidence pixels
use_multiview_confidence=False, # Enable multi-view depth consistency based confidence in place of learning-based one
)
# Access results for each view - Complete list of metric outputs
for i, pred in enumerate(predictions):
# Geometry outputs
pts3d = pred["pts3d"] # 3D points in world coordinates (B, H, W, 3)
pts3d_cam = pred["pts3d_cam"] # 3D points in camera coordinates (B, H, W, 3)
depth_z = pred["depth_z"] # Z-depth in camera frame (B, H, W, 1)
depth_along_ray = pred["depth_along_ray"] # Depth along ray in camera frame (B, H, W, 1)
# Camera outputs
ray_directions = pred["ray_directions"] # Ray directions in camera frame (B, H, W, 3)
intrinsics = pred["intrinsics"] # Recovered pinhole camera intrinsics (B, 3, 3)
camera_poses = pred["camera_poses"] # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world poses in world frame (B, 4, 4)
cam_trans = pred["cam_trans"] # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world translation in world frame (B, 3)
cam_quats = pred["cam_quats"] # OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world quaternion in world frame (B, 4)
# Quality and masking
confidence = pred["conf"] # Per-pixel confidence scores (B, H, W)
mask = pred["mask"] # Combined validity mask (B, H, W, 1)
non_ambiguous_mask = pred["non_ambiguous_mask"] # Non-ambiguous regions (B, H, W)
non_ambiguous_mask_logits = pred["non_ambiguous_mask_logits"] # Mask logits (B, H, W)
# Scaling
metric_scaling_factor = pred["metric_scaling_factor"] # Applied metric scaling (B,)
# Original input
img_no_norm = pred["img_no_norm"] # Denormalized input images for visualization (B, H, W, 3)
MapAnything supports flexible combinations of geometric inputs for enhanced metric reconstruction. Steps to try it out:
Initialize the model:
# Optional config for better memory efficiency
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
# Required imports
import torch
from mapanything.models import MapAnything
# Get inference device
device = "cuda" if torch.cuda.is_available() else "cpu"
# Init model - This requires internet access or the huggingface hub cache to be pre-downloaded
# For Apache 2.0 license model, use "facebook/map-anything-apache"
model = MapAnything.from_pretrained("facebook/map-anything").to(device)
Initialize the inputs:
# MapAnything is extremely flexible and supports any combination of inputs.
views_example = [
{
# View 0: Images + Calibration
"img": image, # (H, W, 3) - [0, 255]
"intrinsics": intrinsics, # (3, 3)
},
{
# View 1: Images + Calibration + Depth
"img": image, # (H, W, 3) - [0, 255]
"intrinsics": intrinsics, # (3, 3)
"depth_z": depth_z, # (H, W)
"is_metric_scale": torch.tensor([True], device=device), # (1,)
},
{
# View 2: Images + Calibration + Depth + Pose
"img": image, # (H, W, 3) - [0, 255]
"intrinsics": intrinsics, # (3, 3)
"depth_z": depth_z, # (H, W)
"camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention
"is_metric_scale": torch.tensor([True], device=device), # (1,)
},
...
]
Note that MapAnything expects the input camera poses to follow the OpenCV (+X - Right, +Y - Down, +Z - Forward) cam2world convention.
Expand to show more examples
```python
views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) }, ... ]
views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]
views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) }, ... ]
views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "ray_directions": ray_dirs_tensor, # (H, W, 3) "depth_z": depth_tensor, # (H, W) } ... ]
views_example = [ { "img": image_tensor, # (H, W, 3) - [0, 255] "intrinsics": intrinsics_tensor, # (3, 3) "depth_z": depth_tensor, # (H, W) "camera_poses": pose_matrices, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention "is_metric_scale": torch.tensor([True]), # (1,) } ... ]
views_example = [ { # View 0: Images + Pose "img": images, # (H, W, 3) - [0, 255] "camera_poses": camera_poses, # (4, 4) or tuple of (quats, trans) in OpenCV cam2world convention }, { # View 1: Images + Calibration "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) }, { # View 2: Images + Calibration + Depth "img": images, # (H, W, 3) - [0, 255] "intrinsics": intrinsics, # (3, 3) "depth_z": depth_z, # (H, W) "is_metric_scale": torch.tensor([True], device=device), # (1,) }, { # View 3: Images + Calibration + Depth + Pose
$ claude mcp add map-anything \
-- python -m otcore.mcp_server <graph>