hub / github.com/Robbyant/lingbot-map

github.com/Robbyant/lingbot-map @main sqlite

repository ↗ · DeepWiki ↗

1,157 symbols 3,845 edges 125 files 743 documented · 64%

README

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

Robbyant Team

https://github.com/user-attachments/assets/fe39e095-af2c-4ec9-b68d-a8ba97e505ab

🗺️ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! 🏗️🌍

LingBot-Map has focused on:

Geometric Context Transformer: Architecturally unifies coordinate grounding, dense geometric cues, and long-range drift correction within a single streaming framework through anchor context, pose-reference window, and trajectory memory.
High-Efficiency Streaming Inference: A feed-forward architecture with paged KV cache attention, enabling stable inference at ~20 FPS on 518×378 resolution over long sequences exceeding 10,000 frames.
State-of-the-Art Reconstruction: Superior performance on diverse benchmarks compared to both existing streaming and iterative optimization-based approaches.

📑 Table of Contents

Click to expand

📰 News
📋 TODO
⚙️ Installation
📦 Model Download
🚀 Quick Start
🎬 Interactive Demo (demo.py)
Try the Example Scenes
Streaming with Keyframe Interval
Windowed Inference (for long sequences, >3000 frames)
Sky Masking
Visualization Options
Performance & Memory
🎥 Offline Rendering Pipeline (demo_render/batch_demo.py)
📜 License
📖 Citation
✨ Acknowledgments

📰 News

2026-06-28 — Fixed an SDPA KV cache bug. The SDPA backend now performs better on long sequences. We still recommend the FlashInfer backend for the best performance.
2026-05-25 — 📊 Evaluation benchmark released. We released the evaluation scripts for KITTI and Oxford Spires — see benchmark/ for the pipeline, and run preprocess/oxford.py to prepare Oxford Spires data before evaluation.
2026-04-29 — 📹 Long-video demo released. We released a very-long-video example (~25 000 frames, 13-minute indoor walkthrough) rendered with the offline pipeline — see Worked Example for the command, flag rationale, and rendered output.
2026-04-27 — 🚀 LingBot-Map accelerated. Pull the latest main and run python demo.py --compile ... or python gct_profile.py --backend flashinfer --dtype bf16 --compile to verify on your hardware.
2026-04-24 — Fixed a FlashInfer KV cache bug where --keyframe_interval > 1 silently cached non-keyframes. You should now see better pose and reconstruction quality when running with more than 320 frames.

📋 TODO

✅ Release evaluation benchmark
✅ Oxford Spires dataset
✅ KITTI dataset
✅ VBR dataset
✅ Droid-W dataset
✅ TUM-D dataset
✅ 7-scenes dataset
✅ ETH3D dataset
✅ Tanks and Temples dataset
✅ NRGBD dataset
✅ Release demo scripts
✅ Indoor long-video demo (Featured indoor walkthrough)
✅ Outdoor long-video demo
✅ LingBot-World demo (Worked example)
✅ Aerial long-video demo

⚙️ Installation

1. Create conda environment

conda create -n lingbot-map python=3.10 -y
conda activate lingbot-map

2. Install PyTorch (CUDA 12.8)

pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu128

PyTorch 2.8.0 is the recommended version because NVIDIA Kaolin (required by the batch rendering pipeline) has prebuilt wheels for torch-2.8.0_cu128. If you only need demo.py you may use a newer PyTorch, but the batch renderer then requires building Kaolin from source. For other CUDA versions, see PyTorch Get Started.

3. Install lingbot-map

pip install -e .

4. Install FlashInfer (recommended)

FlashInfer provides paged KV cache attention for efficient streaming inference. It is a pure-Python package that JIT-compiles CUDA kernels on first use, so a single wheel works across CUDA/PyTorch versions:

pip install --index-url https://pypi.org/simple flashinfer-python

--index-url https://pypi.org/simple is only needed if your default pip index is an internal mirror that doesn't have flashinfer-python. (Optional) For faster first-use, you can additionally install a CUDA-specific JIT cache: pip install flashinfer-jit-cache -f https://flashinfer.ai/whl/cu128/flashinfer-jit-cache/. See FlashInfer installation for details. If FlashInfer is not installed, the model falls back to SDPA (PyTorch native attention) via --use_sdpa.

5. Visualization dependencies (optional)

pip install -e ".[vis]"

📦 Model Download

Model Name	Huggingface Repository	ModelScope Repository	Description
lingbot-map-long	robbyant/lingbot-map	Robbyant/lingbot-map	Better suited for long sequences and large scale scenes (Recommend).
lingbot-map	robbyant/lingbot-map	Robbyant/lingbot-map	Balanced checkpoint — trade off all-around performance across short and long sequences.
lingbot-map-stage1	robbyant/lingbot-map	Robbyant/lingbot-map	Stage-1 training checkpoint of lingbot-map — can be loaded into the VGGT model for bidirectional inference (c2w).

🚧 Coming soon: we're training an stronger model that supports longer sequences — stay tuned.

🚀 Quick Start

After installation, run your first scene with one command:

python demo.py --model_path /path/to/lingbot-map-long.pt \
    --image_folder example/courthouse --mask_sky

This launches an interactive viser viewer at http://localhost:8080. See Interactive Demo below for the full set of scenes and flags, or jump to Offline Rendering Pipeline for long-sequence batch rendering.

🎬 Interactive Demo (`demo.py`)

Run demo.py for interactive 3D visualization via a browser-based viser viewer (default http://localhost:8080).

Try the Example Scenes

We provide four example scenes in example/ that you can run out of the box:

# courthouse scene
python demo.py --model_path /path/to/lingbot-map-long.pt \
    --image_folder example/courthouse --mask_sky

https://github.com/user-attachments/assets/aa10f7ab-8024-43c7-92f8-d56159ec85c8

# University scene
python demo.py --model_path /path/to/lingbot-map-long.pt \
    --image_folder example/university --mask_sky

https://github.com/user-attachments/assets/212a1744-6ff5-4ccf-9bd4-728608248b57

# Loop scene (loop closure trajectory)
python demo.py --model_path /path/to/lingbot-map-long.pt \
    --image_folder example/loop

https://github.com/user-attachments/assets/5ae0a292-b081-40c6-838c-b7c1a0538d75

# Oxford scene with sky masking (outdoor, large scale scene)
python demo.py --model_path /path/to/lingbot-map-long.pt \
    --image_folder example/oxford --mask_sky

https://github.com/user-attachments/assets/6b8daa95-9ed4-40b2-9902-7435779b886d

🎯 Featured: indoor walkthrough (~25 000 frames, 13 minutes)

Sequence is too long for the interactive viser viewer — this clip was rendered with the Offline Rendering Pipeline. See that section for the full command.

We will provide more examples in the follow-up.

Streaming with Keyframe Interval

Use --keyframe_interval to reduce KV cache memory by only keeping every N-th frame as a keyframe. Non-keyframe frames still produce predictions but are not stored in the cache. This is useful for long sequences which exceed 320 frames (We train with video RoPE on 320 views, so performance degrades when the KV cache stores more than 320 views. Using a keyframe strategy allows inference over longer sequences.).

Dataset: Download the demo sequences from robbyant/lingbot-map-demo on Hugging Face.

Example run on the travel sequence from the dataset above (sky masking on, 4 camera optimization iterations, keyframe every 2 frames):

python demo.py \
    --image_folder /path/to/lingbot-map-demo/travel/ \
    --model_path /path/to/lingbot-map-long.pt \
    --mask_sky \
    --camera_num_iterations 4 \
    --keyframe_interval 2

https://github.com/user-attachments/assets/d350b590-d036-4363-af8c-7af3918338ef

Note on inference range. Our method does not perform state resetting by default, so the maximum inference range is bounded by the longest distance seen during training on the dataset. Beyond that distance, state resetting becomes necessary. If you observe pose collapse, switch to windowed mode (--mode windowed) — in most cases tuning --keyframe_interval alone is enough and the rest of the windowed parameters can stay at their defaults.

Windowed Inference (for long sequences, >3000 frames)

python demo.py --model_path /path/to/lingbot-map-long.pt \
    --video_path video.mp4 --fps 10 \
    --mode windowed --window_size 128 --overlap_keyframes 16 --keyframe_interval 2

Sky Masking

Sky masking uses an ONNX sky segmentation model to filter out sky points from the reconstructed point cloud, which improves visualization quality for outdoor scenes.

Setup:

# Install onnxruntime (required)
pip install onnxruntime        # CPU
# or
pip install onnxruntime-gpu    # GPU (faster for large image sets)

The sky segmentation model (skyseg.onnx) will be automatically downloaded from HuggingFace on first use.

Usage:

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky

Sky masks are cached in <image_folder>_sky_masks/ so subsequent runs skip regeneration. You can also specify a custom cache directory with --sky_mask_dir, or save side-by-side mask visualizations with --sky_mask_visualization_dir:

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --mask_sky \
    --sky_mask_dir /path/to/cached_masks/ \
    --sky_mask_visualization_dir /path/to/mask_viz/

Visualization Options

Argument	Default	Description
`--port`	`8080`	Viser viewer port
`--conf_threshold`	`1.5`	Visibility threshold for filtering low-confidence points
`--point_size`	`0.00001`	Point cloud point size
`--downsample_factor`	`10`	Spatial downsampling for point cloud display

Performance & Memory

Without FlashInfer (SDPA fallback)

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --use_sdpa

Running on Limited GPU Memory

If you run into out-of-memory issues, try one (or both) of the following:

--offload_to_cpu — offload per-frame predictions to CPU during inference (on by default; use --no-offload_to_cpu only if you have memory to spare).
--num_scale_frames 2 — reduce the number of bidirectional scale frames from the default 8 down to 2, which shrinks the activation peak of the initial scale phase.

Faster Inference

Lower the number of iterative refinement steps in the camera head to trade a small amount of pose accuracy for wall-clock speed:

python demo.py --model_path /path/to/checkpoint.pt \
    --image_folder /path/to/images/ --camera_num_iterations 1

--camera_num_iterations defaults to 4; setting it to 1 skips three refinement passes in the camera head (and shrinks its KV cache by 4×).

🎥 Offline Rendering Pipeline (`demo_render/batch_demo.py`)

Use this pipeline when your sequence is too long for the interactive viser viewer — for example, the indoor walkthrough featured above. demo_render/batch_demo.py is the all-in-one offline entry point: feed it a video or a folder of images and it will run model inference and produce a headless point-cloud flythrough MP4 in a single command. It shares the same PyTorch / FlashInfer / checkpoint stack a

Core symbols most depended-on inside this repo

exists

called by 103

benchmark/benchmark/core/storage.py

add

called by 25

demo_render/interactive_viewer/camera.py

copy

called by 22

benchmark/benchmark/core/loader.py

escapeAttr

called by 19

benchmark/benchmark/report/templates/app.js

clear

called by 19

demo_render/interactive_viewer/camera.py

copy

called by 18

demo_render/interactive_viewer/camera.py

forward

called by 16

lingbot_map/models/gct_base.py

load

called by 14

benchmark/viewer.py

Shape

Method 625

Function 433

Class 99

Languages

Python94%

TypeScript6%

Modules by API surface

benchmark/benchmark/report/templates/app.js65 symbols

benchmark/viewer.py61 symbols

demo_render/batch_demo.py33 symbols

lingbot_map/vis/point_cloud_viewer.py31 symbols

demo_render/rgbd_render/camera.py31 symbols

demo_render/interactive_viewer/camera.py31 symbols

benchmark/benchmark/core/storage.py29 symbols

preprocess/oxford.py26 symbols

lingbot_map/models/gct_stream_window_v2.py25 symbols

lingbot_map/layers/block.py25 symbols

lingbot_map/utils/geometry.py24 symbols

lingbot_map/models/gct_stream_window.py23 symbols

Dependencies from manifests, versioned

Pillow1×

einops1×

huggingface_hub1×

numpy1.26.4 · 1×

onnxruntime-gpu1.23.2 · 1×

open3d0.19.0 · 1×

opencv-python4.11.0.86 · 1×

pyyaml6.0.2 · 1×

safetensors1×

scipy1×

tqdm4.67.1 · 1×

For agents

$ claude mcp add lingbot-map \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact

github.com/Robbyant/lingbot-map @main sqlite

LingBot-Map: Geometric Context Transformer for Streaming 3D Reconstruction

🗺️ Meet LingBot-Map! We've built a feed-forward 3D foundation model for streaming 3D reconstruction! 🏗️🌍

📑 Table of Contents

📰 News

📋 TODO

⚙️ Installation

📦 Model Download

🚀 Quick Start

🎬 Interactive Demo (demo.py)

Try the Example Scenes

🎯 Featured: indoor walkthrough (~25 000 frames, 13 minutes)

Streaming with Keyframe Interval

Windowed Inference (for long sequences, >3000 frames)

Sky Masking

Visualization Options

Performance & Memory

Without FlashInfer (SDPA fallback)

Running on Limited GPU Memory

Faster Inference

🎥 Offline Rendering Pipeline (demo_render/batch_demo.py)

Core symbols most depended-on inside this repo

Shape

Languages

Modules by API surface

Dependencies from manifests, versioned

For agents

🎬 Interactive Demo (`demo.py`)

🎥 Offline Rendering Pipeline (`demo_render/batch_demo.py`)