MCPcopy
hub / github.com/NVIDIA-AI-IOT/Lidar_AI_Solution

github.com/NVIDIA-AI-IOT/Lidar_AI_Solution @main sqlite

repository ↗ · DeepWiki ↗
683 symbols 2,048 edges 64 files 30 documented · 4%
README

Lidar AI Solution

This is a highly optimized solution for self-driving 3D-lidar repository. It does a great job of speeding up sparse convolution/CenterPoint/BEVFusion/OSD/Conversion.

title

Pipeline overview

pipeline

GetStart

$ git clone --recursive https://github.com/NVIDIA-AI-IOT/Lidar_AI_Solution
$ cd Lidar_AI_Solution
  • For each specific task please refer to the readme in the sub-folder.

3D Sparse Convolution

A tiny inference engine for 3d sparse convolutional networks using int8/fp16. - Tiny Engine: Tiny Lidar-Backbone inference engine independent of TensorRT. - Flexible: Build execution graph from ONNX. - Easy To Use: Simple interface and onnx export solution. - High Fidelity: Low accuracy drop on nuScenes validation. - Low Memory: 422MB@SCN FP16, 426MB@SCN INT8. - Compact: Based on the CUDA kernels and independent of cutlass.

CUDA BEVFusion

CUDA & TensorRT solution for BEVFusion inference, including: - Camera Encoder: ResNet50 and finetuned BEV pooling with TensorRT and onnx export solution. - Lidar Encoder: Tiny Lidar-Backbone inference independent of TensorRT and onnx export solution. - Feature Fusion: Camera & Lidar feature fuser with TensorRT and onnx export solution. - Pre/Postprocess: Interval precomputing, lidar voxelization, feature decoder with CUDA kernels. - Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy. - PTQ: Quantization solutions for mmdet3d/spconv, Easy to understand.

CUDA CenterPoint

CUDA & TensorRT solution for CenterPoint inference, including: - Preprocess: Voxelization with CUDA kernel - Encoder: 3D backbone with NV spconv-scn and onnx export solution. - Neck & Header: RPN & CenterHead with TensorRT and onnx export solution. - Postprocess: Decode & NMS with CUDA kernel - Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy. - QAT: Quantization solutions for traveller59/spconv, Easy to understand.

CUDA PointPillars

CUDA & TensorRT solution for pointpillars inference, including: - Preprocess: Voxelization & Feature Extending with CUDA kernel - Detector: 2.5D backbone with TensorRT and onnx export solution. - Postprocess: Parse bounding box, class type and direction - Easy To Use: Preparation, inference, evaluation all in one to reproduce torch Impl accuracy.

CUDA-V2XFusion

Training and inference solutions for V2XFusion. - Easy To Use: Provides easily reproducible solutions for training, quantization, and ONNX export. - Quantification friendly:PointPillars based backbone with pre-normalization which can reduce quantization error. - Feature Fusion: Camera & Lidar feature fuser and onnx export solution. - PTQ: Quantization solutions for V2XFusion, easy to understand. - Sparsity: 4:2 structural sparsity support. - Deepstream sample: Sample inference using CUDA, TensorRT/Triton in NVIDIA DeepStream SDK 7.0.

cuOSD(CUDA On-Screen Display Library)

Draw all elements using a single CUDA kernel. - Line: Plotting lines by interpolation(Nearest or Linear). - RotateBox: Supports drawn with different border colors and fill colors. - Circle: Supports drawn with different border colors and fill colors. - Rectangle: Supports drawn with different border colors and fill colors. - Text: Supports stb_truetype and pango-cairo backends, allowing fonts to be read via TTF or using font-family. - Arrow: Combination of arrows by 3 lines. - Point: Plotting points by interpolation(Nearest or Linear). - Clock: Time plotting based on text support

cuPCL(CUDA Point Cloud Library)

Provide several GPU accelerated Point Cloud operations with high accuracy and high performance at the same time: cuICP, cuFilter, cuSegmentation, cuOctree, cuCluster, cuNDT, Voxelization(incoming). - cuICP: CUDA accelerated iterative corresponding point vertex cloud(point-to-point) registration implementation. - cuFilter: Support CUDA accelerated features: PassThrough and VoxelGrid. - cuSegmentation: Support CUDA accelerated features: RandomSampleConsensus with a plane model. - cuOctree: Support CUDA accelerated features: Approximate Nearest Search and Radius Search. - cuCluster: Support CUDA accelerated features: Cluster based on the distance among points. - cuNDT: CUDA accelerated 3D Normal Distribution Transform registration implementation for point cloud data.

YUVToRGB(CUDA Conversion)

YUV to RGB conversion. Combine Resize/Padding/Conversion/Normalization into a single kernel function. - Most of the time, it can be bit-aligned with OpenCV. - It will give an exact result when the scaling factor is a rational number. - Better performance is usually achieved when the stride can divide by 4. - Supported Input Format: - NV12BlockLinear - NV12PitchLinear - YUV422Packed_YUYV - Supported Interpolation methods: - Nearest - Bilinear - Supported Output Data Type: - Uint8 - Float32 - Float16 - Supported Output Layout: - CHW_RGB/BGR - HWC_RGB/BGR - CHW16/32/4/RGB/BGR for DLA input - Supported Features: - Resize - Padding - Conversion - Normalization

ROI Conversion (ROIs To Continuous Tensor Conversion)

Combine Resize/Padding/Conversion/Normalization into a single kernel function. - Most of the time, it can be bit-aligned with OpenCV. - It will give an exact result when the scaling factor is a rational number. - Better performance is usually achieved when the stride can divide by 4. - Supported Input Format: - NV12BlockLinear - NV12PitchLinear - YUV422Packed_YUYV - Supported Interpolation methods: - Nearest - Bilinear - Supported Output Data Type: - Uint8 - Float32 - Float16 - Supported Output Layout: - CHW_RGB/BGR - HWC_RGB/BGR - CHW16/32/4/RGB/BGR for DLA input - Gray - Supported Features: - Resize - Padding - Conversion - Normalization

Thanks

This project makes use of a number of awesome open source libraries, including:

  • stb_image for PNG and JPEG support
  • pybind11 for seamless C++ / Python interop
  • and others! See the dependencies folder.

Many thanks to the authors of these brilliant projects!

Core symbols most depended-on inside this repo

get_tensor_id
called by 37
CUDA-CenterPoint/qat/onnx_export/exptool.py
get_tensor_id
called by 37
CUDA-BEVFusion/qat/lean/exptool.py
get_tensor_id
called by 27
libraries/3DSparseConvolution/tool/centerpoint-export/exptool.py
get_tensor_id
called by 27
libraries/3DSparseConvolution/tool/bevfusion-export/exptool.py
build_dataset
called by 20
CUDA-V2XFusion/mmdet3d/datasets/builder.py
init_quantizer
called by 16
CUDA-V2XFusion/scripts/tinyq.py
log
called by 12
CUDA-V2XFusion/scripts/tinyq.py
apply
called by 11
CUDA-CenterPoint/qat/tools/sparseconv_quantization.py

Shape

Function 317
Method 261
Class 103
Route 2

Languages

Python100%

Modules by API surface

CUDA-V2XFusion/scripts/tinyq.py159 symbols
CUDA-BEVFusion/qat/lean/quantize.py47 symbols
CUDA-V2XFusion/scripts/quantize.py40 symbols
CUDA-CenterPoint/qat/tools/sparseconv_quantization.py31 symbols
CUDA-V2XFusion/mmdet3d/datasets/v2x_dataset.py30 symbols
libraries/cuOSD/test/cuosd/__init__.py24 symbols
CUDA-V2XFusion/mmdet3d/models/vtransforms/base.py24 symbols
CUDA-BEVFusion/qat/lean/exptool.py22 symbols
libraries/3DSparseConvolution/tool/bevfusion-export/exptool.py20 symbols
CUDA-V2XFusion/scripts/export_v2xfusion.py19 symbols
CUDA-V2XFusion/mmdet3d/models/backbones/pillar_encoder.py17 symbols
CUDA-CenterPoint/qat/onnx_export/exptool.py17 symbols

Dependencies from manifests, versioned

addict2.4.0 · 1×
motmetrics1.4.0 · 1×
numpy1.22.3 · 1×
onnx1.12.0 · 1×
onnx_simplifier0.4.8 · 1×
onnxruntime1.13.1 · 1×
onnxsim0.4.10 · 1×
pyquaternion0.9.9 · 1×
shapely1.8.0 · 1×
spconv_cu1142.2.6 · 1×
terminaltables3.1.10 · 1×
torch1.11.0 · 1×

For agents

$ claude mcp add Lidar_AI_Solution \
  -- python -m otcore.mcp_server <graph>

⬇ download graph artifact