MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / quantize

Function quantize

tensorrt_llm/quantization/functional.py:755–766  ·  view source on GitHub ↗
(input: Tensor,
             scale_factor: Tensor,
             dtype: str,
             axis: int = -1)

Source from the content-addressed store, hash-verified

753
754
755def quantize(input: Tensor,
756 scale_factor: Tensor,
757 dtype: str,
758 axis: int = -1) -> Tensor:
759 layer = default_trtnet().add_quantize(input.trt_tensor,
760 scale_factor.trt_tensor,
761 str_dtype_to_trt(dtype))
762 layer.axis = axis
763
764 output = _create_tensor(layer.get_output(0), layer)
765
766 return output
767
768
769def dequantize(input: Tensor,

Calls 4

default_trtnetFunction · 0.85
str_dtype_to_trtFunction · 0.85
_create_tensorFunction · 0.85
get_outputMethod · 0.45

Tested by 2