Function quantize

tensorrt_llm/quantization/functional.py:755–766 · view source on GitHub ↗

(input: Tensor,
             scale_factor: Tensor,
             dtype: str,
             axis: int = -1)

Source from the content-addressed store, hash-verified

753
754
755	def quantize(input: Tensor,
756	scale_factor: Tensor,
757	dtype: str,
758	axis: int = -1) -> Tensor:
759	layer = default_trtnet().add_quantize(input.trt_tensor,
760	scale_factor.trt_tensor,
761	str_dtype_to_trt(dtype))
762	layer.axis = axis
763
764	output = _create_tensor(layer.get_output(0), layer)
765
766	return output
767
768
769	def dequantize(input: Tensor,

test_quantize_per_tensorMethod · 0.90

test_quantize_per_channelMethod · 0.90

weight_only_quant_matmulFunction · 0.70

weight_only_groupwise_quant_matmulFunction · 0.70

default_trtnetFunction · 0.85

str_dtype_to_trtFunction · 0.85

_create_tensorFunction · 0.85

get_outputMethod · 0.45

test_quantize_per_tensorMethod · 0.72

test_quantize_per_channelMethod · 0.72