MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / dynamic_quantize

Function dynamic_quantize

tensorrt_llm/quantization/functional.py:1396–1427  ·  view source on GitHub ↗

Parameters: x : Tensor (On GPU) The input tensor. double_scale : Tensor (On GPU) The global per-tensor scaling factor. It should contain only 1 element. axis : int The axis to quantize. Default is -1 (the last axis). block_size

(
        x: Tensor,
        double_scale: Tensor,
        axis: int = -1,
        block_size: int = 16,
        data_qtype: trt.DataType = trt.fp4,
        scale_qtype: trt.DataType = trt.fp8)

Source from the content-addressed store, hash-verified

1394
1395
1396def dynamic_quantize(
1397 x: Tensor,
1398 double_scale: Tensor,
1399 axis: int = -1,
1400 block_size: int = 16,
1401 data_qtype: trt.DataType = trt.fp4,
1402 scale_qtype: trt.DataType = trt.fp8) -> Tuple[Tensor, Tensor]:
1403 '''
1404 Parameters:
1405 x : Tensor (On GPU)
1406 The input tensor.
1407 double_scale : Tensor (On GPU)
1408 The global per-tensor scaling factor. It should contain only 1 element.
1409 axis : int
1410 The axis to quantize. Default is -1 (the last axis).
1411 block_size : int
1412 The block size for quantization. Default is 16.
1413 data_qtype : trt.DataType
1414 The data type for quantized data. Default is FP4.
1415 scale_qtype : trt.DataType
1416 The data type for block scale. Default is FP8.
1417 Returns:
1418 A tuple of two tensors: quantized tensor and block scale tensor.
1419 '''
1420 if axis < 0:
1421 axis = len(x.shape) + axis
1422 dynq = default_trtnet().add_dynamic_quantize(x.trt_tensor, axis, block_size,
1423 data_qtype, scale_qtype)
1424 dynq.set_input(1, double_scale.trt_tensor)
1425 quantized = _create_tensor(dynq.get_output(0), dynq)
1426 scale = _create_tensor(dynq.get_output(1), dynq)
1427 return quantized, scale
1428
1429
1430def block_double_dequantize(x: Tensor,

Callers 2

forwardMethod · 0.85
forwardMethod · 0.85

Calls 3

default_trtnetFunction · 0.85
_create_tensorFunction · 0.85
get_outputMethod · 0.45

Tested by

no test coverage detected