MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / block_double_dequantize

Function block_double_dequantize

tensorrt_llm/quantization/functional.py:1430–1458  ·  view source on GitHub ↗

Parameters: x : Tensor (On GPU) The input tensor. scale : Tensor (On GPU) The block scale tensor. double_scale : Tensor (On GPU) The global per-tensor scaling factor. It should contain only 1 element. dtype : trt.DataType | str

(x: Tensor,
                            scale: Tensor,
                            double_scale: Tensor,
                            dtype: trt.DataType | str = 'float16')

Source from the content-addressed store, hash-verified

1428
1429
1430def block_double_dequantize(x: Tensor,
1431 scale: Tensor,
1432 double_scale: Tensor,
1433 dtype: trt.DataType | str = 'float16') -> Tensor:
1434 '''
1435 Parameters:
1436 x : Tensor (On GPU)
1437 The input tensor.
1438 scale : Tensor (On GPU)
1439 The block scale tensor.
1440 double_scale : Tensor (On GPU)
1441 The global per-tensor scaling factor. It should contain only 1 element.
1442 dtype : trt.DataType | str
1443 The data type for dequantized data. Default is float32.
1444 Returns:
1445 The dequantized tensor.
1446 '''
1447 if isinstance(dtype, str):
1448 dtype = str_dtype_to_trt(dtype)
1449 dequantize_scale_layer = default_trtnet().add_dequantize(
1450 scale.trt_tensor, double_scale.trt_tensor, dtype)
1451 scale = _create_tensor(dequantize_scale_layer.get_output(0),
1452 dequantize_scale_layer)
1453
1454 dequantize_data_layer = default_trtnet().add_dequantize(
1455 x.trt_tensor, scale.trt_tensor, dtype)
1456 dequantize_data = _create_tensor(dequantize_data_layer.get_output(0),
1457 dequantize_data_layer)
1458 return dequantize_data

Callers 2

forwardMethod · 0.85
forwardMethod · 0.85

Calls 4

str_dtype_to_trtFunction · 0.85
default_trtnetFunction · 0.85
_create_tensorFunction · 0.85
get_outputMethod · 0.45

Tested by

no test coverage detected