hub / github.com/NVIDIA/TensorRT-LLM / block_double_dequantize

Function block_double_dequantize

tensorrt_llm/quantization/functional.py:1430–1458 · view source on GitHub ↗

Parameters: x : Tensor (On GPU) The input tensor. scale : Tensor (On GPU) The block scale tensor. double_scale : Tensor (On GPU) The global per-tensor scaling factor. It should contain only 1 element. dtype : trt.DataType | str

(x: Tensor,
                            scale: Tensor,
                            double_scale: Tensor,
                            dtype: trt.DataType | str = 'float16')

Source from the content-addressed store, hash-verified

1428
1429
1430	def block_double_dequantize(x: Tensor,
1431	scale: Tensor,
1432	double_scale: Tensor,
1433	dtype: trt.DataType \| str = 'float16') -> Tensor:
1434	'''
1435	Parameters:
1436	x : Tensor (On GPU)
1437	The input tensor.
1438	scale : Tensor (On GPU)
1439	The block scale tensor.
1440	double_scale : Tensor (On GPU)
1441	The global per-tensor scaling factor. It should contain only 1 element.
1442	dtype : trt.DataType \| str
1443	The data type for dequantized data. Default is float32.
1444	Returns:
1445	The dequantized tensor.
1446	'''
1447	if isinstance(dtype, str):
1448	dtype = str_dtype_to_trt(dtype)
1449	dequantize_scale_layer = default_trtnet().add_dequantize(
1450	scale.trt_tensor, double_scale.trt_tensor, dtype)
1451	scale = _create_tensor(dequantize_scale_layer.get_output(0),
1452	dequantize_scale_layer)
1453
1454	dequantize_data_layer = default_trtnet().add_dequantize(
1455	x.trt_tensor, scale.trt_tensor, dtype)
1456	dequantize_data = _create_tensor(dequantize_data_layer.get_output(0),
1457	dequantize_data_layer)
1458	return dequantize_data

Callers 2

forwardMethod · 0.85

Calls 4

str_dtype_to_trtFunction · 0.85

default_trtnetFunction · 0.85

_create_tensorFunction · 0.85

get_outputMethod · 0.45

Tested by

no test coverage detected