Code
Hub
Workspaces
Connect
Indexed graphs
Engine
MCP
copy
hub
/
github.com/NVIDIA/TensorRT-LLM
/ forward
Method
forward
tensorrt_llm/quantization/layers.py:72–74 ·
view source on GitHub ↗
(self, x)
Source
from the content-addressed store, hash-verified
70
self.axis = axis
71
72
def
forward(self, x):
73
return
quantize(x, self.scaling_factor.value, self.output_dtype,
74
self.axis)
75
76
77
class
QuantizePerToken(Module):
Callers
2
test_quantize_per_tensor
Method · 0.95
test_quantize_per_channel
Method · 0.95
Calls
1
quantize
Function · 0.70
Tested by
2
test_quantize_per_tensor
Method · 0.76
test_quantize_per_channel
Method · 0.76