MCPcopy Index your code
hub / github.com/NVIDIA/TensorRT-LLM / forward

Method forward

tensorrt_llm/quantization/layers.py:2854–2870  ·  view source on GitHub ↗
(self, x)

Source from the content-addressed store, hash-verified

2852 self.register_parameter('scale_to_int', None)
2853
2854 def forward(self, x):
2855 weight = None if self.weight is None else self.weight.value
2856 bias = None if self.bias is None else self.bias.value
2857 scale = None if self.scale_to_int is None else self.scale_to_int.value
2858 clamp_val = None if self.clamp_val is None else self.clamp_val.value
2859 return smooth_quant_rms_norm(
2860 x,
2861 self.normalized_shape,
2862 weight,
2863 bias,
2864 scale,
2865 clamp_val,
2866 self.eps,
2867 dynamic_act_scaling=True,
2868 scale_dtype='float16',
2869 sum_per_token=not self.quant_mode.has_per_group_scaling(),
2870 sum_dtype='float16')
2871
2872
2873# TODO: Mostly duplicates SmoothQuantMLP.

Callers

nothing calls this directly

Calls 2

smooth_quant_rms_normFunction · 0.85
has_per_group_scalingMethod · 0.80

Tested by

no test coverage detected