MCPcopy Index your code
hub / github.com/NVIDIA/TensorRT-LLM / forward

Method forward

tensorrt_llm/quantization/layers.py:1697–1707  ·  view source on GitHub ↗
(self, hidden_states, lora_layer_params=None)

Source from the content-addressed store, hash-verified

1695 self.register_parameter('clamp_val', None)
1696
1697 def forward(self, hidden_states, lora_layer_params=None):
1698 assert lora_layer_params is None, f"lora is not supported on {self.__class__.__name__} now"
1699 inter = self.fc(hidden_states)
1700 inter = ACT2FN[self.hidden_act](inter)
1701 if self.quant_mode.has_fp8_rowwise():
1702 # Quantize per token outputs tuple:
1703 # quantized tensor and scaling factors per token
1704 clamp_val = None if self.clamp_val is None else self.clamp_val.value
1705 inter = quantize_fp8_per_token(inter, clamp_val)
1706 output = self.proj(inter)
1707 return output
1708
1709
1710class Fp8RowwiseGatedMLP(Fp8RowwiseMLP):

Callers

nothing calls this directly

Calls 2

quantize_fp8_per_tokenFunction · 0.85
has_fp8_rowwiseMethod · 0.45

Tested by

no test coverage detected