MCPcopy Index your code
hub / github.com/NVIDIA/TensorRT-LLM / fp8_quantize

Function fp8_quantize

tensorrt_llm/quantization/quantize.py:232–246  ·  view source on GitHub ↗
(model, quant_config: QuantConfig)

Source from the content-addressed store, hash-verified

230
231
232def fp8_quantize(model, quant_config: QuantConfig):
233 assert quant_config.quant_mode.has_fp8_qdq()
234
235 quant_map = {
236 ColumnLinear: FP8Linear,
237 RowLinear: FP8RowLinear,
238 MixtureOfExperts: MixtureOfExperts,
239 }
240
241 model = quantize_layers(
242 model,
243 quant_config,
244 quant_map,
245 )
246 return model
247
248
249def fp8_rowwise_quantize(model, quant_config: QuantConfig):

Callers 2

quantizeFunction · 0.85
fuse_gate_mlpFunction · 0.85

Calls 2

quantize_layersFunction · 0.85
has_fp8_qdqMethod · 0.45

Tested by

no test coverage detected