Function fp8_quantize

tensorrt_llm/quantization/quantize.py:232–246 · view source on GitHub ↗

(model, quant_config: QuantConfig)

Source from the content-addressed store, hash-verified

230
231
232	def fp8_quantize(model, quant_config: QuantConfig):
233	assert quant_config.quant_mode.has_fp8_qdq()
234
235	quant_map = {
236	ColumnLinear: FP8Linear,
237	RowLinear: FP8RowLinear,
238	MixtureOfExperts: MixtureOfExperts,
239	}
240
241	model = quantize_layers(
242	model,
243	quant_config,
244	quant_map,
245	)
246	return model
247
248
249	def fp8_rowwise_quantize(model, quant_config: QuantConfig):

quantizeFunction · 0.85

fuse_gate_mlpFunction · 0.85

quantize_layersFunction · 0.85

has_fp8_qdqMethod · 0.45

no test coverage detected