MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / fp4_quantize

Function fp4_quantize

tensorrt_llm/quantization/quantize.py:539–552  ·  view source on GitHub ↗
(model, quant_config: QuantConfig)

Source from the content-addressed store, hash-verified

537
538
539def fp4_quantize(model, quant_config: QuantConfig):
540 assert quant_config.quant_mode.has_nvfp4()
541 quant_map = {
542 ColumnLinear: FP4Linear,
543 RowLinear: FP4RowLinear,
544 MixtureOfExperts: MixtureOfExperts,
545 }
546
547 model = quantize_layers(
548 model,
549 quant_config,
550 quant_map,
551 )
552 return model
553
554
555# Now consider the kv cache is enabled for all layers

Callers 2

MLPMethod · 0.90
quantizeFunction · 0.85

Calls 2

quantize_layersFunction · 0.85
has_nvfp4Method · 0.45

Tested by 1

MLPMethod · 0.72