MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / smooth_quantize

Function smooth_quantize

tensorrt_llm/quantization/quantize.py:224–229  ·  view source on GitHub ↗
(model, quant_config: QuantConfig)

Source from the content-addressed store, hash-verified

222
223
224def smooth_quantize(model, quant_config: QuantConfig):
225 assert quant_config.quant_mode.has_act_and_weight_quant()
226 if quant_config.quant_algo in W8A8_SQ_PLUGIN_LIST:
227 return smooth_quantize_plugin(model, quant_config.quant_mode)
228 else:
229 return smooth_quantize_ootb(model, quant_config)
230
231
232def fp8_quantize(model, quant_config: QuantConfig):

Callers 1

quantizeFunction · 0.85

Calls 3

smooth_quantize_pluginFunction · 0.85
smooth_quantize_ootbFunction · 0.85

Tested by

no test coverage detected