MCPcopy
hub / github.com/NVIDIA/TensorRT-LLM / __post_init__

Method __post_init__

tensorrt_llm/models/modeling_utils.py:701–708  ·  view source on GitHub ↗
(self)

Source from the content-addressed store, hash-verified

699 self.config = config
700
701 def __post_init__(self):
702 from ..quantization.quantize import quantize
703 quantize(self, self.config.quantization)
704
705 # Currently, use_parallel_embedding must be enabled before weight loading;
706 # otherwise, the model will be inconsistent with the weights loaded from checkpoint.
707 optimize_model(
708 self, use_parallel_embedding=self.config.use_parallel_embedding)
709
710 def release(self):
711 release_gc()

Callers 1

__call__Method · 0.45

Calls 2

quantizeFunction · 0.90
optimize_modelFunction · 0.85

Tested by

no test coverage detected