ParameterBakingPass is a useful tool for quantization simulation acceleration. By default quantizer will bake network parameters once all quantization procedures are finished. For a typical Convolution layer or Gemm layer, which has a non-empty bias tensor, ParameterBakingPass will s
| 9 | |
| 10 | |
| 11 | class ParameterBakingPass(QuantizationOptimizationPass): |
| 12 | """ParameterBakingPass is a useful tool for quantization simulation |
| 13 | acceleration. By default quantizer will bake network parameters once all |
| 14 | quantization procedures are finished. For a typical Convolution layer or |
| 15 | Gemm layer, which has a non-empty bias tensor, ParameterBakingPass will |
| 16 | speed up the layer execution by 30%-50%. |
| 17 | |
| 18 | ParameterBakingPass will rewrite layer parameters with their quantized version, |
| 19 | the quantization procedure will strictly follow layer quantization configuration. |
| 20 | Once the quantization process finished, this pass will change all parameter quantization configuration states |
| 21 | to QuantizationStates.BAKED. |
| 22 | |
| 23 | State QuantizationStates.BAKED indicates corresponding tensor has been pre-quantized and its value |
| 24 | can be used without further quantization, executor will directly use a baked value during execution. |
| 25 | |
| 26 | ATTENTION: value is baked inplace, so to say it will rewrite all network parameters. |
| 27 | ATTENTION: For platforms using int32 accumulator, a float32 bias tensor might lose precision |
| 28 | during the simulation. If you want PPQ simulator to have a consistent result with hardware, it is |
| 29 | highly-recommended to calling ParameterBakingPass before deployment, baking procedure will limit bias |
| 30 | precision to 23 bits (float32 only has 23 fraction bits). |
| 31 | Args: |
| 32 | quantize_function (BaseQuantFunction): a BaseQuantFunction instance to quantize all parameters. |
| 33 | """ |
| 34 | def __init__(self) -> None: |
| 35 | super().__init__(name='PPQ Parameter Baking Pass') |
| 36 | self._quantize_function = PPQuantFunction |
| 37 | |
| 38 | @ empty_ppq_cache |
| 39 | def optimize( |
| 40 | self, |
| 41 | graph: BaseGraph, |
| 42 | **kwargs |
| 43 | ) -> None: |
| 44 | |
| 45 | for _, operation in graph.operations.items(): |
| 46 | if not isinstance(operation, QuantableOperation): continue |
| 47 | operation.baking_parameters(self._quantize_function) |
no outgoing calls
no test coverage detected