## PPQ Quantize Simplify Pass(通用量化精简过程) PPQ use Tensor Quantization Configuration(A data structure defined in ppq.core) to control quantization. Each quantable op will have a list of TQC as its quantization config, which contains necessary quantization parameter(scale, offset), in
| 15 | |
| 16 | |
| 17 | class QuantizeSimplifyPass(QuantizationOptimizationPass): |
| 18 | """ |
| 19 | ## PPQ Quantize Simplify Pass(通用量化精简过程) |
| 20 | |
| 21 | PPQ use Tensor Quantization Configuration(A data structure defined in ppq.core) to |
| 22 | control quantization. Each quantable op will have a list of TQC as its quantization config, |
| 23 | which contains necessary quantization parameter(scale, offset), in order to quantize its input(s) and output(s). |
| 24 | |
| 25 | While TQC is a powerful tool for describing quantization, it introduces some undiserible features: |
| 26 | |
| 27 | For a subgraph like: |
| 28 | |
| 29 | Relu1 - Relu2 |
| 30 | |
| 31 | PPQ will create at least 4 TQC here, namely the input TQC of Relu1 and Relu2, and the output TQC of Relu1 and Relu2. |
| 32 | Problem here is the output TQC of Relu1 and the input TQC of Relu2 is actually duplicated, the output variable |
| 33 | should not be quantized twice. |
| 34 | |
| 35 | This Simplify Pass will detect all the duplicated TQCs in your network, disable them and create a link with their |
| 36 | dominating TQCs. Disabled TQC will have and inactive state(QuantizationState.OVERRLAPED), so PPQ executor will |
| 37 | simply ignore them when executing. |
| 38 | |
| 39 | A duplicated TQC is an input TQC(A) whose binding variable has been quantized by another output TQC(B), |
| 40 | and the input TQC(A) should have the same bit-width as the output TQC(B) |
| 41 | |
| 42 | ### Parameters: |
| 43 | |
| 44 | * No Parameter |
| 45 | |
| 46 | ### Usage |
| 47 | This pass is included in PPQ Quantization Setting, you can calling this optimization by: |
| 48 | |
| 49 | setting = QuantizationSettingFactory.default_setting() |
| 50 | |
| 51 | setting.fusion = True |
| 52 | setting.fusion_setting.remove_useless_quantization = True |
| 53 | |
| 54 | # calling ppq.api.quantize_onnx_model function with this setting. |
| 55 | ir = quantize_torch_model( |
| 56 | model=model, calib_dataloader=load_calibration_dataset(), setting=setting, |
| 57 | platform=TargetPlatform.PPL_CUDA_INT8, calib_steps=8, input_shape=INPUT_SHAPE, |
| 58 | collate_fn=collate_fn) |
| 59 | """ |
| 60 | def __init__(self) -> None: |
| 61 | super().__init__(name='PPQ Quantize Simplify Pass') |
| 62 | |
| 63 | def optimize( |
| 64 | self, |
| 65 | graph: BaseGraph, |
| 66 | dataloader: Iterable, |
| 67 | executor: BaseGraphExecutor, |
| 68 | **kwargs |
| 69 | ) -> None: |
| 70 | for _, variable in graph.variables.items(): |
| 71 | assert isinstance(variable, Variable) |
| 72 | source_op = variable.source_op |
| 73 | |
| 74 | if source_op is None: continue # input variables in network, they do not have a source |
no outgoing calls
no test coverage detected