## PPQ Quantize Fusion Pass(通用量化图融合过程) Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks. Graph fusion can combine operations into a single op to obtain higher accuracy and performance, Pattern like: Conv + Relu can
| 89 | |
| 90 | |
| 91 | class QuantizeFusionPass(QuantizationOptimizationPass): |
| 92 | """ |
| 93 | ## PPQ Quantize Fusion Pass(通用量化图融合过程) |
| 94 | |
| 95 | Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks. |
| 96 | |
| 97 | Graph fusion can combine operations into a single op to obtain higher accuracy and performance, |
| 98 | Pattern like: Conv + Relu can be reduced to ConvRelu. This fusion will reduce memory accesses, |
| 99 | and the quantization point after conv can also be removed. |
| 100 | |
| 101 | Technically we can fuse those layers before quantization, while fused layers are not supported by onnx standard. |
| 102 | So to say ConvRelu is not a valid onnx operation, no execution framework can parse it. |
| 103 | |
| 104 | Therefore, PPQ will simulate the graph fusion by adjusting quantization config: if PPQ finds their is a |
| 105 | pattern like Conv + Relu, the output quantization of Conv will be disabled, pretending that the Conv + Relu |
| 106 | fusion has happened. |
| 107 | |
| 108 | This Pass is designed for 2 types graph fusion: |
| 109 | 1. activation fusion |
| 110 | 2. passive operation fusion |
| 111 | |
| 112 | For activation fusion, PPQ will identify the pattern: Computing op + Activation Op from your network. The output |
| 113 | quantization of computing op will be disabled with their state being set to QuantizationState.OVERLAPPED. |
| 114 | |
| 115 | Activation fusion here supports only simple activation patterns, |
| 116 | for complex activation functions like mish, swish, |
| 117 | will be represented as mish = tanh + mul + softplus, swish = sigmoid + mul in onnx, |
| 118 | cause onnx does not have a op defination for them. |
| 119 | Identifying those complex patterns requires pattern matching, which is implemented in ppq.IR.search.py |
| 120 | |
| 121 | Complex quantization fusions must be invoked manually, PPQ implemented softplus & swish fusion functions in |
| 122 | ppq.quantization.optim.refine.MishFusionPass |
| 123 | ppq.quantization.optim.refine.SwishFusionPass |
| 124 | |
| 125 | For passive operation fusion, PPQ will keep the input and the output variable share a same scale for passive operations. |
| 126 | An operation is identified as passive op only if its attribute "is_active_quant_op" = False, this |
| 127 | attribute is initialized by quantizer. |
| 128 | |
| 129 | If there is a passive operation having multiple input and output, the fusion procedure will make its |
| 130 | FIRST input variable and ALL output variables share the same scale(the same scale as its first input). |
| 131 | The quantization states of all output variables will be set to QuantizationState.OVERLAPPED. |
| 132 | |
| 133 | ### Parameters: |
| 134 | |
| 135 | * activation_type(Set[str]): |
| 136 | |
| 137 | A collection contains all activation types. |
| 138 | |
| 139 | The pattern will be recognized as [Computing Op -> Activation Op], |
| 140 | |
| 141 | By graph fusion, the output quantization of the Computing Op and |
| 142 | the input quantization of the activation op will be disabled. |
| 143 | |
| 144 | * fuse_activation(bool) |
| 145 | |
| 146 | Whether to fuse activation op with computing op. |
| 147 | |
| 148 | # fuse_passive_op(bool) |
no outgoing calls
no test coverage detected