MCPcopy
hub / github.com/OpenPPL/ppq / QuantizeFusionPass

Class QuantizeFusionPass

ppq/quantization/optim/refine.py:91–289  ·  view source on GitHub ↗

## PPQ Quantize Fusion Pass(通用量化图融合过程) Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks. Graph fusion can combine operations into a single op to obtain higher accuracy and performance, Pattern like: Conv + Relu can

Source from the content-addressed store, hash-verified

89
90
91class QuantizeFusionPass(QuantizationOptimizationPass):
92 """
93 ## PPQ Quantize Fusion Pass(通用量化图融合过程)
94
95 Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks.
96
97 Graph fusion can combine operations into a single op to obtain higher accuracy and performance,
98 Pattern like: Conv + Relu can be reduced to ConvRelu. This fusion will reduce memory accesses,
99 and the quantization point after conv can also be removed.
100
101 Technically we can fuse those layers before quantization, while fused layers are not supported by onnx standard.
102 So to say ConvRelu is not a valid onnx operation, no execution framework can parse it.
103
104 Therefore, PPQ will simulate the graph fusion by adjusting quantization config: if PPQ finds their is a
105 pattern like Conv + Relu, the output quantization of Conv will be disabled, pretending that the Conv + Relu
106 fusion has happened.
107
108 This Pass is designed for 2 types graph fusion:
109 1. activation fusion
110 2. passive operation fusion
111
112 For activation fusion, PPQ will identify the pattern: Computing op + Activation Op from your network. The output
113 quantization of computing op will be disabled with their state being set to QuantizationState.OVERLAPPED.
114
115 Activation fusion here supports only simple activation patterns,
116 for complex activation functions like mish, swish,
117 will be represented as mish = tanh + mul + softplus, swish = sigmoid + mul in onnx,
118 cause onnx does not have a op defination for them.
119 Identifying those complex patterns requires pattern matching, which is implemented in ppq.IR.search.py
120
121 Complex quantization fusions must be invoked manually, PPQ implemented softplus & swish fusion functions in
122 ppq.quantization.optim.refine.MishFusionPass
123 ppq.quantization.optim.refine.SwishFusionPass
124
125 For passive operation fusion, PPQ will keep the input and the output variable share a same scale for passive operations.
126 An operation is identified as passive op only if its attribute "is_active_quant_op" = False, this
127 attribute is initialized by quantizer.
128
129 If there is a passive operation having multiple input and output, the fusion procedure will make its
130 FIRST input variable and ALL output variables share the same scale(the same scale as its first input).
131 The quantization states of all output variables will be set to QuantizationState.OVERLAPPED.
132
133 ### Parameters:
134
135 * activation_type(Set[str]):
136
137 A collection contains all activation types.
138
139 The pattern will be recognized as [Computing Op -> Activation Op],
140
141 By graph fusion, the output quantization of the Computing Op and
142 the input quantization of the activation op will be disabled.
143
144 * fuse_activation(bool)
145
146 Whether to fuse activation op with computing op.
147
148 # fuse_passive_op(bool)

Callers 8

yolo6_sample.pyFile · 0.90
bert_sample.pyFile · 0.90
myquantizer.pyFile · 0.85
imagenet.pyFile · 0.85
yolo_5.pyFile · 0.85
Example_PTQ.pyFile · 0.85
build_quant_pipelineMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected