hub / github.com/OpenPPL/ppq / QuantizeFusionPass

Class QuantizeFusionPass

ppq/quantization/optim/refine.py:91–289 · view source on GitHub ↗

## PPQ Quantize Fusion Pass(通用量化图融合过程) Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks. Graph fusion can combine operations into a single op to obtain higher accuracy and performance, Pattern like: Conv + Relu can

Source from the content-addressed store, hash-verified

89
90
91	class QuantizeFusionPass(QuantizationOptimizationPass):
92	"""
93	## PPQ Quantize Fusion Pass(通用量化图融合过程)
94
95	Operation fusion (or kernel/layer fusion) is key optimization in many state-of-the-art execution frameworks.
96
97	Graph fusion can combine operations into a single op to obtain higher accuracy and performance,
98	Pattern like: Conv + Relu can be reduced to ConvRelu. This fusion will reduce memory accesses,
99	and the quantization point after conv can also be removed.
100
101	Technically we can fuse those layers before quantization, while fused layers are not supported by onnx standard.
102	So to say ConvRelu is not a valid onnx operation, no execution framework can parse it.
103
104	Therefore, PPQ will simulate the graph fusion by adjusting quantization config: if PPQ finds their is a
105	pattern like Conv + Relu, the output quantization of Conv will be disabled, pretending that the Conv + Relu
106	fusion has happened.
107
108	This Pass is designed for 2 types graph fusion:
109	1. activation fusion
110	2. passive operation fusion
111
112	For activation fusion, PPQ will identify the pattern: Computing op + Activation Op from your network. The output
113	quantization of computing op will be disabled with their state being set to QuantizationState.OVERLAPPED.
114
115	Activation fusion here supports only simple activation patterns,
116	for complex activation functions like mish, swish,
117	will be represented as mish = tanh + mul + softplus, swish = sigmoid + mul in onnx,
118	cause onnx does not have a op defination for them.
119	Identifying those complex patterns requires pattern matching, which is implemented in ppq.IR.search.py
120
121	Complex quantization fusions must be invoked manually, PPQ implemented softplus & swish fusion functions in
122	ppq.quantization.optim.refine.MishFusionPass
123	ppq.quantization.optim.refine.SwishFusionPass
124
125	For passive operation fusion, PPQ will keep the input and the output variable share a same scale for passive operations.
126	An operation is identified as passive op only if its attribute "is_active_quant_op" = False, this
127	attribute is initialized by quantizer.
128
129	If there is a passive operation having multiple input and output, the fusion procedure will make its
130	FIRST input variable and ALL output variables share the same scale(the same scale as its first input).
131	The quantization states of all output variables will be set to QuantizationState.OVERLAPPED.
132
133	### Parameters:
134
135	* activation_type(Set[str]):
136
137	A collection contains all activation types.
138
139	The pattern will be recognized as [Computing Op -> Activation Op],
140
141	By graph fusion, the output quantization of the Computing Op and
142	the input quantization of the activation op will be disabled.
143
144	* fuse_activation(bool)
145
146	Whether to fuse activation op with computing op.
147
148	# fuse_passive_op(bool)

Callers 8

yolo6_sample.pyFile · 0.90

bert_sample.pyFile · 0.90

ProgramEntrance_2.pyFile · 0.85

myquantizer.pyFile · 0.85

imagenet.pyFile · 0.85

yolo_5.pyFile · 0.85

Example_PTQ.pyFile · 0.85

build_quant_pipelineMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected