hub / github.com/OpenPPL/ppq / QuantizeSimplifyPass

Class QuantizeSimplifyPass

ppq/quantization/optim/refine.py:17–88 · view source on GitHub ↗

## PPQ Quantize Simplify Pass(通用量化精简过程) PPQ use Tensor Quantization Configuration(A data structure defined in ppq.core) to control quantization. Each quantable op will have a list of TQC as its quantization config, which contains necessary quantization parameter(scale, offset), in

Source from the content-addressed store, hash-verified

15
16
17	class QuantizeSimplifyPass(QuantizationOptimizationPass):
18	"""
19	## PPQ Quantize Simplify Pass(通用量化精简过程)
20
21	PPQ use Tensor Quantization Configuration(A data structure defined in ppq.core) to
22	control quantization. Each quantable op will have a list of TQC as its quantization config,
23	which contains necessary quantization parameter(scale, offset), in order to quantize its input(s) and output(s).
24
25	While TQC is a powerful tool for describing quantization, it introduces some undiserible features:
26
27	For a subgraph like:
28
29	Relu1 - Relu2
30
31	PPQ will create at least 4 TQC here, namely the input TQC of Relu1 and Relu2, and the output TQC of Relu1 and Relu2.
32	Problem here is the output TQC of Relu1 and the input TQC of Relu2 is actually duplicated, the output variable
33	should not be quantized twice.
34
35	This Simplify Pass will detect all the duplicated TQCs in your network, disable them and create a link with their
36	dominating TQCs. Disabled TQC will have and inactive state(QuantizationState.OVERRLAPED), so PPQ executor will
37	simply ignore them when executing.
38
39	A duplicated TQC is an input TQC(A) whose binding variable has been quantized by another output TQC(B),
40	and the input TQC(A) should have the same bit-width as the output TQC(B)
41
42	### Parameters:
43
44	* No Parameter
45
46	### Usage
47	This pass is included in PPQ Quantization Setting, you can calling this optimization by:
48
49	setting = QuantizationSettingFactory.default_setting()
50
51	setting.fusion = True
52	setting.fusion_setting.remove_useless_quantization = True
53
54	# calling ppq.api.quantize_onnx_model function with this setting.
55	ir = quantize_torch_model(
56	model=model, calib_dataloader=load_calibration_dataset(), setting=setting,
57	platform=TargetPlatform.PPL_CUDA_INT8, calib_steps=8, input_shape=INPUT_SHAPE,
58	collate_fn=collate_fn)
59	"""
60	def __init__(self) -> None:
61	super().__init__(name='PPQ Quantize Simplify Pass')
62
63	def optimize(
64	self,
65	graph: BaseGraph,
66	dataloader: Iterable,
67	executor: BaseGraphExecutor,
68	**kwargs
69	) -> None:
70	for _, variable in graph.variables.items():
71	assert isinstance(variable, Variable)
72	source_op = variable.source_op
73
74	if source_op is None: continue # input variables in network, they do not have a source

Callers 9

yolo6_sample.pyFile · 0.90

bert_sample.pyFile · 0.90

build_quant_pipelineMethod · 0.90

ProgramEntrance_2.pyFile · 0.85

myquantizer.pyFile · 0.85

imagenet.pyFile · 0.85

yolo_5.pyFile · 0.85

Example_PTQ.pyFile · 0.85

build_quant_pipelineMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected