hub / github.com/OpenPPL/ppq / LearnedStepSizePass

Class LearnedStepSizePass

ppq/quantization/optim/training.py:569–863 · view source on GitHub ↗

## Learned Step Size Pass(网络微调过程-LSQ) Learned Step Size optimization, a training-based optimization pass that tunes weights and scales for high precision quantization. [This method is proposed by Steven K. Esser] (https://arxiv.org/pdf/1902.08153.pdf) This is an alternative versi

Source from the content-addressed store, hash-verified

567
568
569	class LearnedStepSizePass(TrainingBasedPass):
570	"""
571	## Learned Step Size Pass(网络微调过程-LSQ)
572
573	Learned Step Size optimization, a training-based optimization pass that tunes weights and scales for high precision quantization.
574
575	[This method is proposed by Steven K. Esser] (https://arxiv.org/pdf/1902.08153.pdf)
576
577	This is an alternative version of LSQ, this pass will split your graph into multiple trainable blocks, each blocks will be trained separately.
578	Warning: PPQ Learned Step Size minimize only the output loss of each block, which means after training the internal results probably goes far away from original.
579
580	PPQ Learned Step Size optimization requires 256 ~ 2048 samples for finetuning your network, while the data label is not necessary. All training data are cache in GPU memory or CPU memory for acceleration.
581
582	The training loss will be computed as:
583
584	let: Y = WX + b
585
586	Quant(Y, scale_Y) = Quant(W, scale_W) Quant(X, scale_X) + b
587
588	loss = loss_func(Y, Quant(Y, scale_Y)) # loss between fp output and int8 output, that is why we do not need labeled data.
589
590	The formula of calculating the derivatives of y and scale_Y:
591
592	if y > scale_Y * -128 and y < scale_Y * 127:
593
594	dQuant(y, scale_Y)/dy = dQuant(y, scale_Y)
595
596	dQuant(y, scale_Y)/dscale_Y = Quant(y, scale_Y) - y
597
598	if y < scale_Y * -128:
599
600	dQuant(y, scale_Y)/dy = 0
601
602	dQuant(y, scale_Y)/dscale_Y = -128
603
604	if y > scale_Y * 127:
605
606	dQuant(y, scale_Y)/dy = 0
607
608	dQuant(y, scale_Y)/dscale_Y = 127
609
610	### Parameters:
611
612	* interested_layers(List[str]):
613
614	A list of operation names, only the layers listed in this parameter will be trained.
615
616	If interested_layers is None, all layers(conv and gemm) will be trained.
617
618	* steps(int)
619
620	Training steps for finetuning your network, default is 500.
621
622	* block_size(int)
623
624	PPQ Learned Step Size optimization split your graph into blocks at first,
625	each block will be finetuned separately.
626

Callers 4

yolo6_sample.pyFile · 0.90

fp8_sample.pyFile · 0.90

build_quant_pipelineMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected