MCPcopy
hub / github.com/OpenPPL/ppq / LearnedStepSizePass

Class LearnedStepSizePass

ppq/quantization/optim/training.py:569–863  ·  view source on GitHub ↗

## Learned Step Size Pass(网络微调过程-LSQ) Learned Step Size optimization, a training-based optimization pass that tunes weights and scales for high precision quantization. [This method is proposed by Steven K. Esser] (https://arxiv.org/pdf/1902.08153.pdf) This is an alternative versi

Source from the content-addressed store, hash-verified

567
568
569class LearnedStepSizePass(TrainingBasedPass):
570 """
571 ## Learned Step Size Pass(网络微调过程-LSQ)
572
573 Learned Step Size optimization, a training-based optimization pass that tunes weights and scales for high precision quantization.
574
575 [This method is proposed by Steven K. Esser] (https://arxiv.org/pdf/1902.08153.pdf)
576
577 This is an alternative version of LSQ, this pass will split your graph into multiple trainable blocks, each blocks will be trained separately.
578 Warning: PPQ Learned Step Size minimize only the output loss of each block, which means after training the internal results probably goes far away from original.
579
580 PPQ Learned Step Size optimization requires 256 ~ 2048 samples for finetuning your network, while the data label is not necessary. All training data are cache in GPU memory or CPU memory for acceleration.
581
582 The training loss will be computed as:
583
584 let: Y = WX + b
585
586 Quant(Y, scale_Y) = Quant(W, scale_W) Quant(X, scale_X) + b
587
588 loss = loss_func(Y, Quant(Y, scale_Y)) # loss between fp output and int8 output, that is why we do not need labeled data.
589
590 The formula of calculating the derivatives of y and scale_Y:
591
592 if y > scale_Y * -128 and y < scale_Y * 127:
593
594 dQuant(y, scale_Y)/dy = dQuant(y, scale_Y)
595
596 dQuant(y, scale_Y)/dscale_Y = Quant(y, scale_Y) - y
597
598 if y < scale_Y * -128:
599
600 dQuant(y, scale_Y)/dy = 0
601
602 dQuant(y, scale_Y)/dscale_Y = -128
603
604 if y > scale_Y * 127:
605
606 dQuant(y, scale_Y)/dy = 0
607
608 dQuant(y, scale_Y)/dscale_Y = 127
609
610 ### Parameters:
611
612 * interested_layers(List[str]):
613
614 A list of operation names, only the layers listed in this parameter will be trained.
615
616 If interested_layers is None, all layers(conv and gemm) will be trained.
617
618 * steps(int)
619
620 Training steps for finetuning your network, default is 500.
621
622 * block_size(int)
623
624 PPQ Learned Step Size optimization split your graph into blocks at first,
625 each block will be finetuned separately.
626

Callers 4

yolo6_sample.pyFile · 0.90
fp8_sample.pyFile · 0.90
fp8_sample.pyFile · 0.90
build_quant_pipelineMethod · 0.85

Calls

no outgoing calls

Tested by

no test coverage detected