## Layer-wise Equalization Pass(层间权重均衡过程) Weight distributions can differ strongly between output channels, using only one quantization scale, per-tensor quantization has its trouble for representing the value among channels. For example, in the case where one channel has weights
| 21 | |
| 22 | |
| 23 | class LayerwiseEqualizationPass(QuantizationOptimizationPass): |
| 24 | """ |
| 25 | ## Layer-wise Equalization Pass(层间权重均衡过程) |
| 26 | |
| 27 | Weight distributions can differ strongly between output channels, |
| 28 | using only one quantization scale, per-tensor quantization has its trouble for representing the value among channels. |
| 29 | |
| 30 | For example, in the case where one channel has weights in the range [−128, 128] and another channel has weights in the range (−0.5, 0.5), |
| 31 | the weights in the latter channel will all be quantized to 0 when quantizing to 8-bits. |
| 32 | |
| 33 | Hopefully, the performance can be improved by adjusting the weights for each output channel such that their ranges are more similar. |
| 34 | |
| 35 | Formula: |
| 36 | |
| 37 | Take 2 convolution layers as an example |
| 38 | |
| 39 | Where Y = W_2 * (W_1 * X + b_1) + b_2 |
| 40 | |
| 41 | Adjusting W_1, W_2 by a scale factor s: |
| 42 | |
| 43 | Y = W_2 / s * (W_1 * s * X + b_1 * s) + b_2 |
| 44 | |
| 45 | Where s has the same dimension as the output channel of W_1 |
| 46 | |
| 47 | This method is called as Layer-wise Equalization, which is proposed by Markus Nagel. |
| 48 | |
| 49 | https://openaccess.thecvf.com/content_ICCV_2019/papers/Nagel_Data-Free_Quantization_Through_Weight_Equalization_and_Bias_Correction_ICCV_2019_paper.pdf |
| 50 | |
| 51 | self, iterations: int, weight_threshold: float = 0.5, |
| 52 | including_bias: bool = False, including_activation: bool = False, |
| 53 | bias_multiplier: float = 0.5, activation_mutiplier: float = 0.5, |
| 54 | interested_layers: List[str] = None, optimize_level: int = 2, |
| 55 | verbose:bool = False |
| 56 | |
| 57 | ### Parameters: |
| 58 | |
| 59 | * iterations(int): |
| 60 | |
| 61 | Integer value of Algorithm iterations. |
| 62 | |
| 63 | More iterations will give more plainness in your weight distribution, |
| 64 | iteration like 100 can flatten all the parameter in your network to a same level. |
| 65 | |
| 66 | You are not recommended to iterate until value converges, |
| 67 | in some cases stop iteration earlier will give you a better performance. |
| 68 | |
| 69 | * weight_threshold(float) |
| 70 | |
| 71 | A threshold that stops processing value that is too small. |
| 72 | |
| 73 | By default, the scale factor of equalization method is computed as sqrt(max(abs(W_1)) / max(abs(W_2))), |
| 74 | the maximum value of W_2 can be very small(like 1e-14), while the maximum value W_1 can be 0.5. |
| 75 | |
| 76 | In this case, the computed scale factor is 1e7, the optimization will loss its numerical stability and even give an unreasonable result. |
| 77 | |
| 78 | To prevent the scale factor becoming too large, ppq clips all the value smaller than this threshold before iterations. |
| 79 | |
| 80 | This parameter will significantly affects the optimization result. |
no outgoing calls
no test coverage detected