Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Fig 3

The overall process of the 8-bit index quantization.

In the forward propagation, the original weights are first quantized to the 8-bit indices and then de-quantized back to FP32 weights, and in the backward propagation, we use the 8-bit clip function to replace the round function to train the model.

Fig 3