Skip to main content
Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

SensiMix: Sensitivity-Aware 8-bit index & 1-bit value mixed precision quantization for BERT compression

Fig 2

Overview of SensiMix.

SensiMix applies 1-bit value quantization to insensitive feed-forward network (FFN) near the output layer, and applies 8-bit index quantization to remaining sensitive parts.

Fig 2