BERTtoCNN: Similarity-preserving enhanced knowledge distillation for stance detection
Fig 5
F-score curves with the variation of parameter α and T.
Among them, α is the parameter of balancing the soft and hard labels in the knowledge distillation loss, and T is the temperature, which is used to adjust the smoothness of prediction distribution in knowledge distillation.