Fig 1.
The transformer structure used in PLM.
Fig 2.
Classic knowledge distillation approach.
Fig 3.
The structure of BERTtoCNN model.
Fig 4.
Text-CNN as the student model.
Table 1.
The EDA result of Climate Change is Concern.
Table 2.
The SemEval-2016 stance dataset.
Table 3.
NLPCC-ICCPOL-2016 stance dataset.
Table 4.
The performance of BERTtoCNN compared with the baseline methods.
Fig 5.
F-score curves with the variation of parameter α and T.
Among them, α is the parameter of balancing the soft and hard labels in the knowledge distillation loss, and T is the temperature, which is used to adjust the smoothness of prediction distribution in knowledge distillation.
Table 5.
F-score with the variation of hyperparameter γ.
Table 6.
The comparison of EDA data and original data on the model.
Within, we set n_aug = 1 represents the result of the original data without EDA on the model.