BERTtoCNN: Similarity-preserving enhanced knowledge distillation for stance detection

doi:10.1371/journal.pone.0257130

Fig 1.

The transformer structure used in PLM.

More »

Expand

Fig 2.

Classic knowledge distillation approach.

More »

Expand

Fig 3.

The structure of BERTtoCNN model.

More »

Expand

Fig 4.

Text-CNN as the student model.

More »

Expand

Table 1.

The EDA result of Climate Change is Concern.

More »

Expand

Table 2.

The SemEval-2016 stance dataset.

More »

Expand

Table 3.

NLPCC-ICCPOL-2016 stance dataset.

More »

Expand

Table 4.

The performance of BERTtoCNN compared with the baseline methods.

More »

Expand

Fig 5.

F-score curves with the variation of parameter α and T.

Among them, α is the parameter of balancing the soft and hard labels in the knowledge distillation loss, and T is the temperature, which is used to adjust the smoothness of prediction distribution in knowledge distillation.

More »

Expand

Table 5.

F-score with the variation of hyperparameter γ.

More »

Expand

Table 6.

The comparison of EDA data and original data on the model.

Within, we set n_aug = 1 represents the result of the original data without EDA on the model.

More »

Expand