Explainable detection of adverse drug reaction with imbalanced data distribution
Fig 2
Comparative results of the proposed weighted BERT-CRF model with different weights assignment and loss function.
(A) Different weight assignment. The performance of the proposed weighted CRF mainly depends on the weight assignment strategies. The green bar shows the proposed weight assignment (Weighted Loss), as described in Eq 12. For comparison, we introduce two other strategies. The blue bar shows the inverse value of the sample numbers (Strategy-1), and the red bar shows the inverse ratio of the sample numbers (Strategy-2). (B) Different loss function. Recent studies recommended using either focal loss or dice loss for multi-label classification with imbalanced data distribution. The green bar shows the performance of the proposed weighted loss function (Weighted Loss). The blue bar shows the performance of the focal loss, which can reduce the weights of the samples of the majority classes, and force the model to focus more on the samples that are difficult to detect during training. The red bar shows the performance of the dice loss, which was designed to fit an approximation of F1-score metric to attach similar importance to the samples of the minority classes.