Table 1.
Configuration of the LastBERT model.
Fig 1.
Student model architecture.
Fig 2.
Top-level overview of the teacher-student knowledge distillation process.
Table 2.
Sample Posts from the ADHD dataset.
Fig 3.
Top-level overview for ADHD classification study.
Fig 4.
Training and validation metrics over Epochs during Knowledge distillation process.
Table 3.
Performance of LastBERT on GLUE benchmark datasets.
Fig 5.
Accuracy, validation loss, and training loss over Epochs for MRPC.
Fig 6.
Accuracy, validation loss, and training loss over Epochs for SST-2.
Fig 7.
Validation loss, training loss, Matthews correlation, and accuracy over Epochs for CoLA.
Fig 8.
Accuracy, validation loss and training loss over Epochs for QQP.
Fig 9.
Accuracy, validation loss, and training loss over Epochs for MNLI.
Fig 10.
Validation loss, training loss, Pearson and Spearman correlations over Epochs for STS-B.
Fig 11.
Precision, recall, and F1 score over Epochs for LastBert model.
Fig 12.
Accuracy, training loss, and validation loss over Epochs for LastBert model.
Fig 13.
Confusion matrix for LastBERT model.
Fig 14.
ROC curve for LastBERT model.
Fig 15.
Precision, recall, and F1 score over Epochs for DistilBERT model.
Fig 16.
Accuracy, training loss, and validation loss over Epochs for DistilBERT model.
Fig 17.
Confusion matrix for DistilBERT model.
Fig 18.
Precision, recall, and F1 score over Epochs for ClinicalBERT model.
Fig 19.
Accuracy, training loss, and validation loss over Epochs for ClinicalBERT model.
Fig 20.
Confusion matrix for ClinicalBERT model.
Table 4.
Performance comparison of LastBERT, DistilBERT, and ClinicalBERT on ADHD dataset.
Table 5.
Weighted average comparison of LastBERT, DistilBERT, and ClinicalBERT on ADHD dataset.
Table 6.
Comparison of related works and our study on ADHD text classification.
Table 7.
Performance comparison of LastBERT with Other BERT and knowledge distillation models on GLUE datasets.