Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

< Back to Article

Fig 1.

Training pipeline of LKD.

The LKD training procedure follows a bi-level optimization scheme with an inner loop for student training and an outer loop for loss network updates. In the inner loop, we train the student model using the current LKD loss, guided by the pre-trained teacher model, and record some of the iterations’ student model parameters. In the outer loop, we evaluate the student model on a validation set using the CE loss, then update the LKD network parameters based on the validation gradients.

More »

Fig 1 Expand

Fig 2.

Architectures of LKD losses.

More »

Fig 2 Expand

Table 1.

Performance on ImageNet dataset. We train the models following the standard training strategy with pre-trained teacher networks ResNet-34 and ResNet-50 provided by Torchvision [42].

More »

Table 1 Expand

Table 2.

Results on CIFAR-100 dataset with homogeneous architecture style of teacher and student. The top and bottom model names represent the teacher and student, respectively.

More »

Table 2 Expand

Table 3.

Results on CIFAR-100 dataset with heterogeneous architecture style of teacher and student.

More »

Table 3 Expand

Fig 3.

Performance of different distillation loss types.

More »

Fig 3 Expand

Table 4.

Impact of parameter sampling strategies on KD from ResNet-34 to ResNet-18.

More »

Table 4 Expand

Table 5.

Comparison of Gaussian sampling and fixed-step sampling in LKD validation.

More »

Table 5 Expand

Table 6.

Impact of batch data consistency in LKD training on ImageNet.

More »

Table 6 Expand

Table 7.

Impact of training approaches on LKD loss and student model accuracy.

More »

Table 7 Expand