Fig 1.
Visualization of a possible application of our approach to fabric classification.
Table 1.
Overview of related work for fabric classification.
Fig 2.
Schematic visualization of the hybrid architecture.
Fig 3.
Visualization of the DenseNet121 architecture, in which an AFPN and a DConv layer is integrated between DenseNet blocks three and four.
Fig 4.
Visualization of deformable convolution with the example of a 3×3 convolution filter.
(1) normal convolution, (2) deformable convolution.
Fig 5.
Illustration of an AFPN.
Fig 6.
Visualization of the Swin Transformer architecture.
Fig 7.
Training and Evaluation approach: The dataset is split into a training set and a test set.
Data augmentation is applied to the training set, followed by model training using transfer learning and fine-tuning. Finally, the resulting model is evaluated on the test set.
Table 2.
Overview of the hyperparameters used for hyperparameter tuning. TL = Transfer learning; FT = Fine-Tuning.
Fig 8.
Example images from the dataset for each class.
Shown are two images per cotton content class (13 classes in total) [4].
Table 3.
Overview of multiple performance indicators applied to measure the performance of the hybrid model across five folds.
Table 4.
Overview of the results achieved for global RMSE, MAE, TPR, PPV, TNR, NPV, F1-score, and the 95% Wilson confidence interval (CI) for the TPR of each class.
Table 5.
Statistical comparison of the hybrid architecture against DenseNet121 and Swin Transformer V2 based on classification accuracy. Paired tests were conducted across N = 10 measurements (two independent runs of 5-fold cross-validation). Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001.
Table 6.
Ablation study of the proposed architecture. DN = DenseNet121, Swin = Swin Transformer V2, DConv = deformable convolution layer, AFPN = adaptive feature pyramid network, 2nd FC = second fully connected layer. Paired tests were conducted across N = 10 measurements (two independent runs of 5-fold cross-validation). Gain denotes the accuracy difference in percentage points (pp) relative to the full architecture shown in the first row. The p-values result from paired t-tests comparing each configuration to the first-row architecture. Significance levels: * p < 0.05, ** p < 0.01, *** p < 0.001.
Fig 9.
Mean confusion matrix across all five folds.
Fig 10.
Progression curves of training and validation loss from the second run of cross-validation are shown.
Training was stopped at epoch 20 with early stopping. The lowest validation loss occurred at epoch 10.
Fig 11.
Scatter plot showing the distribution of probabilities with which the image was correctly assigned to a class.
The test images (n = 653) from all 5 runs of the cross-validation are shown. The colored dots indicate the average of all individual probabilities for a class. The average probabilities per class are: class 30%: 86.47%; class 40%: 88.93%; class 50%: 92.21%; class 53%: 69.59%; class 58%: 69.76%; class 60%: 69.93%; class 63%: 56.12%; class 65%: 70.00%; class 66%: 81.71%; class 80%: 84.15%; class 95%: 78.83%; class 98%: 73.95%; class 99%: 74.53%.