Fig 1.
Sample images from the toxic herbal medicine dataset.
The dataset was categorized based on the natural properties of the medicinal materials, which were grouped into botanical, animal, and mineral medicines. Botanical medicines included seeds (e.g., Purging Croton Seed, Ginkgo Seed), barks (e.g., Golden Larch Root, Chinese Silkvine Root), roots (e.g., Ternate Pinellia, Antifebrile Dichroa Root), and flowers (e.g., Hindu Datura Flower, Lilac Daphe Flower Bud). Animal medicines included examples such as Centipede and Dried Toad Venom, while mineral medicines included Red Orpiment and Cinnabar.
Fig 2.
Sample images from a test subset of toxic herbal medicine data.
The images exhibit diverse backgrounds, lighting conditions, and small object regions, posing challenges for robust classification and detection.
Fig 3.
Illustration of the MSFF module.
Features from different stages (Stage2, Stage4, and Stage6) are first processed by convolution and downsampling operations to generate consistent channel representations (C2, C4, C6), which are then concatenated to form the fused feature map F.
Fig 4.
The fused feature map F is first refined by the Channel Attention mechanism, followed by the Spatial Attention mechanism, resulting in the enhanced feature representation .
Fig 5.
The overall architecture of the improved EfficientNetV2-based classification model.
Features from multiple stages (Stage0 to Stage6) are fused via a MSFF module, followed by a CBAM to enhance salient regions. The final refined feature map is fed into the classifier for prediction. This design aims to improve recognition performance under complex backgrounds and varying object scales.
Table 1.
Performance comparison of various CNN architectures on toxic herbal image classification. EfficientNetV2 achieves the highest performance across all metrics.
Fig 6.
Validation Loss Comparison Across Models.
This figure compares the validation loss curves of our proposed model with several state-of-the-art convolutional neural networks.
Table 2.
Ablation study of MSFF and CBAM modules on EfficientNetV2. The combination of both modules achieves the best performance across all evaluation metrics.
Table 3.
Ablation study on a challenging subset with complex backgrounds and small herbal targets. MSFF and CBAM modules improve classification performance significantly, especially when combined.
Fig 7.
Visualization of feature maps before and after the MSFF and CBAM modules.
Top row: feature maps after MSFF and CBAM show focused, high-response activations aligned with the herbal target regions, even in complex backgrounds. Bottom row: feature maps extracted prior to the enhancement modules exhibit diffuse attention with weak localization.