Toxic chinese herbal medicine recognition in real-world images via multi-scale and attention-enhanced EfficientNetV2

doi:10.1371/journal.pone.0344262

Fig 1.

Sample images from the toxic herbal medicine dataset.

The dataset was categorized based on the natural properties of the medicinal materials, which were grouped into botanical, animal, and mineral medicines. Botanical medicines included seeds (e.g., Purging Croton Seed, Ginkgo Seed), barks (e.g., Golden Larch Root, Chinese Silkvine Root), roots (e.g., Ternate Pinellia, Antifebrile Dichroa Root), and flowers (e.g., Hindu Datura Flower, Lilac Daphe Flower Bud). Animal medicines included examples such as Centipede and Dried Toad Venom, while mineral medicines included Red Orpiment and Cinnabar.

More »

Expand

Fig 2.

Sample images from a test subset of toxic herbal medicine data.

The images exhibit diverse backgrounds, lighting conditions, and small object regions, posing challenges for robust classification and detection.

More »

Expand

Fig 3.

Illustration of the MSFF module.

Features from different stages (Stage2, Stage4, and Stage6) are first processed by convolution and downsampling operations to generate consistent channel representations (C2, C4, C6), which are then concatenated to form the fused feature map F.

More »

Expand

Fig 4.

Structure of the CBAM.

The fused feature map F is first refined by the Channel Attention mechanism, followed by the Spatial Attention mechanism, resulting in the enhanced feature representation .

More »

Expand

Fig 5.

The overall architecture of the improved EfficientNetV2-based classification model.

Features from multiple stages (Stage0 to Stage6) are fused via a MSFF module, followed by a CBAM to enhance salient regions. The final refined feature map is fed into the classifier for prediction. This design aims to improve recognition performance under complex backgrounds and varying object scales.

More »

Expand

Table 1.

Performance comparison of various CNN architectures on toxic herbal image classification. EfficientNetV2 achieves the highest performance across all metrics.

More »

Expand

Fig 6.

Validation Loss Comparison Across Models.

This figure compares the validation loss curves of our proposed model with several state-of-the-art convolutional neural networks.

More »

Expand

Table 2.

Ablation study of MSFF and CBAM modules on EfficientNetV2. The combination of both modules achieves the best performance across all evaluation metrics.

More »

Expand

Table 3.

Ablation study on a challenging subset with complex backgrounds and small herbal targets. MSFF and CBAM modules improve classification performance significantly, especially when combined.

More »

Expand

Fig 7.

Visualization of feature maps before and after the MSFF and CBAM modules.

Top row: feature maps after MSFF and CBAM show focused, high-response activations aligned with the herbal target regions, even in complex backgrounds. Bottom row: feature maps extracted prior to the enhancement modules exhibit diffuse attention with weak localization.

More »

Expand