Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Lychee13-3634: A new lychee image dataset and classification methodological evaluation

  • Shaoye Luo,

    Roles Conceptualization, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Supervision

    Affiliations College of Computer and Data Science, Putian University, Putian, Fujian, China, Engineering Research Center for Big Data Application in Private Health Medicine of Fujian Universities, Putian University, Putian, Fujian, China

  • Hanling Zheng,

    Roles Conceptualization, Data curation, Methodology, Writing – original draft, Writing – review & editing

    Affiliation College of Computer and Data Science, Putian University, Putian, Fujian, China

  • Ziyang Lin,

    Roles Data curation, Methodology, Validation, Writing – original draft

    Affiliation College of Computer and Data Science, Putian University, Putian, Fujian, China

  • Tingting Zeng,

    Roles Data curation, Writing – original draft

    Affiliation College of Computer and Data Science, Putian University, Putian, Fujian, China

  • Miaomiao Huang,

    Roles Conceptualization, Writing – original draft, Writing – review & editing

    Affiliation College of Computer and Data Science, Putian University, Putian, Fujian, China

  • Yongyi Xiao,

    Roles Funding acquisition, Supervision, Validation, Writing – review & editing

    Affiliations Artificial Intelligence College, Putian University, Putian, Fujian, China, Faculty of Data Science, City University of Macau, Macau, China

  • Antoni Grau Senior Member, IEEE,

    Roles Formal analysis, Supervision, Writing – review & editing

    Affiliation Department of Automatic Control, Robotics and Computer Vision, Polytechnic University of Catalonia, Barcelona, Spain

  • Jiayan Huang

    Roles Conceptualization, Funding acquisition, Investigation, Methodology, Visualization, Supervision, Writing – review & editing

    jyan_huang@163.com

    Affiliations College of Computer and Data Science, Putian University, Putian, Fujian, China, Department of Automatic Control, Robotics and Computer Vision, Polytechnic University of Catalonia, Barcelona, Spain, Fujian Ruituo Information Technology Co., Ltd., Putian, Fujian, China

Abstract

The rapid and accurate classification of lychee varieties is crucial for improving production efficiency and optimizing market supply. Especially for the main production areas of lychee, efficient lychee classification is more urgent. However, there is currently no publicly available comprehensive and diverse lychee benchmark dataset for precise training of classification models. To fill this gap, this work constructs a comprehensive lychee image dataset (Lychee13-3634), which covers 13 varieties and 3634 images. Different from the general fruit datasets, which show significant differences in features between their fruit images, Lychee13-3634 highlights minor inter-class differences among various lychee varieties. Based on this dataset, we applied 20 advanced deep learning-based classification models to validate its availability and effectiveness. Meanwhile, we comprehensively evaluated and provided meaningful insights about all models. Experimental results show that EfficientNetv2 has the best classification performance with an accuracy of up to 99.90%. Besides, we further comprehensively analyzed the balance of Lychee13-3634, and the corresponding experiments demonstrate that a more balanced dataset usually leads to better classification performance of the model. In summary, Lychee13-3634 provides benchmark training data for the lychee image classification task and demonstrates the effective application of existing deep learning classification models, providing reference and inspiration for other agricultural product image recognition research. Our Lychee13-3634 and all evaluation models are available at https://github.com/jyanhuang/Lychee13-3634.

Introduction

As one of the most popular fruits in tropical and subtropical regions, Lychee is not only loved by consumers for its sweet taste, but also attracts attention for its rich nutritional value. Precise lychee classification can accurately identify the appearance, color, and shape, effectively avoiding the subjectivity and errors associated with traditional methods, while high speed and efficiency enable faster quality inspection in production. Therefore, high accuracy and efficiency play an indispensable role in production. Recently, with the rapid development of artificial intelligence (AI) technologies, such as machine learning and deep learning [1,2] have been widely applied in image classification and identification tasks to improve the accuracy and efficiency [35]. For the task of lychee image classification, advanced technology can be used to achieve more accurate and faster classification, laying the foundation for the intelligent development of the lychee industry [6]. However, despite significant progress in image recognition technology [711] for classifying other agricultural products, research on lychee image classification still faces many challenges. Compared to other fruits, lychee has many varieties and subtle differences in appearance characteristics between different varieties, such as color, texture, shape, and so on. Traditional classification methods often rely on human experience, which is not only inefficient but also susceptible to subjective factors. Therefore, building a dataset that comprehensively covers different lychee varieties and appearance characteristics is crucial to promoting the development of lychee image classification technology.

The existing dataset containing lychee images usually only treats lychee as a rough category among many fruit classifications, which is difficult to meet the multi-variety lychee classification tasks in the market. In other words, currently, lychee datasets have the problems of limited data and incomplete variety coverage [12,13], which limits the generalization ability of machine learning and deep learning models in lychee variety recognition [14]. Besides, data imbalance also affects the recognition accuracy of the model for a few categories of lychee varieties [15]. To make up for these problems, this study innovatively proposes a comprehensive lychee dataset (Lychee13-3634) containing 13 typical varieties and 3634 images. Compared to other datasets, Lychee13-3634 is a more comprehensive and unified lychee dataset. For example, Fruits-360 has not explicitly disclosed their image resolution, while Lychee13-3634 uniformly uses a high-definition resolution of 400 × 400 pixels. This is more in line with the input requirements and helps improve the training performance of classification models. Moreover, in terms of sample balance, there is a significant difference in the number of samples for different fruit categories in Fruits-360, while the Lychee13-3634 has achieved a class imbalance rate (IR) of only 1.6 through careful construction, demonstrating a relatively more balanced class distribution. This is crucial for training a comprehensive and accurate lychee classification or recognition model. To sum up, through the detailed data acquisition and screening process, we ensured the accuracy and diversity of the Lychee13-3634, and provided a new benchmark and test platform for lychee image classification research.

In addition, to verify the effectiveness and reliability of the Lychee13-3634, we used 21 state-of-the-art classification models, including 6 typical ResNet-series, 10 deep learning-series, and 5 YOLO-series models for lychee image classification training. During the training, we used a series of performance indicators to comprehensively evaluate the classification performance and efficiency of each model. The results of these indicators directly reflect the performance of each model in lychee image classification. Through in-depth analysis and interpretation of these indicators, we not only understand the advantages and disadvantages of each model in the classification task, but also deeply explore the reasons for the performance differences between different models. This detailed analysis not only helps us better understand the characteristics of each model, but also provides a useful reference for future research, which helps to further promote the development of lychee image classification technology.

Related exisiting fruit datasets

In this section, we conducted research and summarized existing commonly used fruit image datasets (including/excluding lychee images). Table 1 provides detailed information about each dataset, including publication year, classes, class number, total image number, image size, format, method type (classification, segmentation, or recognition), and including/excluding lychee images.

thumbnail
Table 1. The detailed information of different existing fruit datasets.

https://doi.org/10.1371/journal.pone.0334900.t001

Fruits-360 dataset [12] is a public baseline dataset for classification tasks. It contains 94,110 images of 141 different fruit, vegetable, and nut classes, with 70,491 for training and 23,619 for testing, respectively. All images were taken from different angles and lighting conditions, providing a rich visual diversity. The Fruits-360 has a pure white image background, and each image contains only one complete fruit, and only 490 lychee images without further precise category labels. This data collection method guarantees the representation of color, texture, and shape, which is well suited for training and evaluating fruit classification models. Its size and diversity support the effective performance of deep learning models in fruit recognition tasks, commonly used for benchmarking and research development.

Fruits-262 dataset [13] is an image dataset containing 262 different categories of fruits. It is designed for computer vision tasks such as image classification and object recognition. The dataset consists of natural background images, each containing multiple fruits, with an average number of images of 861, a median of 1007, and a standard deviation of 276. Building the dataset involved crawling images from the Internet, automatic and manual filtering, and image resizing. This dataset can be used to train a convolutional neural network (CNN) model to recognize and classify different fruits. It can also serve as a valuable resource for studying computer vision and machine learning algorithms.

FruitNet: Indian-Fruits-Dataset-With-Quality dataset [16] address the need for high-quality fruit images. It contains 14,700 high-resolution images of 6 popular Indian fruits. The dataset is divided into 3 subfolders based on fruit quality: 1) good quality, 2) poor quality, and 3) uneven quality. Each subfolder contains images of 6 fruits: apple, banana, guava, lime, orange, and pomegranate. All images are taken with a high-end resolution camera under various backgrounds and lighting conditions and are valuable for training, testing, and validating fruit classification or reconstruction models.

Fruits-Images dataset [17] is widely used in computer vision and deep learning research, including 360 high-quality images of a category of common fruits, such as apples, bananas, oranges, kiwifruit, and strawberries, with a single image containing multiple fruits of the same variety. Despite the relatively small number of images, different shooting conditions and angles provide valuable training data for developing and evaluating classification models. Fruits-Images has accurate class labels, supports supervised learning tasks, and contributes to the progress of image classification techniques and the improvement of fruit recognition techniques.

Fruit-Classifiation dataset [18] is designed for image classification and contains 22,495 images from 33 categories such as watermelon, dragon fruit, and corn. Most of the images are with a size of 100 × 100, and there are references to Fruits-360 images that often include rotations to aid training and improve usability. The main application of this dataset is to improve the accuracy of fruit image recognition. In practice, it is common to perform preprocessing steps such as normalization and augmentation to optimize model performance.

Comprehensive-Fruit-Image dataset [19] is a valuable resource that provides support for image classification and recognition models. It contains 20 different fruit categories and uses specific fruit names as search criteria to carefully pick and download images from the DuckDuckGo search engine to ensure a diverse and high-quality dataset. In particular, each fruit category contains a large number of pulp images for efficient model training and testing. The images in the dataset are stored in high-resolution format and organized by fruit type for use during model training and evaluation. This dataset can be used for a variety of machine learning tasks, such as image classification and recognition, as well as enhancement of computer vision models. It is also a practical resource for students and educators in the field of machine learning and computer vision. By providing a diverse and curated collection of fruit images, this dataset aims to facilitate significant progress in the field of computer vision and machine learning. Researchers and developers can use this dataset to build robust, accurate, and efficient image recognition systems.

To sum up, the existing fruit and vegetable datasets contain few specific lychee varieties, or the variety is not detailed enough. It just exists as a fruit and vegetable category. For example, the Fruits-360 dataset, which is widely used for fruit classification, contains a large number of fruit images with distinct visual features, while lychee varieties exhibit high similarity and low inter-class differences. In contrast, lychee varieties often exhibit subtle differences in color, texture, and shape, posing unique challenges for lychee classification. This highlights the need for a specialized dataset for lychee classification. Besides, with the advancement of agricultural intelligence, fast and accurate classification of lychee varieties is of great significance for optimizing market supply, reducing labor consumption, and improving production efficiency. Effective recognition of different varieties of lychee images can meet the practical needs of agricultural production, quality control, market classification, and consumer choice. Therefore, a more refined lychee dataset needs to be proposed.

The proposed dataset: Lychee13-3634

In this section, we first introduce lychee image acquisition and then introduce the lychee image background preprocessing. Finally, we present the overview of Lychees13-3634 and analyze the balance of it. The overall flowchart of this work is shown as Fig 1.

thumbnail
Fig 1. The overall flowchart for Lychee13-3634 construction and classification application.

It illustrates the complete process of this work, which mainly includes four stages, i.e., lychee image acquisition, preprocessing, training-test set segmentation, and classification model training.

https://doi.org/10.1371/journal.pone.0334900.g001

Lychee image acquisition

To solve the problem of lychee non-fine-grained classification, we construct a new lychee image dataset (Lychee13-3634), which aims to provide high-quality training and testing data for the deep learning model of lychee image classification or recognition. Notably, the construction of Lychee13-3634 is mainly based on the three aspects of market prevalence, sales volume, and availability. Specifically, we selected 13 varieties widely distributed in southern China to ensure the universality and sufficient data samples for collection. Moreover, their extensive market liquidity makes sample collection convenient. Meanwhile, the existing relative researches provide reference for data annotation, analysis, and model training, which greatly improves data availability and helps to build a high-quality lychee dataset.

All the lychee images are freshly picked and taken with a high-resolution camera under indoor lighting conditions. Specifically, the capture camera is a 48-million pixel main camera with a 24 mm focal length and f/1.78 aperture. The advanced camera configurations ensure the high quality of image acquisition and provide a solid foundation for the reliability of experimental results. During shooting, special attention was paid to factors such as illumination uniformity and background conciseness to ensure clear image quality, true color, and to reflect the real appearance characteristics of lychee.

The proposed Lychee13-3634 dataset was collected by the authors themselves. All lychee image data were purchased from publicly available markets in strict compliance with relevant laws, regulations, and ethical guidelines. The purpose of the Lychee13-3634 collection is limited to scientific research, aiming to improve the accuracy and efficiency of lychee image classification and recognition.

Lychee image preprocessing

To facilitate the training of the deep learning model, we pre-process the lychee images. Including background preprocessing, uniformizing the image size to a resolution suitable for model input (400×400 pixels). In addition, we also randomly divided the Lychee13-3634 into a training set (80%) and a test set (20%) to meet the needs of different research stages. In particular, for the specific background pre-process, we provide its pseudocode implementation in Algorithm 1.

Algorithm 1 Background preprocessing of lychee image.

Require: original lychee image I with size of , color difference threshold K.

Ensure: processed lychee image J.

1: initialize a two-dimensional array Imap with a size of

  and set all elements to 2 (representing foreground), while

  set the upper left corner element (Imap[0][0]) to 0

  (representing background).

2: J = I

3: for to do

4:   for to H−1 do

5:   

6:    if x<W−1 then

7:    

8:     EuclideanDistance()

9:     if D<K and == 2 and then

10:      J[x + 1][y] = 255

11:     end if

12:    end if

13:    if y<H−1 then

14:    

15:     EuclideanDistance()

16:     if D<K and and then

17:      J[x][y + 1] = 255

18:     end if

19:    end if

20:   end for

21: end for

22: return J

function EuclideanDistance(X, Y) = return

Noted: This background preprocessing approach assumes uniform backgrounds and that alternative methods may be needed for complex field conditions, such as varying lighting or occlusions.

As shown in Fig 2, it provides a comparison of lychee samples before and after background preprocessing. Obviously, after background preprocessing, the noise issues such as shadows caused by unstable lighting during image capture have been balanced. This can make the dataset more standardized and unified, thereby reducing bias caused by the dataset during model training.

thumbnail
Fig 2. Comparison of lychee samples before and after background preprocessing.

It can be seen that before processing, there are interferences such as referencing, uneven lighting, and blurring in the image background. While after processing, it illustrates clearly and uniformly, making it easy for fine-tuning in other research.

https://doi.org/10.1371/journal.pone.0334900.g002

Lychee13-3634 overview

Lychee13-3634 is a comprehensive and standardized lychee image dataset, which contains 3634 images, covering 13 representative varieties. These varieties have high popularity and economic value in the market, covering the main types and variations of lychee. Table 2 and Fig 3 provide a detailed summary of Lychee13-3634. And Fig 4 shows 13 different varieties of single and multiple lychee samples, respectively.

thumbnail
Fig 3. The distribution bar chart of the proposed Lychee13-3634.

Each category includes different numbers of images with a single lychee and images containing multiple lychees of the same variety, and the average number of single lychee images is around 220, while the average number of multiple lychee images is around 60.

https://doi.org/10.1371/journal.pone.0334900.g003

thumbnail
Fig 4. Examples of 13 different categories of single-lychee and multiple-lychees images from Lychee13-3634.

(a) Blackleaf. (b) Chicken Mouth. (c) Crystalball. (d) Feizixiao. (e) Guiwei. (f) HangingGreen. (g) LycheeKing. (h) Nuomiei. (i) Seedless. (j) TopScoreRed. (k) WhiteSuggerPoppy. (l) Whitewax. (m) Xianjinfeng. The first row displays images of a single lychee of each category, while the second row displays images of multiple-lychees of the same variety. It can be seen that the differences between most categories are small and difficult to distinguish directly.

https://doi.org/10.1371/journal.pone.0334900.g004

thumbnail
Table 2. The specific details of our proposed Lychee13-3634 Dataset.

https://doi.org/10.1371/journal.pone.0334900.t002

To the best of our knowledge, the proposed Lychee13-3634 dataset is the first benchmark dataset for fine-grained classification of lychee to date. However, there are still certain limitations. On one hand, it does not include information on the maturity of the lychees, which may affect the model’s generalizability. On the other hand, the sizes and the production area of different lychee varieties were not recorded during data collection. This lack of information may limit the dataset’s applicability in certain research scenarios. Additionally, while efforts were made to ensure data quality, potential biases may still exist due to factors such as uneven sample distribution or variations in shooting conditions.

Lychee13-3634 balance analysis

For further analysis, we use the imbalance rate (IR) [20], which is an important indicator to measure the imbalance degree of the dataset. IR is defined as the ratio of the number of samples in the majority class to that in the minority class. A higher IR value typically leads to poorer classification performance, especially for minority classes, and it is expressed as,

(1)

where and denote the maximum and the minimum sample size in a set, respectively. The closer the IR value is to 1, the more balanced the dataset becomes. The specific IR ratios of Lychee13-3634 are shown in Table 3. It can be found that the IR value of ’single’ category (image only contains a single lychee) and ’multiple’ category (image contains multiple lychees of the same variety) is 3.6 (the maximum category is ’single’, the minimum category is ’multiple’). The IR of the entire dataset (both ’single’ and ’multiple’ categories) is 1.6 (the maximum category is ’HangingGreen lychee’, the minimum category is ’Guiwei lychee’). The IR of only the ’single’ category is 1.7 (the maximum category is ’HangingGreen lychee’, the minimum category is ’Guiwei lychee’), and the IR of only the ’multiple’ category is 1.9 (the maximum category is ’TopScorerRed lychee’ and the minimum category is ’Whitewax lychee’). In summary, Lychee13-3634 is a relatively balanced dataset compared to other fruit datasets. Data balance is crucial for ensuring that models perform well across all categories, especially for datasets with fewer categories.

Experimental results on Lychee13-3634

To create a high-quality and highly representative dataset system for experimental validation, we adopt stratified random sampling to divide Lychee13-3634 into a fixed 80% (2907 images for training) and 20% (727 images for testing) to maintain the variety distribution similar to the original dataset.

Image classification models

To validate the applicability and effectiveness of Lychee13-3634, we selected 16 representative or recent deep learning-based classification models [2123], including ResNet152, ResNet101, ResNet50, ResNet34, ResNet18 [24], EfficientNetv2-s, EfficientNetv2-m, EfficientNetv2-l [25], SENet [26], Vision Transformer [27], Res-Att-Net [28], MobileViT [29], SqueezeNet [30], ShuffleNetv2 [31], MobileNetV2 [32], and MobileNetV4 [33], as well as 5 YOLO (-v5 [34], -v7 [35], -v8 [36], -v9 [37], and -v10 [38]) algorithms for experiments. These classifiers were chosen for their superior performance, lower computational complexity, and good generalization ability, particularly in classification tasks. Specifically, ResNet series solves the problem of gradient disappearance in traditional neural networks through residual connections and can conduct deeper training on lychee images and achieve better feature extraction. Moreover, different depth versions of ResNet can be flexibly adapted to the size of the dataset and computing resources, an advantage over many other architectures. Deep learning-based models have the unique advantages of attention mechanisms and lightweight design, meeting the different analysis requirements of the lychee dataset. YOLO series has excellent real-time object detection performance and can provide fast and accurate lychee classification results, which is difficult for most other architectures to achieve. All the compared methods were coded with the PyTorch framework. To ensure optimal performance of each model, we retained their original hyper-parameter settings, as detailed in Table 4. Notably, in the last column, the data marked as ’Fixed (8:2)’ means that the method uses a fixed split of 8:2 for training/testing sets.

thumbnail
Table 4. The detailed hyper-parameter settings of different models.

https://doi.org/10.1371/journal.pone.0334900.t004

Experimental evaluation indicators

Classification effectiveness evaluation indicators. In this paper, we use common evaluation indicators [39,40], including Precision (Pre.), Recall [41] (Rec.), F1-Score [42] (F1.), Accuracy [43] (Acc.), top-1 and top-5 [44], to comprehensively evaluate and analyze the classification performance of different models on both the Fruits-360 and our Lychee13-3634. In particular, top-1 accuracy is the proportion of samples for which the model’s highest predicted probability class matches the actual class, while top-5 accuracy is that for which the actual class is among the top five predicted probability classes. The two indicators are particularly suitable for multi-classification tasks. They reflect the model’s performance in single predictions and candidate predictions, respectively. All the above evaluation indicators can respectively be defined as,

(2)(3)(4)(5)

Model performance evaluation indicators. The number of model parameters (Params) and the model running time (Times) are used for the model performance evaluation, where Params refers to the total number of trainable parameters contained in the model. It directly affects the model’s complexity and computational cost. In the lychee classification task, choosing a model with a moderate parameter count can reduce computational resource consumption while ensuring performance. And Times refers to the duration required for the model to make a prediction, including forward propagation and any necessary post-processing steps. In real-time application scenarios, time is a critical performance metric. Shorter prediction times mean the model can respond to queries more quickly, enhancing user experience.

Lychee13-3634 evaluation

In this section, we apply 16 typical state-of-the-art deep learning-based models and 5 YOLO-series models for multi-class analysis on both our Lychee13-3634 and the existing Fruits-360.

Classification results on Lychee13-3634.

To comprehensively evaluate the effectiveness and reliability of the constructed Lychee13-3634, we introduced a variety of existing ResNet-based, deep learning-based, and YOLO-series models to carry out lychee image classification experiments. It is noted that for the YOLO series models, we first annotate the Lychee13-3634 and then use five common YOLO series models to carry out the lychee classification. The input image size is set to 256 × 256. The classification results are shown in Table 5, it can be found that SqueezeNet shows the worst performance with an accuracy of 71.53%. This may be due to its simple architecture, limited model capacity, and lack of targeted optimization for multi-classification tasks, resulting in unsatisfactory classification results for Lychee13-3634. In contrast, EfficientNetv2-s shows excellent performance with an accuracy of 99.99%. Although EfficientNetv2-m has the same performance as EfficientNetv2-s, it has significant expansion in parameter number (from 21.46M to 54.14M), which directly leads to the rise of its calculation cost and the extension of the average prediction time. For the YOLO series, it can be seen that all the YOLO algorithms achieve good classification performance on Lychee13-3634. In particular, YOLOv7 and YOLOv10 stand out in both precision and mAP50, with both values of 98%, showing a strong classification ability.

thumbnail
Table 5. The classification results of different models on Lychee13-3634.

https://doi.org/10.1371/journal.pone.0334900.t005

To sum up, from the comprehensive consideration of classification accuracy and model efficiency, it can be considered that the EfficientNetv2-s model has more prominent advantages in the task of lychee image classification. To observe the excellent classification performance of EfficientNetv2-s more intuitively, we provide its confusion matrix as shown in Fig 5. It can be observed that the best model has the highest probability of misclassifying Guiwei as Heiye (8%), followed by misclassifying Chickenmouth as Xianjinfeng (2%). Fig 6 gives the visual examples of the common misclassified lychee varieties. It can be speculated that such misclassification is caused by the similarity of lychees themselves (Guiwei and Blackleaf), as well as uncontrollable factors such as light control, shooting distance, and freshness during the dataset acquisition process. This is also an important direction for the further improvement of the dataset in the future.

thumbnail
Fig 5. The confusion matrix of the best performing model EfficientNetv2-s.

It can be found that except for the misclassification probability of 0.08 for ’ChickenMouth_1ychee’ as ’Xianjinfeng_lychee’ and 0.02 for ’Guiwei_1ychee’ as ’Blackleaf_lychee’, all other diagonal values are 1.0, which proves the strong strength and reliability of EfficientNetv2-s in lychee classification tasks.

https://doi.org/10.1371/journal.pone.0334900.g005

thumbnail
Fig 6. Two examples of common misclassify lychee varieties.

(a) shows the easily confused Guiwei and Blackleaf varieties, which have similar textures and sizes. (b) shows the easily confused ChickenMouth and Xianjinfeng varieties, which have similar peel colors. This shows that the misclassification of lychees may not only be caused by the natural texture itself, but also by external factors such as freshness, angle, distance, and lighting when shooting.

https://doi.org/10.1371/journal.pone.0334900.g006

In addition, to verify whether the performance differences between different models are significant, we use K-fold cross validation (K=5 in this paper) for statistical testing, that is, through multiple iterations, each image can participate in both training and testing to avoid the random deviation of a single division. It is noted that, considering the uniformity of various indicators for different models and the fact that other indicators can be indirectly calculated through Acc. and Pre., we only use Acc. and Pre. as the result indicators of the statistical test of model differences. As shown in Table 6, the analysis of variance (ANOVA) based on the K-fold experimental results shows that the performance differences between the models are extremely statistically significant (), that is, the average accuracy of high-performance models such as EfficientNetv2-l and SENet exceeds 99%, which is 2.46% higher than the average accuracy of medium-performance models such as ResNet-152 (96.56%), and the difference is significant, while the accuracy of lightweight models (such as SqueezeNet) is less than 75%, which is more than 15% lower than other models. Among the same type of models, there is no significant difference in performance between YOLOv5 and YOLOv9, but the former is more stable. VisionTransformer has a slight advantage over traditional CNN-based methods but is inferior to high-performance CNN-based methods. This indicates that there is a real gap in the model architecture’s ability to extract features from lychee images. In practical applications, models with significant advantages can be selected based on accuracy requirements and resource constraints.

thumbnail
Table 6. The average accuracy and precise of different models based on K-fold (K=5) cross validation.

https://doi.org/10.1371/journal.pone.0334900.t006

Recognition results on Lychee13-3634.

YOLO-series models are renowned for their superior object recognition capabilities. To further validate the wide availability of Lychee13-3634, we perform the lychee image recognition task by training the five different YOLO models on it. The recognition results are shown in Table 7. It can be found that YOLOv10 gets the best comprehensive recognition results. To some extent, the data from Table 7 also reflects the applicability of Lychee13-3634 in image recognition.

thumbnail
Table 7. The recognition results of YOLO-series models on Lychee13-3634.

https://doi.org/10.1371/journal.pone.0334900.t007

Classification results on Fruits-360.

To verify the effectiveness of our Lychee13-3634 and explore the generalization ability of each model on different datasets, we used another dataset (Fruits-360 [12]), which is widely recognized in the field of fruit classification, to carry out an extended experiment. The Fruits-360 has a wide range of fruit types and rich images, which can provide a variety of test scenarios for the classification model.

The classification results on Fruits-360 are shown in Table 8, it is worth noting that EfficientNetv2-s once again shows excellent performance, and its accuracy is as high as 99.9%. It can be seen that in the field of classification of a variety of fruits or lychee, EfficientNetv2 shows the best effect and has the highest applicability. However, the performance of ResNet34 is relatively inferior, and its accuracy rate is 91.2%. Despite SqueezeNet performing poorly in the lychee classification, it shows adaptability to a variety of fruit classifications, which may be related to the differences in dataset characteristics. In summary, the classification results of each model on Lychee13-3634 are consistent with those on Fruits-360, which reflect the similarities and differences between the two datasets. This consistency strongly proves the effectiveness of the lychee dataset proposed in this paper in capturing lychee characteristics, and its generalization is equivalent to other widely recognized fruit datasets (such as Fruits-360). Notably, due to the huge amount of data in the Fruits-360 and the lack of label files directly applicable to the YOLO-series model, we did not use the YOLO-series model for testing in this study.

thumbnail
Table 8. The classification results of different models on Fruits-360.

https://doi.org/10.1371/journal.pone.0334900.t008

Based on experimental results on Lychee13-3634 and Fruits-360, it can be observed the significant differences in performance between different models. Specifically, in terms of classification performance, Resnet-152 successfully alleviated the gradient disappearance problem due to its sufficient depth and effective residual structure. Res-Att-Net is more consistent with the data characteristics. This demonstrates that the feature distribution and complexity of the dataset have a direct impact on the model performance. EfficientNetv2 series performs well in both datasets, demonstrating the wide adaptability of its architecture to different datasets. VisionTransformer has a low recall on Fruits-360, probably because of its weak ability to capture local detailed features. In terms of parameter optimization, the general strategy is dynamically adjusting the learning rate, such as using a large learning rate at the initial stage of training to converge quickly, while reducing the learning rate at the later stage to avoid missing the optimal solution. The weight initialization can be selected according to model structure. For instance, residual network adjusts jump connections or adds batch normalization, and EfficientNetv2 series optimizes data enhancement strategy to further improve model performance.

Model efficiency evaluation

In addition, we also comprehensively compare the parameter number and running time of each model on both Lychee13-3634 and Fruits-360, as shown in Table 9. On Lychee13-3634, the Res-Att-Net had the shortest average classification time (0.0011s), followed by MobileNet (0.0014s). Although EfficientNetv2-l has excellent classification performance, the average time of it is the longest (1.0764s). On Fruits-360, SqueezeNet has the shortest average time (0.0001s). Similarly, the EfficientNet series still takes the longest time (0.9758s). In terms of the parameter number, SqueezeNet is the least (0.73M). The EfficentNetv2-l model has the most parameters (118.52M). In summary, EfficientNetv2 has the best classification performance, but it also has the longest running time and the most parameters.

thumbnail
Table 9. Comparisons of parameter number and time consumption of different models on both Lychee13-3634 and Fruits-360.

https://doi.org/10.1371/journal.pone.0334900.t009

Ablation studies

In this section, we conducted ablation studies on different background preprocessing and dataset balance of Lychee13-3634.

Impact of dataset preprocessing on model.

To evaluate the impact of different background preprocessing techniques on the classification performance, we compared the accuracy of three different models (SqueezeNet, EfficientNetV2-S, and YOLOv8) without background preprocessing, black background preprocessing, and white background preprocessing, respectively. The three selected models cover classifiers of different performance levels, and thus it can more comprehensively analyze the impact of background processing and model architecture on classification performance.

As shown in Table 10, it can be found that background processing has a significant impact on model performance. Specifically, white background preprocessing can improve the classification accuracy, while black background preprocessing leads to performance degradation. Moreover, the adaptability of different models to background changes varies. EfficientNet shows the most stable performance under different background treatments. SqueezeNet is sensitive to background changes. YOLOv8 performs well in unprocessed backgrounds, but has limited adaptability in black backgrounds.

thumbnail
Table 10. The accuracy comparison of three typical models based on different background preprocessing for Lychee13-3634.

https://doi.org/10.1371/journal.pone.0334900.t010

Impact of dataset IR on model.

To evaluate the impact of dataset balance on model classification performance, we aligned the number of samples in all categories of Lychee13-3634 with the category with the smallest number of samples in Lychee13-3634, and constructed a balanced dataset. We respectively chose one algorithm from the ResNet series (ResNet50), deep learning series (MobileViT), and YOLO series (YOLOv8) as an example to conduct an ablation study.

As shown in Table 11, after dataset balancing, the accuracy, precision, and F1-score of the three models have been improved to varying degrees. This indicates that the balance of the dataset is crucial for improving the generalization and classification performance of the model. Future research will further explore how to optimize the balance of the dataset through data augmentation and sampling techniques, thereby improving the robustness and classification accuracy of the model.

thumbnail
Table 11. The accuracy comparison of 3 typical models under different IR of Lychee13-3634.

https://doi.org/10.1371/journal.pone.0334900.t011

Conclusion

In this paper, we construct a comprehensive and diverse benchmarking lychee image dataset (Lychee13-3634), which consists of 13 varieties and 3634 images with a uniform format. To our knowledge, Lychee13-3634 is currently the only standard and publicly available dataset for lychee image classification. To demonstrate the availability and effectiveness of Lychee13-3634, we conducted classification experiments on 21 typical and recent classification models based on it and the existing fruit dataset (Fruits-360). Experimental results show that different models have similar classification performance on the two datasets, which reflects the usability of our Lychee13-3634. Besides, we also conduct ablation studies on the different preprocessing technologies and dataset IR, and provide meaningful insights for laying the foundation for future lychee classification algorithms. Although Lychee13-3634 shows good universality in classification tasks, it still has certain limitations, such as the lack of information on lychee maturity, size, and harvesting location, which may limit its applicability in certain research scenarios. Besides, since the experiments were conducted under controlled laboratory conditions, it may not fully represent the challenges encountered in real-world agricultural deployment scenarios. Therefore, future work will focus on investigating the model robustness under varying field conditions to bridge the gap between laboratory and real-world applications.

References

  1. 1. Wang C-H, Huang K-Y, Yao Y, Chen J-C, Shuai H-H, Cheng W-H. Lightweight deep learning: an overview. IEEE Consumer Electron Mag. 2024;13(4):51–64.
  2. 2. Kotsiantis SB, Zaharakis I, Pintelas P. Supervised machine learning: a review of classification techniques. Emerging Artificial Intelligence Applications in Computer Engineering. 2007;160(1):3–24.
  3. 3. Bhargava A, Bansal A. Fruits and vegetables quality evaluation using computer vision: a review. Journal of King Saud University - Computer and Information Sciences. 2021;33(3):243–57.
  4. 4. Kumari RSS, Gomathy V. Fruit shop tool: Fruit classification and recognition using deep learning. Agricultural Engineering International: CIGR Journal. 2023;25(2):312–21.
  5. 5. Taşcı B, Acharya MR, Datta Barua P, Metehan Yildiz A, Veysel Gun M, Keles T, et al. A new lateral geniculate nucleus pattern-based environmental sound classification using a new large sound dataset. Applied Acoustics. 2022;196:108897.
  6. 6. Guo Q, Chen Y, Tang Y, Zhuang J, He Y, Hou C, et al. Lychee fruit detection based on monocular machine vision in orchard environment. Sensors (Basel). 2019;19(19):4091. pmid:31546669
  7. 7. Ordóñez FJ, Roggen D. Deep convolutional and LSTM recurrent neural networks for multimodal wearable activity recognition. Sensors (Basel). 2016;16(1):115. pmid:26797612
  8. 8. Jafri R, Arabnia HR. A survey of face recognition techniques. Journal of Information Processing Systems. 2009;5(2):41–68.
  9. 9. Maggiori E, Tarabalka Y, Charpiat G, Alliez P. Convolutional neural networks for large-scale remote-sensing image classification. IEEE Trans Geosci Remote Sensing. 2017;55(2):645–57.
  10. 10. Driscoll JR, Healy DM. Computing fourier transforms and convolutions on the 2-sphere. Advances in Applied Mathematics. 1994;15(2):202–50.
  11. 11. Shafiq M, Gu Z. Deep residual learning for image recognition: a survey. Applied Sciences. 2022;12(18):8972.
  12. 12. Rathnayake N, Rathnayake U, Dang TL, Hoshino Y. An efficient automatic Fruit-360 image identification and recognition using a novel modified cascaded-ANFIS algorithm. Sensors (Basel). 2022;22(12):4401. pmid:35746183
  13. 13. Elhamouly S, Meselhy E, Gamal F. A comparative study of state-of-the-art algorithms for plant recognition and classification on a large dataset. IJCI International Journal of Computers and Information. 2023;10(3):164–74.
  14. 14. Xiong J, Zou X, Chen L, Guo A. Recognition of mature litchi in natural environment based on machine vision. Transactions of the Chinese Society for Agricultural Machinery. 2011;42(9):162–6.
  15. 15. Barbedo JGA. Impact of dataset size and variety on the effectiveness of deep learning and transfer learning for plant disease classification. Computers and Electronics in Agriculture. 2018;153:46–53.
  16. 16. Meshram V, Patil K. FruitNet: Indian fruits image dataset with quality for machine learning applications. Data Brief. 2021;40:107686. pmid:34917715
  17. 17. Kausar A, Sharif M, Park J, Shin DR. Pure-cnn: a framework for fruit images classification. In: 2018 International Conference on Computational Science and Computational Intelligence (CSCI). 2018. p. 404–8.
  18. 18. Gulzar Y. Fruit image classification model based on MobileNetV2 with deep transfer learning technique. Sustainability. 2023;15(3):1906.
  19. 19. Wijaya YF, Hindarto D. Advancing fruit image classification with state-of-the-art deep learning techniques. SinkrOn. 2024;8(2):1125–34.
  20. 20. Zhu R, Guo Y, Xue J-H. Adjusting the imbalance ratio by the dimensionality of imbalanced data. Pattern Recognition Letters. 2020;133:217–23.
  21. 21. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521(7553):436–44. pmid:26017442
  22. 22. Sarker IH. Deep learning: a comprehensive overview on techniques, taxonomy, applications and research directions. SN Comput Sci. 2021;2(6):420. pmid:34426802
  23. 23. Espinoza S, Aguilera C, Rojas L, Campos PG. Analysis of fruit images with deep learning: a systematic literature review and future directions. IEEE Access. 2024;12:3837–59.
  24. 24. He K, Zhang X, Ren S, Sun J. Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2016. p. 770–8. https://doi.org/10.1109/cvpr.2016.90
  25. 25. Tan M, Le Q. Efficientnetv2: Smaller models and faster training. In: Proceedings of the international conference on machine learning. PMLR; 2021. p. 10096–106.
  26. 26. Hu J, Shen L, Sun G. Squeeze-and-excitation networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2018. p. 7132–41.
  27. 27. Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T. 2020. https://arxiv.org/abs/2010.11929
  28. 28. Wang F, Jiang M, Qian C, Yang S, Li C, Zhang H. Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017. p. 3156–64.
  29. 29. Mehta S, Rastegari M. In: 2021. https://doi.org/arXiv:2110.02178
  30. 30. Iandola FN, Han S, Moskewicz MW, Ashraf K, Dally WJ, Keutzer K. 2016. https://doi.org/arXiv:1602.07360
  31. 31. Ma N, Zhang X, Zheng HT, Sun J. Shufflenet v2: practical guidelines for efficient cnn architecture design. In: Proceedings of the European Conference on Computer Vision. 2018. p. 116–31.
  32. 32. Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC. Mobilenetv2: inverted residuals and linear bottlenecks.In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018. p. 4510–20.
  33. 33. Qin D, Leichner C, Delakis M, Fornoni M, Luo S, Yang F. MobileNetV4: universal models for the mobile ecosystem. In: Proceedings of the European Conference on Computer Vision. 2024. p. 78–96.
  34. 34. Redmon J, Divvala S, Girshick R, Farhadi A. You only look once: unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition; 2016. p. 779–88.
  35. 35. Zou Z, Chen K, Shi Z, Guo Y, Ye J. Object detection in 20 years: a survey. Proc IEEE. 2023;111(3):257–76.
  36. 36. Lawal OM, Zhao H, Zhu S, Chuanli L, Cheng K. Lightweight fruit detection algorithms for low-power computing devices. IET Image Processing. 2024;18(9):2318–28.
  37. 37. Chen Y, Xu H, Zhang X, Gao P, Xu Z, Huang X. An object detection method for bayberry trees based on an improved YOLO algorithm. International Journal of Digital Earth. 2023;16(1):781–805.
  38. 38. Lee J, Hwang K. YOLO with adaptive frame control for real-time object detection applications. Multimed Tools Appl. 2021;81(25):36375–96.
  39. 39. Cawley GC, Talbot NL. On over-fitting in model selection and subsequent selection bias in performance evaluation. The Journal of Machine Learning Research. 2010;11:2079–107.
  40. 40. Opitz J. A closer look at classification evaluation metrics and a critical reflection of common evaluation practice. Transactions of the Association for Computational Linguistics. 2024;12:820–36.
  41. 41. Sajjadi MS, Bachem O, Lucic M, Bousquet O, Gelly S. Assessing generative models via precision and recall. Advances in Neural Information Processing Systems. 2018;31.
  42. 42. Takahashi K, Yamamoto K, Kuchiba A, Koyama T. Confidence interval for micro-averaged F1 and macro-averaged F1 scores. Appl Intell (Dordr). 2022;52(5):4961–72. pmid:35317080
  43. 43. Gunawardana A, Shani G. A survey of accuracy evaluation metrics of recommendation tasks. Journal of Machine Learning Research. 2009;10(12):2935–62.
  44. 44. Zhang Y, Chen Z, Zhong Z. Type [arXiv:2107.03815]. 2021.