Figures
Abstract
Estimation of fruit quality parameters are usually based on destructive techniques which are tedious, costly and unreliable when dealing with huge amounts of fruits. Alternatively, non–destructive techniques such as image processing and spectral reflectance would be useful in rapid detection of fruit quality parameters. This research study aimed to assess the potential of image processing, spectral reflectance indices (SRIs), and machine learning models such as decision tree (DT) and random forest (RF) to qualitatively estimate characteristics of mandarin and tomato fruits at different ripening stages. Quality parameters such as chlorophyll a (Chl a), chlorophyll b (Chl b), total soluble solids (TSS), titratable acidity (TA), TSS/TA, carotenoids (car), lycopene and firmness were measured. The results showed that Red-Blue-Green (RGB) indices and newly developed SRIs demonstrated high efficiency for quantifying different fruit properties. For example, the R2 of the relationships between all RGB indices (RGBI) and measured parameters varied between 0.62 and 0.96 for mandarin and varied between 0.29 and 0.90 for tomato. The RGBI such as visible atmospheric resistant index (VARI) and normalized red (Rn) presented the highest R2 = 0.96 with car of mandarin fruits. While excess red vegetation index (ExR) presented the highest R2 = 0.84 with car of tomato fruits. The SRIs such as RSI 710,600, and R730,650 showed the greatest R2 values with respect to Chl a (R2 = 0.80) for mandarin fruits while the GI had the greatest R2 with Chl a (R2 = 0.68) for tomato fruits. Combining RGB and SRIs with DT and RF models would be a robust strategy for estimating eight observed variables associated with reasonable accuracy. Regarding mandarin fruits, in the task of predicting Chl a, the DT-2HV model delivered exceptional results, registering an R2 of 0.993 with an RMSE of 0.149 for the training set, and an R2 of 0.991 with an RMSE of 0.114 for the validation set. As well as for tomato fruits, the DT-5HV model demonstrated exemplary performance in the Chl a prediction, achieving an R2 of 0.905 and an RMSE of 0.077 for the training dataset, and an R2 of 0.785 with an RMSE of 0.077 for the validation dataset. The overall outcomes showed that the RGB, newly SRIs as well as DT and RF based RGBI, and SRIs could be used to evaluate the measured parameters of mandarin and tomato fruits.
Citation: Elmetwalli AH, Derbala A, Alsudays IM, Al-Shahari EA, Elhosary M, Elsayed S, et al. (2024) Machine learning-driven assessment of biochemical qualities in tomato and mandarin using RGB and hyperspectral sensors as nondestructive technologies. PLoS ONE 19(8): e0308826. https://doi.org/10.1371/journal.pone.0308826
Editor: Sajid Ali, Bahauddin Zakariya University, PAKISTAN
Received: November 11, 2023; Accepted: July 31, 2024; Published: August 26, 2024
Copyright: © 2024 Elmetwalli et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper and its Supporting information files.
Funding: The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Large Groups Project under grant number (R.G.P.2/60/45). As well as, Nourah bint Abdulrahman University Researchers for funding this work through project number (PNURSP2024R365) Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.
Competing interests: The authors have declared that no competing interests exist.
1. Introduction
One of the fastest-growing agribusiness industries in Egypt is the production of fruits and vegetables. Citrus and vegetable crops are well adapted to Egypt’s temperate environment. The main fruit crop grown and produced in Egypt is citrus, which is followed by mangoes, grapes, olives, and bananas. Citrus fruits such as oranges (Citrus Aurantium) and mandarin (Citrus reticulate) are among Egypt top exports as in 2020 it was considered the world’s top exporter of citrus. Additionally, Egypt exported nearly 2 million Mg of citrus [1]. In Egypt, mandarins account for roughly 25% of the total citrus production. The total planted area is about 46036 ha with a total production of 1038753 Mg; an average productivity of 22.6 Mg/ha. Citrus juices extracts contain important antioxidants because they are an important source of phenolic compounds [2]. Ahmed et al. [3] also revealed that citrus juice is a rich source of ascorbic acid, vitamins and antioxidants that are important for our health.
Tomato fruits (Solanum lycopersicum) are also one of the most important vegetable crops in Egypt, with 168000 hectares cultivated area and 8 million Mg production in 2021. According to FAO [1], Egypt is ranked sixth for exporting tomato fruits with an average exports amount and value of 57.9 thousand Mg and 32.2 million dollars, respectively. Tomato fruits are rich in lycopene, which plays an important role in reducing the incidence of various diseases such as cardiovascular disease, osteoporosis, heart disease and cancer. In addition, tomato fruits are rich in vitamins A and K, vitamin C and potassium [4–6].
Various biochemical properties including total chlorophyll, chlorophyll a and b, carotenoids, soluble solid content (SSC) can effectively be used as fruit quality or maturity/ripening parameters. The determination of fruit quality parameters mainly depends on destructive methods which is difficult to be accomplished when huge numbers of observations are needed. These methods are not reliable for fast changes in fruit quality parameters. According to Wanitchang et al. [7] common destructive methods for measuring fruit ripeness, such as pH, total soluble solids, TSS/TA, and chlorophyll content, result in fruit destruction, take much time, expensive and delay export.
Image processing and spectral reflectance measurements can be robust alternative techniques to conventional methods of assessing fruit quality attributes [8]. Image processing technique can obtain fruit images and acquire their spatial data while spectral reflectance offers data about chemical and physical properties of fruits. Image processing and spectral reflectance can provide the collection of fruit images and spectral data simultaneously [9].
Previous studies investigated the feasibility of spectral reflectance measurements to estimate biochemical properties of fruit as diagnostic indicators of fruit quality. Passive remote detection sensors mainly depend on sunlight as illumination source, which enable hyperspectral data to be collected in the visible (VIS) and near infrared (NIR) regions of the electromagnetic spectrum [9–11]. Elsayed et al. [9] revealed that the newly developed (NDVI-VARI)/(NDVI-VARI) index showed a remarkable correlation with chlorophyll t, chlorophyll a and chlorophyll b of mango with R2 values of 0.71, 0.71 and 0.78, respectively. Salah et al. [12] used spectroscopic technique to predict the chemical properties of orange fruits at different growth stages showing that the NDIs and R672/R550 had strong significant correlations with chlorophyll b and chlorophyll a (R2 = 0.84 and 0.92, respectively) while PSI and R672/R550 had the highest correlation with TSS. Borba et al. [13] quantified the quality properties of tomato fruits in a non-destructive manner using spectral reflectance. They concluded that total soluble solid content (TSS), titrated acidity (TA) and citric acid can be rapidly and efficiently estimated using spectroscopy measurements.
The assessment of fruit quality in crops using SRIs often yields inconsistent outcomes in diverse geographical and environmental conditions. Therefore, there is an ongoing need for refining SRIs and RGBI to enhance their effectiveness as a rapid and straightforward approach for accurately estimating fruit quality parameters. It is of utmost importance to ascertain the optimal algorithmic formulations for the computation of diverse fruit quality attributes, thereby enhancing the efficacy of remotely acquired data in the evaluation of fruit quality. Typically, prior studies have primarily concentrated their efforts on utilizing published SRIs for the assessment of various fruit quality attributes. As far as the authors are aware, only a limited number of inquiries have investigated into the simultaneous application of distinct techniques for fruit property assessment. The distinctive advantage of the present investigation lies in the methodology employed for selecting the most suitable SRIs for the evaluation of fruit quality parameters. In this regard, the utilization of correlogram maps stands out as a noteworthy approach, enhancing the study’s capability to identify and employ the most effective SRIs in assessing fruit quality attributes.
Although SRIs offer a straightforward approach for the estimation of biochemical parameters, with the potential to enable the development of a portable and lightweight instrument for the rapid and cost-effective assessment and management of biochemical parameters on a significant scale, it is important to note that each SRI is constrained by a finite set of band combinations. The challenge lies in formulating robust SRIs for the assessment of fruit quality attributes amidst diverse and potentially perplexing conditions. These conditions encompass substantial variations in the dimensions of fruit components and their consequential impact on the saturation level of the quality parameters under scrutiny. As a subset of artificial intelligence (AI), machine learning (ML) has grown quickly in this environment. A huge amount of spectrum data can be used by ML to extract important information for accurate classification and self-prediction [14]. Employing model-based techniques for the purpose of feature selection has the capacity to discern a subset of features characterized by substantial discriminative and predictive potential, as demonstrated by the research conducted by Beltrán et al. [15]. This strategy has the potential to augment model performance by mitigating the issue of overfitting and removing extraneous features. Additionally, retaining the initial feature representation can contribute to enhanced interpretability, as highlighted by Guyon and Elisseeff [16]. The significance of feature selection algorithms in the context of modeling and prediction has been steadily on the rise, as underscored by Schuize et al. [17]. Numerous approaches have been investigated for the purpose of diminishing data dimensionality. These include Decision Trees (DT) and Random Forest (RF). In the RF model, an assessment of variable importance is carried out based on the methodology outlined by Strobl et al. in their seminal work [18]. Glorfeld et al. (2019) introduced a back-propagation neural network metric aimed at discerning the most pivotal variables within a given context [19]. Furthermore, the process of hyper-parameter selection wields substantial influence over the performance of ML models, yielding manifold advantages. For instance, it has the potential to augment the efficacy of ML algorithms [20], foster equity and replicability in the realm of scientific inquiries [21], and exert a direct influence on the training dynamics of algorithms, thereby assuming a pivotal role in the enhancement of predictive models [22].
In the context of this research study, the overarching objective was to assess the effectiveness of both RGBI and SRIs as non-destructive techniques for estimating the characteristics of mandarin and tomato fruits, as well as for detecting the quality parameters of these fruits at various stages of maturity. To achieve this, the study set out to accomplish the following specific goals: (i) Quantify the quality parameters of mandarin and tomato fruits at different stages of ripening; (ii) Evaluate the suitability of both conventional and newly developed SRIs for quantifying the quality parameters of mandarin and tomato fruits; and (iv) Assess the performance of DT and RF models, which are based on RGBI and SRIs, in predicting the quality parameters of mandarin and tomato fruits.
2. Materials and methods
2.1. Plant material
The experiments were conducted on mandarin and tomato fruits in the Laboratory of the Faculty of Agriculture, Tanta University, Gharbia Governorate, Egypt (30° 47’ 18.00"N and 30° 59’ 54.61"E). Samples of fruits were collected from a private farm in Gharbia Governorate, a different stages of ripening. The fruits were randomly selected and harvested manually. The experiments were conducted throughout the year 2021 to predict the quality attributes of mandarin and tomato fruits using RGB indices and SRIs and linking them to the biochemical properties of the fruits. Balady Mandarin (Citrus reticulata, Blanco) fruit specimens of the seven-year-old trees were procured during three distinct phases of ripening: the mature stage, characterized by predominantly green coloration; the semi-ripening stage, exhibiting a combination of green and yellow hues; and the ripening stage, marked by a vivid orange hue, as visually represented in Fig 1. Tomato fruits (Solanum lycopersicum, Alissa F1) were collected at four different ripening such as dark green, yellowish green, light red and dark red as shown in Fig 2 and they were used for laboratory analysis.
2.2. Chemical parameters
2.2.1. Chlorophyll a, Chlorophyll b, carotenoids and lycopene.
A spectrophotometer was used to measure the absorbance at certain wavelengths of 663, 645, 480 and 503 nm to determine the content of Chl a, Chl b, car [23] and lycopene [24] of crude extracts in mandarin and tomato fruits using the following equations: (1) (2) (3) (4) Where A = absorbance at specific wavelengths, W = fresh weigh, and V = final volume of chlorophyll extract.
2.2.2. Total soluble solids (TSS).
A handheld refractometer (Milwaukee, model MA871, Brookfield, WI, USA) was used to measure the TSS in juice extract from tomato and mandarin fruits and the data was expressed as Brix (%) according to Cheour et al. [25].
2.2.3. Titrated acidity (TA).
The titrated acidity of tomato and mandarin juice extracts as a percentage of anhydrous citric acid was measured by titrating a given volume of juice fruits known to 0.1 N NaOH standard using 1% phenolphthalein as an indicator by A.O.A.C. [26].
2.3. Physical parameter
The firmness of each fruit mandarin and tomato was measured using a digital fruit hartester (IC-FR5120, China) with a 6 mm probe.
2.4. Image analysis
Environment for Visualization and Visualization 4.6 (ENVI 4.6) is the ideal software (ITT, Visual Information Solutions, Boulder, CO, USA) for visualizing, viewing and analyzing digital images of all types. In addition, a large number of ENVI wizards are available, covering almost all functions available in the interactive ENVI software. Each processing routine is an IDL operation or function and is used like any other IDL routine. The image analysis results take the average value of three different RGB bands. The fruits of the mandarin and tomato were photographed with a Nikon D5300 camera (Nikon Corporation, Tokyo, Japan), a 24.2 megapixels digital single-lens reflex (DSLR) camera with 18–55 mm lens. The camera was manually held and directed vertically downwards towards the mandarin and tomato fruits at a distance of 30 cm. The measurements were carried out under cloudy conditions to guarantee high image resolution. The flash of the camera was always kept off during measurements using IrfanView 4.37, the digital photographs were converted from JPEG to TIF file format. Statistical analysis was done using the SPSS 22 package to calculate selected RGBI based on the red (R), green (G), and blue (B) pixels (Table 1).
2.5. Spectral reflectance measurements
Following the acquisition of various samples of mandarin and tomato, representing distinct ripening stages, spectral data for each specimen were obtained utilizing a passive reflection sensor manufactured by HandySpec Field® (tec5, Oberursel, Germany). The spectral range encompassed wavelengths spanning from 302 to 1148 nanometers (nm). Notably, the optical bandwidth and perspective angle employed in this spectral data acquisition process were set at 2 nm and 12 degrees, respectively. Each sample was scanned three times. To avoid exposure differences, the spectroradiometric measurements of the different samples were performed in full sunlight for short periods of time. Spectral reflectance matching of different samples was done using calibration factors derived from a white reference standard. In the course of spectral measurements, it was imperative to employ a black sheet positioned beneath the fruit specimen. This strategic placement served the crucial purpose of mitigating spectral reflections emanating from the surrounding background, thereby ensuring that the recorded spectral data primarily represented the reflective characteristics of the fruit itself. From the readings of the spectrometer unit, the reflectance of the fruit is calculated and corrected using calibration elements taken from the reference gray standard. Spectroradiometric measurements were taken from the vertical position approximately 30 cm above the fruit on clear days.
2.5.1. Selection of SRIs of tomato and mandarin fruits.
Table 2 contains a list of selected SRIs, including both published and newly generated indices. Contour maps of correlation matrices exhibited statistical metrics in the form of determination coefficients (R2) among the measured values of mandarin and tomato fruits with ratio spectral indices (RSI) as seen in Figs 3 & 4. The RSI was calculated by merging a pair of wavelengths in the 302–1148 nm spectrum range (Figs 3 and 4). Elsayed et al. [35], developed spectral contour maps to determine the most efficient spectrum region with productive wavelengths and to identify the significance of SRIs. SRIs were computed using several wavelengths (522, 534, 546, 550, 566, 584, 600, 608, 610, 616, 618, 620, 622, 632, 640, 646, 648, 650, 654, 660, 664, 666, 670, 672, 674, 676, 678, 710, 720, 730, 750, 760, 780, 810, 822, 824, 878, 894, 1120, 1132 and 1140).
Contour maps (Correlation matrices) illustrating the R2 for two sets wavelength pairings in the 302–1148 nm range (as a ratio index) with (a) chlorophyll a (Chl a), (b) chlorophyll b (Chl b), (c) total soluble solids (TSS), (d) titratable acidity (TA), (e) TSS/TA, and (f) carotenoids of mandarin fruits at different ripening degrees.
Contour maps (Correlation matrices) illustrating the R2 for two sets wavelength pairings in the 302–1148 nm range (as a ratio index) with (a) chlorophyll a (Chl a), (b) chlorophyll b (Chl b), (c) total soluble solids (TSS), (d) titratable acidity (TA), (e)TSS/TA, (f) carotenoids (car), (g) lycopene and (h) firmness of tomato fruits at different ripening degrees.
2.6. Machine learning models
2.6.1. Random Forest (RF).
Random forest (RF) is a versatile technique grounded in regression trees or multiple classifications, adept at assessing the interplay among a number of variables that are independent and dependent variables. It achieves this by partitioning the dataset into various nodes, forming homogeneous subsets known as regression trees (ntree) through recursive partitioning, and subsequently aggregating the outcomes from all these trees. In its growth phase, each tree is expanded to its greatest extent, drawing upon a bootstrap sample from the dataset used for training, and notably, it introduces an element of randomness during the regression step within each tree. This randomness is introduced by selecting a random subset of variables (mtry) to estimate the node split at each juncture [18]. The training process of RF takes into consideration three critical factors: ntree and mtry. Specifically, ntree is the number of trained features, ranging from 1 to 20, while mtry corresponds to the random subset of features selected for node splitting. To optimize the model and minimize the root mean squared error (RMSE) of validation (RMSEV), the leave-one-out validation method (LOOV) is employed for fine-tuning the two parameters, mtry and ntree. The parameter ntree undergoes scrutiny in the range of 1 to 25, and the optimal value for mtry is assessed by varying the number of features used. Once the model is trained with the optimal parameters (Fig 5), all the features are organized, and a selection of the most valuable features is made based on variable importance statistics [39]. Throughout this iterative process, the outputs are diligently collected, and multiple combinations of features are assessed to identify the optimal feature set that yields the lowest RMSEV.
2.6.2. Decision Tree (DT).
The process known as decision tree induction is the method by which decision trees are generated from sets of training data that have been annotated with class labels. These decision trees take on the form of graphical structures resembling flowcharts, consisting of distinct types of nodes, including a root node, decision nodes, and leaf nodes. The root node marks the starting point of the tree, while the decision nodes play a pivotal role in making choices and guiding the progression from one node to another. Ultimately, the leaf nodes represent the ultimate outcomes or classifications determined by the decision tree. It’s important to note that not all decision tree algorithms produce the same types of trees. Some, such as the CART (Classification and Regression Trees) algorithm, are constrained to produce binary trees, which are characterized by having precisely two internal nodes, while others possess the capacity to generate non-binary trees [40]. During the training phase of decision tree induction, three critical factors are carefully considered: the maximum depth of the tree (Md), the minimum number of samples allowed per leaf (Ms), and the maximum number of leaf nodes (Mln). Specific parameter values have been selected for each of these factors, including Md values of 1, 3, 5, 7, and 9, Ms values of 2, 4, 6, 8, and 10, and Mln values of None, 10, 20, 30, 40, and 50. These parameter choices have a profound impact on the resulting decision tree’s structure, complexity, and its ability to effectively make predictions and generalize in various practical applications. Hyper-parameter optimization (Fig 5) was done during the training phase, resulting in the creation of the top-level model using the most effective parameter settings. Decision tree regressors can be effortlessly derived from decision trees. Due to their minimal reliance on domain expertise or parameter configuration, decision tree regressors are well-suited for exploratory knowledge discovery tasks. Importantly, decision trees exhibit a remarkable ability to accurately handle datasets with high dimensions.
2.6.3. Data analysis tools and datasets.
A total of 68 samples each of tomato and mandarin were used for both training and validation purposes. The methodology employed for this task was the leave-one-out cross-validation (LOOCV) technique, which systematically excluded one sample during each trial from the training dataset to facilitate validation. This strategic approach served the dual purpose of mitigating overfitting tendencies and enhancing the model’s predictive capabilities [41]. The entire process, spanning from data analysis to model creation and data preparation, was executed using Python 3.7.3. For the regression tasks, the RF and DT modules from the Scikit-learn package version 0.20.2 were employed. Notably, data examination was performed on a machine equipped with an Intel Core i7-3630QM processor clocked at 2.4 GHz, complemented by 8 GB of RAM.
2.6.4. Model assessment.
To evaluate the efficacy of a regression model, two distinct statistical metrics were employed including the coefficient of determination (R2) and the root mean square error (RMSE) according to Eqs 5 & 6 [42, 43]. The parameters applied in this assessment are explicitly defined as follows: "Yact" signifies the genuine laboratory-derived value, "Yp" represents the anticipated or simulated value, "Yave" denotes the mean value, and "T" encompasses the entirety of data points.
3. Results and discussion
3.1. Variation of different biochemical parameters, RGB and spectral reflectance indices of mandarin and tomato fruits
Ripening degrees have a significant impact on the biochemical characteristics of mandarin and tomato. The Chl a, Chl b, TA, and firmness values of fruits decreased during the ripening stages. While TSS, TSS/TA, Car, and Lycopene increased with increasing the fruit’s ripening. Significant difference between the mean values of each biochemical parameter at different ripening degrees was found. As well as RGB indices and SRIs values were changed according to the ripening stages in Tables 3 & 4. Chl a values of mandarin ranged from 0.03 to 4.24 (mg/g tissue), and Car from 0.07 to 0.23 mg/g FW, while t RGB indices such as VARI ranged from -0.65 to 0.34, and IKAW from 0.00 to 0.97 (Table 3). In addition, the SRI such as R710/R600 values ranged from 1.26 to 3.91, and PRMI ranged from 0.28 to 5.78.
For tomato, TSS values ranged from 4.00 to 6.80 (Brix%), and firmness from 690.00 to 350.00 kgf/cm2 as listed in Table 4. While RGB indices such as INT ranged from 68.66 to 176.31, and ExR from 0.04 to 1.26 in Table 4. In addition, SRI such as, R546/R1132 values rising from 0.75 to 1.02, and NAI ranged from -0.02 to 0.04. Shravan et al. [44] reported that throughout fruit maturation, the concentration of chlorophyll decreased significantly, while the concentration of carotenoids and TSS increased. Thus the color shift that occurs during the ripening phase of fruit from green to red is due to the discovery of pre-existing pigments due to the decomposition of chlorophyll [45]. Brandt et al. [46] explained that at the beginning of the growth stage, the amount of lycopene decreases and then increases during the fruit ripening stage. Thimann [47] discovered that the chromoplasts also contain yellow or red carotenoids, such as lycopene, so we observe the breakdown or disappearance of chlorophyll with the accumulation of lycopene (a red pigment), altering the color of tomato fruit into the red.
3.2 Correlation analysis between all biochemical parameters of mandarin and tomato fruits
Correlation analysis has elucidated a significant association among the following variables: Chl a, Chl b, TSS, TA, TSS/TA, carotenoids (car), lycopene, and fruit firmness, as meticulously outlined in Tables 5 and 6. For mandarin fruits the greatest correlation coefficient (r) was found between Chl a, and Chl b (r = 1.00). The lowest r value was found between the TSS and TA (r = -0.75). All biochemical parameters have statistically substantial associations ranged from -0.75 to 1.00 in Table 5.
For tomato fruits, the highest r value for all biochemical parameters was between firmness and TA (r = 0.94). While the lowest correlation coefficient of—0.45 was found between Chl a and TSS/TA. All biochemical parameters of tomato have substantial significant associations ranged from -0.45 to -0.94. These results are in agreement with Elsayed et al. [9], who found that the TSS, car, Chl a, Chl b, total Chl and TA of mango were shown to be substantially linked. Galal et al. [14] discovered considerable negative relationships between TSS and Chl content of banana and orange fruit at various maturation stages.
3.3. Relationships between RGB indices and biochemical parameters of mandarin and tomato fruits
Twelve RGB indices acquired through digital images processing were linked to different biochemical parameters of mandarin and tomato fruits as seen in Tables 7 & 8. The findings demonstrated moderate to significant correlations between all biochemical parameters and the tested RGB indices. For mandarin fruits, IKAW and NDVI1 presented the greatest R2 of 0.91 with Chl a. VARI and Rn presented the highest R2 of 0.96 with car. The greatest R2 of Rn with TSS was (R2 = 0.81).
For tomato fruits, the highest R2 for the relationship between Chl a and GRRI, IKAW, and NDVI1 had the greatest R2 of 0.66. The highest R2 let were Chl b, with GRRI was (R2 = 0.59). Also, the NDVI1 produced the highest coefficient of determination (0.90) with T. acidity. The results further demonstrated that the Rn had the greatest R2 (0.75) with TSS, whereas ExG and Gn had the lowest R2 (R2 = 0.55). The Rn and GRRI led to the greatest R2 for estimating firmness (R2 = 0.85; for both).
In this research study, RGB indices have proven to be useful and can be used at different ripening stages of mandarin fruits. Other studies used the surface color of the fruit as one of the main factors that have been used to determine the ripeness of mandarin fruits. For instance, Fouda et al. [31] showed that the VARI index gave the highest value of R2 with car, Chl a and Chl b (0.85, 0.66 and 0.75, respectively. Kaur et al. [48] showed that the RGBI of the fruit images were related to the biochemical properties and a strong correlation between the average intensity of green color and fruit acidity (R2 = 0.99). Psiroukis et al. [49] referred to the development of a multi-scale prediction model for tomato fruit analysis. The findings of the study exhibited a substantial level of association among various VIs and the attributes of tomato plants. Notably, the (NDVI displayed a particularly strong correlation, as evidenced by a determination coefficient of 0.89 (R2 = 0.89). In addition, Rasool et al. [50] found that the obtained VARI and GRVI were positively associated with the Chl a content of tomato fruits (R2 of 0.85 and 0.86, respectively).
3.4. Relationships between spectral indices and biochemical parameters of mandarin and tomato fruits
All tested SRIs had significant association with the biochemical parameters with R2 differing from 0.16 to 0.87 for all investigated biochemical parameters of mandarin (Table 9) and tomato (Table 10). There were statistically significant associations between all assessed spectral reflectance indices derived from the visible (VIS) and near-infrared (NIR) and different biochemical properties of both fruit types. The SRIs were significantly related to all biochemical parameters of mandarin. The indices, RSI710,600, and R730,650 showed the greatest R2 values with respect to Chl a, Chl b with R2 = 0.80 for both and with T. acidity (R2 = 0.75), while the RSI 810,822 had the greatest R2 with TSS (R2 = 0.61).
SRIs were found to be substantially related to tested biochemical characteristics of tomato fruits. Chl a with R2 values fluctuated from 0.50 to 0.68, range from 0.44 to 0.63 for Chl b, ranged from 0.46 to 0.70 for TSS, and ranged from 0.58 to 0.83 for firmness. RSI 1140/R674 found to be good indicator to estimate the Chl a and Chl b, with R2 of 0.67 and 0.61, respectively. RSI 584/R65 had well relationships with TSS and TA and the highest R2 values were 0.70 and 0.82, respectively. In agreement of these results, Pires et al. [51] used non-destructive assessment of the Citrus maturity based on predicting internal quality properties. This gave a good predictive performance for TSS (R2 = 0.79) and pH (R2 = 0.80) in addition to the titration acidity (R2 = 0.73) and maturity index (R2 = 0.80). Galal et al. [14] demonstrated that the NAI and R800/R640 showed the highest determination coefficient of 0.89 with chlorophyll content, and the R570/R540 index had the greatest R2 value (0.79) with TSS. Huang et al. [52] used spectroscopy to assess the quality of tomato and showed that the predictions of pH and TSS made by spectroscopy were high. The results showed the regression coefficient for predicting pH (R2 = 0.81) and the regression coefficient for TSS (R2 = 0.80). In addition, Wati et al. [53] found that the best-calibrated model used the wavelength range of 527–799 nm to measure the pH of intact tomato with R2 value of 0.90.
3.5. Assessing the efficacy of decision tree and random forest models in predicting quantitative attributes of mandarin and tomato produce
Table 11 delineates the implementation of Decision Tree (DT) and Random Forest (RF) models which incorporate the analysis of high-level variables using a combination of 2D-spectral vegetation indicators (VIs) and color RGB-focused indicators (RGBI). This process aids in the detection of the characteristics of mandarin maturation such as Chla, Chlb, TSS, TA, TSS/TA, and car. The DT model was trained using the 2D-VIs and RGBI to predict the examined parameters, with the reserved values of the DT model being compared against the projected values. The study’s multivariate analysis and comparison techniques indicate a significant increase in predictability when applying this approach. External verification emerged as the most dependable method for evaluating the accuracy of the regression model, as it was not employed during the model’s formulation. The results of this assessment revealed that the DT-9 SRIs model exhibited exceptional precision in the detection of Chl a, with approximately nine parameters derived from RGB data playing pivotal roles in this prediction. The corresponding R2 values for the training and validation datasets were 1.00, with a RMSE of 0.119 for the former and 0.99 with an RMSE of 0.080 for the latter. In the context of Chl b prediction, the DT-7RGBI model outperformed other models as the optimal predictor, demonstrating impressive R2 scores of 0.99 for both the training and validation datasets. The RMSE values for this model were 0.193 and 0.134 for the training and validation datasets, respectively. Turning our attention to TSS, the RF-7RGBI model emerged as the most precise model for TSS prediction, exhibiting R2 values of 0.96 and 0.83 for the training and validation datasets, respectively. The DT-12RGBI model was identified as the most reliable predictor for the ratio of TSS to TA, demonstrating robust R2 values of 0.99 for the training dataset and 0.98 for the validation dataset. The RMSE values were calculated as 0.408 for the training dataset and 0.508 for the validation dataset. These findings underscore the effectiveness of these models in predicting various biochemical parameters, thereby highlighting their potential utility in plant and soil ecology research.
The major variables of tomato fruits were isolated through the parameters researched, as depicted in Table 12, leveraging both DT and RF models. These particular features were instrumental in pinpointing Chl a, Chl b, TSS, TA, TSS/TA, Hardness, Carotene, and Lycopene. The efficacy and precision (accuracy and RMSE) of DT and RF models in predicting the considered parameters are represented in Table 12. The study reveals that the model DT-3 SRIs maintained the supreme correlation with Chl a and distinctive features, making it the optimum model for predictions. It encompasses approximately three critical attributes significant for predicting Chl a. The resultant R2 values were 0.87 (RMSE = 0.091) and 0.740 (RMSE = 0.087) for training and verification sets, respectively. For assessing Chl b, the RF-22 SRIs demonstrated superior results, achieving an R2 of 0.84 (RMSE = 0.099) and 0.58 (RMSE = 0.125) in the training and validation series, correspondingly. The DT-2RGBI, which was the most precise for TSS determination, had RMSE records of 0.248 and 0.224 and R2 statistics of 0.83 and 0.76 for the training and validation series, respectively. The DT-7VIs model predicted the TSS/TA ratio accurately, with R2 values of 0.62 and 0.39 (RMSE = 0.231 and 0.222) for training and validation sets, respectively. The RF-9RGBI model surpassed in predicting hardness, concluding with R2 values of 0.96 and 0.87 (RMSE = 21.039 and 25.194) for the training and validation, respectively. The DT-2RGBI model precisely predicted carotene, with R2 values of 0.89 and 0.86 (RMSE = 0.014 and 0.011) for both series. Based on the outcomes from [54, 55], enhancing the regression methods for accurate forecasting required strategies such as adjusting high-level features and modifying model settings.
The SRIs and RGBIs were processed using two ML models, the DT and RF, with the goal of selecting optimal features in each iteration. Subsequently, the most effective hybrid variables (HV) were identified to estimate the quality of different fruits, including mandarins (Table 13) and tomatoes (Table 14). First, regarding mandarin fruits: In the task of predicting Chl a, the DT-2HV model delivered exceptional results, registering an R2 of 0.993 with an RMSE of 0.149 for the training set, and an R2 of 0.991 with an RMSE of 0.114 for the validation set. For the prediction of Chl b, the RF-23HV model demonstrated excellent accuracy, with R2 values of 0.996 and 0.989 and RMSE values of 0.146 and 0.166 for the training and verification sets, respectively. Exceptional accuracy was shown by the RF-14HV model in predicting TSS, as evidenced by R2 values of 0.819 for the training set and 0.688 for the verification set, and corresponding RMSEs of 0.348 and 0.337. In predicting TA, the RF-16HV model stood out, securing an R2 of 0.994 and RMSE of 0.023 for the training set, alongside an R2 of 0.984 and RMSE of 0.026 for the verification set. With outstanding precision, the RF-12HV model predicted TSS/TA, recording R2 scores of 0.993 (RMSE = 0.325 and 0.979 (RMSE = 0.448) for the training and verification sets, respectively. In the evaluation of car prediction, the DT-2HV model showcased remarkable results, with an R2 of 0.988 and an RMSE of 0.005 for training, and an R2 of 0.977 with an RMSE of 0.006 for verification.
These characteristics include chlorophyll b (Chl b), chlorophyll a (Chl a), total soluble solids (TSS), titratable acidity (TA), the TSS/TA ratio, and carotenoids (car).
These properties include chlorophyll b (Chl b), chlorophyll a (Chl a), total soluble solids (TSS), titratable acidity (TA), the TSS/TA ratio, carotenoids (Car), lycopene, and firmness.
Second, for tomato fruits: the DT-5HV model demonstrated exemplary performance in the Chl a prediction, achieving an R2 of 0.905 and an RMSE of 0.077 for the training dataset, and an R2 of 0.785 with an RMSE of 0.077 for the validation dataset. For Chl b assessments, the DT-4HV model proved exceptionally precise, attaining R2 records of 0.779 and 0.574 and RMSE measurements of 0.119 and 0.116 for the training and validation datasets, respectively. Precision was evident in the DT-27HV model’s TSS predictions, which showed an R2 of 0.960 (RMSE = 0.118) for training and 0.793 (RMSE = 0.207) for validation, respectively. The DT-20HV model’s performance in TA prediction was notably strong, with an R2 of 0.982 and RMSE of 0.007 during training, and an R2 of 0.896 and RMSE of 0.010 during validation. The RF-7HV model attained moderate accuracy in estimating TSS/TA, with an R2 of 0.5515 and RMSE = 0.176 during training, and an R2 of 0.364 and RMSE = 0.218 in evaluation. The DT-2HV model’s forecasting of the Car was robust, recording R2 scores of 0.891 and 0.873 and RMSEs of 0.013 and 0.011 for the training and validation phases, respectively. The DT-4HV model outperformed in lycopene prediction, registering an R2 of 0.978 and an RMSE of 2.969 in training, followed by an R2 of 0.919 and an RMSE of 3.618 in validation. In firmness calculation, the DT-12HV model achieved exceptionally, securing an R2 of 0.973 with an RMSE of 16.251 during the training phase, and capturing an R2 of 0.898 with an RMSE of 23.199 in the validation phase. These findings are consistent with previous studies [8, 53], which indicate that selecting high-level variables plays a crucial role in enhancing model optimization and improving the accuracy of the predicted outcomes.
In the future, leveraging advanced ML techniques with optimally combined features holds significant promise for further enhancing the accuracy and efficiency of fruit quality assessment. Continued refinement of these combined features and the exploration of novel algorithms can lead to more robust predictive models, applicable to a broader range of fruits and agricultural products. Additionally, integrating larger datasets and incorporating real-time data processing capabilities will facilitate more dynamic and precise quality monitoring systems, ultimately benefiting producers and consumers alike through improved product consistency and reduced waste.
4. Conclusions
A cost-effective approach to appraise the fruit quality parameters of mandarin and tomato at different levels of ripeness was developed using RGB and SRI indices in combination with the DT and RF models. In the context of R2 values, the RGBI and newly created SRIs outperformed to assess the fruit quality parameters of mandarin and tomato. All tested RGB and SRIs examined had significant association with the biochemical parameters of mandarin and tomato. There are statistically significant associations between all assessed SRIs derived from the VIS and NIR regions and fruit attributes. Combining RGB and SRIs indices with DT and RF models would be a robust strategy for estimating eight observed variables associated with reasonable accuracy. The findings of this research study would be adequate to provide a potential reference for estimating several parameters. This research also offers technological assistance for monitoring and assessing the fruit quality of mandarin and tomato during ripening and storage. In conclusion, this information, which is based upon the most reliable RGB, SRI, and DT and RF calibration models developed in the current study, has the potential to be used to build active measurement systems in order to track fruit quality and ripeness in the field or factory.
Acknowledgments
The authors express their gratitude to King Khalid University and Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia, for their support.
References
- 1.
FAO. FAOSTAT [Internet]. 2021. http://www.fao.org/faostat/en/#data/QC.
- 2. Ramful D, Bahorun T, Bourdon E, Tarnus E, Aruoma OI. Bioactive phenolics and antioxidant propensity of flavedo extracts of Mauritian citrus fruits: Potential prophylactic ingredients for functional foods application. Toxicology. 2010;278:75–87. pmid:20100535
- 3. Ahmed EA, Omar HM, Ragb SM, Nasser AY. The antioxidant activity of vitamin C, DPPD and L-cysteine against cisplatin-induced testicular oxidative damage in rats. Food Chem Toxicol. 2011;49:1115–21. pmid:21310208
- 4. Saad AM, Ibrahim A, El-Bialee N. Internal quality assessment of tomato fruits using image color analysis. Agric Eng Int CIGR J. 2016;18:339–52.
- 5. Viuda-Martos M, Sanchez-Zapata E, Sayas-Barberá E, Sendra E, Pérez-Álvarez JA, Fernández-López J. Tomato and tomato byproducts. Human health benefits of lycopene and its application to meat products: a review. Crit Rev Food Sci Nutr. 2014;54:1032–49. pmid:24499120
- 6. Wang C, Li M, Duan X, Abu-Izneid T, Rauf A, Khan Z, et al. Phytochemical and nutritional profiling of tomatoes; impact of processing on bioavailability-a comprehensive review. Food Rev Int. 2022;39:5986–6010.
- 7. Wanitchang J, Terdwongworakul A, Wanitchang P, Noypitak S. Maturity sorting index of dragon fruit: Hylocereus polyrhizus. J Food Eng. 2010;100:409–16.
- 8. Galal H, Elsayed S, Elsherbiny O, Allam A, Farouk M. Using RGB imaging, optimized three-band spectral indices, and a decision tree model to assess Orange fruit quality. Agriculture. 2022;12:1558.
- 9. Elsayed S, Galal H, Allam A, Schmidhalter U. Passive reflectance sensing and digital image analysis for assessing quality parameters of mango fruits. Sci Hortic. 2016;212:136–47.
- 10. Elsayed S, El-Hendawy S, Dewir YH, Schmidhalter U, Ibrahim HH, Ibrahim MM, et al. Estimating the leaf water status and grain yield of wheat under different irrigation regimes using optimized two-and three-band hyperspectral indices and multivariate regression models. Water. 2021;13:2666.
- 11. Nagy A, Riczu P, Tamás J. Spectral evaluation of apple fruit ripening and pigment content alteration. Sci Hortic. 2016;201:256–64.
- 12. Salah S, Elmetwalli AH, Ghoname MS. Hyperspectral Reflectance as a Tool to Measure Ripeness of Orange Fruits. J Soil Sci Agri Eng. 2022;13:241–51.
- 13. Borba KR, Aykas DP, Milani MI, Colnago LA, Ferreira MD, Rodriguez-Saona LE. Portable near infrared spectroscopy as a tool for fresh tomato quality control analysis in the field. Appl Sci. 2021;11:3209.
- 14. Galal H, Elsayed S, Allam A, Farouk M. Indirect Quantitative Analysis of Biochemical Parameters in Banana Using Spectral Reflectance Indices Combined with Machine Learning Modeling. Horticulturae 2022;8:438.
- 15. Beltrán NH, Duarte-Mermoud MA, Salah SA, Bustos MA, Peña-Neira AI, Loyola EA, et al. Feature selection algorithms using Chilean wine chromatograms as examples. J Food Eng. 2005;67:483–90.
- 16. Guyon I, Elisseeff A. An Introduction to Variable and Feature Selection. J Mach Learn Res. 2003;3:1157–82.
- 17. Schuize FH, Wolf H, Jansen H, Vander VP. Applications of artificial neural networks in integrated water management: Fiction or future? Water Sci Technol. 2005;52:21–31.
- 18. Strobl C, Boulesteix AL, Kneib T, Augustin T, Zeileis A. Conditional variable importance for random forests. BMC Bioinform. 2008;9:307. pmid:18620558
- 19. Glorfeld LW. A methodology for simplification and interpretation of backpropagation-based neural network models. Expert Syst Appl. 1996;10:37–54.
- 20.
Melis G, Dyer C, Blunsom P. On the state of the art of evaluation in neural language models. arXiv. 2017, arXiv:1707.05589.
- 21. Bergstra J, Yamins D, Cox D. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In: 30th Int. Conf. Mach. Learn. ICML. 2013;2013:115–23.
- 22. Wu J, Chen XY, Zhang H, Xiong LD, Lei H, Deng SH. Hyperparameter optimization for machine learning models based on Bayesian optimization. J Electron Sci Technol. 2019;17:26–40.
- 23. Arnon DI. Copper enzymes in isolated chloroplasts. Polyphenoloxidase in Beta vulgaris. Plant Physiol. 1949;24:1. pmid:16654194
- 24. Ranganna S. Manufacture and quality control of tomato products. Indian food packer, 1976; 30: 44–82.
- 25. Cheour F, Willemot C, Arul J, Makhlouf J, Desjardins Y. Postharvest response of two strawberry cultivars to foliar application of CaCl2. HortScience. 1991;26:1186–8.
- 26.
AOAC. Official Methods of Analysis. 13th ed. Association of Official Analytical Chemists, Washington, D.C, USA; 1990.
- 27. Kumaseh MR, Latumakulita L, Nainggolan N. Segmentasi citra digital ikan menggunakan metode thresholding. Jurnal Ilmiah Sains. 2013;13:74–9.
- 28.
Mao W, Wang Y, Wang Y. Real-time detection of between-row weeds using machine vision. In: 2003 ASAE Annual Meeting; 2003. p. 1.
- 29. Hague T, Tillett ND, Wheeler H. Automated crop and weed monitoring in widely spaced cereals. Precis Agric. 2006;7:21–32.
- 30. Aynalem HM, Righetti TL, Reed BM. Non-destructive evaluation of in vitro-stored plants: a comparison of visual and image analysis. In Vitro Cell Dev Biol Plant. 2006;42:562–7.
- 31. Fouda T, Derbala A, Elmetwalli A, SaLah S. Detection of orange color using imaging analysis. AgroLife Sci J. 2013;2:181–4.
- 32. Kawashima S, Nakatani M. An algorithm for estimating chlorophyll content in leaves using a video camera. Ann Bot. 1998;81:49–54.
- 33. Verrelst J, Schaepman ME, Koetz B, Kneubühler M. Angular sensitivity analysis of vegetation indices derived from CHRIS/PROBA data. Remote Sens Environ. 2008;112:2341–53.
- 34. Elmetwalli A, Salah S. Imaging analysis technique for assessing orange maturity. Misr J Agric Eng. 2015;32:243–56.
- 35. Elsayed S, El-Gozayer K, Allam A, Schmidhalter U. Passive reflectance sensing using regression and multivariate analysis to estimate biochemical parameters of different fruits kinds. Sci Hortic. 2019;243:21–33.
- 36. Sims DA, Gamon JA. Relationships between leaf pigment content and spectral reflectance across a wide range of species, leaf structures and developmental stages. Remote Sens Environ. 2002;81:337–54.
- 37.
Rouse JW Jr, Haas RH, Deering DW, Schell JA, Harlan JC. Monitoring the vernal advancement and retrogradation (green wave effect) of natural vegetation. 1974; (No. E75-10354).
- 38. Acharya UK, Subedi PP, Walsh KB, McGlasson WB. Estimation of fruit maturation and ripening using spectral indices. Acta Hortic. 2016;1119:265–72.
- 39. Breiman L. Random forests. Mach Learn. 2001;45:5–32.
- 40.
Han J, Kamber M. Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco; 2001.
- 41. Zhu J, Huang ZH, Sun H, Wang GX. Mapping forest ecosystem biomass density for Xiangjiang river basin by combining plot and remote sensing data and comparing spatial extrapolation methods. Remote Sens. 2017;9:241.
- 42. Malone BP, Styc Q, Minasny B, McBratney AB. Digital soil mapping of soil carbon at the farm scale: A spatial downscaling approach in consideration of measured and uncertain data. Geoderma. 2017;290:91–9.
- 43. Saggi MK, Jain S. Reference evapotranspiration estimation and modeling of the Punjab Northern India using deep learning. Comput Electron Agric. 2019;156:387–98.
- 44. Shravan R, Shere DM. Study of physico-chemical characteristics of sweet orange (Citrus sinensis) fruit. J Pharmacogn Phytochem. 2018;7:1687–9.
- 45.
Mango Lizada C. In: Seymour GB, Taylor JE, Tucker GA, editors. Biochemistry of fruit ripening. Dordrecht: Springer Netherlands; 1993. p. 255–71.
- 46. Brandt S, Pék Z, Barna É, Lugasi A, Helyes L. Lycopene content and colour of ripening tomatoes as affected by environmental conditions. J Sci Food Agric. 2006;86:568–72.
- 47.
Thimann KV. The senescence of leaves. In: Thimann KV, editor. Senescence in plants. Boca Raton, Fla: CRC Press; 1980. p. 85–115.
- 48. Kaur H, Sawhney BK, Jawandha SK. Evaluation of plum fruit maturity by image processing techniques. J Food Sci Technol. 2018;55:3008–15. pmid:30065410
- 49. Psiroukis V, Darra N, Kasimati A, Trojacek P, Hasanli G, Fountas S. Development of a multi-scale tomato yield prediction model in Azerbaijan using spectral indices from sentinel-2 imagery. Remote Sens. 2022;14:4202.
- 50. Rasool S, Amin K, Sadiq M. Development of colour sensor based low-cost hand-held device for crop nitrogen management. Skuast J Res. 2022;24:215–20.
- 51. Pires RR. Guerra SP, Cruz MD, Antunes A, Brázio A.M.A, fonso A.M, Daniel M, et al. Ripening assessment ‘Ortanique’(Citrus reticulata Blanco x Citrus sinensis (L) Osbeck) on tree by SW-NIR reflectance spectroscopy-based calibration models. Postharvest Biol Tec. 2022; 183, 111750.
- 52. Huang Y, Lu R, Chen K. Assessment of tomato soluble solids content and pH by spatially-resolved and conventional Vis/NIR spectroscopy. J Food Eng. 2018;236:19–28.
- 53.
Wati RK, Pahlawan MFR, Masithoh RE. Development of calibration model for pH content of intact tomatoes using a low-cost Vis/NIR spectroscopy. In: IOP Conference Series: Earth and Environmental Science. 2021;686(1):012049. IOP Publishing.
- 54. Elsherbiny O, Fan Y, Zhou L, Qiu Z. Fusion of feature selection methods and regression algorithms for predicting the canopy water content of rice based on hyperspectral data. Agriculture. 2021;11:51.
- 55. Elsherbiny O, Zhou L, Feng L, Qiu Z. Integration of visible and thermal imagery with an artificial neural network approach for robust forecasting of canopy water content in rice. Remote Sens. 2021;13:1785.