Color Measurement of Tea Leaves at Different Drying Periods Using Hyperspectral Imaging Technique

This study investigated the feasibility of using hyperspectral imaging technique for nondestructive measurement of color components (ΔL*, Δa* and Δb*) and classify tea leaves during different drying periods. Hyperspectral images of tea leaves at five drying periods were acquired in the spectral region of 380–1030 nm. The three color features were measured by the colorimeter. Different preprocessing algorithms were applied to select the best one in accordance with the prediction results of partial least squares regression (PLSR) models. Competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA) were used to identify the effective wavelengths, respectively. Different models (least squares-support vector machine [LS-SVM], PLSR, principal components regression [PCR] and multiple linear regression [MLR]) were established to predict the three color components, respectively. SPA-LS-SVM model performed excellently with the correlation coefficient (rp) of 0.929 for ΔL*, 0.849 for Δa*and 0.917 for Δb*, respectively. LS-SVM model was built for the classification of different tea leaves. The correct classification rates (CCRs) ranged from 89.29% to 100% in the calibration set and from 71.43% to 100% in the prediction set, respectively. The total classification results were 96.43% in the calibration set and 85.71% in the prediction set. The result showed that hyperspectral imaging technique could be used as an objective and nondestructive method to determine color features and classify tea leaves at different drying periods.


Introduction
Tea is welcome by many people because of its healthy function. For example, it can prevent cancer and cardiovascular disease and cure chronic gastritis [1], [2]. Tea processing procedure, which is composed of a series of physical and chemical reactions, can affect tea's quality directly [3]. However, the change of color values (DL*, Da* and Db*) of tea leaves play significant roles in tea processing procedure. Therefore, studying color parameters of tea leaves during drying periods can finally improve tea's quality.
Hyperspectral imaging technique, which integrates both spectral and imaging techniques, has been widely applied in many fields [4], [5], [6]. A spatial picture can be generated when the sample is scanned by the hyperspectral imaging system. The spatial picture (hyperspectral cube) consists of a series of images at different wavelength, and each pixel has both spectroscopic and spatial information. The schematic hyperspectral imaging system can be seen in Fig. 1. In accordance with the previous studies, hyperspectral imaging technique is very efficient for knowing the process when the samples changes with time [7], [8].
Hyperspectral technique has many advantages, such as nondestructive, rapid and simple operation, accurate, low cost, and can be applied in on-line detection. At present, hyperspectral technique has already been used to detect color parameters in many studies [9], [10], [11], [12], [13]. The change of color parameters of tea leaves is very important in the tea processing procedure. However, to measure the color parameters of tea leaves at different drying periods using hyperspectral imaging technique has not been found.
The goals of this work were: (1) to find the quantitative relationships between the spectral reflectance information and color parameters of tea leaves at different drying periods; (2) to obtain effective wavelengths which are useful for the determination of color values; (3) to compare the predictive ability of different calibration models; (4) to develop an algorithm for the determination of color values of tea leaves.

Hyperspectral imaging system
A visible and near infrared (VIS-NIR) hyperspectral imaging system covering the spectral wavelengths of 380-1030 nm was used in this study (as shown in Fig. 2). The system includes a lens (OLE-23), an imaging spectrograph (V10E-QE, Specim, Finland), a CCD camera (C8484-05, Hamamatsu City, Japan), two light sources (Oriel Instruments, Irvine, USA) provided by two 150W quartz tungsten halogen lamps, a conveyer belt operated by a stepper motor (IRCP0076, Isuzu Optics Corp., Taiwan, China), and a computer operating the spectral image system V10E software (Isuzu Optics Corp., Taiwan, China). The area CCD array detector of the camera has 6726512 (spatial 6 spectral) pixels, and the spectral resolution is 2.8 nm. The system scans the samples line by line, and the reflected light was dispersed by the spectrograph and captured by the area CCD array

Samples preparation and flow chart
Three different cultivars of tea leaves including Biyun, Longjing-43 and Zhongcha-302 were used in this study. All of them were very famous tea in China. The number of each cultivar was fourteen. All leaves were picked from the green house, which is located at Zhejiang University, Hangzhou (120.2E, 30.3N), China. First of all, hyperspectral images of forty-two fresh tea leaves were acquired, and then three color parameters (DL*, Da* and Db*) of these fresh tea leaves were measured using the colorimeter (Konica Minolta, SC-80C, Japan). These leaves were then dried in a drying oven (GHD-9070A, JingHong, Shanghai, China) at 80˚C for four minutes. After being cooled in a glass desiccator, they were imaged and color measured again. The same operation was run three more times with the same drying temperature (the third operation for six minutes, the fourth operation for eight minutes and the fifth operation for ten minutes).
The main steps of this study are illustrated in Fig. 3. All raw hyperspectral images were acquired by the hyperspectral imaging system in the wavelengths of 380 to 1030 nm. Simultaneously, color values were determined by the colorimeter. The raw hyperspectral images were then corrected by the dark and white reference images. Spectral reflectance values of all pixels from the ROI (30630 pixels) of each sample were extracted and averaged as one variable. Thus, a total of 210 X variables were obtained and used to represent the spectral data of all samples. Then, these samples were divided into two sets at a ratio of 2:1 (2/3 for calibration and 1/3 for prediction). Nine different pre-processing methods were used to improve the predictive ability. In order to optimize calibration model, two effective wavelengths selection methods including competitive adaptive reweighted sampling (CARS) and successive projections algorithm (SPA) were used to select the key wavelengths. The optimal calibration model was determined according to the values of r c , r p , RMSEC and RMSEP.

Image acquisition and correction
The exposure time was 0.07 s, the moving speed was 2.6 mm/sec, and the vertical distance between the lens and sample was 36.0 cm. Each leaf was then placed on  the conveyer belt to be scanned using the hyperspectral imaging system. Then, one hyperspectral image (hyperspectral cube) for each sample was obtained covering the spectral wavelengths of 380 to 1030 nm. The dimensions of the hyperspectral cube were 512 bands in the l dimension and 672 pixels in the y dimension. When raw hyperspectral images were created, they should be corrected with dark and white reference images based on equation (1). The dark reference image with the reflectance factor of about 0% was obtained by covering the lens with the cap and turning off the light. The white reference image with the reflectance factor of about 99% was obtained from a white Teflon board (CAL-tile200, 200 mm625 mm610 mm).
Where R cal is the corrected hyperspectral image, I raw is the raw hyperspectral image, I dark is the dark reference image, and I white is the white reference image.

Models and evaluation index
Partial least squares (PLS) is an effective method which has been widely used for establishing calibration models in many spectral studies [14], [15], [16], [17], [18]. This algorithm is very efficient when predicting many different measured variables that are collinear. The spectral information is projected onto a small number of latent variables (LVs) for compressing the original spectral data [19]. The predicted result is achieved by extracting a set of orthogonal factors which carry most of the useful information for predicting [20]. Principal components regression (PCR), which can not only compress the high dimension of the raw variables effectively but also speed up the calculation by ignoring the minor components, has also been widely applied in many studies [21], [22]. The PCR algorithm can effectively overcome multi-collinearity problem which may result in instability for the predicted result. Multiple linear regression (MLR) is a common method used to establish models due to its features, such as being simple and easy interpretation. Though it has been used in many studies [23], [24], it fails when the number of the sample is fewer than that of the input variables [25]. In this study, the number of the samples in calibration set was fewer than that of the full spectral variables. Thus, MLR model was only established based on the selected wavelengths suggested by CARS and SPA, respectively. PLS, PCR and MLR models were operated by Unscrambler V9.7 software for the determination of the three color parameters. Least Squares-Support Vector Machine (LS-SVM), which is a simplification of traditional Support Vector Machine (SVM), is capable of handle both linear and nonlinear multivariate problems in a fast way [26]. The advantage of this algorithm is that a linear set of equations instead of a quadratic programming (QP) problem were used to obtain support vectors (SV). This method has also been widely used in many fields [27], [28]. In this study, LS-SVM was used to build model for predicting color parameters and classify tea leaves during different drying periods. The calculation was carried out by the free LS-SVM toolbox (LS-SVM v1.5, Suykens, Leuven, Belgium) in Matlab R2009a.
The performance of the models was evaluated in accordance with the values of the correlation coefficient (r c and r p ), root mean square error of calibration (RMSEC) and root mean square error of prediction (RMSEP) [29]. A good model should be of high values of r c and r p , low values of RMSEC and RMSEP, and small difference between RMSEC and RMSEP. A large difference between the values of RMSEC and RMSEP indicates that the model is over-fitting [30].

Key wavelengths selection
For the purpose of improving the performance of the predictive ability and reducing the influence of redundant information between contiguous wavelengths in the whole spectrum, selection of effective wavelengths is a very significant operation in spectral studies [31]. According to the previous studies, the selected wavelengths can be equally or even more effective than the whole spectral wavelengths [32]. The CARS was firstly used to select effective wavelengths from the full spectral wavelengths in this study. The CARS selects the key wavelengths based on the principle of ''survival of the fittest'' [33]. It abandons the wavelengths which are of small regression coefficients by exponentially decreasing function (EDF). The main procedures of each sampling run can be described as follows: (a) model sampling based on Monte Carlo (MC) principle; (b) wavelengths selection using EDF; (c) competitive wavelengths selection based on adaptive reweighted sampling (ARS); (d) evaluation of the subset by cross validation. Finally, those wavelengths which contain little or no useful information are eliminated while effective wavelengths are retained [34], [35]. The SPA, which is also a robust method for the selection of key wavelengths, was then used to select effective wavelengths. This algorithm can solve the collinear problem by selecting optimal wavelengths with minimal redundancy, and use a projection operation in a vector space for selecting key wavelengths [36], [37]. Both of the two wavelengths selection algorithms were operated in Matlab R2009a. Finally, the raw spectral data were consequently reduced into a matrix with a dimension of m6n (m was the number of the samples, and n was the number of the selected wavelengths).

Measurement of color values (DL*, Da* and Db*)
In tea processing procedure, color parameters of tea leaves play vital roles for the reason that they can not only directly determine tea's quality, but also reflect the quality. Therefore, it is crucial to acquire the color parameters of tea leaves at different drying periods. In this study, three color parameters (DL*, Da* and Db*) were measured using the colorimeter. Before color measurement, the colorimeter should be firstly calibrated by a standard white calibration plate. The parameter DL* is the lightness or luminance component. The other two parameters (Da* and Db*), which range from 2120 to 120, are the two chromatic components. The parameters DL*, Da* and Db* represent color changes from dark to brightness, green to red and blue to yellow, respectively.

Statistics of color parameters
A total of 210 tea leaves at five different drying periods (fresh, drying for 4 min, drying for 6 min, drying for 8 min and drying for 10 min) were studied. They were divided into the calibration set and the prediction set at a ratio of 2:1. That is one sample was picked out from every three ones consecutively, which resulted in 140 samples for the calibration set (Biyun: 46, Longjing-43: 47, Zhongcha-302: 47) and 70 ones for the prediction set (Biyun: 24, Longjing-43: 23, Zhongcha-302: 23). The detailed statistical values of each set are shown in Table 1.

Preprocessing results
In order to obtain useful spectral information and improve the predictive ability, wavelengths at the beginning with some noise were rejected, resulting in the wavelengths of 400 to 1030 nm were studied. Then, nine different preprocessing algorithms were used to evaluate the optimal one in terms of the values of r c , r p , RMSEC and RMSEP of PLS model. The results can be seen in Table 2. Based on the evaluation standards, raw data performed best with the highest values of r p (0.925 for DL* and 0.930 for Db*, respectively). Though the PLS model based on MSC preprocessing method obtained the result with the highest values of r c (0.949) and r p (0.799) for Da*, the lowest value of RMSEC (0.611) and the second lowest value of RMSEP (1.276), it did not performed well due to the big gap between the values of r c and r p . Thus, raw data was used for further study.

Effective wavelengths
In order to obtain the optimal model with the robust predictive ability and a small number of input variable, two wavelengths selection methods (CARS and SPA) were conducted to determine the most effective wavelengths in this study, respectively. As a result, forty-eight (DL*), thirty-four (Da*) and twenty-six (Db*) wavelengths were identified by CARS, respectively; seven (DL*), six (Da*) and eleven (Db*) wavelengths were selected by SPA, respectively. Compared with the number of full spectral wavebands, those of the selected wavelengths recommended by CARS only account for 9.68%, 6.85% and 5.24%, respectively. The numbers of effective wavelengths suggested by SPA were only 1.41%, 1.21% and 2.22% of that of the full wavebands, respectively. These selected wavelengths (as shown in Table 3) were then used to replace the whole spectral wavelengths for the determination of color values. A total of four different calibration models (LS-SVM, PLSR, PCR and MLR) were established based on the selected wavelengths, respectively. These selected wavelengths not only simplify the calibration model and speed up the calculation but also improve the accuracy and robustness of the predictive ability.

Predicted results
The predicted results can be seen in Table 4 . From the results, it could be seen that LS-SVM model based on the selected wavelengths performed better than other models. Though there was a little decrement of the values of r c and r p , increment of the values of RMSEC and RMSEP for those models established based on SPA, the input variables were fewer compared with the models which were established based on CARS, respectively. The fewer input variables demonstrated that CARS and SPA can improve the performance of the predicted ability for the determination of color parameters. Thus, these selected wavelengths were more efficient than the whole wavelengths. It may because that the whole spectral wavelengths contained more redundant information which affects the performance of the predicted results. The predicted results of CARS-LS-SVM and SPA-LS-SVM models were shown in Fig. 4 (a-f), respectively. It could be found that the plots in calibration and prediction sets were distributed near the ideal lines, indicating that the performance of these models were good. It demonstrated that hyperspectral imaging technique could be used to determine the color parameters of tea leaves, and both CARS and SPA methods could remove uninformative wavelengths and improve the predicted ability of models.

Classification of different samples
The correct classification rates (CCRs) of samples at five different drying periods based on LS-SVM model were shown in Table 5. The results covered from 89.29% to 100% in the calibration set and from 71.43% to 100% in the prediction set, respectively. The total CCRs were 96.43% in the calibration set and 85.71% in the prediction set, respectively. From the results, it can be seen that type (6 min) and type (8 min) were identified badly (both were 71.43%), other three types were classified excellently with high values of CCRs (from 92.86% to 100%). There were three samples in type (6 min) were identified as type (8 min), and four ones in type (8 min) were identified as type (6 min). This might because of the short interval of drying time, which caused tiny change of color parameters of tea leaves. However, the total result was acceptable which demonstrated that hyperspectral imaging technique combined with LS-SVM model could also be used to classify tea leaves at different drying periods.

Conclusions
This study was carried out to evaluate the feasibility of using visible and near infrared hyperspectral imaging technique, which covers the spectral wavelengths of 380 to 1030 nm, to determine color parameters of tea leaves during different drying periods. Two wavelengths selection methods including CARS and SPA were used to select effective wavelengths. Four different models (LS-SVM, PLS, PCR and MLR) were used to predict color values. Each wavelength selection method and each model obtained a good result. Among all models, the values of r p ranged from 0.902 to 0.931 for DL*, from 0.618 to 0.973 for Da* and from 0.904 to 0.944 for Db*, respectively. Based on the selected wavelengths, multispectral imaging system could be designed for nondestructive quality inspection during tea processing industry. Moreover, the CCRs of tea leaves at five different drying periods based on LS-SVM model covered from 89.29% to 100% in calibration set and from 71.43% to 100% in prediction set, respectively. The result demonstrates that this technique could to be used as an objective and nondestructive method to determine color parameters of tea leaves and classify samples during different drying periods. This is the first time that the visible and near infrared hyperspectral imaging technique was applied in the color determination of tea leaves at different drying periods. This technique can also be considered to determine some other chemical components which are also very important for tea's quality. However, this study was a preliminary work. In further studies, more samples with different drying time should be selected to build more accurate and robust model. More effective wavelengths with higher accuracy and fewer variables should also be considered.