Characterization of the volatile components in green tea by IRAE-HS-SPME/GC-MS combined with multivariate analysis

In the present work, a novel infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) followed by gas chromatography-mass spectrometry (GC-MS) was developed for rapid determination of the volatile components in green tea. The extraction parameters such as fiber type, sample amount, infrared power, extraction time, and infrared lamp distance were optimized by orthogonal experimental design. Under optimum conditions, a total of 82 volatile compounds in 21 green tea samples from different geographical origins were identified. Compared with classical water-bath heating, the proposed technique has remarkable advantages of considerably reducing the analytical time and high efficiency. In addition, an effective classification of green teas based on their volatile profiles was achieved by partial least square-discriminant analysis (PLS-DA) and hierarchical clustering analysis (HCA). Furthermore, the application of a dual criterion based on the variable importance in the projection (VIP) values of the PLS-DA models and on the category from one-way univariate analysis (ANOVA) allowed the identification of 12 potential volatile markers, which were considered to make the most important contribution to the discrimination of the samples. The results suggest that IRAE-HS-SPME/GC-MS technique combined with multivariate analysis offers a valuable tool to assess geographical traceability of different tea varieties.


Introduction
Tea is one of the most popular beverages in the world owing to its attractive aroma, taste, and health benefits [1,2]. According to the way of processing, teas are usually classified into three big groups based on their fermentation degrees: non-fermented (green and white), semi-fermented (oolong) and fully fermented (black tea including pu-erh tea) [3]. Of these, green tea has gained more popularity because of its pleasant flavor, mainly in Asian countries especially in Japan and China [4]. The manufacturing process of green tea basically involves four steps, including withering, pan firing, rolling, and drying. According to the heating process to inactivate the endogenous enzymes in the leaves, green tea is mainly divided into two types, i.e., a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 knowledge, no references have been reported so far concerning the volatile profile of green tea by coupling infrared-assisted extraction with HS-SPME technique.
In the present work, infrared-assisted extraction coupled to headspace solid-phase microextraction (IRAE-HS-SPME) has been developed for the rapid analysis of the volatile constituents in green tea based on GC-MS. To our best knowledge, this is the first report of exploring IRAE-HS-SPME on the aroma composition in green tea. The extraction parameters such as fiber type, sample amount, IR power, extraction time, and IR lamp distance were investigated by orthogonal experimental design to acquire the optimal analysis conditions. A comparison between IR radiation and conventional water-bath heating was conducted. Multivariate statistical techniques such as PLS-DA, HCA were implemented to understand the potential characteristics according to the geographical origins of the samples. The application of a dual criterion based on variable importance in projection (VIP) scores and the results of one-way univariate analysis (ANOVA), allowed the identification of potential volatile markers from different geographical origins.

Materials and reagents
In total, 21 commercially available roasted green tea samples were purchased from the local market. The detailed name and sources were listed in S1 Table. All tea samples were stored in a refrigerator at a temperature below -20˚C until analyses were performed.
headspace above the samples. After 20 min of extracting time at equality temperature of 90˚C, the needle was removed from the headspace vial and directly inserted into the injection port of GC-MS. The analytes were thermally desorbed in the splitless mode at 250˚C for 5 min. Gas chromatography-mass spectrometry GC-MS was performed with an Agilent Technologies 7890A GC system coupled to an Agilent Technologies 7000C Triple Quadrupole mass spectrometer. Samples were analyzed on a HP-5ms (30 m ×I.D. 0.25 mm × film thickness 0.25 μm). Analytical conditions were as follows: The oven temperature was held at 40˚C for 0 min, and raised to 190˚C for 2 min at 3˚C/min, then increased to 290˚C at 10˚C/min, maintaining for 3 min; Splitless mode was conducted; Helium (99.999% purity) was used as the carrier gas at a rate of 1 mL/min.
The mass spectrometer was operated in an electron-impact (EI) mode. The scan range was 50-450 m/z. The scan rate was 0.2s/scan. The temperatures at ionization source and interface were 230 and 280˚C, respectively.

Qualitative analysis
The qualitative analysis of the volatile components was processed by Agilent MassHunter Workstation Software Unknowns Analysis with the ability of peak picking, peak deconvolution and mass spectra comparison. The compounds were identified by comparing the obtained spectra with those of reference compounds from the National Institute of Standards and Technology (NIST14) and by the Kovàts retention indices calculated for each peak with reference to the nalkane standards (C7-C40) running under the same conditions. Peaks were assigned when the similarity was above 80%. Any known artificial peaks were excluded from the data set. The relative percentages of various components in the samples were obtained by peak area normalization.

Statistical methods
Significant differences among green tea samples were determined by one-way ANOVA using SPSS 18.0 (SPSS Inc., USA). The difference between two related results was considered to be statistically significant with values of p<0.05. HCA was also performed using the SPSS 18.0 (SPSS Inc., USA). PLS-DA was conducted by SIMCA-P software with the version 11 (UME-TRICS, Sweden). All variables were scaled with unit variance (UV) prior to PLS-DA.
Partial least squares-discrimination analysis. PLS-DA was conducted to develop models to find a two dimensional plane (discriminating plane) in which the tea samples (projected observations) on the PLS components were well separated according to their volatile compounds. As for the PLS weight plot, composition variables of weight plot can reveal the variables (specific volatile compounds) contributing to the separation.
Hierarchical clustering analysis. HCA is an unsupervised chemometric technique that reveals the natural groupings existing between samples characterised by the values of a set of measured variables. The results achieved were described by a dendrogram representing in a tree structure. In the present work, the similarities between samples were calculated on the basis of the Squared Euclidean distance, whereas Ward's method was used as a linkage procedure to establish the clusters.

Optimization of the infrared-assisted extraction coupled to headspace solid-phase microextraction parameters
The extraction process of IRAE-HS-SPME was affected by multiple parameters. In order to acquire the maximum extraction efficiency, a number of experiments under different conditions were performed. The effects of fiber type, sample amount, infrared power, extraction time, and the IR lamp distance on the extraction efficiency were investigated by orthogonal experimental design.
Headspace solid-phase microextraction fiber. The fiber coating is the "heart" of the extraction. In order to evaluate the selectivity of different fiber coatings, three types of fibers (CAR-PDMS, PDMS-DVB, and PA) with different polarity and inner structure were tested for the extraction efficiency of the volatile compounds in green tea. The total peak areas using PA and PDMS-DVB fibers were 76.66% and 89.72% of that using CAR-PDMS respectively (see Fig 2), indicating that the CAR-PDMS fiber was the most efficient and had the strongest retention ability for the volatile compounds in green tea. In addition, CAR-PDMS fiber showed the best repeatability. Therefore, the CAR-PDMS fiber was regarded as the optimal fiber and was employed in the following study.
Orthogonal experimental design. In this work, a three-level orthogonal array design with an L9 (3 4 ) matrix was conducted to optimize the sample amount (factor A, from 0.1 to 0.5 g), extraction time (factor B, from 10 to 20 min), IR lamp power (factor C, from 100 to 275 W), and IR lamp distance (factor D, from 6 to 10 cm) (see Table 1). Compared to the traditional optimization process that arrays one parameter each time, the orthogonal experiment design is regarded as the modern approach to characterize and optimize system performance in many research areas. The orthogonal experiment design method can be used to select representative points from the full factorial experiments which are distributed uniformly within the test range and thus can adequately represent the overall situation [29,30].
In this study, the orthogonal experiment design was used to evaluate the inner relationship and the influence sequence among factors, which might significantly affect the extraction efficiency. The Ki data was obtained by averaging the total peak area in the same level of each factor. The ki data was obtained by Ki data divided by the level number. The R value for each factor was calculated by finding the difference between maximum and minimum k value. The larger the R value for a factor, the stronger is the influence of the test factor on the result. The results suggest that the order of influence for extraction efficiency is RC > RD > RB > RA. In other words, the IR lamp power has the most significant impact on the detected amounts of volatile compounds in green tea followed by IR lamp distance, extraction time, and sample amount.
The variation trend of ki can be used for determining the optimal level. From the results in Table 1, we conclude that k3 > k2 > k1 for Factor A, k3 > k2 > k1 for Factor B, k2 > k3 > k1 for Factor C and k3 < k2 < k1 for Factor D. Therefore, the optimal combination of factor levels will be A3B3C2D1. In other words, a higher extraction efficiency will be achieved for sample amount of 0.5 g, extraction time of 20 min, IR power of 175 W and IR lamp distance of 6 cm. The standard statistical technique of ANOVA was used to estimate the relative significance of each parameter in terms of percentage contribution to the overall response. By comparing the F-value of different factors, it is obvious that the order of factors from large to small is CDBA (see Table 2). The order of factors determined in the ANOVA analysis is the same as for R value, which verifies the previously calculated results.
Overall, the optimized extraction conditions were as follows: CAR-PDMS fiber, sample amount of 0.5 g, extraction time of 20 min, IR power of 175 W and IR lamp distance of 6 cm.

Analysis of the volatile compounds in green teas from different geographical origins
On the basis of the optimized conditions, IRAE-HS-SPME was applied to identify the volatile profile of green tea samples from Hangzhou and Ya'an district. The volatile components were Table 1. The results and analysis of orthogonal design L9 (3 4 ). Characterization of green tea from geographical origin by IRAE-HS-SPME/GC-MS identified by comparing the obtained mass spectra with standard ones from NIST Mass Spectral library and by the Kovàts retention indices calculated for each peak with reference to the normal alkanes C7-C40 series. The relative amounts were calculated by the individual peak area relative to the total areas. The GC-MS chromatograms of green teas from different geographical origins were mapped in Fig 3. A total of 82 volatile compounds were identified in 21 green tea samples, including 27 hydrocarbons, 2 furans, 9 alcohols, 11 ketones, 7 esters, 17 aldehydes, 9 nitrogen compounds (see Table 3). As seen from S1 Fig, significant differences were observed in terms of relative area percentages of the different categories of volatile compounds between Hangzhou and Ya'an district.    A total of 14 saturated and 13 unsaturated hydrocarbons were identified in 21 batches of green teas, accounting for 26.81% and 24.07% in Hangzhou and Ya'an district, respectively. There were significant differences (P<0.05) in the content of hydrocarbon compounds, such as α-terpinene, undecane, pentadecane, calamenene, δ-cadinene, hexadecane, and cadalene. Saturated hydrocarbons are considered to have little contribution to tea flavor, while, unsaturated hydrocarbons play an important role in the flavor of tea [9]. For example, α-terpinene, which generally has a sweet and flowery aroma, is important for green tea's quality.

No. Factor A-Sample amount (g) Factor B-Extraction time (min) Factor C-IR lamp power (W) Factor D-IR lamp distance (cm) Sum of peak area
The results indicated that aldehydes were present with high proportion in green tea, representing 31.29% and 28.98% in Hangzhou and Ya'an district, respectively. Significant differences (P<0.05) in the content from the two regions were observed, including (E)-2-heptenal, benzaldehyde, (E,E)-2,4-heptadienal, decanal, β-cyclocitral, and (E)-2-nonenal. The aldehydes, originated from thermal Strecker oxidative degradation of amino acids and fatty acids, play an important role on the entire odor because of their relatively low odor threshold values [31]. Typically, benzaldehyde is described as fragrant, sweet, and almond aroma while benzeneacetaldehyde is described as honey-like and sweet.
As far as alcohols are concerned, 1-octen-3-ol, benzyl alcohol, 1-octanol are the most abundant. These results are different from the previous reports [32], which can be attributed to the difference of the extraction method and sample used in this analysis. Benzyl alcohol, which imparts a mild sweet and roasted odor, has been reported in various types of tea. There is no significant difference in the content with regard to α-terpineol, 1-octen-3-ol, and linalool oxide between Hangzhou and Ya'an district. α-Terpineol is a very important odorant that provide a floral and sweet scent. 1-Octen-3-ol, which stems from the oxidative degradation of linoleic acid, is described with an intense, persistent mushroom-like, and earthy odor [31].
A total of 9 nitrogen compounds were identified. Caffeine, which is related to the taste of tea but contributed little to the aroma of tea, is most abundant for all green teas, comprising14.54% and 15.44% for Hangzhou and Ya'an district, respectively. Indole with typical flowery fragrance, is assumed to contribute the overall green tea odor. Pyrazines, such as 2,5-dimethylpyrazine, 3-ethyl-2,5-dimethylpyrazine, which are reported to be formed by Maillard reaction through Strecker degradation, contribute desirable roasty, sweet, and nutty odors [32].
As for the rest of the identified compounds, methyl salicylate, which has a holly oil herbal fragrance, was found to be an important aroma compound. In addition, 2-pentyl-furan and 2-(2-propenyl)-furan with a burnt and sweet odor were identified in the present study [33]. No significant differences (P>0.05) were found in the content for the two furans between Hangzhou and Ya'an district.

Analysis of the volatile components in green tea and comparison with conventional methods
To evaluate the extraction efficiency, IRAE was compared with conventional water-bath heating method. There are 53 components identified in green tea by water-bath heating (see S2  Table). The chemical classes obtained by water-bath heating were mainly made up of aldehydes (32.04%), hydrocarbons (27.43%) and nitrogenous compounds (14.05%), with relatively small amounts of esters (11.77%), alcohols (5.09%), ketones (5.09%) and furans (3.59%).
It is noteworthy that the IRAE contains most of the volatile components identified by water-bath heating. Moreover, the IRAE-HS-SPME has much higher extraction efficiencies than water-bath heating. Under the conditions investigated, the total peak area obtained by IRAE-HS-SPME is 15.01 ×10 9 while the total peak area obtained by water-bath heating is 6.23×10 9 (see Fig 4). The result indicates that the IRAE is superior to the classical water-bath heating method due to the special mechanism of IRAE. The IR radiation emits a continuous spectrum, especially the wavelength range of 2.5-25μm (corresponding wave number range of 400-4000 cm -1 ) which can excite the vibrations in molecules in the modes of stretching, bending, rocking and twisting [34]. The presence of C-H and O-H bonds (such as hydrocarbons and alcohols in green tea) leads to a very strong absorption of infrared radiation between 3100 and 3600 cm -1 . Additionally, most absorption peaks of volatile components also fall into the range of the IR radiation. That is to say, the wavelength of the infrared radiation matches the absorption characteristics of the active compounds in green tea. In addition, IR radiation owns high penetration capability, it heats the green tea sample in a three-dimension manner without heating the surrounding air, while a finite period of time is needed to heat the vessel before the heat is transferred to the sample in conventional heating mode [35,36]. Therefore, the IRAE provides an alternative, powerful and effective method for the extraction of the volatile in green tea.

Method precision
The repeatability of the IRAE-HS-SPME method was determined by performing six replicate analyses on HZ-10 green tea sample under the optimum conditions. Ten frequent volatile components in green tea were selected. For these components, the relative standard deviations (RSDs) of retention time and peak area in green tea sample were calculated. As indicated in S3 Table, the RSDs of the retention times of each compound are less than 1%, and the RSDs of the peak area of each compound are less than 8%. These satisfying RSD values definitely prove the stability, indicating that the developed IRAE-HS-SPME method is a reliable method for the determination of the volatile components in green tea samples.
Above all, the IRAE-HS-SPME method is a rapid, high-efficient, solvent-free method for analysis of the volatile components in green tea. It is therefore considered that this method is promising and can be a good alternative to the traditional techniques.

Multivariate analysis
Multivariate analysis was implemented on the whole data consisting of a 21×82 matrix. The columns represent the green tea samples analysed and the rows represent the relative contents of the volatile metabolites determined by IRAE-HS-SPME. In particular, two classification models were built, in order to characterize the differences related to the geographical origins of the samples.
Partial least squares-discrimination analysis. PLS-DA is a multivariate technique used to classify different groups of samples. It is based on linking two data matrices, X (explanatory dataset) and Y (explicative dataset). In this study, two types of samples (a total of twenty-one) were processed using PLS-DA method. As shown in Fig 5A, a significant discrimination between Hangzhou and Ya'an of green teas according to the data matrix of their volatile compounds was observed by using a PLS-DA model, one group for the sample dots of Hangzhou (positive position) and another one for the sample dots of Ya'an (negative position).The wellexplained variance (R2Y = 0.978) and cross-validated predictive capability (Q2 = 0.838) indicates the model's good feasibility.
PLS-DA was further performed to display specific volatile components that better explain the differences of the two regions. The results of PLS-DA highly coincide with the comparison result of the concentration. For example, some volatile compounds such as (Z)-3-hexenyl hexanoate (63), n-valeric acid cis-3-hexenyl ester (50), 2,5-dimethylpyrazine (5), α-terpinene (19), (E,E)-2,4-heptadienal (15) and (Z)-3-hexenyl butanoate (43) presenting higher concentrations in Hangzhou samples than in Ya'an samples (see Table 3) are strongly correlated with Hangzhou green teas and located in the positive position of w Ã c axis (Fig 5B). Similarly, several specific compounds located in the negative position of w Ã c axis such as cis-jasmone (65), benzyl alcohol (22), calamenene (73), indole (53), and hexadecane (78) may contribute to the specificity of Ya'an samples and they also display higher contents in Ya'an green teas when compared with that in Hangzhou green teas.
Subsequently, the variable influence on the projection (VIP) parameter was used to select the metabolites which exhibited significant contribution in discriminating the two groups in PLS-DA model. Indeed, the VIP score is an index accounting for the relative importance: the higher the value of the VIP score, the more relevant the variable. Moreover, the VIP scores are normalized in a way that their average value on a particular model is always 1, so that a "larger than 1" criterion can be adopted to assess the significance of the contributions of individual predictors [37]. In the present study, a total of 31 differential compounds with VIP value larger than 1 were screened out (see Fig 5C), indicating that they have above average influence on the differentiating green teas from Hangzhou and Ya'an regions.
Hierarchical clustering analysis. HCA is an ideal technique for the crude classification of tea samples based on the contents of volatile components because it does not require previous information of test samples [38]. HCA of 21 samples was performed using a Ward's method to visualize the differences and/or similarities among samples through Squared Euclidean distance. At a distance level of 20, all the samples can be clustered into two groups (see Fig 6). The first cluster I consists of 15 samples (HZ-1 to HZ-15) from Hangzhou district. The second cluster II is made up of six samples (YA-1 to YA-6) from Ya'an district. It is probably because of some conceivable reasons, such as the different cultivation region, soil, and climatic conditions, harvesting seasons, processing methods, and other factors. The displaying procedures of HCA produced consistent result with that of PLS-DA, providing enough information to discriminate green tea samples from different regions.

Potential volatile markers
The application of a dual criterion, based on the univariate and multivariate analysis, allows the identification of potential volatile markers for different regions [39]. In our model, the  Table 3. $M1.DA1 and 2 represent Hangzhou and Ya'an, respectively. (C) The variables important in the projection (VIP) scores, the number referred to compound number in Table 3.

Conclusions
In the present work, a novel IRAE-HS-SPME followed by GC-MS was developed for rapid determination of the volatile components in green teas. The optimal IRAE-HS-SPME performance was obtained under operating conditions: CAR-PDMS fiber, sample amount 0.5 g, extraction time 20 min, IR power 175 W and IR lamp distance of 6 cm. A total of 82 volatile components were extracted and identified in 21 green tea samples by the proposed technique. Multivariate technique including PLS-DA and HCA were successfully employed to provide a visual comparison and highlight the differences in the volatiles profiling of green teas from different geographical origins. Furthermore, 12 potential volatile markers were identified to be the most important variables in distinguishing the samples based on the variable importance in the projection values (VIP > 1.50) and on the category from one-way ANOVA (P<0.01). The results shows that the combination of IRAE-HS-SPME/GC-MS technique to multivariate analysis offers a valuable tool to assess geographical traceability of different green tea samples.