Determination of hemicellulose, cellulose, holocellulose and lignin content using FTIR in Calycophyllum spruceanum (Benth.) K. Schum. and Guazuma crinita Lam.

Capirona (Calycophyllum spruceanum (Benth.) K. Schum.) and Bolaina (Guazuma crinita Lam.) are fast-growing Amazonian trees with increasing demand in timber industry. Therefore, it is necessary to determine the content of cellulose, hemicellulose, holocellulose and lignin in juvenile trees to accelerate forest breeding programs. The aim of this study was to identify chemical differences between apical and basal stem of Capirona and Bolaina to develop models for estimating the chemical composition using Fourier transform infrared (FTIR) spectra. FTIR-ATR spectra were obtained from 150 samples for each species that were 1.8 year-old. The results showed significant differences between the apical and basal stem for each species in terms of cellulose, hemicellulose, holocellulose and lignin content. This variability was useful to build partial least squares (PLS) models from the FTIR spectra and they were evaluated by root mean squared error of predictions (RMSEP) and ratio of performance to deviation (RPD). Lignin content was efficiently predicted in Capirona (RMSEP = 0.48, RPD > 2) and Bolaina (RMSEP = 0.81, RPD > 2). In Capirona, the predictive power of cellulose, hemicellulose and holocellulose models (0.68 < RMSEP < 2.06, 1.60 < RPD < 1.96) were high enough to predict wood chemical composition. In Bolaina, model for cellulose attained an excellent predictive power (RMSEP = 1.82, RPD = 6.14) while models for hemicellulose and holocellulose attained a good predictive power (RPD > 2.0). This study showed that FTIR-ATR together with PLS is a reliable method to determine the wood chemical composition in juvenile trees of Capirona and Bolaina.


Introduction
The Amazonian Forest plays an important role in human being and environmental welfare; however, the growing deforestation could reach a summit point thus this forest will turn to non-forest ecosystem [1,2]. Just in the last 10  growing population [2] have contributed to the disappearance of 26 million Ha of Amazonian Forest [3], an area close to New Zealand surface. Thus, Amazonian deforestation has increased the intensity of global warming, loss of biodiversity and affects indigenous communities adversely (migration, prostitution, etc) [2]. Deforestation also increases risk of human contact with zoonotic pathogens [4]. In this context, tree breeding is a competitive and environmentally friendly opportunity to recover and increase woody biomass [5]. Tree breading programs increase wood quality and forestry production; such is the case of Scots pine, Norway spruce, Stika spruce, Eucalyptus pellita and Acacia mangium [5,6]. In the amazon forest there is interest in tree breading programs of Capirona and Bolaina [7]. These fast-growing trees deliver wood products at an early stage [8] and can be coppiced for successive harvests [9]. Both species have a high wood market demand and are widely used for reforestation [10][11][12]. Furthermore, the production of both species is a sustainable activity for small farmers across Amazon and provides environmental services like carbon capture, conservation of biodiversity and soil fertility [8].
Selection of superior trees (plus-tree) is normally based on visual properties rather than quality parameters such as cellulose, hemicellulose and lignin content [5,6]. The chemical composition is a key feature that determines wood quality and the end-use of wood in early stage [13,14]. For instance, lignin is particularly appreciated in timber industry but not in pulp production, while cellulose is quite valued in pulp industry [13]. The information of chemical composition can help breeders in the selection process of superior trees [6]. However, determination of chemical composition using conventional methods is expensive (USD 1000 per tree), time consuming, destructive and requires chemical reagents [6,15].
On the contrary to conventional techniques, FTIR-ATR spectroscopy is a practical and economical analytic method to determine wood chemical composition [15]. This method has been successfully used to identify cellulose, hemicellulose, lignin, monosaccharide, extractive compounds and proteins in forest species [15][16][17][18]. The complex data generated by FTIR-ATR (vibration of chemical bonds and functional groups) is processed by multivariate analysis [19] such as partial least squared (PLS). This multivariate method is one of the most used ones and can predict sample properties by a number of latent variables which compress spectral information [20]. Despite the successful application and numerous advantages of these methods, they are barely applied in native Amazonian trees.
To the best of our knowledge, there is no FTIR-ATR spectroscopy coupled to PLS in fast growing trees in juvenile stages. Therefore, the aim of this work was to develop efficient models to predict cellulose, hemicellulose, holocellulose and lignin content in juvenile trees of Capirona and Bolaina.

Plant material
100 Capirona trees and 150 Bolaina trees acquired from Ayahuasca Ecolodge E.I.R.L. and Est. Exp. Agraria Pucallpa-Ucayali-INIA, were kept in a net house in La Molina (12˚05' S, 76˚57' W and 243.7 masl). When trees reached 1.8 year-old, the samples were collected from apical and basal stem to expand range of wood chemical variability. We grouped 4 and 6 plants as a sample for Capirona and Bolaina, respectively, to reach enough dry matter (5 g) for chemical analysis. In both species, the third and fourth internode from the upper stem were taken as apical samples while the third and fourth internodes from the lower stem as basal samples.

Wood chemical analysis
The harvested stems were cut into small pieces, dried at 60ºC and milled to quantify cellulose, hemicellulose, and lignin content according to Van Soest and Robertson [21]. This method determines neutral detergent fiber (NDF), acid detergent fiber (ADF) and acid detergent lignin (ADL) after digesting samples with chemical reagents. NDF is first obtained after digestion in neutral detergent solution, and it mainly consists of cellulose, hemicellulose and lignin. After digesting NDF with acid detergent, ADF is obtained and contains cellulose and lignin predominantly. To obtain ADL, cellulose was removed from ADF with H 2 SO 4 . Cellulose content is calculated from the difference between ADF and ADL, hemicellulose from the difference between NDF and ADF. Lignin content is expressed as ADL and holocellulose was determined by addition of cellulose with hemicellulose [22]. We analysed 25 samples per apical and basal stem and all experiments were run twice for each sample. Analysis of variance (ANOVA) followed by Tukey comparison test (α = 0.05) between the chemical composition of apical and basal stem was performed using R studio.

FTIR-ATR spectroscopy
Spectra from dried, milled and sieved (60 Mesh) samples of both species were collected. All samples were kept in sealed containers at air-dry moisture content [27]. Triplicated FTIR spectra of each sample were acquired by attenuated total reflection (ATR) in Spectrum 100 Perkin Elmer spectrometer equipped with a diamond ATR accessory, a Tantalato de Litio detector and a KBr beam splitter. Thirty-two scans were run per sample in the spectra range between 4000 cm -1 and 400 cm -1 at 4 cm -1 spectral resolution. Before each measurement, background spectra with the same settings were collected.

Chemometrics
Data acquisition and pre-processing. Spectral data were acquired through Spectrum 10 software in SP format. Prior to pre-procesing, spectral data were mean-centered. To remove the effect of interference factors (light scattering, noise, and pathlength variations) from spectral data, the following pre-processing methods (Savitzky-Golay first derivative, Savitzky-Golay second derivative, Smoothing, standard normal variation, multiplicative scatter correction, straight line correction and min-max normalization) were evaluated. Pre-processing was applied in the full spectra region (3700-850 cm -1 ) and the fingerprint region (1800-850 cm -1 ).
Partial Least Squares (PLS) analysis. PLS regression was used to build models for predicting cellulose, hemicellulose, holocellulose and lignin content from FTIR spectra. A cross validation was performed in the Peak spectroscopy software (https://www.essentialftir.com/) to select the best pre-treatment for each compound in the full spectra and the fingerprint region. Once identified the best pre-treatment for each compound, the data were split into calibration set (100 spectra) and validation set (50 spectra). Models were built using calibration set and evaluated using validation set.
Determination of number of latent variables. Two methods were performed to determine the number of latent variables: Haaland and Thomas [23] and Gowen et al. [24]. This last method uses the appearance of noise in regression coefficients as a sign of over-fitting. The noise and sharp peaks in regression coefficients were quantified (Jaggedness) and paired to RMSE to estimate the optimal numbers of latent variables [24].
Model evaluation. Performance of models were evaluated using root mean squared error of calibration (RMSEC) and predictions (RMSEP), coefficient of determination of calibration (R 2 c ) and prediction (R 2 p ), and ratio of performance to deviation (RPD). To assess the predictive ability of models (RPD), Santos et al. [25] and Li et al. [26] formula were used. According to Karlinasari et al [27], models with RPD values higher than 1.5 are satisfactory and useful for initial screenings and preliminary predictions, RPD values between 2.0 and 2.5 have very good prediction and RPD values higher than 3 are efficient prediction. In wood samples an RPD of 1.5-2.5 is enough to predict wood chemical composition [28]. Finally, the best model performance was selected based on R 2 c and R 2 p values close to 100%, RMSEC and RMSEP values close to zero and a high RPD value [17].

Analysis of chemical composition
The chemical composition of the apical and basal stem for both species (S1 Table) was determined by Van Soest and Robertson [21] method and it is summarized in Table 1. In both species, the percentage of cellulose and holocellulose was significantly higher (p < 0.001) in the basal stem than apical stem. In contrast, a significant decrease in hemicellulose was observed in the basal stem with respect to apical stem in Capirona (p <0.05) and Bolaina (p <0.001). Like carbohydrates, lignin content increased in Capirona basal stem while in Bolaina the lignin content decreased in the apical stem. Therefore, the higher content of carbohydrates and lignin in the base of trees may be related to denser wood as an adaptative response to alleviate any bending stress related to wind and other forces [9,29]. The mechanical benefit of denser wood is the increasing strength in supporting tissues at the base of tree stalk where cellulose provides strength and lignin, compression and shear resistance [30].

FTIR spectra
In the FTIR spectra of Capirona and Bolaina wood, two regions of analysis were distinguished (3800-2800 cm-1 and 1800-850 cm-1), see Fig 1. The peak numbers and their assignment to chemical compounds are presented in Table 2. All samples show peaks 1, 2 and 3, which were assigned to O-H stretching, symmetric and asymmetric C-H stretching, respectively (Fig 1 and Table 2) and they are presented in all components of wood [31]. In the fingerprint (1800-850 cm-1), there were many peaks assigned to cellulose (peaks 11, 13 and 16), hemicellulose (peak 4), lignin (peaks 6 and 7), holocellulose (peaks 10, 12, 14 and 15) and common peaks to carbohydrates and lignin (peaks 8 and 9). Peak 4 corresponds to vibration bonds of C = O in hemicellulose and extractive compounds such as esterified resin acids, waxes and fats [17,33] and its absorbance was higher in the basal stem than apical stem in both the species, which contradicts the chemical analyses (Table 1). These differences could be explained because of the presence of extractive compounds which can be masking hemicellulose vibration [37]. In relation to lignin, peaks 6 and 7 indicate the vibration of the aromatic skeletons [17,33] and there were more intense in the basal stem than apical stem in both the species. However, peak 6 was not well defined in Capirona due to the presence of a prominent peak 5, which is attributed to calcium oxalate and flavones [33]. While in Bolaina, peak 5 is poorly defined in apical stem, which could be a signal of a little presence of extractives [34] and it is close to peak 6. Peaks 5, 6 and 7 are in the region from 1750 to 1510 cm -1 , which was used by Funda et al. [18] to report chemical differences between juvenile and mature wood in Scots pine because of vibrations in lignin and some carbohydrates. Similarly, Ozgenc et al. [38] were able to determine lignin content in alder and cedar based on skeleton of aromatics rings (1508-1510 cm -1 and 1603-1609 cm -1 ). Moreover, peaks 8, 9, 10 and 11 were assigned to stretching and bending bonds of carbohydrates and lignin [16-18, 32, 34, 35]; the absorbance of these peaks in Capirona was higher at basal than apical stem and it was in agreement with chemical analysis (Table 1). For Bolaina, the absorbance of peaks 9, 10 and 11 (Fig 1B) was higher in the apical section versus basal stem but peak 8 was not distinguished. The spectral region (1510-1265 cm -1 ) for peaks 8, 9, 10 and 11 was also used in Pinus koraiensis to detect high lignin and carbohydrates concentrations in artificial inclined stem [39]. In the carbohydrate region (1200 to 890 cm -1 ), peaks 12, 13, 14, 15 and 16 (Fig 1A and 1B) evidenced a higher concentration of carbohydrates in the basal stem than in the apical stem in both the species. Furthermore, within the fingerprint region, Acquah et al. [36] were able to report polysaccharides by the presence of peaks: 897 cm-1, 1030 cm -1 (C-H deformation in cellulose and C-O stretch in polysaccharides), 1157 cm -1 (C-O-C vibrations) and 1239 cm -1 (C-O stretch and O-H stretch in plane in polysaccharides) in forest logging residue. Thus, the FTIR spectra evidenced the compositional difference in different forest samples.

PLS modeling
PLS regression was applied to develop predictive models that correlate FTIR spectra with chemical wood composition. First, cross validation was performed to select the best pre-treatment for each compound in the full spectra and fingerprint region (S2 Table). The number of latent variables (LV) were determined by Haaland & Thomas [23] method, however higherorder latent variables were observed (S2 Table). Therefore, Gowen et al. [24,40] method was used to prevent overfitting. Then, external validation was performed using the best predictive model obtained by cross validation (Table 3). RMSEC, RMSEP, R 2 c , R 2 p and RPD were used to evaluate the model performance. The lignin models of Capirona and Bolaina (Table 3) were highly accurate because of low RMSEP values (0.48 and 0.81, respectively), with a good prediction power (RPD > 2.0) and a good data fitting (R 2 c of 90.2% and 82.6%). Our results are in agreement with Acquah et al. [36], Zhou et al. [17], Zhou et al. [37], who reported values of R 2 � 90-80% and RMSEP � 0.47-0.80. Despite these authors reported models with lower RPD (1.4 and 1.72) values than our models´(see Table 3), the author models were considered to have a good predictive power.
In Capirona, the predictive power of cellulose (Fig 2A), hemicellulose ( Fig 2B) and holocellulose ( Fig 2C) models were satisfactory to quantify these compounds. The RPD values of these three compounds were within 1.5 to 2.0 range (Table 3), they are suitable for initial screenings and preliminary predictions [26,27]. However, these models would be proper for estimating wood properties according to Hein et al. [28]. Moreover, the R 2 c values for cellulose and holocellulose content were 85.1% and 83.0%with a RMSEP of 2.06 and 2.07, respectively. Meanwhile, the R 2 c value for hemicellulose was 77.3% with a RMSEP of 0.68. A good hemicellulose predictive model with similar values to our model was reported by Funda et al. [18] in juvenile wood of Scots pine. Furthermore, our cellulose model is consistent with Zhou et al. [17], who reported R 2 and RPD values of 85.62% and 1.72 in hardwoods so this model was strong enough for screening and cellulose characterization. Finally, our holocellulose model was better than the one reported by Acquah et al. [36] in forest logging residue. On the other hand, Funda et al. [18], Zhou et al. [17] and Acquah et al. [36] reported more efficient lignin models based on a high RPD and a low RMSEP than carbohydrate models. We found this trend in Capirona models too. A possible explanation to this trend may be the presence of unique chemical structures in lignin unlike carbohydrates [36].
In Bolaina, the RMSEP values of cellulose, hemicellulose and holocellulose (1.82, 2.12 and 3.16, respectively) were higher compared to that of the other authors. For instance, Funda et al. [18], who reported RMSEP values from 0.69 to 0.71 in cellulose and 0.71 in hemicellulose, while Zhou et al. [17] found RMSEP values from 0.8 to 1.9 in cellulose and hemicellulose. Since R 2 value depends on the distribution of the data and not only on the model performance [41], it is possible to use it as a secondary measure of model performance [18]. Thus, the performance models should achieve RMSEC and RMSEP values close to zero, a high RPD with R 2 c and R 2 p values close to 100%. In Table 3, models showed a high predictive power (RPD > 2) with RMSEP slightly high, so R2 was considered to evaluated model performance. Thus hemicellulose ( Fig 3B) and holocellulose (Fig 3C) models had a high data correlation (R 2 c of 85.1 and 89.3, respectively) and RPD > 2.0, so they were considered as very good prediction models [26,27]. Cellulose model (Fig 3A) showed exceptionally high values of R 2 c ¼ 97:3 and RPD = 6.14, so it was classified as an excellent predictive model [26,27]. In contrast, the cellulose, hemicellulose and holocellulose models reported by Acquah et al. [36] in forest logging residue and Zhou et al. [17] in hardwoods were considered satisfactory.
For Capirona and Bolaina, the accuracy of the models (RMSEP) was higher in lignin than the carbohydrate models (Table 2). According to Hein et al. [25], an RPD > 1.5 allows to quantify the chemical composition of the wood. However, the RPD scale established by Karlinasari et al. [27] allows a better classification of the models. Finally for each model, the values of R 2 and RMSE were close between the set of calibration and validation, therefore models have good predictive capacity according to Li et al. [26].

Conclusions
A rapid method for wood composition determination in an integrated production system based on Capirona (international market) and Bolaina (national market) is required for genetic improvement and hence improve a forest-based bioeconomy in Amazon. This study confirmed significant differences in cellulose, hemicellulose, holocellulose and lignin content between apical and basal stem in Capirona and Bolaina. The variability found was used to build PLS models to estimate cellulose, hemicellulose, holocellulose and lignin content in juvenile wood from FTIR-ATR spectra. Thus, lignin models of both species achieved high accuracy (low RMSEP) and were considered as very good prediction models (RPD > 2). In Bolaina, models of hemicellulose and holocellulose showed a very good prediction power (RPD > 2.0), meanwhile cellulose model was excellent (RPD > 3.0). In Capirona, cellulose, hemicellulose and holocellulose models attained sufficient predictive power (RPD > 1.5) to estimate wood composition. Thus FTIR-ATR spectroscopy combined with PLS models offers a technologic tool that can be applied to early tree selection based on chemical wood composition.
Supporting information S1   Table. Leave-one-out cross validation method and number of latent variables determination. (XLSX)