Linear Mixed-Effects Models to Describe Individual Tree Crown Width for China-Fir in Fujian Province, Southeast China

A multiple linear model was developed for individual tree crown width of Cunninghamia lanceolata (Lamb.) Hook in Fujian province, southeast China. Data were obtained from 55 sample plots of pure China-fir plantation stands. An Ordinary Linear Least Squares (OLS) regression was used to establish the crown width model. To adjust for correlations between observations from the same sample plots, we developed one level linear mixed-effects (LME) models based on the multiple linear model, which take into account the random effects of plots. The best random effects combinations for the LME models were determined by the Akaike’s information criterion, the Bayesian information criterion and the -2logarithm likelihood. Heteroscedasticity was reduced by three residual variance functions: the power function, the exponential function and the constant plus power function. The spatial correlation was modeled by three correlation structures: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS). Then, the LME model was compared to the multiple linear model using the absolute mean residual (AMR), the root mean square error (RMSE), and the adjusted coefficient of determination (adj-R 2). For individual tree crown width models, the one level LME model showed the best performance. An independent dataset was used to test the performance of the models and to demonstrate the advantage of calibrating LME models.


Introduction
China-fir (Cunninghamia lanceolata (Lamb.) Hook) is the most commonly grown afforestation species in southeast China because of its fast growth and good wood qualities. It is widely used for buildings, furniture, bridge construction and many other purposes. According to the National Continuous Forest Inventory, approximately 11.26 million hectares and 734.09 million cubic meters of China-fir were distributed over 10 provinces in China in 2010. approximately 1699 mm, the annual mean frost-free season is 287 days, and the annual mean temperature is 18.7°C.
Data from four thousand one hundred ninety-nine trees were obtained from 55 single-species plots of plantation-grown China-fir on the Jiangle state-owned forest farm in Fujian Province, southeast China (Fig 1). The Jiangle state-owned forest farm issued permission for each location, and the field studies did not involve endangered or protected species. The sample plots were square and varied in size from 400 to 600 m 2 . All standing live trees (height > 1.3 m) on the plots were measured for DBH (outside bark), tree height, height to crown base (height above ground to crown base) and crown width. Three to five dominant trees on each plot were chosen to calculate plot dominant height and diameter. Crown width was taken as the arithmetic mean of two crown widths, obtained from measurements of four crown radii in four directions (from the east, west, south and north to the center of tree, respectively) representing two perpendicular azimuths [8]. The crown width data were randomly divided into two groups; 75% of the points were used for model fitting, and 25% were used for model validation, which can be claimed as independent. The fitting and validation data consisted of 2587 trees from 39 plots and 1613 trees from 16 plots, respectively. Summary statistics for both the fitting and validation data are shown in Table 1

Methods
The covariate selection. DBH is an important tree characteristic and the variable that has the greatest correlation with crown width [27]. In addition to DBH, CW is explained by other tree and stand attributes, [9; 28] such as a reduction in growth from increases in stand density, SD (tree ha -1 ) and basal area, BA (m 2 Áha -1 ) [22; 29]. In addition, CW is also influence by tree size variables, such as sample tree height (H) and height to crown base (HCB), and stand variables, such as stand age (A), plot dominant height, DH (m), plot dominant diameter at breast height, DD (cm), plot quadratic mean diameter, QMD (cm) [28; 30; 31], plot mean height, MH (m), and site index, SI (m at 20 yr).
Crown width multiple linear model. Independent variables were identified and a backward stepwise linear regression routine that started with all candidate variables, tested the deletion of each variable using a chosen model comparison criterion, deleted the variable (if any) whose removal improved the model the most, and repeated this process until no further improvement was possible, was applied to reduce the number of chosen variables to avoid overfitting. Variance inflation factors (VIF<10), which provide an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity, were also computed to reduce the number of chosen variables to avoid multicollinearity, which could result in numerically unstable estimates of the regression coefficients. Stepwise regression fits an observed dependent dataset using a linear combination of independent variables. The statistical methods were implemented in R, which is a free software environment for statistical computing and graphics [32]. The dependent variable is determined from a linear equation combining the values of the independent dataset with coefficients established by the regression. The statistical results were assessed in terms of the absolute mean residual (AMR), root mean square error (RMSE), and the adjusted coefficient of determination (adj-R 2 ), which accounts for the number of predictors. The calculation formulas of these statistics are listed as follows: jy ij Àŷ ij j n i 1 where M is the number of plots, n i is the number of observations in plot i, r is the number of parameters in the model, y ij is the crown width of the jth tree taken from the ith plot, ŷ ij is the crown width prediction, and y is the average of observations. The accuracy of the models was tested against the fitting data and against independent validation data from the same plot [23]. LME model method. Available data were from measurements of trees located in sample plots. Because of this nested structure, there is high correlation among observations taken from the same plot. To alleviate this issue, a linear mixed-effects model approach has been proposed by other authors [10; 33]. For a single level of grouping, a general expression for a LME model can be defined as [17; 20; 34]: where CW ij is the crown width of the jth tree taken from the ith plot, β is the p-dimensional vector of fixed effects (where p is the number of fixed-effects parameters in the model), b ij is the q-dimensional vector of random effects associated with plot i that is assumed to follow a normal distribution with mean zero and a variance-covariance matrix D (where q is the number of random-effects parameters in the model), X ij (of size n i ×p) and Z i (of size n i ×q) are known fixed-effects and random-effects regressor matrices, and ij is the n i -dimensional within-group error vector with a spherical Gaussian distribution [35], which is assumed to be normally distributed with zero expectation and a positive-definite variance-covariance structure R ij , generally is a n i × 1 vector for the residual items [e i1 , e i2 , e i3 ,. . ., e ij ,. . ., e in i ] T [36]. Both the random-effects b ij and the within-group errors ij are assumed to be independent for different groups and to be independent of each other for the same group. Ascertainment of mixed parameters. To fit the mixed-effects models, the key question is which parameters in the model should be considered as random effects and which ones could be treated as purely fixed effects. Generally, an alternative model-building approach is to start with a model with random effects for all parameters and then examine the fitted object to decide which, if any, of the random effects can be eliminated from the model [18]. Therefore, different combinations of model parameters were tested to ascertain their importance with respect to crown width, and the best model was selected by Akaike's information criterion (AIC) [37], Bayesian information criterion (BIC) [38] and -2 logarithm likelihood (-2 LL) [31]. The less criteria a model has, the better it performs. An appropriate variance function structure for LME models were determined by a likelihood ratio test (LRT) [18; 39]. All LME models presented in this paper were fitted using the LME function in the R statistical software environment.
Determining the structure of R. This special matrix structure R (which is allowed to depend on both random and fixed effects, as well as on a set of common but unknown parameters) can include both correlation effects and weighting factors to account for within-group heteroscedasticity and spatial correlation [35; 36; 40]. A general expression for the matrix is given by [40; 41]: where (in this case) for tree j in plot i, with n i increment, R is the n i ×n i intraindividual variancecovariance matrix which defines within-group variability, G is a n i ×n i diagonal matrix of the within-group error variance structure (heteroscedasticity), I is a n i ×n i matrix showing the within-group autocorrelation structure of error, and σ 2 is a scaling factor for the error dispersion [10]. To remove variance heterogeneity, we used the power function, exponential function and constant plus power function as the variance functions to fit crown width models [18].
Correlation structures were used to address the within-tree spatial correlations observed in the data [42; 43]. A method was selected from among three commonly used approaches: the first-order autoregressive structure [AR(1)], a combination of first-order autoregressive and moving average structures [ARMA(1,1)], and the compound symmetry structure (CS) [18]. where ρ is the autoregressive parameter, γ is a moving average component, and σ 1 is the residual covariance [44; 45]. Parameter estimation. The parameters in the equations were estimated by maximum likelihood (ML) using the Lindstrom and Bates (LB) algorithm implemented in the R LME function [17; 18]. The LB algorithm and LME function are detailed in several articles; see, for example, [17; 18].
A key question in fitting the LME models is to estimate the random effects parameters. In this study, they can be calculated with the information from measured trees, such as the measurements of CW and DBH, by the Empirical Best Linear Unbiased Predictors (EBLUPs) [34].
whereD is the estimated variance-covariance matrix for the random-effectsb ijk ,R is the estimated variance-covariance matrix for the error term, andẐ is the estimated partial derivatives matrix with respect to random effects parameters.

Selection of the basic crown width model
The following formula is the composition of individual tree size variables and stand variables for predicting crown width using OLS: where β 0 -β 11 are the formal parameters.
To avoid overfitting and multicollinearity between independent variables, the backward stepwise linear regression routine and the variance inflation factor were used to reduce the number of chosen variables. In addition, we took into account the biologically reasonable and the factors that exhibited significance (Pr value<0.05) between independent variables. The variable selection process involves a series of steps beginning with the stepwise regression method together with VIF control to identify those variables that may be useful in the model. DH, A and MH were removed from Eq 13 because their VIF>10 (VIF DH = 27.48, VIF A = 13.61, VIF MH = 17.72). As a result, the final diameter growth model for fir plantations can be expressed as: The statistics used for the selection of the basic model are shown with equations in Table 2.

Construction of LME models
There would be ninety different combinations of no more than four random-effects parameters for Eq 14 while simultaneously considering plots effects. The LME models with more than four random-effects parameters could not reach convergence.
LRT, AIC, BIC and -2 LL statistics were compared between the LME models with the best different combinations of random-effects parameters and are shown in Table 3. The model of Eq 14, incorporating plots effects on β 0 , β 1 , β 2 and β 3 (Eq 14.4), yielded the smallest AIC, BIC and -2 LL and had significant differences when compared to the other LME models (Eqs 14.1-14.3) (Pr<0.0001).

LME model with heteroscedasticity and spatial correlation
We used the power function, the exponential function or the constant plus power function as the variance functions and AR, ARMA(1,1) or CS as the correlation structure to update Eq 14.4 to reduce heteroscedasticity and spatial correlation. The LME models with variance functions and correlation structures are shown in Table 4. In this study, Equation 14.4.1 is the same as Eq 14. The best models were chosen with the smallest AIC, BIC and -2 LL. Thus, the

Model prediction
The predictive ability of Eq 14 was evaluated using prediction procedures and Eq 1-3 on both fitting and validation data. The performance of the LME models, with and without modeling the error structure, was evaluated using cross-validation procedures for both fitting and validation data; the random effects were predicted with the EBLUPs (Eq 12), using the measurement data. Table 5 lists the three prediction statistics of Eq 14, Eq 15 and Eq 15 without random effects for both fitting and validation data. Compared with Eq 14, Eq 15 had a higher adj-R 2 , 0.7226 compared to 0.4733, and lower RMSE, 0.4854 compared to 0.6688, and AMR, 0.3688 compared 0.4954, for the validation data. In Fig 3,  Based on the above analysis, we can conclude that Eq 15, incorporating the random effects plots, was better than Eq 14. The LME model provides a model for predicting the expected values of crown width for individual trees of China-fir in the single-species plantations of the study area.

Discussion
In this study, a backward stepwise linear regression was used to establish a multiple linear individual tree crown width model for China-fir. The relative importance of explanatory variables used to predict the crown width were assessed. Generally, DBH is the tree size variable most related to crown width [27]. In addition to DBH, the tree size variables (such as H and HCB in this study) and stand variables (such as DH in this study) are also obvious factors affecting the crown width [20; 22; 46; 47]. Both stand and tree development are linked to the DH because it is a measureable stand characteristic that indicates site quality in terms of the stand growth and yield capacity [20]. The variables H and HCB showed a significant effect on the crown width because they are closely related to tree size and have an important role in crown fire initiation and spread [47; 48]. Therefore, we selected the diameter at breast height, tree height, height to crown base, plot dominant height, plot dominant diameter at breast height, stand age, site index, stand density, plot quadratic mean diameter, plot mean height and basal area as the independent variables to establish an individual tree crown width model. However, variance inflation factors were used to avoid potential overfitting and multicollinearity.

Conclusions
Eleven variables were selected in this study to describe crown width (Eq 13) of China-fir in pure plantation stands in Fujian province, southeast China. Then, the backward stepwise linear regression routine and the variance inflation factor were used to reduce the number of chosen variables (Eq 14). The one-level (plot) LME model using the variance function structure and correlation structure approach were used to estimate the relationship of the chosen variables with crown width for individual trees. The results showed that the one-level LME models with mixed effects, considering variance function structure and correlation structure (Eq 15), provided better model fitting and more precise estimations than the LME models without mixed effects (Eq 14) ( Table 5 and Figs 3 and 4). Therefore, we recommend using a linear mixed effects modeling approach to build an individual tree crown width model.