Hyperspectral Imaging for Mapping of Total Nitrogen Spatial Distribution in Pepper Plant

Visible/near-infrared (Vis/NIR) hyperspectral imaging was employed to determine the spatial distribution of total nitrogen in pepper plant. Hyperspectral images of samples (leaves, stems, and roots of pepper plants) were acquired and their total nitrogen contents (TNCs) were measured using Dumas combustion method. Mean spectra of all samples were extracted from regions of interest (ROIs) in hyperspectral images. Random frog (RF) algorithm was implemented to select important wavelengths which carried effective information for predicting the TNCs in leaf, stem, root, and whole-plant (leaf-stem-root), respectively. Based on full spectra and the selected important wavelengths, the quantitative relationships between spectral data and the corresponding TNCs in organs (leaf, stem, and root) and whole-plant (leaf-stem-root) were separately developed using partial least-squares regression (PLSR). As a result, the PLSR model built by the important wavelengths for predicting TNCs in whole-plant (leaf-stem-root) offered a promising result of correlation coefficient (R) for prediction (RP = 0.876) and root mean square error (RMSE) for prediction (RMSEP = 0.426%). Finally, the TNC of each pixel within ROI of the sample was estimated to generate the spatial distribution map of TNC in pepper plant. The achievements of the research indicated that hyperspectral imaging is promising and presents a powerful potential to determine nitrogen contents spatial distribution in pepper plant.

Introduction variables [32], determination of nitrogen (N), phosphorus (P), and potassium (K) content and visualization of chemical distribution in oilseed rape leaves [33], estimation of leaf nitrogen accumulation in wheat [34], diagnostics of N deficiency by virtue of chlorophyll content in cucumber leaf [19], assessment of the leaf N content in wheat [35], estimation of canopy N concentration in temperate forest [36]. A majority of studies have investigated the chemical components in certain organ tissue (e.g. leaves) of crops. However, few studies on TNCs spatial distribution of plants using hyperspectral imaging have been reported.
In the current study, hyperspectral imaging technique was employed to map the spatial distribution of total nitrogen in pepper plant. The specific objectives were as follows: (1) to acquire hyperspectral images and measure TNCs of samples (leaves, stems, and roots) using Dumas combustion method; (2) to extract the spectral data and employ Random frog (RF) to select important wavelengths; (3) to build multivariate calibration models for predicting TNCs in organs (leaf, stem, and root) and whole-plant (leaf-stem-root) by partial least-squares regression (PLSR) based on full spectra and the selected important wavelengths; (4) to apply the optimal PLSR model to predict TNC of each pixel in samples and generate spatial distribution of TNCs in whole pepper plant.

Samples preparation
A total of 40 pepper plants (Capsicum frutescens L. conoides (Mill.) Bailey) were taken for this research. According to the crop management of National Farmers Information Service (NAFIS) [37], pepper plants were planted in the same environmental conditions and field management in greenhouses at Zhejiang University, Hangzhou (120˚099E, 30˚149N), China.
After transplanting about 6 weeks, all pepper plants began to send forth flower buds during early blooming stage. According to the growth status of the pepper plants, several leaves (few yellow leaves in lower position and some small fresh leaves in upper position) were excluded, and then the remaining 11 true leaves with similar size were numbered from upper to lower (Fig. 1). In each pepper plant, 2 leaves (petioles were removed) were randomly sampled from the upper (1 st -4 th ), middle (5 th -8 th ), and lower (9 th -11 th ) positions [38]. Each 2 leaves in the sampled position were viewed as a sample for subsequent data analysis, because the dry matter of one leaf failed to satisfy the demands of total nitrogen measurement. Meanwhile, the whole stem was divided into 3 parts according to positions (upper, middle, and lower) of petiole node. Lastly, root of each pepper plant was dug out from the soil and impurities were cleaned up. The position of sampled leaf, stem, and root is marked in Fig. 1. According to the preceding process, a total of 280 samples containing 120 leaf samples, 120 stems, and 40 roots were collected from 40 pepper plants. These samples were then used for hyperspectral data acquisition and TNCs measurement.

Acquisition and calibration of hyperspectral images
Three-dimensional hyperspectral data of all samples were acquired by a laboratorial hyperspectral imaging device with the reflectance mode. The hyperspectral imaging device consists of: a mobile platform operated by a stepper motor (IRCP0076, Isuzu Optics Crop, Taiwan, China); an assembled illumination source with two 150-W quartz tungsten halogen lamps (Fiber-Lite DC950 Illuminator, Dolan Jenner Industries Inc., USA); an imaging spectrograph (ImSpectorV10E, Spectral Imaging Ltd., Finland) covering the spectral range of 380-1,030 nm with 512 wavebands; a CCD camera (C8484-05, Hamamatsu city, Japan) coupled with a zoom lens (OLES23, Specim, Spectral Imaging Ltd., Oulu, Finland); and a computer with the spectral imaging system V10E software (Isuzu Optics Corp, Taiwan, China), which is used to set and adjust the parameters of the device, including exposure time, motor speed, imaging acquisition, wavelength range, image correction. In all, the whole system (except the computer) was assembled in a dark chamber to minimize the effects of ambient light during the sample scanning. As Yu et al. [39] described, the hyperspectral imaging device was calibrated before samples were scanned. Parameters of the device for image acquisition are summarized in Table 1. Subsequently, a hyperspectral image which was named ''hypercube'' with dimension of (x, y, l) was generated with line scanning [39]. In detail, a hypercube contained n-pixels in y-direction (depending on the size of the sample), 672 pixels in x-direction, and 512 wavebands in l-direction. Then, the raw hyperspectral images were calibrated with white and dark references using the following (1) and this process was implemented by the hyperspectral imaging analyzer software.
where I calibrated is the calibrated hyperspectral image of the sample; I raw is the raw hyperspectral image of the sample; I white is the white reference image with 99% reflectance acquired from a white reference ceramic tile; I dark is the dark reference image with 0% reflectance obtained by the camera lens thoroughly covered with its opaque cap.

Measurement of TNCs using Dumas combustion method
The referenced method for TNC measurement was Dumas combustion using Nitrogen Analyzer Rapid N cube (Elementar Analysensysteme GmbH, Hanau, Germany). After hyperspectral images acquisition, all samples were dried completely to the constant weight [40]. Then, the samples were grinded by a Tissuelyser-48 (Jingxin Experimental Technology, Shanghai, China) operating in 70 Hz, 150 s. 50¡1 mg dry matter of each sample was used to determine the TNC. After a series of processing (combustion, reduction, purification, detection, etc.), TNCs (%) of samples were obtained through the Rapid N software. All the measurements were carried out in a room at an approximate constant temperature of 25˚C and a relative humidity of 20-30%. Hyperspectral data processing and analyzing

Hyperspectral data extraction
By implementing the mask function in the ENVI software, the target region of each sample was separated from hyperspectral image. The separated region was identified as the region of interest (ROI) of the corresponding sample. The spectra of each pixel in ROIs were then averaged and the average spectrum was considered as the spectrum of a sample. The spectra of other samples were obtained in the same way as mentioned above. Because of the low signal-to-noise ratio (SNR), the reflectance in two regions (380-420 nm and 1,000-1,030 nm) was rather low. Therefore, hyperspectral data were resized to the spectral range of 420-1,000 nm with 460 wavebands.

Variable selection
Selection of important wavelength is of critical significance for removing the redundant information from high-dimensional data and optimizing calibration models for producing reliable results [41,42]. Thus, identification of important variables carrying the most valuable and authentic information is a challenging task in the current hyperspectral data analysis [43].
In the present study, Random frog (RF) methodology was carried out to select important wavelengths. RF is a novel and efficient technique for variable selection, which borrows the framework of reversible jump Markov Chain Monte Carlo (RJMCMC) methods [44,45]. It is employed to perform feature extraction for selecting a series of variables which describe the correlation between the predictor variables and the response variables [46]. In interior of RF algorithm, partial least squares regression (PLSR) is viewed as a modeling method. X (n 6 p) stands for the spectral matrix consisting of n samples in rows and p variables in columns. And Y (n61) denotes the property of interest.
Before running RF algorithm, five parameters (T, Q, h, v, and g, details were listed in Table 2) should be assigned with proper values. As shown in Fig. 2, random frog works in three steps [47]: (1) A variable subset V 0 consisting of Q variables is initialized randomly; (2) A candidate variable subset V* is proposed based on Q* variables, accept V* with a certain probability as V 1 and replace V 0 using V 1 ; (3) Selection probability of each variable after N iterations is computed. The details of RF algorithm could be found in literature [44,45].
As Monte Carlo strategy is embedded in RF algorithm, the selection probability of variables is unable to be reproduced exactly [44,45]. Generally, RF is implemented a couple of times (depending on the data) to minimize the influence of this random factor [44,45].

Multivariate data analyses
Partial least-squares regression (PLSR), one of the reliable analytical tools for modeling [48][49][50][51], has a wide application in multivariable data analysis and regression. It possesses an advantage to solve the situation when the number of variables is larger than that of samples, and when there is colinearity among variables [52]. The PLSR model employs several latent variables (LVs) instead of real variables and develops the calibration model applying leave-one-out crossvalidation (LOOCV) method [53,54]. The LOOCV method is applied to validate the performance and evaluate over-fitting of the calibration models. In this study,

Evaluation of models
The performances of models were evaluated by correlation coefficient (R) and root mean square error (RMSE) across the calibration (R C , RMSEC), crossvalidation (R CV , RMSECV), and prediction (R P , RMSEP) sets of samples.
Generally, an optimal model should provide high R values, low RMSE values, and a small difference between calibration and prediction [40].

Chemical imaging
Chemical imaging is a technique for building visual color images to display the spatial distribution of chemical components in heterogeneity [55]. The chemical value of each pixel can be predicted by inputting its corresponding spectral reflectance values into the established quantitative model [19]. Spatial distribution of chemical components could be generated with the established quantitative model combined with image processing. To observe the variance of TNCs in whole pepper plant, distribution maps of TNCs were required. In particular, an optimal model built using the mean spectra of important wavelengths was applied to predict the TNC in each pixel. Subsequently, the spatial position of each pixel along with its TNC was used to form the spatial distribution maps [52].

Software tools
The

Results and Discussion
Statistics of the TNCs measured by Dumas combustion (TNCs-DC) The statistical results of TNCs in pepper plant samples measured by Dumas combustion are presented in Table 3. The results were similar to previous study of Zhang et al. [33], Nelson et al. [56], and Stevenson [57]. The mean TNCs-DC in leaves (3.295%), roots (1.186%), and stems (0.992%) displayed a descending tendency. Meanwhile, the TNCs-DC in groups of leaf and stem conveyed an ascending trend in lower/middle/upper positions, ranging from 2.703% to 3.129%, then to 4.053% in leaves, and from 0.736% to 0.849%, then to 1.390% in stems. Also in Table 3, results of analysis of variance (ANOVA) analyses of the TNCs-DC in different samples revealed an apparent difference among leaf/stem/ root groups, as well as the leaf and stem groups in upper/middle/lower positions. In this research, the SPXY (sample set partitioning based on joint x-y distances) method proposed by Galvao et al. [58] was implemented to divide all samples into calibration sets with 210 samples (90 leaves, 90 stems, and 30 roots) and prediction sets with 70 samples (30 leaves, 30 stems, and 10 roots) for subsequent hyperspectral TNC analyses. Table 4 shows a summary of statistical analysis on the TNCs-DC of all samples in calibration and prediction sets. There was an evident variation in TNCs-DC in the calibration set, ranging from 2.264% to 4.871% for leaf, 0.847% to 1.470% for root, 0.556% to 2.135% for stem, and 0.556% to 4.871% for whole-plant samples. Samples with a wide compositional distribution of the TNCs-DC were collected, which was of great importance to build stable calibration models. Fig. 3 demonstrates the mean spectral reflectance curves of samples covering the range of 420-1,000 nm. In Fig. 3(a), there were similar profiles of curves of leaves and stems in upper/middle/lower positions, however, the discrepancy of two groups only appeared on the different spectral reflectance values. Moreover, spectral reflectance of leaves and stems in lower/middle/upper positions displayed Table 3. Statistic results of all samples' TNCs-DC.

Samples
Positions N. [1] Mean (%) ¡ S. D. [2] Leaf Note: Different letters (a, b, c) in the same column indicate statistical significance at the 5% level by Tamhane's T2 test. [1] N.: Number of samples; [2] S. D.: Standard deviation of the group. One-way analysis of variance (ANOVA) was used to generate these results, which were obtained using IBM SPSS Statistics (Version 20.0, IBM Corporation, Armonk, New York, USA). The results of TNCs-DC exhibited significant differences between groups (leaf, stem, and root) and within groups (leaves/stems in different positions).
an ascending gradient. In detail, molecular vibration around 550 nm is caused by chlorophyll, which related to N concentrations [20,32]. Small valley of the curves around 960 nm is attributed to the O-H second overtone, which is related to water in plants [19,39,41,55,59]. The region of 680-750 nm with high reflectance values generally refers to the ''red edge'' [60].
The mean spectral reflectance and standard deviation (SD) of leaf/stem/root groups are illustrated in Fig. 3(b). The mean reflectance and SD of the stem was consistently higher than that of the leaf across the whole tested range. There were several intersections of means for each three groups (roots, stems, and leaves), which was attributed to their different tissue structures. The spectra of organs (leaf, stem, and root) resulted from a complex combination of scattering processes and overlapping absorptions arising from water and biochemical components [61]. In addition, the trend of the root curve was relatively flat and different from Table 4. Summary of statistical analyses of TNCs-DC in calibration and prediction sets.

Samples
Calibration set Prediction set N. [1] Max. [3] (%) Min. [4] (%) Mean (%) ¡ S.D. [2] N. [1] Max. [3] (%) Min. [4] (%) Mean (%) ¡ S.D. [ Note: [1] N.: Number of samples; [2] S. D.: Standard deviation of the group; [3] Max.: Maximum; [4] Min.: Minimum; Calibrations set with wide variation range of TNCs-DC could benefit for building robust models. Cross-validation set had the same results with the calibration set, which were not motioned in this table. In this study, leave-one-out cross-validation (one sample randomly chosen from calibration set was retained at a time and the rest of samples in calibration set were used to build the model) was used to verify the reproducibility and robustness of models.  those of leaf and stem, both of which were similar to the study reported by San Juan Martínez [21]. Actually, some soil particles adhered to the surface of root, resulting in spectrum similar to the soil where the plant grew [21,62].

Selection of important wavelengths using Random frog algorithm
In this research, RF was executed 50 times and the average value over these 50 runs was taken as the criterion for estimating the importance of each variable. The selection probability (SP) of wavelengths is shown in Fig. 4. From these SP curves, it could be found that a small number of wavelengths exhibited an extremely high SP, whereas the SP of most wavelengths was relatively low. This indicated that the majority of wavelengths provided a weak relevance to the TNCs predicted by hyperspectral imaging (TNCs-HSI). All the variables were ranked in descending order, according to the SP. To investigate the influence of the number of wavelengths in the model, as well as to seek an optimal number of variables [45], different wavelengths for benefiting the TNCs-HSI in organs (leaves, stems, and roots) and the whole-plant (leaf-stem-root) are collected in Table 5.
The selected important wavelengths of leaves were mainly scattered in three regions (around 650, 790, and 970 nm), in agreement with previous researches [22,23,33,31]. The important wavelengths for TNCs-HSI in stems were principally clustered in 790 and 980 nm. However, the important wavelengths of root were largely distributed around the regions of 445 and 580 nm, which were similarly related to the wavelengths characteristic for soil [21,63]. For whole-plant (leaf-stem-root), the selected important wavelengths at 756, 749, and 759 nm were assigned to the third overtones of N-H stretching around 785 nm [31,64,65]. The selected important wavelength at 992 nm was close to the second overtones O-H stretching near 970 nm [39], and the remaining selected important wavelengths at 918, 909, 921, and 912 nm were ascribed to the third overtones of free monomer C-H stretching (934 nm) in carboxylic acid [31].

Establishment of the PLSR models
The PLSR predictive models of the TNCs-HSI in organs (leaf, stem, and root) and whole-plant (leaf-stem-root) were separately established based on the full spectra and the selected important wavelengths. The results of these models are enumerated in Table 6. Organ prediction methods provided more accurate TNC estimates, compared with the whole-plant model. Meanwhile, PLSR models based on both full spectra and the selected important wavelengths showed indeed only minute changes between the calibration and the respective cross-validation values. However, organ prediction methods needed three models to accrue TNCs-HSI in different organs, which enhanced the complexity of determining the TNCs-HSI of whole-plant. Furthermore, the pepper plant should be taken as a whole experimental subject. Hence, the whole-plant models with more samples (the sum of roots, stems, and leaves) were introduced to obtain TNCs-HSI in pepper plant and obtain good performance.
From the results of whole-plant models (Table 6), the PLSR model based on full spectra (F-PLSR) with 6 latent variables (LVs) provided relatively robust results to benefit the TNCs-HIS, compared with PLSR model built using important wavelengths selected by RF (RF-PLSR). When switching from F-PLSR model (406 variables) to RF-PLSR model (8 variables), 98.26% of variables were eliminated. Meanwhile, the correlation coefficients of calibration, cross-validation, and prediction (R C , R CV , and R P ) only showed a slight reduction of 0.026, 0.019, and 0.032, while their respective errors RMSEC, RMSECV, and RMSEP exhibited small changes of 0.063, 0.059, and 0.075. It meant that the predictive power of the RF-PLSR model would slightly drop for the whole-plant TNCs-HSI  Table 5. Important wavelengths for predicting TNCs in leaves, stems, roots, and whole-plant (leaf-stem-root) based on Random frog (RF).

Samples
Important wavelength (nm) where l i nm is the spectral reflectance value at the wavelength of i nm; Y TNC is the TNCs-HSI estimate of whole pepper plant.

Spatial distribution maps of TNCs in pepper plant
The linear function (2) obtained from the RF-PLSR model was employed to predict the TNC of each pixel within the leaf, stem, and root images. Pixels with similar spectral patterns in raw hyperspectral images would produce similar predicted values of the TNCs-HSI, and then would appear in similar colors in the resulting chemical images [55]. As a result, the spatial distribution maps of the TNCs-HSI in 10 samples (6 leaves and 3 stems in upper, middle, and lower positions and a root) of pepper plant were generated in Fig. 5. The TNCs-DC Note: [1] N.: Number of wavelengths used for analysis; [5] F-PLSR models meant the PLSR models established by using full spectra; [6] RF-PLSR models represented the PLSR models built by important wavelengths selected by RF algorithm. LVs, R C , RMSEC, R CV , RMSECV, R P , and RMSEP could be found in the text. covered a broad range from 0.75% (stem in lower) to 4.76% (leaf in upper). Pixels providing similar spectral information in original hyperspectral images would result in similar results of the TNCs-HSI [51,55], thus producing similar colors in the resultant chemical images. Indeed some samples (stems in middle/lower and root) with the relatively low TNCs-DC might show similar color, which was hard to distinguish based on the pixels of high or low TNCs. Hence, in the predictive map, three color scales described the ranges of the TNCs-HSI in each spot of the leaves, stems, and root samples, respectively. Compared with the images of original samples, the difference of the TNCs-HSI coloring within a sample could be easily identified by the naked eyes.
As shown in Fig. 5, three linear color scales in different colors from red to blue represented the different TNCs-HSI from high to low, respectively. The 10 resulting images of samples revealed the changing spatial tendency of the TNCs-HSI. Nitrogen status showed a decreasing trend from upper to lower positions in leaves and stems, which was in alignment with the measured nitrogen variation shown in Table 3 and the respective TNC-DC values of those samples. This phenomenon could be explained as follows: N is a removable and active element and exists in the form of organic matter in a living plant. It can be reabsorbed from older leaves to young leaves [19]. Meanwhile, N and other nutrients in plant are transported from roots to growth center (fresh leaves) with the xylem sap in stems under the transpiration pull [66,67], resulting in the high TNCs in fresh leaves. In addition, TNC distribution is related to seasons, plant age, and photon flux density of leaves in different positions [68].
According to plant physiology [69][70][71][72], N element is absorbed on the root surface in the form of ammonium (NH4 + ) or nitrate (NO3 2 ) by the movement of water in the soil [73]. Then, N ions enter the conduit of xylem in the root tissues through the apoplast pathway. The sap ascending in the xylem of nitrogen-fixing legumes carries nitrogen compounds originating from inorganic soil nitrogen (mainly NO3 2 absorbed by the roots). Early growth of the leaf depends largely on phloem for intake of nitrogen and other nutrients that transported with xylem. At last, nutriments are transported from roots to various organs and tissues through the xylem sap under the physiological activities.
It is worth noticing that the maps of the upper and middle leaves displayed a relatively even color within the mesophyll part of individual images. In the images of the lower leaves (Fig. 5), pixels in the midrib part with light blue implied the relatively high TNCs-HSI. For stems, petiole nodes with orange showed the relatively low TNCs-HSI. In the map of root, hair parts colored blue revealed the extremely low TNCs-HSI, as well as in stems and leaves.
Furthermore, colors in the contour of stem samples were distinctly different in other regions, which might be caused by its tridimensional structure. Samples with the planar or tridimensional structure might result in different spectral characteristics [74]. This issue would be deliberated in further work.

Conclusions
In this research, RF method was employed to select eight important wavelengths (992, 756, 749, 918, 909, 921, 758, and 912 nm) for successful prediction of whole-plant (leaf-stem-root) total nitrogen contents. After that, PLSR was employed to build the quantitative relationship between the spectral reflectance and TNCs-DC of samples based on full spectra and important wavelengths. The RF-PLSR model of whole-plant with results of R P 50.876 and RMSEP50.426% was considered the optimal model for the TNCs-HSI prediction in pepper plants. Lastly, the TNCs-HSI of all pixels in samples were calculated by applying the optimal PLSR model. Meanwhile, spatial distribution maps of the TNCs-HSI in pepper plant samples were constructed by using a developed image processing procedure. It could be inferred that the differences in TNCs within pepper plant could provide important information for monitoring nutrient distribution.
In further research, more varieties of pepper plant with different geographical locations, ages, time, and broader chemical values should be taken into account for developing more adequate and accurate TNC prediction models, which could provide a theoretical guidance for crop nutrient diagnosis and field management.