Quantitative Profiling of Polar Metabolites in Herbal Medicine Injections for Multivariate Statistical Evaluation Based on Independence Principal Component Analysis

Botanical primary metabolites extensively exist in herbal medicine injections (HMIs), but often were ignored to control. With the limitation of bias towards hydrophilic substances, the primary metabolites with strong polarity, such as saccharides, amino acids and organic acids, are usually difficult to detect by the routinely applied reversed-phase chromatographic fingerprint technology. In this study, a proton nuclear magnetic resonance (1H NMR) profiling method was developed for efficient identification and quantification of small polar molecules, mostly primary metabolites in HMIs. A commonly used medicine, Danhong injection (DHI), was employed as a model. With the developed method, 23 primary metabolites together with 7 polyphenolic acids were simultaneously identified, of which 13 metabolites with fully separated proton signals were quantified and employed for further multivariate quality control assay. The quantitative 1H NMR method was validated with good linearity, precision, repeatability, stability and accuracy. Based on independence principal component analysis (IPCA), the contents of 13 metabolites were characterized and dimensionally reduced into the first two independence principal components (IPCs). IPC1 and IPC2 were then used to calculate the upper control limits (with 99% confidence ellipsoids) of χ2 and Hotelling T2 control charts. Through the constructed upper control limits, the proposed method was successfully applied to 36 batches of DHI to examine the out-of control sample with the perturbed levels of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde. The integrated strategy has provided a reliable approach to identify and quantify multiple polar metabolites of DHI in one fingerprinting spectrum, and it has also assisted in the establishment of IPCA models for the multivariate statistical evaluation of HMIs.


Introduction
Metabolic profiling is essential to ensure the quality, consistency, safety and efficacy of herbal medicine products; especially for the injection dosage form. Water extraction or decoction is the most favored method of preparation of herbal medicine injections (HMIs). Saccharides, amino acids, organic acids, and other primary metabolites are unavoidably extracted along with targeted secondary metabolites during the process of HMIs, such as Qingkailing injection [1], Danshen injection [2], Guanxinning injection [3], and Shuxuetong injection [4]. In general, amino acids are considered to provide tonic activities and act as key regulators of nutrient metabolism, and polysaccharides are believed to be one of the most important constitutes in some herbal materials for pharmacological activities. Some monosaccharides have the suppressive effect on cell-mediated immune reactions [5]. However, these primary metabolites in HMIs are often ignored to detect and set corresponding quality criteria in China Pharmacopeia and national standard. Due to the strong polarity and hydrophilicity, conventional reversed-phase performance liquid chromatography (HPLC) fingerprint [6,7]and other analytical methodologies, including anion exchange chromatography [8], gas chromatography [9,10], and capillary electrophoresis [11,12], are limited to separate and detect primary metabolites unless with complicated pretreatment, derivatization reagents and laborious preparation procedures. To delineate various class metabolites in HMIs, different types of fingerprints are necessary for a holistic quality evaluation, which are difficult to realize during the practical industry processing. Therefore, a simple and fast approach is required to be capable of detecting saccharides, amino acids, and organic acids, together with mainly bioactive secondary metabolites in one fingerprinting spectrum simultaneously.
Proton nuclear magnetic resonance ( 1 HNMR) spectroscopy provides access to detect all proton-bearing compounds, almost irrespective of the chemical compound class. Because of the signal intensity directly proportional to the number of nucleus contributing to a specific resonance [13], 1 H NMR method achieved the identification and quantification of metabolites in a one-step acquisition. With simple sample preparation, 1 H NMR facilitates high-throughput analysis for metabolic studies and quality control of various food [14][15][16][17] and herbal materials [18][19][20]. As a universal technique with the simplicity and rapidity of implementation, 1 H NMR has expansive prospect of application in profiling of polar metabolites of HMIs.
Since the complexity of chemical composition, HMIs cannot be completely represented by a limited number of certain bioactive compounds [21]. To extract the feature variables, principal component analysis (PCA) [22] and independent component analysis (ICA) [23] are classical tools to reduce the dimension of multivariate. The components (PCs) in PCA method are mutually orthogonal, while ICA method contains the components to be statistically independent [24]. ICA has been found to be a successful alternative to PCA in eliminating the overlapping information between the components [25]. However, ICA faces some limitations due to some instability [26], the choice of number of components to extract and high dimensionality [27]. As a consequence, independent principal component analysis (IPCA) was proposed by Yao et al. in 2012 [28] to use PCA as a preprocessing step to reduce the dimension of the data, and then use ICA as a denoising process of PCA to separate relevant information. On simulation studies and real data sets, IPCA offered a better visualization of the data than ICA and with a smaller number of components than PCA. Owing to the benefit to generate denoised the loading vectors, we attempted to employ IPCA method to construct x 2 and Hotelling T 2 control charts [29] for multivariate statistical analysis.
Danhong injection (DHI) is a patent injection made from the extracts of Radix Salviae Miltiorrhizae and Flos Carthami [30]. It has been widely used for the prevention and treatment of cardiovascular and cerebrovascular diseases in clinic [31][32][33]. In our previous work, ultra-performance liquid chromatography (UPLC) coupled with UV detection was adopted to identify 11polyphenolic acids in DHI [34]. However, the total weight of identified constituents accounted for only a low proportion (about 10%) of the solid content in DHI.
In this study, we describe a strategy to detect more hydrophilic primary metabolites in DHI based on quantitative 1 H NMR spectroscopy. The absolute concentration of identified metabolites was calculated by using internal standard method. Linearity, precision, repeatability, stability and accuracy were carried out to validate the method. The contents of polar metabolites were further evaluated by the multivariate analysis tool IPCA to establish x 2 and Hotelling T 2 control charts.

Materials and Chemicals
Thirty-six batches of DHI manufactured in 2011, 2012 and 2013 were provided by Heze Buchang Pharmaceutical Co. Ltd (Heze, China). The standards of valine, threonine, alanine, pyroglutamate, procatechuic aldehyde and asparagine were purchased from the National Institute for Food and Drug Control

Ethics
No specific permission was required for the described field studies. The field locations are neither privately owned nor protected, and neither endangered nor protected species were involved.

Sample preparation
All the injection samples were subjected to freeze-drying. The dried powders (18 mg) were accurately weighed and dissolved with 600 mL of D 2 O containing 0.58 mM TSP. Exactly 500 mL of sample solution was transferred into a standard 5 mm NMR tube (Vineland, NJ, USA). No buffer was used due to the stable pH of injection.

NMR measurements
1 H NMR spectra were acquired at 298 K on a Bruker AV III 600 MHz NMR spectrometer (600.23 MHz for proton frequency) with a 5 mm broadband BBFO probehead. All pulse sequences were from Bruker pulse program library. A standard one dimensional composite pulse sequence (zgcppr) was employed to suppress the residual water signal. The 90u pulse width was adjusted to about 13 mm for each sample. Sixty-four scans were collected into 32k data points using a spectral width of 12335 Hz, a relaxation delay of 1.0 s and an acquisition time of 2.66 s. A 0.3 Hz line-broadening function was applied to all spectra for Fourier transformation (FT) followed by phasing and baseline correction.
Spin-lattice relaxation time (T 1 ) values of the quantified protons of individual constituent and TSP were measured using a classical inversion recovery pulse sequence with 10 relaxation delays (t) ranging from 0.01 to 20 s.

Quantification of the metabolites
Because the intensity of a given 1 H NMR signal is directly proportional to its contributing number of protons, the amount of metabolites in DHI can be measured by the signal areas of given metabolites and an internal reference with known concentration. Thirteen metabolites in DHI were selected for quantification. The important parameters for data acquisition and processing of 1 H NMR spectra must be set appropriately to obtain accurate and precise measurements [36]. Most of all, the relaxation delay t should be long enough to ensure complete relaxation for all the signals of interest. 1 H NMR measurements are done in a longer acquisition time by choosing t$56 longest T 1 . For shortening the acquisition time, 1 H NMR spectra can be acquired in an incompletely relaxed condition, and the absolute concentrations should be calculated taking the T 1 values in consideration [37]. With the effective magnetization reading pulse of 90u, the quantification of chemical constituents in this study can be performed by using the following equation: where P X and P TSP are the mass concentrations of metabolite and TSP, A X and A TSP are the integral areas for targeted signal of metabolite and for methyl groups of TSP, N X and N TSP are the proton numbers of metabolite and of methyl groups of TSP, M X and M TSP are the molar masses of metabolite and TSP, T X 1 and T TSP 1 are the spin-lattice relaxation times for proton X and methyl protons of TSP, respectively; t is total relaxation time (relaxation delay plus acquisition time).
The quantitative 1 H NMR method was checked for linearity, precision, repeatability, stability, and accuracy. Precision, repeatability and stability were calculated as the relative standard deviation (RSD). Recovery test was employed to determine the accuracy, and four typical compounds, alanine, glucose, salvianic acid and procatechuic aldehyde, were chosen to evaluate the average recovery.
Assay for multivariate quality control The quantitative resulting data of 13 metabolites was imported to R 3.0.2 software loaded with packages of MVA, MSQC and mixOmics (www.r-project.org) for multivariate statistical analysis. For simplifying the multivariate problem, principal component   analysis (PCA) and independence principal component analysis (IPCA) were performed to reduce the dimensionality of data. The scores of principal components characterized the whole data were then imported to x 2 and hotelling T 2 control charts to calculate the upper control limits (with 99% confidence ellipsoids). Moreover, as one of the requisites in control chart is the independence of the data, the independence of selected components in PCA and IPCA models were validated by autocorrelation function (ACF). The out-of-control samples in Phase II were examined by the upper control limits achieved from Phase I.

Proton signal assignments and chemical identification
A representative 1 H NMR spectrum of DHI was shown in Fig.1. The resonance signals were assigned to 30 metabolites based on the elucidation with extensive 2D NMR experiments ( 1 H-1 H COSY, 1 H-1 H TOCSY, 1 H J-resolved, HSQC),the literature data in our former work [35], and in-house database. In the range of d 3.2-5.8, the spectrum is dominated by 5monosaccharides and 2 disaccharides, including glucose, galactose, arabinose, fructose, rhamnose, rutinose, and rutinulose. In the high-field region (d 0.5-3.2), 8 amino acids (isoleucine, leucine, valine, threonine, alanine, proline, pyroglutamate, asparagine) and 3 organic acids (acetate, succinate, malonate) were observed. In the low-field region (d 5.8-10.0), 7 polyphenolic acids, including salvianic acid, salvianolic acid B, salvianolic acid A, rosmarinic acid, lithospermic acid, procatechuic acid and procatechuic aldehyde, together withuridine and 5-(hydroxymethyl)-2-furaldehyde (5-HMF), were identified. Moreover, 3 organic acids (4-hydroxybenzoic acid, 4hydroxycinnamic acid, and formate) were observed as well in the low-field region. The chemical shifts of the identified 30 metabolites by 1 H NMR were listed in Table 1.To our knowledge, 7 saccharides and 6 organic acids were reported for the first time in DHI. Without the need of any sample pretreatment or precolumn derivatization, the established 1 H NMR method provided an approach to determine 7 saccharides, 6 organic acids, 8 amino acids, 1nucleoside, 1 carbohydrate derivatives (5-HMF) and 7 polyphenolic acids in DHI simultaneously.

Quantitative 1 H NMR analysis and method validation
Due to the narrow chemical shift range of 1 H NMR and frequent signal overlap, it is a challenge to quantify all the constituents in a mixture. In our study, 13 metabolites with fully separated signals were selected for quantification. In order to improve the efficiency of 1 H NMR method, the spectra were acquired in an incompletely relaxed state. As a consequence, the spin-lattice relaxation time (T 1 ) value must be accurately measured and taken into account for quantitative analysis. The T 1 values were determined by the inversion-recovery experiments (Table 2). Accordingly, the absolute concentrations of the 13 metabolites were calculated from three parallel samples of each batch (Table  S1 in File S1).
Linearity. 1 H NMR as method itself is linear and no calibration is necessary for the determination of molar ratios of mixtures. Thus, the 13 metabolites in five different molar ratios confirmed the linearity of NMR spectroscopy. Good linearity was achievable, as indicated by the equations and satisfactory correlation coefficients (r 2 ).
Precision. The intraday and interday precision was determined by analyzing six replicates on the same day and on three consecutive days respectively. The intraday precision for the contents of 13 metabolites ranged from 0.20% to 0.89%, and the interday precision ranged from 0.29% to 1.49%. The RSD values were adequate and indicated the suitability of the method.
Repeatability. Six samples prepared from the same batch showed RSD values ranging from 1.46% to 2.75%, indicating a high repeatability.
Stability. One sample was analyzed to determine stability on three consecutive days. The RSD values of the analytes were in the range of 0.37% to 2.21%.
Accuracy. Considering the limited volume of NMR tube and the cost of using deuterium reagents to dilute sample continuously, four metabolites of different types were employed, including alanine, glucose, salvianic acid and procatechuic aldehyde. The recovery was calculated as the ratio of the response of the selected four compounds in the spiked DHI samples against that of the standards at the same levels. The average recoveries were found to be 106.6% (61.2), 106.6% (62.5), 98.3% (62.1) and 91.3% (64.0) for alanine, glucose, salvianic acid and procatechuic aldehyde, respectively, indicating acceptable recovery.
According to the results, the concentration of glucose was extremely high in DHI as shown in Fig. 2A and 2B, and the amount of saccharides, amino acids and organic acids represented about 60% of the total solid content of DHI (Table S1 in File S1).  Control Charts based on PCA and IPCA With the enhancement in quality control of DHI, the analysis should be performed through a multivariate approach, that is, the above 13 metabolites must be analyzed together, not independently. The concentrations of 13 metabolites were mean centered and unit-variance scaled before being analyzed by PCA and IPCA ( Fig. 2C and 2D). To avoid the loss of significant information, the percent specified of the principal components (PCs) cumulative proportion of explain variance is normally fixed on 80% in PCA model [29]. Thus, the first three PCs in Phase I (Fig. 3A) were selected to construct x 2 (Fig. 4A) and Hotelling T 2 (Fig. 4B) control charts. Since the independence of the data is one of the requisites in control chart, we assessed the marginal independence of each necessary PC to indicate the model validation [38]. Correlograms (Fig. 5 PC1-PC3) showed that PC1 fell outside of the confidence bands, which indicated that there was an evidence of autocorrelation or dependence of PC1, and PCA model using Phase I data was not achieved.
In order to remove the autocorrelation effects of PCA, we employed IPCA to generate denoised and independent loading vectors [28]. The kurtosis measure of loading vectors was used to decide the number of independent principal components (IPCs). The kurtosis of all extracted IPCs was plotted in Fig. 3B, whereas the kurtosis of IPC7 was close to zero. By using the first 7   Table S2 in File S1, and the seventh sample fell outside the tolerance region. doi:10.1371/journal.pone.0105412.g006 components of IPCA, the exactly choosing number of IPCs was obtained (Fig. 3C). Since the kurtosis of IPC3 was close to zero, the first 2 components were sufficient with IPCA. The presence of autocorrelation was assessed as shown in Fig. 5 (IPC1 and IPC2), which indicated that there was no evidence of relation between the adjacent observations. Therefore the original 13 dimension of our data had been reduced to a two-dimensional problem.
Then the first two IPCs were taken to establish the in-control state (Phase I). According to x 2 (Fig. 4C) and Hotelling T 2 (Fig. 4D) control charts with the 99% confidence region, the upper control limits (UCL) were determined as 9.21 and 11.56, respectively. The first two IPCs were consequently controlled through 2D ellipsoids (Fig. 6).The x 2 control ellipse (UCL = 9.21) could be used as process region, and the T 2 control ellipse with less restrictive (UCL = 11.56) was used as tolerance region. In both cases, all the points in Phase I fell inside the confidence ellipsoids (Fig. 6A). The samples in Phase II (Table S2 in File S1) were monitored by employing the UCLs of both x 2 and T 2 charts obtained from Phase I ( Fig. 4E and 4F), and the points of Phase II were added into 2D ellipsoids (Fig. 6B). The seventh sample fell outside the 99th confidence ellipsoids of both the process and tolerance regions, indicating the presence of out-of-control sample (batch 110402). The decomposition of T 2 value showed that the out-of-control variability was associated to the IPC1 since p-value was equal to 0.0032 (Table 3). The same result was also obtained through IPCA loading plots as shown in Fig. 7. Although the loading plot could not exactly determine which metabolites were responsible for the variation, it still showed the contents of succinate, malonate, glucose, fructose, salvianic acid and protocatechuic aldehyde made more contributions to the independent loading vectors of IPC1.

Conclusion
Based on quantitative 1 H NMR analysis, a reliable approach for simultaneous determination of amino acids, organic acids, saccharides, and botanic secondary metabolites of HMIs in one fingerprinting spectrum has been developed and validated by using Danhong injection as a model. The method had takenT 1 values into account when calculated the contents of feature metabolites, which allowed the assay with good linearity, precision, repeatability, stability and accuracy. Unlike HPLC fingerprinting methods, the 1 H NMR approach has the significant advantages of less analysis time (about 5 min) without chromatographic separation, and no requirement of standard materials used to quantitative analysis. In combination with IPCA, two kinds of multivariate control charts (x 2 and Hotelling T 2 ) were also successfully carried out for detecting off-test HMI samples by employing the independence principle components. The decomposition of T 2 value and IPCA loading vectors can reflect the significance variations of overall metabolite profiling, although it cannot decide the mutative metabolites exactly. The established multivariate models have the prospects for extracting sufficient characterization to monitor more feature metabolites in HMIs.   Table S2 in File S1.

Supporting Information
File S1 Quantification of 13 chemical markers selected in the 1H NMR spectra of DHI samples in Phase I (Table  S1) and Phase II (Table S2). (DOC)