Discriminatory Components Retracing Strategy for Monitoring the Preparation Procedure of Chinese Patent Medicines by Fingerprint and Chemometric Analysis

Chinese patent medicines (CPM), generally prepared from several traditional Chinese medicines (TCMs) in accordance with specific process, are the typical delivery form of TCMs in Asia. To date, quality control of CPMs has typically focused on the evaluation of the final products using fingerprint technique and multi-components quantification, but rarely on monitoring the whole preparation process, which was considered to be more important to ensure the quality of CPMs. In this study, a novel and effective strategy labeling “retracing” way based on HPLC fingerprint and chemometric analysis was proposed with Shenkang injection (SKI) serving as an example to achieve the quality control of the whole preparation process. The chemical fingerprints were established initially and then analyzed by similarity, principal component analysis (PCA) and partial least squares-discriminant analysis (PLS-DA) to evaluate the quality and to explore discriminatory components. As a result, the holistic inconsistencies of ninety-three batches of SKIs were identified and five discriminatory components including emodic acid, gallic acid, caffeic acid, chrysophanol-O-glucoside, and p-coumaroyl-O-galloyl-glucose were labeled as the representative targets to explain the retracing strategy. Through analysis of the targets variation in the corresponding semi-products (ninety-three batches), intermediates (thirty-three batches), and the raw materials, successively, the origins of the discriminatory components were determined and some crucial influencing factors were proposed including the raw materials, the coextraction temperature, the sterilizing conditions, and so on. Meanwhile, a reference fingerprint was established and subsequently applied to the guidance of manufacturing. It was suggested that the production process should be standardized by taking the concentration of the discriminatory components as the diagnostic marker to ensure the stable and consistent quality for multi-batches of products. It is believed that the effective and practical strategy would play a critical role in the guidance of manufacturing and help improve the safety of the final products.


Introduction
Chinese patent medicines (Zhong-Cheng-Yao in Chinese, CPMs) are the typical form of TCMs in clinical practice, which are composed of several TCMs together to improve therapeutic efficacy and reduce side-effect [1]. It is noteworthy that the chemical compositions of CPMs are numerous and extremely complex, which are affected not only by the botanical origins of TCMs but also by the process of production including coextraction, mixing, sterilization, and so on [2,3]. It is believed that chemical and physical reactions among thousands of compounds are occurring in those procedures, which may be still lack of understanding but impact therapeutic efficacy of CPMs significantly. Consequently, the chemical compositions of CPMs tend to be far more complicated, and their quality control becomes crucial challenging [4][5][6].
The compendial strategies adopted by the latest Chinese pharmacopeia are generally following the approaches applicable to the analysis of a single TCM, including microscopic examination and thin layer chromatogram (TLC) comparison by verifying the presence or absence of a target TCM or some major components, determination of the contents of one to three or multiple marker compounds by high performance liquid chromatography (HPLC), and fingerprint assay based on HPLC and GC to identify as many components as possible [7]. Among these approaches, fingerprinting is considered to be a preferable method in the quality control of TCMs and CPMs for its capability of providing a holistic profile to reveal comprehensive chemical information, especially given that most of the chemicals remain unclear [8,9]. The Chinese State Food and Drug Administration (SFDA) regulations require the domestic injection manufacturers to standardize their products by HPLC-based fingerprint assay. Qualified products are authorized based on the similarity values more than 0.9 between the measured and the reference fingerprints [10]. Recently, fingerprint technique combined with chemometrics, enabling the segregation of sample groups and rapid identification of discriminatory components, has been proven to be a powerful strategy to characterize botanical drug of different origins and also has been widely applied to the quality assessment of TCMs and CPMs [11][12][13][14][15].
Nevertheless, as far as we know, the current available methods concerning the quality control of CPMs are typically focusing on the evaluation of the final products. The results generally indicated that there was significant difference among products of batch-to-batch or from different manufactures. An example of this kind of description is that the categorized results of Danhong injection based on fingerprint and principal component analysis demonstrated the potential instability in practical manufacturing process [16]. There were few literature reports available on analysis of the raw materials and the preparation process. Therefore, the current situation is that the unqualified products have been found out and the discriminatory components were determined, but neither the causes for the problem, nor the approaches to address them were still unclear and let alone to implement in the guidance of production. One of the reasons is that it is difficult to collect the raw materials and the corresponding intermediates from manufactories and also there is no rational and effective approach to assess the process in literatures. Fortunately, in our study, Xi'an Shiji Shengkang Pharmaceutical Industry Co., Ltd. (Xi'an, China) kindly provided a total of ninety-three batches of Shenkang injection (SKI), and the corresponding semi-products, intermediates, and the raw materials to ensure the study conducted smoothly. The detailed sample information is listed in S1 Table. SKI contains four medicinal crude drugs including Radix et Rhizoma Rhei (Dahuang in Chinese, RRR), Radix et Rhizoma Salviae Miltiorrhizae (Danshen in Chinese, SMRR), Radix Astragali (Huangqi in Chinese, AR), and Flos Carthami (Honghua in Chinese, CF). It is used for the treatment of chronic renal failure [17]. As described in Fig. 1, the preparation procedures are as follows [17]: firstly, RRR and SMRR were coextracted to produce intermediate A; AR and CF were coextracted to yield intermediate B. Secondly, the intermediates A and B were mixed to form the semi-products and then subjected to sterilization to produce the final products. As for the bioactive constituents, it was reported that RRR mainly contained sennosides, anthraquinones, stilbenes, and glucose gallates [18], SMRR and CF mainly contained phenolic acids and flavonoids, respectively [19,20], and AR mainly contained different kinds of isoflavonoids and triterpenoid saponins [21].
In this work, a novel and effective strategy labeling "retracing" way was proposed on the basis of integrating fingerprint and chemometrics to try to address and solve this problem. The workflow is outlined in Fig. 1. The strategy was following five consecutive steps to achieve the quality control of the entire preparation process: (1) establishment of the chemical fingerprint of the final products; (2) data analysis by similarity and chemometrics to assess the products quality and to explore the discriminatory components; (3) retracing the distribution patterns of those components in the corresponding semi-products, intermediates, and TCM raw materials, sequentially, and determining the possible reasons for the unqualified products; (4) establishment of the reference fingerprint based upon qualified products as the standards for the quality assessment of new products; (5) development of the standardized procedures using the concentration of the discriminatory components as the monitoring marker. SKI as an example was utilized to validate the strategy.

Materials and Reagents
Acetonitrile (Honeywell, UV, NJ, USA) and glacial acetic acid (Tedia, USA) used in the mobile phase are both of HPLC grade. High-purity deionised water was obtained with a Millipore, Milli-Q purification system (Millipore, Bedford, MA, USA). Six reference compounds consisting of gallic acid, propanoid acid, protocatechualdehyde, hydroxysafflor yellow A, salvianolic acid D, and salvianolic acid B were purchased from Shanghai Oriental Pharmaceutical Science and Technology Co., Ltd (Shanghai, China), and their structures are shown in S1 Fig. The ninety-three batches of SKI, ninety-three batches of semi-products, thirty-three batches of intermediates, eleven batches of raw materials of RRR, twelve batches of SMRR, twenty batches of AR and fifteen batches of CF were kindly provided by Xi'an Shiji Shengkang Pharmaceutical Industry Co., Ltd. (Xi'an, China). S1 Table displayed the detailed information of all tested samples. Voucher specimens were deposited at the author' laboratory in Shanghai Institute of Materia Medica, Chinese Academy of Sciences.

Sample preparation
Preparation of reference standard solution: An appropriate amount of each reference standard was dissolved with methanol, and then mixed to produce the mixed standard solution containing gallic acid, propanoid acid, protocatechualdehyde, hydroxysafflor yellow A, salvianolic acid D, and salvianolic acid B. The solution was filtered through a 0.20-μm membrane.
Preparation of sample solutions:An adequate volume of SKI or semi-product was filtered through 0.20-μm membranes. 0.5 ml of sample solutions including intermediates A, B, and the raw materials were diluted to 10 ml with water, and filtered through 0.20-μm membranes.
In the structural elucidation, LTQ-Orbitrap mass spectrometer (Thermo Fisher Scientific) connected to ultra high-performance liquid chromatography instrument (Dionex Ultimated 3000) via an ESI source was applied in negative ion mode,with the mass range from 50 to1000 and the resolutions at 30000 for full scan and 7500 for MS n (n!2). The chromatographic separation conditions were the same with those in HPLC-UV analysis. High-purity helium (He) and nitrogen (N 2 ) were used as the collision gas and nebulizing gas, respectively. The parameters were set as follows: source voltage, 4.0 kV; sheath gas flow, 35 arb; auxiliary gas flow, 10 arb; source heat temperature, 350°C; capillary temperature, 325°C; collision energy, 35%; tube lens, 90.0 V. The LC elute was introduced into the mass spectrometer via a splitter at the ratio of 1:2. Data was post-processed using QualBrowser part of Thermo Scientific Xcalibur 2.2 software.

Method validation
Method validation was performed in terms of precision, repeatability, and stability. The results were assessed by similarity, which was calculated by "Similarity Evaluation System for Chromatographic Fingerprint of Traditional Chinese Medicine, Version 2004 A" software (abbreviated as similarity software in the following text). In brief, the test solution was consecutively injected for six times to evaluate the precision; six test solution samples were prepared from the same SKI sample using the method depicted in Sample preparation for repeatability assessment; a newly prepared test solution was analyzed at 0, 4, 8, 16, and 24 h to evaluate the stability of the sample solution at room temperature.

Quality assessment by similarity and chemometrics
The HPLC-UV fingerprints ( Ã .cdf file) were imported into the similarity software for similarity calculation.
PCA and PLS-DA were applied to the statistical analysis of SKIs based on the HPLC-UV fingerprint data by the commercially available software SIMCA-P+ (Version 13.0, Umetrics, Umea, Sweden). The retention time (t R , min) and peak area were employed as the variables and observation ID. The data from ninety-three batches of SKIs was mean-centered prior to multivariate statistical analysis. PCA was used to holistically observe general clustering and the discrepancy of the SKIs and also to find the characteristic components which cause the difference, while PLS-DA was used to amplify variations with a supervised means and to confirm the discriminatory components based upon the combination of Variable Influence on Projection (VIP) values [4].

Establishment of the fingerprint and characterization of chemical constituents
In the study, the chromatographic conditions were optimized, involving column, gradient program, column temperature, flow rate, and so on, in order to obtain a suitable chromatographic separation for as more peaks as possible in a short analysis time. The final results are summarized in Instrumentation section. Since the crude drugs in SKI containing chemical constituents with well UV absorption, the fingerprint was established based on HPLC-UV analysis with the monitoring wavelength at 280 nm. Ninety-three batches of SKIs were analyzed and the typical chromatogram is showed in Fig. 2. The method validation results (S2 Table), with the similarity values over 0.99, indicated that the method was reliable and the sample solution was stable within 24 h.
Structural identification was performed using ultra high-performance liquid chromatography coupled with LTQ-Orbitrap mass spectrometry (UHPLC-LTQ-Orbitrap MS). The representative base peak intensity (BPI) chromatogram of SKI is shown in S2 Fig. A total of fifty-six compounds were characterized in Table 1. Six compounds including gallic acid (4), propanoid acid (6), protocatechualdehyde (8), hydroxysafflor yellow A (10), salvianolic acid D (29), and salvianolic acid B (42) were unambiguously identified by comparing the retention time and MS data with those of reference standards. For the unknown peaks, structural elucidation was tentatively characterized by comparison of the accurate mass and the diagnostic ions in the MS n experiments with those reported in literatures [18][19][20][21][22][23][24].
In addition, the fingerprints of the four raw materials and intermediates A and B were also established by the same method. The fingerprints are outlined in Fig. 2. Through comparing the retention time and MS data of each peak in SKI sample with those in the raw materials, the assignment of each peak was obtained and the results are shown in Table 1. Unexpectedly, peaks 17 and 49 were not detected in any raw material, but in intermediate A, which likely produced in the coextraction of RRR and SMRR.

Analysis of SKIs by similarity and chemometrics
The batch-to-batch consistency was initially evaluated by a routine analysis of similarity using the similarity software, which was considered to be easily operated. The similarity values are listed in Table 2. It is apparent that most of the values were over 0.90, demonstrating a chemical uniformity of these SKI products. However, eleven batches including S11, S15-S17, and S86-S92, with undesirable similarity values lower than 0.90, were considered to be unqualified products.
PCA was performed allowing visualization of holistic distribution of the SKI products and further evaluation of the quality consistency for those samples. A two-component PCA model was obtained which cumulatively accounted for 68.1% of the variation; the total variance explained for the first principal component is 53.1% and that for the second principal component is 15.0%. Through a visual analysis of Fig. 3A, the samples are mainly separated into two groups. Forty-three batches got tightly clustered in Circle I including S32-S34, S47-S85, S93, and thirty-nine batches in Circle II including S1-S46 except S11, S15-S17, and S32-S34. The other samples involving S11 and S15-S17 in Circle III, and S86-S92 in Circle IV as the outliers diverged significantly. Consistently, those samples were just the corresponding ones with poor similarity ( Table 2). The loading plot (Fig. 3B) displays the contribution of each variable to the discrimination. Theoretically, the further the variable departs from the zero of the X-axis and the Y-axis, the more the variable contributes to the clustering [12,23]. Based on that rule, five major representative discriminatory variables were identified preliminarily, corresponding to the peaks at the retention times of 61.0, 7.95, 24.3, 53.1 and 41.0 min.
Based upon the results from PCA, PLS-DA was conducted. The fitting (R 2 X) and predictive (Q 2 X) values for the model were 0.681 and 0.522, respectively. The biplot (Fig. 3C) exhibits a more spread of the variables and with a simultaneous display of both samples and variables in one plot provides a better understanding about their relationships [25,26]. In general, the components near the periphery present high level in the samples that are in close proximity, but present low level in those samples in the opposite quadrant. Thus, the positive or negative  correlation between the two items can be obtained. Therefore, the biplot displays that the components at the retention times of 61.0, 53.1 and 38.1 min are positively correlated with the samples in group II in the (+,-) quadrant, and negatively correlated with those in group IV and group I mainly in the (-, +) quadrant. On the contrary, the peaks at 24.3 and 41.0 min are positively correlated with the samples in groups I and IV, but negatively correlated with those in group II. It is apparent that those components are the most significant markers to separate the samples into the major two groups, I and II. The peak at 7.95 min presenting positively correlated with the outliers in group III in the (+, +) quadrant is the most discriminatory components for those samples.
The VIP values from PLS-DA reflect the importance of the variables in the model [4]. Variables with a larger VIP are more relevant for sample classification. The VIP plot (Fig. 3D), displaying the variables with VIP values more than 0.5, was used to confirm the most relevant variables. Obviously, emodic acid (61.0), gallic acid (7.95), and other eight components are the most relevant. In the present study, the first five variables were selected as representatives of the discriminatory components to explain the retracing way.
Among these constituents, gallic acid, p-coumaroyl-O-galloyl-glucose and chrysophanol-Oglucoside were from RRR, and caffeic acid was from SMRR. Emodic acid as the most significant factor was from the coextraction of RRR and SMRR.

Retracing the origins of the discriminatory components
Based on the results from PCA and PLS-DA, five discriminatory components were characterized as the representative targets to retrace their origins and also to find the causes for the outlier samples. The preparation of the injection undergoes coextraction, mixing and sterilization and the inconsistent quality of SKI products may arise from any condition in any procedure. Through comparing the targets distribution among the SKI products, semi-products, intermediates, and the raw materials, the preparation procedure that suffered from unknown variations could be defined. Emodic acid. Fig. 4A plots the content trends of emodic acid in all tested samples. For SKIs (blue line), an obvious tendency was observed that the contents of emodic acid in S1-S46, except S31-S33, were relatively high but with significant variance, while those in S47-S93 were steadily low. Consistently, similar distribution patterns appeared in the ninety-three batches of semi-products (red line) and thirty-three batches of intermediates A (green line), from which it could be inferred that the variations were not produced in sterilizing and mixing procedure. As described in Table 1, emodic acid was undetected in RRR (purple line) but in intermediate A.
Thus it was deduced that the significant variations likely originated from the process of coextraction of RRR and SMRR. A study reported that emodic acid was a metabolite of emodin via an oxidation reaction [27], hence the wide distribution of phenolic acid in SMRR probably led to the reaction in the heating extraction procedure. Consequently, the extraction temperature and time should be strictly controlled for the preparation of intermediate A to obtain relatively stable products.
Gallic acid. Through the analysis of the biplot (Fig. 3C), a conclusion could be drawn that there were high level of gallic acid in SKIs of S11 and S15-S17. According to the preparation   Retracing Strategy for Monitoring the Preparation Procedure of CPM process, a retracing approach was conducted to explore the key step causing the variation (Fig. 4B). The result found that gallic acid was at a stale level in the corresponding semiproducts S11 and S15-S17 (S1 Table), intermediates S4 and S6 (S1 Table), and the raw material of RRR S1 (S1 Table). In other words, this compound was a discriminatory variable in SKI products solely. Therefore, it was likely the sterilizing process that caused the content variation in these SKI samples. A research reported that stilbenes esterified with gallic acid on the glucosyl residue and glucose gallates are widely present in RRR [18], which might degrade to gallic acid in the process of sterilization. It suggested that the sterilizing temperature, time and the position in container should be consistent as much as possible to ensure the stable products.
Caffeic acid. Fig. 4C showed the variation of caffeic acid in the tested samples. Apparently, SKIs of S86-S92 had higher content of caffeic acid. And the content was also at a high level in the corresponding semi-products S86-S92 and intermediates S31-S32. However, in the crude drug of SMRR (S5, S9, and S10) the concentration varied slightly. Consequently, it was defined that the coextraction of RRR and SMRR led to the remarkable increase of caffeic acid in these intermediates. As reported from literatures, there are lots of phenolic acids, resulting from the condensation of danshensu and caffeic acid, in SMRR, and they are unstable in solution and tend to degrade in heating extraction which affected by extraction temperature, time, and pH severely [28].
Chrysophanol-O-glucoside. On the contrary to the content distribution of caffeic acid among the ninety-three batches of SKIs, the concentration of chrysophanol-O-glucoside was at low level in S86-S92. Similarly, the concentrations were also at low level in the corresponding semi-products, intermediates A, and the raw material of RRR (Fig. 4D). Therefore, it suggested that the intensity variation was from the crude drug of RRR. RRR included in Chinese Pharmacopoeia 2010, is from the roots and rhizomes of Rheum officinale Baill., R. palmatum L. and R. tanguticum Maxim. ex Balf., all of which belong to Sect. Palmata. Aside from Sect. Palmata, unofficial species from Sect. Rheum including R. franzenbachii Munt., R. hotaoense C.Y. Cheng et C.T. Kao and R. emodi Wall. are also used as rhubarb drugs in practice. A study found that the content of the major constituents are different among these species [18], which definitely would result in very different bioactivities. It was therefore highly recommended to define the origins of the crude drugs.
p-Coumaroyl-O-galloyl-glucose. Comparing with the other discriminatory components, p-coumaroyl-O-galloyl-glucose contributed less to the classification. Its content was slightly higher in SKIs of S86-S92. Fig. 4E shows there were relatively high level of this compound in the corresponding semi-products, intermediates, and the crude drugs, hence, the variation was supposedly from the raw material of RRR.

Establishment of SKI reference fingerprint
In the research, a reference fingerprint was established. The forty-three batches of SKI products in group I in Fig. 3A were considered to be stable and quality controllable. The reference fingerprint was constructed based on the average of the fingerprints of those samples using the similarity software, which subsequently applied to the quality evaluation of new products. The peak areas of the discriminatory components were limited in a narrow range for assisting to find the possible reasons for the unqualified products. In addition, it is suggested the preparation procedures should be standardized by taking the discriminatory components as the markers to ensure the stable and consistent quality for multi-batches of samples involving the raw materials, intermediates, semi-products and the final products.

Application to the guidance of production
This strategy was subsequently applied to the guidance of production. The fingerprints of ten new batches of SKI products were established by the same method and the similarity with the reference fingerprint was calculated. The results indicated that one sample was unqualified with similarity lower than 0.90. By comparing the peak areas of the five discriminatory components with those from the reference fingerprint, it was found that the reason was from emodic acid. According to the retracing way, the problem was from the coextraction procedure when preparing intermediate A. Therefore, the inappropriate condition should be checked and adjusted.

Conclusion
This study proposed a retracing strategy that combined HPLC-based fingerprint with chemometric analysis devoting to monitoring the entire preparation procedures of Chinese patent medicines. SKI was taken as a model to describe the process. Similarity calculation, PCA, and PLS-DA based upon a large number of SKI products helped to understand the holistic distribution of those samples and also informed five major discriminatory components. The retracing strategy was carried out through analysis of the distribution pattern of the five discriminatory components among the corresponding semi-products, intermediates, and the raw materials, successively. A reference fingerprint was established and applied to quality evaluation of new products based on similarity calculation. The concentration of the discriminatory components was employed as the diagnostic marker to define the reason for the unqualified products. The effectiveness and practicality of this strategy was validated by implementing to the guidance of production. It is believed that the strategy would be widely used in the quality control of CPMs.