Selection of Reference Genes for Quantitative Real-Time PCR Normalization in Panax ginseng at Different Stages of Growth and in Different Organs

Quantitative real-time reverse transcription PCR (qRT-PCR) has become a widely used method for gene expression analysis; however, its data interpretation largely depends on the stability of reference genes. The transcriptomics of Panax ginseng, one of the most popular and traditional ingredients used in Chinese medicines, is increasingly being studied. Furthermore, it is vital to establish a series of reliable reference genes when qRT-PCR is used to assess the gene expression profile of ginseng. In this study, we screened out candidate reference genes for ginseng using gene expression data generated by a high-throughput sequencing platform. Based on the statistical tests, 20 reference genes (10 traditional housekeeping genes and 10 novel genes) were selected. These genes were tested for the normalization of expression levels in five growth stages and three distinct plant organs of ginseng by qPCR. These genes were subsequently ranked and compared according to the stability of their expressions using geNorm, NormFinder, and BestKeeper computational programs. Although the best reference genes were found to vary across different samples, CYP and EF-1α were the most stable genes amongst all samples. GAPDH/30S RPS20, CYP/60S RPL13 and CYP/QCR were the optimum pair of reference genes in the roots, stems, and leaves. CYP/60S RPL13, CYP/eIF-5A, aTUB/V-ATP, eIF-5A/SAR1, and aTUB/pol IIa were the most stably expressed combinations in each of the five developmental stages. Our study serves as a foundation for developing an accurate method of qRT-PCR and will benefit future studies on gene expression profiles of Panax Ginseng.


Introduction
Ginseng (Panax ginseng C.A. Meyer) is a perennial herb and is well-known for its adaptogenic and restorative properties. It has been widely used in traditional Chinese medicine and Western herbal medicine [1,2]. Ginseng root, the most commonly used part of the plant, contains ginsenosides that are major bioactive constituents with complex and multiple pharmacological effects [3,4]. Ginseng leaf-stem extract also contains numerous important bioactive components [5,6]. A recent report demonstrated that American ginseng leaf contains similar pharmacologically active ingredients in higher quantity than found in ginseng root [7]. Research has shown that ginseng leaf-stem may as well be a valuable source of ginsenosides as ginseng root [8].
From germination to withering, the stages of growth of ginseng can be generally classified into the leaf-expansion period (LP), the flowering stage (FS), the green fruit stage (GFS), the red fruit stage (RFS), the root growing after fruit stage (RGS), and the withering stage [9]. In recent years, the research focus has expanded considerably towards elucidating the gene expression of ginseng at different developmental stages. Various researchers have highlighted the genetic aspects of ginseng, including the marker gene identification or authentication, genes that confer resistance to environmental and biological stresses, the regulatory factors of its growth and development, and key enzymes involved in the ginsenoside biosynthetic pathway [7,8,[10][11][12][13][14][15].
qRT-PCR has been widely used as a powerful technique to quantify the expression levels of transcripts. The accuracy of qRT-PCR largely depends on the stability of the reference gene(s) applied to data normalization [16]. A series of presumably stable expressed genes have been used as internal references. Some of the best known and most frequently used reference transcripts, often referred to as housekeeping genes [17], include actin (ACT), tubulin (TUB), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), polyubiquitin (UBQ), and translational initiation factor (eIF). These have been extensively used as reference genes in different organisms because of their stable and uniform expression patterns [18]. However, these references have shown a significant variance when tested across species and under a broad range of experimental tests [19]. Failure to use suitable reference genes may deflect gene expression profiles and lead to misguiding results [16]. So far, there have been no reports on the use of such genes in ginseng. Therefore, it is essential to determine appropriate reference genes in order to undertake genetic engineering studies in ginseng. Our laboratory has constructed 15 ginseng transcriptome databases (including samples of three organs in five growth stages) using high-throughput sequencing technology. These databases provide more than 73,000 genetic data containing the gene sequences, gene expression levels, gene annotations, and other related information. In the present study, by analyzing the gene annotation process, we aimed to find appropriate reference genes for ginseng. After conducting a comprehensive literature search, the gene expression levels of ten commonly used reference genes [19][20][21][22][23][24][25][26][27][28] and ten novel expression stable genes were evaluated to select the best candidate reference genes. This selection was based on the statistical tests involving RPKM values at different growth stages and in different organs. In addition, the expressions of 20 candidate genes were measured by qRT-PCR, and the expression stability of each gene was further measured using quantitative software applications, such as geNorm, NormFinder, and Best-Keeper. This study provides greater insights into the optimal control genes involving different growth stages and various organs of P. ginseng, and will significantly contribute to the development of ginseng transcriptomics.

Ethics Statement
No specific permissions were required for the locations used or activities undertaken in the present study. The samples of Panax ginseng C.A Meyer were originally collected from Fu-song County (longitude: 127.28, latitude: 42.33), Jilin province, China. No endangered or protected species were involved in the field studies.

Plant material
Five stages of P. ginseng were harvested from Fu-song County, Jilin province, China. 5-year-old ginseng plants were used for library construction. After cleaning with distilled water, the main roots, stems, and leaves were minced into small pieces, and immediately frozen in liquid nitrogen.

Total RNA samples
Total RNA was isolated using TRIzol reagent (Invitrogen) according to the manufacturer's instructions. Quality of RNA was ascertained by measuring absorbance at 260 nm using the BioSpec-nano Spectrophotometer and through 1% ethidium bromide (EtBr)-stained agarose gel electrophoresis. The total RNA integrity [29] was further tested using the 2100 Bioanalyzer (Agilent Technologies).
cDNA library construction, sequencing, assembly, and gene expression analyses The samples, processed according to the Illumina kit instructions, were prepared for the transcriptome analysis. Protocols for the cDNA library construction, sequencing, assembly, and gene expression level analysis have been previously described by Baojin Yao [30]. Based on the RPKM values, the estimated gene expression was used directly for comparing the differences in gene expressions between samples. Distinct sequences were used for the BLAST search and annotated against the NCBI nr database using an E-value cut-off of 10 25 [31].
Using the Illumina sequencing platform, we generated more than 39 million high-quality sequencing reads for each sample. After clustering via the TGICL software, more than 80,000 unigenes were produced in every database. Unigene sequences were aligned by BLASTX to four common protein databases (Nr, Swiss-Prot, KEGG, and COG; e-value ,0.00001). Simultaneously, we obtained the highest sequence similarities Unigenes along with their protein functional annotations.

Selection of candidate reference genes for normalization
On analyzing the existing databases, ten commonly used housekeeping genes were selected as endogenous control genes. Based on the calculated statistical values of the coefficient of variation (CV = SD/Mean) and the maximum fold change (MFC = Max (RPKM) /Min (RPKM) ) [32], we obtained ten novel reference genes from the 15 databases. Notes: Descriptive statistics of the candidate genes based on the coefficient of variance (CV) and the maximum fold change (MFC). In total, 10 untraditional reference genes were screened, which had the CV less than 20% and MFC less than 1.5. LP, leaf-expansion period; FS, the flower stage; GFS, the green fruit stage; RFS, the red fruit stages; RGS, the root growing after fruit stage. doi:10.1371/journal.pone.0112177.t001 In total, 20 candidate reference genes were selected, including 10 housekeeper reference genes (ACT1, GAPDH, UBQ, 18SrRNA, eIF-5A, aTUB, bTUB, CYP, F-box, and EF-1a) and 10 novel reference genes (CDP, 6-PG, 30S RPS20, 60S RPL13, V-ATP, pol IIa, ARF, QCR, SAR1, and TCTP).

Primer design and validation
Based on the sequences obtained from high-quality cDNA sequencing, primers were designed using primer 5.0 software. The specificity of the primers was confirmed by BLAST searches.
In order to examine the target specificity of primers, reverse transcription PCR was employed. With 500 ng of total RNA (each from five stages) as the template, a thermal cycling profile was conducted according to the following protocol: 30uC for 10 min, 50uC for 30 min, 95uC for 5 min, 5uC for 5 min; 30 cycles at 94uC for 30 s, 60uC for 30 s, 72uC for 1 min. The products were visualized by 2% agarose gel electrophoresis along with the DL1000 DNA marker.

Quantitative Real-Time PCR
The test of transcript variability among the fifteen samples (three organs and five stages) was carried out using qRT-PCR reactions for mRNA. These reactions were performed in triplicate using the MxPro 4.1 system assays and the One Step SYBR PrimeScript PLUS RT-PCR kit (TaKaRa, TaKaRa code: DRR096A), including minus reverse transcription (RT) controls to assess the genomic DNA and non-template controls, thereby ensuring a lack of background signal in the assay. The final volume of the RT reaction was 25 ml, which consisted of 12.5 ml 26One Step SYBR RT-PCR Buffer, 1.5 ml TaKaRa Ex Taq HS Mix, 0.5 ml PrimeScript PLUS RTase Mix, 10 mM PCR Forward Primer, 10 mM PCR Reverse Primer, 40 ng total RNA, and 6.5 ml RNase-free H 2 O. The reactions were incubated in thin-wall polypropylene 8-tube strips using MxPro 4.1. The PCR cycling conditions were as follows: 42uC for 5 min, 95uC for 10 sec, followed by 40 cycles of 95uC for 5 sec and 60uC for 30 sec. Finally, the steps, 95uC for 15 sec, 60uC for 30 sec, and 95uC for 15 sec were carried out for dissociation. Data were collected during each cycle at the 60uC extension step.

Analysis of stability of candidate reference genes
The variation among 20 reference genes was determined by cycle threshold (Ct) using the MxPro 4.1 software, following the manufacturer's instructions. Generally, the Ct value of every single reaction and the mean efficiency of each amplicon were used to calculate their relative expression levels [17]. To compare the stability of the 20 candidate reference genes, three Visual Basic Applications (VBA) for Microsoft Excel -geNorm (http:// medgen.ugent.be/,jvdesomp/genorm/), NormFinder (http:// www.mdl.dk/publicationsnormfinder.html), and BestKeeper (http://www.gene-quantification. de/bestkeeper.html) were used. The Ct values of the candidate reference genes were divided into nine sets of samples for further analysis, which included the total set (all data set), roots, stems, leaves, LP, FS, GFS, RFS, and RGS.

Screening of the candidate reference genes
In the present study, we screened ten housekeeping genes (ACT1, GAPDH, 18SrRNA, UBQ, aTUB, bTUB, CYP, eIF-5A, F-box, and EF-1a). Besides 18S rRNA, the RPKM value distribution of the remaining nine housekeeping genes was in the     1-b). Screening of the potential reference genes was based on the statistical tests (CV and MFC), which reflected the RPKM values of stably and moderately or highly expressed genes among all the databases. RPKM value expression abundance ratios are presented in Figure 1. To determine the distribution of transcript populations of 20 candidate reference genes in three vegetative organs of ginseng during the five stages of growth, the quantity of transcript for each gene was estimated as a ratio relative to the sum of the 20 transcript populations. The results clearly revealed a fluctuation in the relative magnitude of RPKM values and the ratios, thus indicating that all of the 20 genes did not exhibit stable expression patterns. A summary of the sequence information for the 20 ginseng candidate reference genes is presented in Table 2.
Validating the expression levels of candidate reference genes by qRT-PCR By reverse transcription PCR, the specificity of the primers used for candidate reference genes was verified. A single band for each gene was revealed through electrophoresis, without primer-dimers or non-specific amplification (Figure 2).
Based on SYBR Green detection, qRT-PCR analysis was employed to evaluate the stability of the expressions of the 20 candidate reference genes in different organs and different developmental stages of P. ginseng. The samples were divided into fifteen groups comprising of three organs (roots, stems, and leaves) and five developmental stages. The Ct values of the reference genes of each group were then used to compare the various degrees of expression.

Statistical data analysis
The gene expression data were analyzed by Ct value, geNorm, NormFinder, and BestKeeper applets to obtain the expression stability of 20 candidate reference genes.
With a higher gene expression, a smaller Ct value was obtained, and vice versa. Figure 3 shows a relatively broad range of Ct values for all the 20 putative reference genes. The highest Ct value was 26.40 (bTUB), while the lowest was 15.06 (18S rRNA). Ct values of the remaining genes were distributed between 19 and 24. On comparing the Ct values of the 20 candidate reference genes, the expression level of each reference gene was found to differ, with respect to the developmental stage or the organ under study. The expression patterns of the 20 reference genes displayed irregular variation; this may be attributed to change in the level of reference gene expression abundance with the cell type and the developmental stage [33]. Therefore, successful gene expression analysis under different experimental conditions in ginseng requires careful selection of reliable reference genes.
Based on the expression stability of the genes and the assumption that two ideal reference genes should not vary with each other under different test conditions [34], geNorm ranked the best out of the three data analysis applications used. geNorm computes the average pair-wise variation of a given candidate reference gene with all the other genes and assigns a score of its expression stability (M) to each gene. Stepwise exclusion of genes with the highest M values (indicating the least stable expressions) before recalculation finally reveals the two most stable candidate genes [35]. After calculating the pair-wise variation Vn/n+1, geNorm selects the optimal number of control genes. The cut-off value is usually set to a default value of 0.15 [34]. Gene expression stability and ranking of 20 candidate reference genes, as calculated by geNorm using nine sets of samples, are presented in Figure 4. Analyses of all fifteen samples revealed that the CYP and EF-1a combination showed the lowest M value (0.31), while 30S RPS20 showed the highest M value (0.89). Among the different organs, GAPDH/30S RPS20, CYP/60S RPL13, and CYP/QCR were the most stably expressed gene combinations in roots, stems, and leaves, respectively; while 18S rRNA, UBQ, and TCTP were the least stably expressed. Among the five developmental stages under study, CYP/60S RPL13, CYP/eIF-5A, aTUB/V-ATP, eIF-5A/ SAR1, and aTUB/pol IIa were the most stably expressed combination, respectively, and 30S RPS20 was the least stably expressed gene in all the five stages. Based on these observations, CYP was evidently the most stably expressed gene and may be considered as the most suitable reference gene for the analyses of gene expressions in P. ginseng. Furthermore, the addition of a third reference gene would not have significantly increased the statistical reliability of this calculation, as V2/3 = 0.033 or V3/ 4 = 0.041 (in roots) was significantly below the default cut-off value of 0.15 ( Figure 5). Although the pair-wise variation for all the samples (V2/3) was estimated as 0.145, it was still less than the limiting value. Hence, our study showed that two reference genes were sufficient to normalize gene expression for all the samples of P. ginseng.
The NormFinder algorithm uses a model-based approach to evaluate modifications amongst the reference gene expression levels [36]. Similar to the geNorm method, NormFinder imparts a score of expression stability (M) to each gene, which is negatively correlated with the stability of gene expression [37]. In addition, NormFinder can determine the estimated inter-and intra-group variances [38]. The calculated values generated by NormFinder are shown in Table 3. In the final outcome, CYP, QCR, and aTUB show the most stable expression levels for the total samples, stems, leaves, and the five developmental stages, while 30SRPS20 and TCTP were observed to be less stable. In the roots, GAPDH and V-ATP were the most stably expressed genes with values of 0.013 and 0.110, while 18S rRNA was the least stable. Nevertheless, EF-1a and eIF-5A were found to be in the forefront of the rankings. The results of NormFinder and geNorm were almost consistent.
The stability of the candidate reference gene expression was also analyzed using BestKeeper, an Excel-based tool. In this analysis, the average Ct value of every single reaction is applied to analyze the stability of each candidate reference gene [25]. Rankings of the candidate reference genes are based on their pair-wise correlation with this index value, which is indicated by the Pearson correlation coefficient (r) [35]. BestKeeper calculates the standard deviation (SD) and the coefficient of variation (CV) based on the Ct values. The most stable reference genes exhibit the lowest CV and SD (CV6SD) [39]. Because the maximum number of genes analyzed by this algorithm is 10 [40], the candidate genes that rank lower in the previous analyses are generally ruled out. The ranking of the genes revealed through BestKeeper analysis is presented in Table 4. These results were mostly consistent with those obtained Figure 4. Gene expression stability and ranking of 20 candidate reference genes as caluculated by geNorm. The stability value (M) was determined by assessing the mean pairwise variations of all genes; the least stable gene (the highest M value) was excluded, and the M value was recalculated until the most stable pair was selected. doi:10.1371/journal.pone.0112177.g004 Figure 5. Determination of the optimal number of reference genes required for effective normalization. The geNorm program calculated an NF and used the variable V to determine pairwise variation (Vn/Vn+1) between two sequential NFs (NFn and NFn+1). Additional genes are included when V exceeds the cutoff value, which is typically set at 0.15 but is not always achievable. The number of reference genes is deemed optimal when the lowest possible V value is achieved, at which point it is unnecessary to include additional genes in the normalization strategy. doi:10.1371/journal.pone.0112177.g005 Table 3. Ranking of candidate reference genes in order of their expression stability as calculated by NormFinder software. using geNorm, including the total samples, roots, stems, leaves and LP.
In summary, CYP and EF-1a were demonstrated to be the best reference genes under all the treatment conditions. In addition, GAPDH and V-ATP showed the highest CV6SD values (0.3760.08 and 0.4660.11, respectively) in the roots. However, ACT1 and QCR were the most stable reference genes in FS, and their CV6SD values were 0.2960.06 and 0.4460.11, respectively, which slightly differed between geNorm and NormFinder.

Discussion
Selection of suitable reference genes is a crucial pre-condition to a successful gene expression study based on qRT-PCR. Using inaccurate reference genes can lead to conflicting results, particularly when the variations in the rate of transcription between sample groups are small [41]. Herein, we have described a systematic analysis involving the stability of mRNA expression of candidate genes for data normalization in qPCR experiments using different developmental stages and the three vegetative organs of Panax ginseng. Investigation of 20 candidate reference genes by Ct value, geNorm, NormFinder, and BestKeeper applets led to the identification of the best reference genes for differential gene expression analyses at different developmental stages and various organs of ginseng. In qRT-PCR analysis, certain housekeeping genes (such as, ACT, UBQ, F-box) are considered stably expressed in different environmental conditions and are commonly employed as reference gene(s) [36]. The analysis data revealed certain changes in the mRNA gene expression levels in majority of the traditional housekeeping genes of ginseng under different treatment conditions; therefore, these genes could not be considered as ideal ginseng reference genes. However, a stable reference gene is essential for genetic engineering studies in ginseng. To the best of our knowledge, this is the first report on the identification and validation of suitable reference genes for qRT-PCR analysis of ginseng.
An ''ideal'' reference gene(s) should be continually transcribed in all cell types and organs. Additionally, its RNA transcription level should be relatively constant in response to the internal and external stimulations [42]. For example, during housekeeping gene selection for qRT-PCR normalization in potato, it was found that the expression of EF-1a was not influenced by cold, salt, or late blight stressors [29]. In the analysis of reference genes for Arabidopsis, EF-1a was relatively stable in different organs [43]. However, under nutrition deficiency or abiotic stress, the stability of EF-1a was poor [44]. Selected as the appropriate reference gene in cucumber, the CYP gene was the most stable gene under cold and heat stress treatments; nevertheless it was less stable in various other tissues [45]. Based on our statistical analyses using Ct value, geNorm, NormFinder, and BestKeeper applets, the mRNA expression level of CYP, a traditional housekeeping gene, was found to be the most stable in different organs and developmental stages, and was followed by EF-1a (Table 5). Furthermore, out of the 10 novel reference genes, it was interesting to note that QCR was relatively stable in all the experimental samples.
Although the results of all the three applets were reasonable, they were not found to be completely consistent. However, this variation was not surprising, since the three software applications are based on different calculation algorithms [25]. geNorm is known to be a more effective and feasible algorithm for ensuring the optimal stability of reference genes, whereas NormFinder and BestKeeper are best applied for assessing the quality of the gene rankings obtained by geNorm [26,46,47]. The results of the geNorm analysis have been satisfactorily accepted by many Table 3. Cont.  Notes: LP, leaf-expansion period; FS, the flower stage; GFS, the green fruit stage; RFS, the red fruit stages; RGS, the root growing after fruit stage. Descriptive statistics of 10 candidate genes based on the coefficient of variance (CV) and standard deviation (SD) of their Ct values were determined using the whole data set. Reference genes were identified as the most stable genes, i.e. those with the lowest coefficient of variance and standard deviation (CV% 6 SD). doi:10.1371/journal.pone.0112177.t004 Table 5. Stability ranking of 20 candidate reference genes using geNorm,Normfinder and Bestkeeper.