Selection of Reference Genes for Gene Expression Normalization in Peucedanum praeruptorum Dunn under Abiotic Stresses, Hormone Treatments and Different Tissues

Peucedanum praeruptorum Dunn is one of the main traditional Chinese medicines producing coumarins and plenty of literatures are focused on the biosynthesis of coumarins. Quantitative real-time reverse transcription PCR (qRT-PCR) is a widely used method in studying the biosynthesis pathway and the selection of reference genes plays a crucial role in accurate normalization. To facilitate biosynthesis study of coumarins, twelve candidate reference genes were selected from the transcriptome database of P. praeruptorum according to previous studies. Then, BestKeeper, geNoFrm and NormFinder were used for selecting stably expressed reference genes in different tissues and under various stress treatments. The results indicated that, among the twelve candidate reference genes, the SAND family protein (SAND), actin 2 (ACT2), ubiquitin-conjugating enzyme 9 (UBC9), protein phosphatase 2A gene (PP2A) and polypyrimidine tract-binding protein (PTBP1) were the most stable reference genes under different experimental treatments, while glyceraldehyde 3-phosphate dehydrogenase (GAPDH) and tubulin beta-6 (TUB6) were the least stable genes. In addition, the suitability of SAND, TIP41-like protein (TIP41), UBC9, ACT2, TUB6 and their combination as reference genes were confirmed by normalizing the expression of 1-aminocyclopropane-1-carboxylate oxidase (ACO) in different treatments. This work is the first survey of the stability of reference genes in P. praeruptorum and provides guidelines to obtain more accurate qRT-PCR results in P. praeruptorum and other plant species.


Introduction
Plant secondary metabolites have various biological activities and pharmacological importance to human beings. As important members of plant secondary metabolites, coumarin compounds have received continuous attention, at the same time their chemical structures and praeruptorum to investigate and choose the suitable reference genes for normalization [14]. To determine the stability of reference genes, different experimental treatments, including osmotic stress (polyethylene glycol, PEG), salt stress (NaCl), oxidative stress (H 2 O 2 ), mental stress (CuSO 4 ), hormones (methyl jasmonate (MeJA) and salicylic acid (SA)), cold (4°C) and heat (42°C) stress, and tissue pattern, were conducted to ensure that the reference genes selected in this study could be used in various experimental treatments designed for qRT-PCR [44,45]. To analysis the raw data, three kinds of Excel-based software, geNorm [24], NormFinder [23] and BestKeeper [35] were employed according to the manufacturer's procedures. The results indicated that SAND, ACT2, UBC9, PP2A and PTBP1 were the most stable reference genes, while GAPDH and TUB6 were the least stable genes. In addition, geNorm was used to determine the optimal numbers of the reference genes required for accurate normalization by pairwise variation (Vn/Vn+1) [24] and the results indicated that under most groups, selection of two reference genes could be sufficient for normalization. To confirm the suitability of selected reference genes, SAND, TIP41, UBC9, ACT2, TUB6 and their combination were used as reference genes to normalize the expression of ACO in different treatments. The results showed a good correlation with the stability rank revealed by the method of predication used in this study, which proved that the reference genes identified in this work were desirable. To the best of our knowledge, this work is the first survey of the stability of reference genes in P. praeruptorum, which would provide the basis for further research in exploring the metabolism and regulation mechanism to environment stresses, especially the secondary metabolism involved in biosynthesis of coumarin compounds [6,9,11].

Plant sample preparation and treatment
One-year-old P. praeruptorum materials were collected from the fields of Ningguo City, Anhui Province, China (longitude: 118.95E, latitude: 30.62N). The field of P. praeruptorum was a private land and the owner of the land had given permission to us for using the plant to conduct the study. Then the plants were transplanted in plastic basins containing a mixture of vermiculite, perlite, and peat moss at a ratio of 1:1:1. The plants were grown in a greenhouse with a long photoperiod (16 h light and 8 h dark) at 25°C, 40-65% relative humidity and 3000 lux of light intensity until use. For drought treatment, plants were subjected to 200 mL of 25% PEG 6000 per day for one week. In salt stress treatment, about 200 mL (600 mM) of NaCl was applied for seven days. For cold and heat shock treatments, plants were transferred to a greenhouse with the temperature of 4°C and 42°C for 48 h, respectively. For hormone treatments, MeJA (25 mM) and SA (5 mM) were imposed for 6 h according to the method described before [43]. Heavy metal stress was conducted using 500 mM copper sulfate (CuSO 4 ) for 24 h and oxidative stress was carried out with 50 mM H 2 O 2 for 24 h. All of the treatments were processed in three biological replicates. The plants without treatments were collected as control and the harvested samples were washed with MINIQ-filtered water and frozen and stored in liquid nitrogen prior to RNA isolation.

Selection and validation of candidate reference genes and primer design
According to previous studies [14,22,[38][39][40][41][42], twelve candidate genes (ACT2, CYP2, EF-1α, eIF-4α, GAPDH, NCBP20, PP2A, PTBP1, SAND, TIP41, TUB6 and UBC9) were used to identify the most stable reference genes of P. praeruptorum in different treatments and their totally information was listed in Table 1. To meet this goal, the protein sequences of the twelve Arabidopsis genes were selected from TAIR database (http://www.arabidopsis.org) to set as the templates. And then, a local BLAST was conducted using the tblastn program in Bioedit Sequence Alignment Editor. The nucleotide sequences were also downloaded to find the intron for primer design using Primer 5. The primers listed in Table 1 were optimized for the primer lengths of 22 bp, GC content of 44% to 60% and amplification lengths from 84 bp to 170 bp. To validate the reliability of the selected candidate reference genes, the relative expression profiles of ACO were measured and normalized with the most and least stable reference genes under different experimental designs. The samples without treatments were used as control. The relative expression data was calculated by the 2 -ΔΔCT method [46] and presented as relative expression level. Three biological and technical replicates were used to obtain the qPCR data and the raw data were listed in S1 Table. Total RNA, DNA extraction and cDNA synthesis According to the manufacturer's recommendations, TransZol Plant reagent (TransGen Biotech, Beijing, China) was used to isolate the total RNA, then Spectramax plus384 enzyme-labeling instrument (Molecular Devices, Sunnyvale, USA) and agarose gel electrophoresis were used to assess the quantity and quality of the RNA. To eliminate DNA contamination, the samples were treated with DNase I (Takara, Dalian, China) according to the manufacturer's protocol. For qRT-PCR experiments, cDNA was synthesized in a 20 μl reaction volume using a quantity of 1 μg total RNA (Takara, Dalian, China). Then, the plasmid DNA containing each gene was diluted 10-fold with nuclease-free water for qPCR analysis and continue diluted (10 2 ×, 10 3 ×, 10 4 ×, 10 5 ×, 10 6 × and 10 7 ×dilutions) for determining the amplification efficiency (E) and correlation coefficient (R 2 ). To achieve the aim of amplification of the genomic DNA

RT-PCR and qPCR analysis
To verify the specificity of the target genes, PCR was performed using cDNA as a template. To verify the primer design containing intron, an analogous PCR procedure was conducted using total genomic DNA as a template. And then, all PCR products were examined by 1% (w/v) agarose gel electrophoresis. To further confirm the PCR products and obtain the nucleotide sequences of the target genes, the PCR products were inserted into pMD19-T Vector (Takara, Dalian, China) for sequencing and the results are listed in S2 Table. qRT-PCR analysis was conducted using LightCycler 480 (Roche Molecular Biochemicals, Mannheim, Germany) and AceQ qPCR SYBR Green Master Mix (Vazyme, Nangjing, China). Reaction mixtures contained 10 μL AceQ qPCR SYBR Green Master Mix, 2 μL diluted cDNA, 0.4 μM primer and ddH 2 O in a total volume of 20 μL. According to the procedures of specification, the following amplification conditions were applied: 1 cycle of 95°C for 5 min, 40 cycles of 95°C for 10 sec and then 60°C for 30 sec, followed by 1 cycle of 95°C for 15 sec, 60°C for 60 sec and 95°C for 15 sec. The RNase-free water was used as a control and three technical replicates were contained in each plate.

Gene expression stability analysis
To analyze the expression stability of candidate reference genes, geNorm [24], NormFinder [23] and BestKeeper [35] were used according to the experimental design and manufacturer's procedures. For geNorm and NormFinder analysis, the raw Cp values under different experimental designs were transformed into relative quantities using the formula 2 -ΔCT (ΔCT = each corresponding Ct value-lowest Ct value and Cp is another name of Ct) and then imported to geNorm to analyze gene expression stability value (M). Similar to geNorm, NormFinder was further used to investigate the expression stability values (M) for each gene and the pairwise variation (V) of that gene with other reference genes. The reference gene with the highest M value is considered as the most unstable gene while the lowest M value indicated the most stable gene. BestKeeper analysis was based on the untransformed Cp values and using coefficient of variance (CV) and the standard deviation (SD) of the Cp to evaluate the stability of reference genes. It was also used to rank the candidates from the most to least stably expressed genes. By the combination of the three kinds of Microsoft Excel-based software, we could easily rank the expression stability of reference genes in different experimental sets.

Statistical analysis
Three biological and technical replicates were used to obtain the qPCR data. Unless the special comments, all data were presented as mean ± standard error of the mean (SEM). Statistical analyses were performed using the Student's t-test. Graphs were generated using OriginPro 8 (OriginLab Corporation, Northampton, MA, USA). Data analysis was performed using geNorm [24], NormFinder [23] and BestKeeper [35] according to the manufacturer's procedures.

Selection of candidate reference genes, amplification specification and PCR efficiency
According to previous studies [14,22,[38][39][40][41][42], the protein sequences of the twelve genes (TIP41, TUB6, SAND, ACT2, CYP2, GAPDH, BCBP20, eIF-4α, EF-1α, PP2A, UBC9 and PTBP1) of Arabidopsis were selected from the TAIR database to identify the candidate reference genes of P. praeruptorum in different treatments. Then, a local BLAST program was conducted to obtain the sequences. The gene name, description, Arabidopsis homolog locus, amplification length combined with TBLASTN score and E-value of the twelve genes were listed in Table 1.
The identity (ID), from 59% to 96%, indicated that they had a good homology to the selected protein sequences. To prove the specificity of the primers designed in this study, the PCR products were detected with agarose gel electrophoresis and melting curve analysis was also conducted. As shown in S1 and S2 Figs, a single band and peak were observed, indicating that they had a good specificity. To calculate the amplification efficiency, the standard curve of the candidate genes was conducted using a 10-fold serial dilution of plasmid DNA containing given genes. According to the slope of the standard curve (S3 Fig), the PCR efficiency (E) and the regression coefficient (R 2 ) were calculated and listed in Table 1

Expression stability of candidate reference genes
In order to further evaluate the stability of candidate reference genes under a given treatment, the selected twelve reference genes were supposed to different treatments (osmotic stress, salt stress, oxidative stress, heavy metal stress, hormone and temperature) and tissues, and then, 1188 Cp values (three biological and technical replicates) were collected for data analysis with three Excel-based programs (geNorm [24], NormFinder [23] and BestKeeper [35]).
geNorm analysis geNorm analysis uses the reference gene expression stability measurement (M) value which is calculated as the level of pairwise variation for each reference gene with all other control genes and as the standard deviation (SD) of the logarithmically transformed expression ratios to evaluate the gene expression stability and a high M value means a low stability [24]. To collect data for geNorm analysis, the plant was exposed to different treatments and the Cp values were processed via a linear scale using the ΔCp method [24]. As shown in Fig 2,  when it came to NaCl or SA, the span was enlarged to 1. In another word, although CYP2 was not a good reference gene in H 2 O 2 induced oxidative stress, it might also be used as a reference gene for its low M value. To be the most important findings, SAND seemed to be a most stable reference gene under different stresses (5 times to be the most stable genes in 9 stresses or 10 groups) and could be selected as a good reference gene in different experimental designs.

NormFinder analysis
NormFinder is an algorithm to identify the optimal normalization gene in a given experimental design. Similar to geNorm, the data from qRT-PCR run should not be used directly and needs to be transformed [23]. After data collection and analysis, the results of the gene stability ranking were shown in Table 2. As it was revealed, the rank of stability values gradually increased, which represented that the stability gradually decreased from top to bottom of the table.
Hence, it could easily be seen that, PTBP1, UBC9, ACT2 and SAND ranked the most stable reference genes in the treatment of PEG, CuSO 4 , H 2 O 2 (hot) and NaCl (MeJA, SA, cold), respectively. Among the most stable reference genes, SAND had a lowest value which may be considered as the most important reference genes. More interestingly, SAND ranked nearly on the top of all the treatments (6 out of 10 groups) with which similar to the outcomes of geNorm analysis (Fig 2). However, there are also slightly differences between the results of geNorm and NormFinder analysis. For instance, PTBP1 and ACT2 were considered as the most stable reference genes by geNorm (Fig 2), while they ranked second and forth by NormFinder (Table 2), respectively. Hence, another analysis method needed to be involved.

BestKeeper analysis
BestKeeper is an Excel-based tool using pairwise correlations to determine the stability of housekeeping genes, differentially regulated target genes and sample integrity [35]. The coefficient of variance (CV) and the standard deviation (SD) of the candidate reference genes were used to evaluate the stability of reference genes in each experimental design and the gene with the lowest CV and SD was considered to be the most stable reference gene [47]. It differs from the geNorm and NormFinder analysis and could use the raw data of Cp values to analyze. Similar to the results of NormFinder analysis, the CV ± SD rank of the candidate genes gradually increased from top to bottom of the table, which represented that the stability gradually decreased. For instance, CYP2 had a CV ± SD value of 1.83 ± 0.53 and ranked as the most stable genes in PEG induced osmotic stress, while, GAPDH was listed as the least stable gene for it had a CV ± SD value of 7.40 ± 1.51 (Table 3). Owing to the fact that SD>1 was considered as inconsistent and such values should be excluded [41], none of the reference genes could be used in the group of 'total' for the least SD = 1.5. Fortunately, in another 9 groups or experimental designs, nearly all SD values were below 1.5 except the most unstable one. Some reference genes, namely CYP2, ACT2 and TIP41, had a tendency to be the most stable genes for they were listed on top 3 of the ranks. On the contrary, TUB6 and EF-1α seemed not to be good reference genes.

Optimal numbers of reference genes for accurate normalization
Apart from using average pairwise expression ratios (M) to evaluate the gene expression stability, geNorm can also determine the optimal numbers of reference genes for normalization by calculating the pairwise variation (Vn/Vn+1) between the normalization factors (NF) in all samples of the different experimental sets using 0.15 as the proposed cut-off value [24]. According to this principle, the Vn/Vn+1 value was calculated and listed in Fig 3. As indicated, Selection of Reference Genes in Peucedanum praeruptorum the two most stable reference genes were sufficient for reliable normalization under all treatments and there was no need to add a new reference gene in the treatment. However, when the experimental design related to different tissues, choosing three reference genes were necessary since V2/3>0.15. The same settlements were also suited for the total samples. Despite the fact that adding a reference gene might make it credible in qPCR analysis, the proposed 0.15 value must not be taken as a too strict cut off in most time, simply using the two best reference genes were reliable enough for normalization [24], and the results of this study also proved this idea.

Reference gene validation
To evaluate the reliability of the selected reference genes, the relative expression level of ACO was calculated by the selected reference genes. As depicted in Fig 4A, an enhanced expression level of ACO was observed when normalized with the most stable reference gene, SAND. While, when it came to TUB6, one of the least reference genes, no obvious change was observed. To further evaluate the reliability of the selected reference genes, another stimulus was imposed and the three most stable reference genes were used to analyze the expression level of ACO at the same time. The results displayed that the expression level of ACO was enhanced at a same level (no significant difference) and all the three stable reference genes were reliable for normalization ( Fig 4B). However, a significant difference (P<0.001) was observed when using UBC9, one of the most unstable reference genes. According to the results of geNorm, the optimal numbers of reference genes used for normalization was also investigated. The results showed that, though UBC9 was not a good reference gene, the outcomes for normalization became advisable when it was combined with other stable genes (Fig 4C).

Discussion
Considering the high sensitivity and specificity, qPCR is now commonly used in many laboratories for high-throughput analysis of gene transcript level. And, the utilization of suitable reference genes is necessary to ensure the reliability and accuracy of the data. With the awareness of the importance of reference genes in normalization and the deviation aroused by selection of the unstable reference genes, numerous studies have been conducted to investigate the stability of reference genes under different stresses or experimental designs [22,38,39]. Hence, in this study, the expression stability of twelve candidate reference genes in P. praeruptorum was systematically analyzed by geNorm, NormFinder, and BestKeeper under the treatment of osmotic stress (PEG), salt stress (NaCl), oxidative stress (H 2 O 2 ), mental stress (CuSO 4 ), hormones (MeJA and SA), cold (4°C) and hot (42°C) stress, and different tissue types. The results indicated that different reference genes needed to be selected under the different stresses since they shared different stabilities. At the first step of the study, the twelve reference genes were cloned from the cDNA template and then inserted into T-Vector for sequencing. The correctness of the genes was confirmed by multiple sequence alignment with our transcriptome data (S3 Table). PCR was also conducted with the genomic DNA as a template and the products were inserted into T-Vector for sequencing, which could ensure the rationality of the primer design. As shown in S1 Fig, the signal band indicated the specificity of the primers while the difference of the band displayed the length of products with different templates. At the same time, the melting curve analysis was conducted to confirm the specificity of the primer pairs (S2 Fig). Moreover, the amplification efficiency of the selected candidate genes was calculated based on the slope of the standard curve. The R 2 >0.99 and E-value ranged from 92.34% to 109.23% (Table 1 and S3 Fig), indicating that the curves showed a good linear relationship and the PCR conditions were acceptable. As an important part of reference gene selection, the expression level of the selected genes was also investigated and the mean Cp values were listed in Fig 1. It can be easily seen that, the mean average expression level ranged from 19.73 to 27.74, which was consistent with the previous studies [18,41]. Based on the fact that the moderate expression level (a Cp value of 15 to 30, for instance) could give an accurate normalization [48], the genes selected in this study were sufficient for experimental needs. According to the fact that, low Cp values corresponded to high expression levels, some candidate genes selected in this study were abundantly distributed in P. praeruptorum. For instance, the GAPDH has a mean Cp value about 17 in P. praeruptorum, while the same value was up to 27 in carrot [40]. Considering that a narrow distribution range tends to represent the low variability, the variation of Cp values indicates that SAND is a best reference gene and EF-1α is the least one. The results were somehow consistent with the outcomes calculated by geNorm and NorFinder (Fig 2 and Table 2). Hence, considering the difference in the stability and the expression level of the candidate reference genes, the stability analysis and the expression analysis need to be combined.
In an ongoing effort to be more accurate in analyzing the stability of the candidate genes, three Excell-based programs were used according to the reports previously [23,24,35,49]. Owing to the fact that different software had its own method to rank the stability of the candidate genes and there might be some extent of disunity in the results, it was necessary to choose at least two methods to analyze the data. In addition, considering the reference genes had different stability under different treatments, osmotic stress, salt stress, oxidative stress, mental stress, hormones, temperature and different tissue types was imposed. The treatments conducted in the study nearly included all the treatments in similar studies and would present a systematically assess in gene stability [22,30,40,50].
With the help of the three analysis methods, the stability of the candidate genes was ranked in their own way of calculation. According to the results of geNorm analysis, SAND and ACT2 were the two most stable reference genes in total samples. The outcomes were consistent with the results of NormFinder but against the outcomes of BestKeeper, in which eIF-4α and TIP41 were the most stable reference genes with a lower CV. The same event also appeared in the treatment of PEG, CuSO 4 , H 2 O 2 , MeJA, SA and temperature stress or different tissue patterns. The results accounted for the fact that, geNorm and NormFinder analysis used the same way of calculation while BestKeeper using CV ± SD to rank the stability. This phenomenon was also reported by Zhuang and Tian in their studies [22,40]. However, it tended to be a good consistency when the five most stable reference genes were compared. For instance, under the treatment of PEG induced osmotic stress, the rank orders in geNorm, NormFinder and Best-Keeper analysis were PTBP1>PP2A>NCBP20>CYP2>UBC9, PTBP1>PP2A>NCBP20>CYP2>UBC9 and CYP2>ACT2>PP2A>NCBP20>SAND, respectively. In fact, there were no obvious differences in the order of the top five most stable reference genes. It also can be seen in Fig 4B, when normalized with TIP41, ACT2, and SAND, there were no significance differences in the expression level of ACO. Hence, it is sufficient to predict the stability of the reference genes by combination of the three kinds of software, which is a good strategy for the selection of reference genes for normalization [51,52]. For example, when merging the results with the three kinds of software, SAND, ACT2, UBC9 PP2A and PTBP1 were the most stable reference genes identified under different treatments and they could be easily found on the top list of Fig 2, Tables 2 and 3. One of them could be chosen as the best reference gene in different stress experiments and the results were consistent with the previous studies [53][54][55]. However, the candidate genes with low stability could also be used for normalization. For instance, although GAPDH nearly ranked in the last of the candidate genes, it could also be used in MeJA induced stress experiment since it had a low CV value and the highest expression abundance. There were also numerous studies indicating that, GAPDH was among the most stable genes and usually used for measuring gene expression [34,[56][57][58][59]. Hence, to be mentioned, stress and expression abundance shared equal importance in choosing a suitable reference gene.
The results above gave us enough information in reference gene selection under different treatments. A suitable reference gene could be chosen in qPCR analysis according to different experimental designs or treatments. However, the question that how many reference genes needed to be used in a specific set aroused. To resolve this problem, geNorm was employed according to the handbook which known as the 'pairwise variation (V)' [24]. They also recommended a V score of 0.15 as an ideal cut-off value, below which the inclusion of an additional reference gene was not required [24]. Based on the guidance, the optimal numbers of reference genes were calculated and listed in Fig 3. In our experiments, 8 out of 10 groups had a V score of below 0.15, indicating that there was no need to add an additional reference gene in those 8 groups. The results were also proved in Fig 4B where there were no visible differences between two or three reference genes for normalization. The same results were also proved by the study of Gimeno et al, in which different combination of reference genes was used for normalization of the target genes [39]. However, when the V score exceeded 0.15, it was desirable to add one more reference gene. As shown in Fig 4C, when UBC9, one of least stability reference genes, combined with other stable genes, the outcomes for normalization seemed to be credible. It also indicated that the proposed 0.15 value must not be taken as a too strict cut-off [24]. This was indeed true for several reports using a higher V values in some species according to the research purpose [48,60].
Conclusions qRT-PCR is the most commonly used method for gene expression analysis and a suitable reference gene is necessary for normalization. In the present study, the stability of twelve candidate reference genes in P. praeruptorum was investigated to select the most stable reference gene under different treatments. The stability analysis of gene expression by geNorm, NormFinder and BestKepper revealed that SAND, ACT2, UBC9, PP2A and PTBP1 were the most stable reference genes which could be used for normalization. However, GAPDH and TUB6 were the least stable genes. The optimal numbers of reference genes for normalization were also calculated by geNorm using the pairwise variation (Vn/Vn+1) and the results indicated that, in most of the treatments, two most stable reference genes were sufficient for normalization. In addition, the suitability of the most stable reference genes and their combination were confirmed by normalizing the expression of ACO. Apart from updating the first survey of the stability of reference genes in P. praeruptorum and providing the basis for further research in P. praeruptorum, the study also provided guidelines to obtain more accurate qRT-PCR results in other plant species.
Supporting Information S1 Fig. Agarose gel (1%) electrophresis of the twelve candidate reference genes. 1-12 represent TIP41, TUB6, SAND, ACT2, CYP2, GAPDH, NCBP20, eIF-4α, EF-1α, PP2A, UBC9, PTBP1, respectively. The left part is the PCR products with cDNA as template and the right part is the PCR products with gDNA as template.   Table. Nucleotide acid sequences of twelve candidate reference genes from P. praeruptorum. The PCR was conducted with the cDNA and genomic DNA as templates and then the products were inserted into T-Vector for sequencing, respectively. 1-12 represent the twelve candidate reference genes and the sequences with the italics represents the intron of each gene. (DOCX) S3 Table. The transcriptome data of twelve candidate reference genes. (XLSX)