Selection of Suitable Reference Genes for RT-qPCR Analyses in Cyanobacteria

Cyanobacteria are a group of photosynthetic prokaryotes that have a diverse morphology, minimal nutritional requirements and metabolic plasticity that has made them attractive organisms to use in biotechnological applications. The use of these organisms as cell factories requires the knowledge of their physiology and metabolism at a systems level. For the quantification of gene transcripts real-time quantitative polymerase chain reaction (RT-qPCR) is the standard technique. However, to obtain reliable RT-qPCR results the use and validation of reference genes is mandatory. Towards this goal we have selected and analyzed twelve candidate reference genes from three morphologically distinct cyanobacteria grown under routinely used laboratory conditions. The six genes exhibiting less variation in each organism were evaluated in terms of their expression stability using geNorm, NormFinder and BestKeeper. In addition, the minimum number of reference genes required for normalization was determined. Based on the three algorithms, we provide a list of genes for cyanobacterial RT-qPCR data normalization. To our knowledge, this is the first work on the validation of reference genes for cyanobacteria constituting a valuable starting point for future works.


Introduction
Cyanobacteria are a multifaceted group of photosynthetic prokaryotes, with a large morphological diversity including unicellular, colonial and filamentous forms. Some filamentous strains exhibit cellular differentiation, with cells specialized in photosynthesis (vegetative cells) and in nitrogen fixation (heterocysts) [1][2]. Besides the morphological diversity, cyanobacteria are versatile microorganisms with simple growth requirements that are able to produce several secondary metabolites of commercial interest [3][4][5]. To exploit this biotechnological potential and use cyanobacteria as cell factories it is necessary to understand their physiology and metabolism at a systems level. The comprehensive evaluation of gene expression patterns is an important part of this characterization and, at present, the quantitative real-time polymerase chain reaction (RT-qPCR) is considered the gold standard for measurement of transcript abundance [6]. This technique allows the simultaneous amplification and quantification of a target amplicon by measuring the fluorescence increment in each PCR cycle in a fast, extremely sensitive and accurate way [7]. However, it has been shown that the poor design of RT-qPCR experiments can lead to erroneous or biologically irrelevant data [6]. A number of parameters such as the sample origin and amount, RNA quality and integrity, PCR efficiency, qPCR protocol and validation have been shown to compromise the quality of the final RT-qPCR experiment [8][9][10]. Additionally, the data normalization strategy used to eliminate technical or experimentally induced variation is very important. Normalization against internal control genes is frequently used, yet the expression of housekeeping genes can vary significantly depending on the samples and experimental conditions [11][12][13][14]. To minimize the influence of individual variation experimental validation of each reference gene is required for a particular sample and condition, and the use of three or more reference genes is now standard practice [10,15]. Moreover, several algorithms have been developed to allow the evaluation of candidate reference genes in terms of their expression stability and determination of the minimum number of reference genes to be used [16][17][18][19].
In this work we have validated candidate reference genes for accurate normalization of RT-qPCR data in cyanobacteria. Several genes were selected and their transcription analyzed in three morphological distinct organisms: the unicellular Synechocystis sp. PCC 6803 (Synechocystis), the filamentous Lyngbya aestuarii CCY 9616 (Lyngbya), and the filamentous heterocystous Nostoc sp. PCC 7120 (Nostoc). Our analysis provides a list of reference genes that can be used in RT-qPCR experiments with these cyanobacteria, grown in conditions routinely used in the laboratory. Although validation is always mandatory, our work constitutes a valuable starting point for the selection of reference genes, even for other cyanobacterial strains.

Results and Discussion
Choice of candidate reference genes Two approaches were followed to choose the candidate reference genes to be validated: (i) genes that have been used as controls in previous RT-qPCR studies and (ii) genes whose transcription did not vary significantly in two microarray studies [20][21] available in the CyanoBIKE server [22]. Twelve genes from independent pathways were selected to minimize the effects of co-regulation (Table 1): seven based on the bibliography (i) and five from CyanoBIKE (ii). A preliminary RT-qPCR analysis was performed using cells grown under a 16 h light/8 h dark regimen (LD) and collected in the middle of the phases (L8 and D4), in medium with combined nitrogen (N + ) or without combined nitrogen (N 2 ) for Lyngbya and Nostoc, while in Synechocystis only N + medium was used (non-N 2 -fixing strain). The six candidate reference genes with lowest Cq variation were selected for further studies in each organism: rrn16S (16S), rnpB and secA (tested in all organisms), ilvD (Nostoc), petB (Nostoc and Synechocystis), ppc (Lyngbya and Synechocystis), purC (Lyngbya), rnpA (Lyngbya and Nostoc) and rpoA (Synechocystis). In the second RT-qPCR analysis, samples collected in continuous light (CL) were included and four time points were tested in all conditions (CL.N + , CL.N 2 , LD.N + and LD.N 2 ).

Amplicon specificity, RT-qPCR efficiencies and analysis of Cq values
The size of the amplicons was confirmed by agarose gel electrophoresis (one sample per primer pair per organism, see Fig.  S1) and their specificity confirmed by DNA sequencing. In all the RT-qPCR runs, standard curves were included (using cDNA serial dilutions) and the amplification efficiencies (E) obtained were in the recommended range (90-110%), except for the rpoA gene (Synechocystis) in which E was slightly higher (117.4%, see Table 2). The candidate reference genes exhibited desirable E values with acceptable R 2 values demonstrating linearity and reproducibility of the reactions across organisms and variety of conditions tested. The melting curves showed single peaks corresponding to unique amplicons, and in the no template controls (NTCs) no amplification or non-specific products (unstable at the specific amplicon melting temperature) were detected ( Table 2). The parameters derived from the RT-qPCR analysis required by the MIQE guidelines are listed in Table 2.
The raw quantification cycle (Cq) values were extracted from the iQ5 Optical System Software v2.1 (Bio-Rad) and represented by box-and-whiskers plots (Fig. S2). The median Cq value for the 16S was the lowest in all organisms (Cq<20), which was expected since it is one of the most abundant transcripts. For the other genes, similar median Cq values were observed in Lyngbya (Cq<30), while for Nostoc and Synechocystis larger variation was found (23,Cq#36). The use of 16S as reference gene is controversial since the degradation machinery may not affect rRNA and mRNA in the same manner [23][24], and it is also recommended that reference and target genes should have similar Cq values [25]. The latter issue can be circumvented by ensuring that the calculations are performed within the linear range of the amplification curves of reference and target genes [25].
Assessing gene-stability and minimum number of reference genes required Gene expression stability was analyzed using three distinct algorithms: geNorm [16], NormFinder [18] and BestKeeper [17]. These algorithms rank the candidate reference genes according to the calculated gene expression stability values (M, geNorm and NormFinder) or Pearson's correlation coefficients (r, BestKeeper) (see Material and Methods for further details). Two types of analyses were performed to assess the stability: (i) under each growth condition (CL.N + , CL.N 2 , LD.N + and LD.N 2 ) pooling the data corresponding to four collection time points, or (ii) under each light regimen (CL and LD) pooling the data corresponding to eight collection time points (four in each medium). For each time point, data from each biological and technical replicates were considered. The analysis of reference genes stability using the different algorithms revealed similar gene rankings for all organisms under the different conditions (Tables S1, S2 and S3). Furthermore, the optimal number of reference genes for each organism in each specific experimental condition was determined using geNorm PLUS , which calculates the pair-wise variation -V n / petB Cytochrome b 6 , involved in electron transport and ATP generation [38] ppc Phosphoenolpyruvate carboxylase (PEPC), central enzyme in carbon concentrating mechanism [39] rnpA Protein subunit of ribonuclease P (RNase P) [38] rnpB RNA subunit of ribonuclease P [40][41][42][43][44] rps1B Small subunit ribosomal protein S1 [42,44]  V n+1 . This analysis showed that, in most of the conditions, the value of V 2/3 was below 0.15 indicating that the minimum number of reference genes required for normalization is two (Fig. 1). For Lyngbya, under the conditions CL, LD.N 2 and LD the pair-wise variation was always above the threshold and therefore the use of three genes for the first two conditions and five for latter is necessary (Fig. 1A). Taking into consideration the ranking of the three algorithms as well as the pair-wise analysis of geNorm PLUS , a comprehensive selection of adequate reference genes is presented in Table 3. In all the conditions tested and in the three organisms, a minimum of three stable reference genes was always considered since it is more appropriate for a reliable normalization of data [16]. In this study the 16S gene was shown to be a stable reference gene in all organisms, and can be used for normalization in all the conditions tested in Nostoc (six conditions) and in Synechocystis (two conditions). The rnpB gene was stable in the three cyanobacteria, but the conditions in which it can be used are more limited. This first study on the validation of cyanobacterial reference genes identified 16S and rnpB as the most stable in the majority of the conditions tested; by coincidence these two genes were the more commonly used for normalization in previous cyanobacterial RT-qPCR studies ( Table 1). The rnpA and secA genes can also be considered for data normalization in Lyngbya and Nostoc in a wide range of conditions. All the other reference genes listed in Table 3 are more strain-dependent: ilvD can be used in Nostoc, while ppc and purC are specific for Lyngbya, and petB is suggested for Synechocystis and Nostoc, but in the latter only in one condition (LD.N 2 ).

Choice of reference genes
To demonstrate the effects of the choice of a non-optimal reference gene, an expression analysis was simulated using data from Nostoc grown under a light/dark regimen. In this particular condition, 16S was identified as the most stable gene, followed by secA, and rnpB was identified as the least stable. Therefore, the simulation was performed considering secA as target and its expression was normalized against 16S or rnpB ( Fig. 2A). The relative fold expression of secA was calculated and normalized to the lowest value: 1.8860.98 in N + and 1.0060.61 in N 2 medium. This normalization resulted in a significant difference in relative fold expression when comparing the two media (P,0.0001). The expression level of secA was calculated relative to 16S, and it was found to be stable in both media, P = 0.7157 (1.0860.77 in N + medium and 1.0061.00 in N 2 media). In contrast, when rnpB was used as reference the relative expression of secA varied between 1.0060.66 and 1.7761.53 in N + and N 2 media, respectively, with a significant P value of 0.0112. Consequently, the difference between gene expression levels of secA is an artifact caused by the wrong choice of reference gene. This finding is supported by the results of the gene expression stability for secA (see above). A second analysis was performed taking rnpB as target and 16S and secA as reference genes (Fig. 2B). The rnpB relative fold expression was also calculated and normalized to the lowest value: 3.3261.33 in N + and 1.0060.61 and N 2 medium (P,0.0001). Subsequently, rnpB expression was normalized to 16S, secA or both genes generating similar results: rnpB expression was always significantly lower in N 2 medium. Overall, this analysis confirms the expression stability of 16S and secA genes and the inaccuracy of using the rnpB gene as a reference, emphasizing that the choice of non-optimal reference genes leads to data misinterpretation.

Conclusions
This is the first study on validation of reference genes from cyanobacteria and, although a ''universal'' set could not be identified, a list of recommended genes is provided for the normalization of RT-qPCR data. The expression of a set of candidate reference genes was analyzed in three morphologically distinct cyanobacteria grown under different conditions. The different algorithms used to assess expression stability did not rank the candidate genes in the same order, but in general identified the same as the most stable. This reinforces not only the need to use more than one algorithm for accurate validation, but also the requirement of normalizing data against three or more reference genes. The array of experimental conditions tested makes this work a valuable starting point in the choice of reference genes when studying cyanobacteria.

Materials and Methods
Strains, culture conditions and sample collection Three cyanobacterial strains, representing different sections among cyanobacteria, were used: the unicellular Synechocystis sp. PCC 6803 (Pasteur Culture Collection of Cyanobacteria, Paris, France), the filamentous non-heterocystous Lyngbya aestuarii CCY    at 4500 g, 4uC), resuspended in 0.5 mL medium and mixed with 1 mL RNAprotectH Bacterial reagent (Qiagen) and processed according to manufacturer's instructions, the cell pellets were frozen at 280uC until further use. Escherichia coli strain DH5a (Stratagene) was used for cloning purposes. E. coli transformants were cultivated at 37uC in Luria-Bertani (LB) medium supplemented with 100 mg mL 21 ampicillin.

Candidate reference genes, primer design and amplicon specificity
The selection of the candidate reference genes was performed by bibliographic search, and using the CyanoBIKE server (http:// biobike.csbc.vcu.edu:8003, accessed 2008) [22] selecting genes whose relative transcript levels varied between 0.7 and 1.3 but their difference in all conditions tested was below 0.5.
The sequences of the twelve selected genes were obtained from GenBank (accession number and location in Table S4). The primers were designed using the Beacon Designer 6 software (PREMIER Biosoft International) and synthesized (with cartridge purification) by STAB Vida (Lisbon). For each primer pair, the annealing temperature was optimized through gradient PCR, in reaction mixtures (20 mL) containing: 0.25 mM of each primer, 10 mL of iQ TM SYBRH Green Supermix (Bio-Rad) and 2 mL cDNA (1/9 dilution). The PCR profile was: 3 min at 95uC followed by 60 cycles of 30 s at 95uC, 30 s at 51, 53 or 56uC (Table 4 and Table S5) and 30 s at 72uC. A melting curve analysis was performed at the end of each PCR program, to exclude the formation of nonspecific products. After optimization, the size of the PCR products (for each gene and in each organism) was checked by agarose gel electrophoresis performed by standard protocols, using 16 TAE buffer [28]. These products were also cloned into the pGEM-TH Easy (Promega) vector according to the manufacturer's instructions, and the amplicon specificity was confirmed by DNA sequencing (STAB Vida).

RNA extraction and cDNA synthesis
For RNA extraction, the TRIzolH Reagent (Ambion) was used in combination with the Purelink TM RNA Mini Kit (Ambion). Briefly, the cells were disrupted in TRIzol containing 0.2 g of 0.2 mm-diameter glass beads (acid washed, Sigma) using a Mini-Beadbeater (Biospec Products) or a FastPrepH-24 (MP Biomedicals) and the following extraction steps were performed according to the manufacturer's instructions. RNA was quantified on a NanoDrop ND-1000 spectrophotometer (NanoDrop Technologies, Inc.) and the quality and integrity was checked using the Experion TM RNA StdSens Analysis Kit (Bio-Rad). The RNA samples (<300 ng) were treated with 1 U of RQ1 RNase-free DNase (Promega) according to manufacturer's instructions. The absence of genomic DNA contamination was checked by PCR, in reaction mixtures containing: 0.5 U of GoTaq Flexi DNA Polymerase (Promega), 16 GoTaq Flexi buffer, 200 mM of each deoxyribonucleotide triphosphate (dNTP), 1 mM of each BD16S primer (Table 4), and 2 mL total RNA. The PCR profile was: 2 min at 95uC followed by 25 cycles of 30 s at 95uC, 30 s at 51uC and 30 s at 72uC, and a final extension at 72uC for 7 min. The PCR reactions were run on agarose gel electrophoresis.
For cDNA synthesis, 150 ng of total RNA was transcribed with the iScript TM Reverse Transcription Supermix for RT-qPCR (Bio-Rad) in a final volume of 20 mL, following the manufacturer's instructions. A control PCR was performed using 1 mL of cDNA as template and the same reaction conditions and PCR program described above. Three-fold standard dilutions of the cDNAs were made (1/3, 1/9 and 1/27) and stored at 220uC.

Transcription analysis by Real-time RT-PCR
The RT-qPCRs were performed on iQ TM 96-well PCR plates covered with Optical Sealing Tape (Bio-Rad). Reactions were manually assembled and contained 0.25 mM of each primer, 10 mL of iQ TM SYBRH Green Supermix (Bio-Rad) and 2 mL of template cDNA (dilution 1/9). The PCR profile was: 3 min at 95uC followed by 60 cycles of 30 s at 95uC, 30 s at 51 or 56uC (Table 4) and 30 s at 72uC. Standard dilutions of the cDNA were used to check the relative efficiency and quality of primers. Negative controls (no template cDNA) were included and a melting curve analysis was performed in all assays. RT-qPCRs were performed with three biological replicates and technical triplicates/duplicates of each cDNA sample in the iCycler iQ5 Real-Time PCR Detection System (Bio-Rad). The data obtained were analyzed using the iQ5 Optical System Software v2.1 (Bio-Rad). Efficiency values were calculated and the Cq values for each data set were exported to a Microsoft Office Excel file, and imported into the qbase PLUS2 software (Biogazelle). The relative quantities of each sample were calculated using the gene-specific efficiency acquired from the dilutions series and normalized to the mean Cq value.
GraphPad Prism v5 (GraphPad Software, Inc.) was used to create the box-and-whisker plots. For each gene in a given condition, the Cq values of all technical replicates corresponding to the four collection time points were pooled together. Boxes correspond to Cq values within the 25 th and 75 th percentiles and the median is represented by an horizontal line. Whiskers include Cq values within the 10 th and the 90 th percentiles and Cq values outside this range (outliers) are represented as dots (Fig. S2).

Analyses of gene stability and number of optimal reference genes
Gene stability was assessed using three different mathematical algorithms: geNorm PLUS [16], NormFinder [18], and BestKeeper [17]. The geNorm PLUS algorithm determines internal control gene stability measure (M) as the average pair-wise variation of each reference gene with all the other reference genes, and enables the elimination of the least stable gene and the recalculation of the M values, resulting in the ranking of the most stable genes. The lower the M value, the higher is the gene stability; a good reference gene should have an M below 0.5 in homogeneous sample sets [15,29]. Furthermore, the minimum number of genes required for the calculation of a reliable normalization factor is determined, calculating the pair-wise variation V n /V n+1 between two sequential normalization factors NF n and NF n+1 (geometric mean of the expression levels of n reference genes). The stepwise inclusion of reference genes is performed until V n /V n+1 drops below the recommended threshold of 0.15, when the benefit of using an extra gene (n+1) is limited [16,30]. Gene stability determination by geNorm PLUS was performed using the qbase PLUS2 software v2.2 (Biogazelle).
NormFinder focuses on the expression variations both between different groups (inter-group variation) and inside one group (intragroup variation), in a model-based approach of mixed linear effect modeling. This algorithm determines inter-and intra-group variations and combines both results in a stability value for each investigated gene. According to NormFinder, genes with lowest M value will be top ranked. For the analysis with this algorithm, expression values were calculated using qbase PLUS2 and first exported to a Microsoft Office Excel file for use with the NormFinder applet.
BestKeeper calculates standard deviations (SD) and the coefficient of variance (CV) based on the Cq values of all candidate reference genes, considering the genes with SD lower than 1 as stably expressed. The algorithm calculates the BestKeeper index (BI) as the geometric mean of the stable reference genes Cq values and then establishes pair-wise correlations between each gene and the BI. The genes with highest Pearson correlation coefficient (r<1) and probability (p) value lower than 0.05 are considered the most stable. The average Cq value of each technical triplicate/ duplicate was calculated (without conversion to relative quantity) and imported to the Microsoft Excel based application BestKeeper to analyze gene stability.

Choice of reference genes
The gene expression simulation was performed using data from the three biological replicates of Nostoc grown under a light/dark

MIQE guidelines
In this work, the Minimum Information for Publication of Quantitative Real-Time PCR Experiments (MIQE) guidelines [10] were followed to promote the effort for experimental consistency and transparency, and to increase the reliability and integrity of the results obtained.