Caenorhabditis elegans Operons Contain a Higher Proportion of Genes with Multiple Transcripts and Use 3′ Splice Sites Differentially

RNA splicing generates multiple transcript isoforms from a single gene and enhances the complexity of eukaryotic gene expression. In some eukaryotes, operon exists as an ancient regulatory mechanism of gene expression that requires strict positional and regulatory relationships among its genes. It remains unknown whether operonic genes generate transcript isoforms in a similar manner as non-operonic genes do, the expression of which is less likely limited by their positions and relationships with surrounding genes. We analyzed the number of transcript isoforms of Caenorhabditis elegans operonic genes and found that C. elegans operons contain a much higher proportion of genes with multiple transcript isoforms than non-operonic genes do. For genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and non-operonic genes. C. elegans operonic genes also have a different preference of the 20 most common 3′ splice sites compared to non-operonic genes. Our analyses suggest that C. elegans operons enhance expression complexity by increasing the proportion of genes that express multiple transcript isoforms and maintain splicing efficiency by differential use of common 3′ splice sites.


Introduction
RNA splicing generates multiple transcript isoforms from a single gene and is believed to be a driving force for biological complexity in evolution [1,2]. In C. elegans, over 13% of genes are alternatively spliced [3]. In human, most genes are alternatively spliced [4,5,6]. Compared to RNA splicing, operons provide a different regulatory form of gene expression. An operon is a cluster of genes that are transcribed from a single promoter and controlled by the same regulatory sequences [7]. Operons exist abundantly in prokaryotes and are also found in eukaryotes, which include the nematode Caenorhabditis elegans, the fruit fly Drosophila melanogaster and some mammals [7,8]. In C. elegans, it was initially estimated that there were 15% of genes in about 1000 operons with an average of 2.8 genes per operon [9,10]. Recently the number of annotated operons in the C. elegans genome has increased to approximately 1250 (Wormbase Release 205), which gives an average of 2.3 genes per operon considering the number of operonic genes remains largely unchanged (around 2880, see the Results). In C. elegans, genes in an operon form a closely-spaced cluster with an ,100 bp intergenic distance [10]. However it is not known how operonic genes increase expression complexity, e.g., by RNA splicing, to adjust to the pressure of evolution and at the same time maintain their positional and regulatory relationships. C. elegans has a large number of operonic genes that are alternatively spliced, which provides an interesting model to understand the relationship between operons and RNA splicing.

Results
We examined the average number of transcript isoforms per gene for genes of the whole genome, for all non-operonic genes and for all operonic genes. As shown in Figure 1A, non-operonic genes had about 1.26 transcript isoforms per gene, which was similar to the average of 1.31 transcript isoforms per gene for the whole genome. Operonic genes had 1.68 transcript isoforms per gene, which was over 30% more than that of the non-operonic genes.
One reason that operonic genes have more transcript isoforms per gene than non-operonic genes do is that operons may contain a higher proportion of genes that generate multiple transcript isoforms. Indeed, about 40% of all operonic genes have multiple transcript isoforms ( Figure 1B and Table 1). However, only 14% and 17% of non-operonic genes and all genes, respectively, have multiple transcript isoforms ( Figure 1B and Table 1). We next examined whether there is any difference in the average number of isoforms for genes that have multiple transcript isoforms. For all such non-operonic genes, there were about 2.81 isoforms per gene. For all such operonic genes, there were 2.71 isoforms ( Figure 1C). For all genes of the whole genome, this number was 2.78, which was similar to that of operonic and non-operonic genes ( Figure 1C). These results suggest that alternatively spliced operonic and non-operonic genes do not differ apparently in generating transcript isoforms. Therefore, operonic genes may utilize the splicing machinery as efficiently as non-operonic genes do to enhance their expression complexity.
To investigate whether operonic introns utilize 39 splice sites differently from non-operonic introns, we analyzed the nucleotide sequences of position 27 to 21 of C. elegans introns. This sequence (39 splice site) is recognized by the splicing factors U2AF large and small subunits and plays important roles in regulating splicing efficiency and alternative splicing [11,12,13,14]. Among all 39 splice sites, the top 20 most commonly used sites were found in over 80% of introns ( Table 2), suggesting that these sites are responsible for the splicing of the majority of introns. As shown in Figure 2, operonic introns use ttttcag, atttcag, tttccag and tttgcag significantly more frequently than non-operonic introns do, in which the frequency of tttgcag usage in operonic introns increased over 30% compared to that in non-operonic introns. 16 sites were used equally or less frequently in operonic introns. Among them, the frequencies of tttttag, gtttcag, ctttcag, attttag and tgttcag were significantly reduced compared to that of non-operonic introns.

Discussion
It is a challenge for operonic genes to increase expression complexity and maintain splicing efficiency while keeping strict positional and regulatory relationships. C. elegans operons may achieve these goals by at least two approaches. First, C. elegans operons significantly increase the proportion of genes that express multiple transcript isoforms ( Figure 1). However, for genes that express multiple transcript isoforms, there is no apparent difference between the number of isoforms in operonic and nonoperonic genes. This result suggests that C. elegans operons are more permissive for their genes to increase expression complexity by RNA processing than non-operonic genes are. By increasing the proportion of genes that express multiple transcript isoforms, C. elegans operons may compensate for a more strict transcriptional regulation and achieve the goal of expression complexity. Alternatively, C. elegans operonic genes may be under more pressure evolutionarily to enhance their transcript complexity, e.g., in order to perform more complex biological functions. Second, C. elegans operonic genes use four of the 20 most abundant 39 splice sites (ttttcag, atttcag, tttccag and tttgcag) more frequently and use the other 39 splice sites equally or less frequently (Figure 2). The differential usage of common 39 splice sites may help maintain efficient splicing of operonic genes, which are often highly expressed and have essential biological functions [9,10]. The differential usage of common 39 splice sites by operonic genes is also consistent with the notion that transcription and RNA splicing are coupled processes [1,2]. Compared to individual genes, it is plausible that the coupling of transcription and splicing of multiple genes in an operon presents a more challenging task for the splicing machinery, which may favor those 39 splice sites that optimize the splicing process and result in a differential use of common 39 splice sites by operonic genes.
The expression of transcript isoforms by C. elegans operonic genes may also depend on other regulatory mechanisms, e.g., by using different splicing silencers or enhancers and by generating alternative 59 and 39 untranslated regions (UTRs). Further analysis of these possibilities will provide a more comprehensive picture about the expression complexity of C. elegans operonic genes.

Methods
We downloaded C. elegans gene names and annotated transcripts from the WormMart (WormBase Release 195) as html files. The data were processed using MS Excel to identify genes with different number of transcripts. Non-operonic genes were identified by deducting operonic genes from all genes of the whole genome. A random examination of over 100 operonic genes that are annotated to have multiple transcript isoforms indicates that the isoforms for each gene share at least one coding exon.
The total number of each analyzed 39 splice site (positions 27 to 21) for the whole genome was obtained from the Intronerator (http://genome-test.cse.ucsc.edu/Intronerator/) [15]. We downloaded 16,087 unique operonic intron sequences from WormMart (WormBase Release 195) and processed the sequences using a software written in the C programming language and Microsoft Excel. Identical 39 splice sites (positions 27 to 21) are grouped and the proportion of each site is determined. The number of each 39 splice site for non-operonic genes was obtained by deducting the number of the same site for operonic genes from the number for the whole genome. The online calculator for pairwise Z-test analysis is found at http://www.dimensionresearch.com/resources/ calculators/ztest.html.  Figure 2. Common 39 splice sites are used differentially by C. elegans operonic genes. The proportions of each 39 splice site (X axis) of operonic and non-operonic genes were compared to that of all genes of the whole genome and were presented as fold changes (Y axis). Pairwise Ztest was performed (see Table 2) to evaluate the significance of difference between the proportions of each 39 splice site in operonic genes and nonoperonic genes. *: p#0.01. doi:10.1371/journal.pone.0012456.g002