The τCstF-64 Polyadenylation Protein Controls Genome Expression in Testis

The τCstF-64 polyadenylation protein (gene symbol Cstf2t) is a testis-expressed orthologue of CstF-64. Mice in which Cstf2t was knocked out had a phenotype that was only detected in meiotic and postmeiotic male germ cells, giving us the opportunity to examine CstF-64 function in an isolated developmental system. We performed massively parallel clonally amplified sequencing of cDNAs from testes of wild type and Cstf2t−/− mice. These results revealed that loss of τCstF-64 resulted in large-scale changes in patterns of genome expression. We determined that there was a significant overrepresentation of RNAs from introns and intergenic regions in testes of Cstf2t−/− mice, and a concomitant use of more distal polyadenylation sites. We observed this effect particularly in intronless small genes, many of which are expressed retroposons that likely co-evolved with τCstF-64. Finally, we observed overexpression of long interspersed nuclear element (LINE) sequences in Cstf2t−/− testes. These results suggest that τCstF-64 plays a role in 3′ end determination and transcription termination for a large range of germ cell-expressed genes.


Introduction
Polyadenylation is the co-transcriptional process by which most mRNAs form their 39 ends in eukaryotic organisms. The polyadenylation machinery is highly conserved in all eukaryotes [1], highlighting the fundamental importance of polyadenylation for gene expression and cell viability. Further, mRNA processing in general and polyadenylation in particular are forces that modify chromatin structure, changing the dynamics of gene expression [2,3,4,5]. Together, these functions imply that polyadenylation is necessary for global gene expression, chromatin function, and genomic integrity.
At least five multi-protein factors and up to 85 proteins are involved in polyadenylation [6,7]. Of these factors, the cleavage specificity factor (CstF) appears to have key functions in regulating alternative polyadenylation [5], in coupling polyadenylation to transcription termination [8], and in integrating polyadenylation with DNA damage responses [9]. CstF is composed of three polypeptides, of which the 64,000 M r polypeptide, CstF-64 (gene symbol CSTF2) is the RNA-binding component of CstF that binds to the GU-rich downstream polyadenylation element [10,11,12,13], resulting in changes in polyadenylation site choice [14,15].
Mammals have two paralogous genes encoding CstF-64: CSTF2 is X-linked and encodes the somatic CstF-64 that is expressed in all tissues [16]. In contrast, CSTF2T (Cstf2t in mice) is an autosomal retrotransposed gene that encodes tCstF-64 [17,18]. tCstF-64 is expressed at highest levels in testis and brain, and at lower levels in other tissues and cell types [7,19]. In testis, tCstF-64 is expressed exclusively in male germ cells, because male sex chromosome inactivation (MSCI) results in transcriptional inactivation of the somatic CstF-64 [20].
Congruent with this hypothesis, although tCstF-64 is expressed in multiple tissues, targeted deletion of Cstf2t revealed that the function of tCstF-64 was primarily in spermatogenesis [21]. Male mice homozygous for Cstf2t tm1Ccma (i. e., Cstf2t 2/2 mice) exhibited male infertility and spermatogenic defects resembling human oligoasthenoteratozoospermia (ibid.). While spermatogenesis was greatly disrupted, a few morphologically defective but active spermatozoa were produced [22,23]. This suggested that deletion of tCstF-64 resulted in systematic alterations in gene expression, but not absolute loss of expression of key spermatogenic genes. Therefore, we wanted to examine changes in global gene expression that correlated with the absence of tCstF-64 polyadenylation function.
Here we compare results of massively parallel clonally amplified sequencing of cDNAs from testes of wild type and Cstf2t 2/2 mice. We found that poly(A)-selected cDNAs from Cstf2t 2/2 mouse testis contained significantly less representation from annotated exonic regions, but more representation from intronic and intergenic regions. In agreement with these data, we observed down-regulation of intronless small genes (ISGs) in Cstf2t 2/2 mouse testis, and concomitant increase in the amount of readthrough transcription, suggesting these effects were due to aberrant transcriptional termination. Finally, we observed increased representation of long interspersed nuclear elements (LINEs) L1 in Cstf2t 2/2 mouse testis, but not of other repetitive elements such as short interspersed nuclear elements (SINEs) such as B2 and B4 elements or LTR elements. This suggests that tCstF-64 represses L1 elements selectively in mouse testes. Together, these data support a model in which tCstF-64 in male germ cellsand, by extension, the process of polyadenylation in all cells -is critical for global control of genomic gene expression.

Animals
Animal studies were performed in accordance with protocols according to National Institutes of Health guidelines and approved by the Texas Tech University Health Sciences Center Institutional Animal Care and Use Committee. The Cstf2t tm1Ccma mice used in these studies were of mixed C57BL/6-129SvEv background. All genotyping was done as described previously [21].

RNA Preparation and Sequencing
Total RNA was extracted from the testes of 25 day postpartum (dpp) wild type or Cstf2t tm1Ccma/tm1Ccma (Cstf2t 2/2 ) mice using the TRIzol reagent (Invitrogen, Carlsbad, CA), treated with DNase (Ambion, Austin, TX), poly(A) + RNA prepared using oligo(dT) columns (New England Biolabs, Ipswich, MA), and oligo(dT)primed double-stranded cDNA synthesized using the Just cDNA Double-Stranded cDNA Synthesis Kit (Agilent Technologies, Santa Clara, CA, Fig. 1A). Resulting cDNA (4 mg) was nebulized to produce fragments of ,500 bp, adapters were ligated onto the cDNA fragments, and emulsion-PCR performed to amplify the cDNA products. Massively parallel pyrosequencing was performed using the 454 Titanium protocol (454 Life Sciences, Branford, CT).

Sequence Analyses
454 sequencing reads were mapped to the mouse genome (Mouse Genome Assembly version mm8) using BLAT [24]. For reads with multiple hits, an alignment score was calculated for each hit which was based on the difference of the number of mismatched nucleotides from the number of matched nucleotides. If the alignment score of the best hit was greater than that of the second best hit by $10, the read was considered as uniquely mapped to the genome. Uniquely mapped reads were annotated by the gene structure based on RefSeq sequences. Identification of transposable element sequences was based on RepeatMasker [25]. Uniquely mapped reads were also compared to the LINE-1 sequences in the genome based on the LINE track in UCSC Figure 1. High-throughput cDNA sequencing (RNA-seq) finds significant differences between wild type and Cstf2t 2/2 mouse testis RNAs. (A) RNA was pooled from testes of five 25 dpp mice of either wild type or Cstf2t 2/2 genotype, cDNA synthesized, and high-throughput sequencing performed (see Materials and Methods). (B) RNA-seq from wild type (,55,000 reads) and Cstf2t 2/2 (,77,000 reads) mouse testis samples were not biased when mapped to the mouse genome. 454 sequencing reads were mapped to the mouse genome (Mouse Genome Assembly version mm8) using BLAT [24]. Pie graphs show that similar proportions of reads mapped to either unique genomic regions (blue), multiple regions (nonunique, green), or could not be mapped to known regions (unmapped, tan) in samples from wild type or Cstf2t 2/2 mouse testes. The proportion of uniquely mapped reads has no statistical difference between wild type and Cstf2t2/2 mice (85.4% vs. 85.2%; P = 0.14, Fisher's exact test). (C) Introns and intergenic regions were more highly expressed in testes of Cstf2t 2/2 mice, while exons were less expressed. Pie graphs show percentages of reads that were uniquely mapped to different regions of the genome for wild type and Cstf2t 2/2 mice. Exon (blue), reads fully aligned to exons; exon & intron (green), reads aligned to both exonic and intronic regions; intron (tan), reads fully aligned to introns; 39 UTR-ext (orange) and 59 UTR-ext (purple), reads aligned to within 4 kb downstream of 39 UTR or 1 kb upstream of the 59 UTR, respectively; intergenic (grey), reads aligned to regions not within annotated genes or their extended regions. The difference of proportion of reads mapped to different genomic regions is significant: P ,10-323 for both the intergenic region and intronic region (Fisher's exact test, exon region used as control). doi:10.1371/journal.pone.0048373.g001 tCstF-64 Controls Genome Expression PLOS ONE | www.plosone.org genome browser. Reads mapped to the LINE-1 sequences with $6 kb were used to support expression of full-length LINE-1 and the 39 end positions of the reads along the LINE-1 sequences were further examined. Data will be deposited into the NCBI Sequence Read Archive (SRA), and a BED format file will be available upon request.

Relative Usage of Distal Poly(A) Sites
Relative usage of distal poly(A) site (RUD) scores were determined from our previous microarray data [21] for 17, 22, 25, and 85 day post partum (dpp) mouse testis RNA. Each gene was assigned a RUD score value that reflected the relative 39 untranslated region (UTR) length in each sample [26,27]. The mean RUD of all genes in a sample is the RUD for the entire sample. To decrease sample bias, RUD scores were normalized to genes that do not exhibit alternative polyadenylation as described (op. cit.).

Quantitative RT-PCR (qRT-PCR)
Complementary DNA was prepared from wild type or Cstf2t 2/ 2 testes as described [21]. Experiments were performed using RNA from testes of at least three wild type and three Cstf2t2/2 mice at 25 dpp. Real time PCR was performed using the indicated primers (Table S1) with a 96-well format ABI 7500 Real-Time PCR System (Applied Biosystems, Foster City, CA), with iTaq SYBR Green Supermix with ROX (Bio-Rad, Hercules, CA). PCR  . ISGs are down-regulated and have increased read-through in Cstf2t 2/2 mouse testes. (A) Cumulative frequency of microarray log 2 mRNA expression changes of Cstf2 2/2 (KO25) versus wild type (WT25) mouse testis at 25 dpp. Short genes were defined as the lowest 20% in length, with a cutoff of 6658 bp or shorter. Indicated are long multi-exon genes (11,451 genes, blue), short multi-exon genes (2,324 genes, green), and short single-exon genes (541 genes, red). There are 276 short single-exon genes in the region between -2 and 0 log 2 expression change. P values are 4.2610 24 between short single-exon and short multi-exon genes and 1.0610 215 between long multi-exon and short-multi-exon genes by a K-S test. (B) qRT-PCR was performed using primers specific for the indicated genes (see Table S1) normalized to Rps16. Each bar represents the amount (in percent) of the indicated mRNA in 25 dpp Cstf2t 2/2 mouse testis RNA compared to wild type. The asterisks indicate values that are significantly different (P,0.001) from Rsp16 and Actb by ANOVA (Bonferroni multiple comparisons test). (C) Polyadenylation read-through assay. Random-primed cDNA is made from RNA from wild type or Cstf2t 2/2 mouse testes. qRT-PCR is then performed using primer pairs within the body of the gene (''Upstream'') or downstream of the polyadenylation site (''Downstream''). An increase in read-through is measured as in increase in the downstream value compared to the upstream value in Cstf2t 2/2 mice after normalization. (D) Read-through increases for ISGs in Cstf2t 2/2 mouse testes. The polyadenylation read-through assay described in (C) was performed on the indicated genes and normalized to 1.0 in the wild type mice. The asterisk (P,0.05) and double asterisk (P,0.01) indicate values that differ significantly from the wild type by a Student's t-test. doi:10.1371/journal.pone.0048373.g003 tCstF-64 Controls Genome Expression conditions were 95uC for 3 min, followed by 40 cycles of 95uC (15 sec) and 55uC (45 sec), followed by a dissociation stage. C t data were normalized to the ribosomal protein Rps16 mRNA, which was run in every experiment.

Results and Discussion
High-throughput RNA Sequencing Reveals That Intergenic Regions and Introns Are Overrepresented in Testes of Cstf2t 2/2 Mice Many aspects of mRNA polyadenylation are different in mammalian male germ cells from somatic cells. In germ cells, RNA signals differ [28], alternative sites are used [26,27,29], and germ cell variants of core proteins are involved [16,30,31]. One such variant protein is tCstF-64 (gene symbol: Cstf2t), which is a paralog of the CstF-64 polyadenylation protein [18]. tCstF-64 is essential for spermatogenesis, as male Cstf2t knockout mice are infertile [21,22,23]. Most likely, tCstF-64 is necessary because it is involved in polyadenylation of genes critical for postmeiotic germ cell development. To examine the differences in mRNA species expressed in wild type and Cstf2t 2/2 mouse testes, we performed high-throughput sequencing of cDNAs using the 454 method (Fig. 1A) from 25-day postpartum (dpp) wild-type (,65,000 reads) and Cstf2t 2/2 (,90,000 reads) mouse testis RNAs (25 dpp was chosen because effects of Cstf2t are greatest at that age [21]). Highthroughput cDNA sequencing was used because it offered an unbiased sampling of expressed genome sequences. Similar percentage of reads could be uniquely mapped to the mouse genome for each genotype (Fig. 1B). However, significant differences were found between wild type and Cstf2t 2/2 mice in the fraction of reads mapped to different regions of the genome (Fig. 1C). Cstf2t 2/2 mouse testis RNA showed dramatically decreased exonic regions and greatly increased intronic and intergenic regions represented in cDNAs. This indicated largescale changes in transcriptional and mRNA processing patterns in these mice. Twelve percent of the reads from KO were mapped to genomic regions annotated as repetitive elements whereas only 5% of the reads from WT were so, consistent with our other results.
Distal Polyadenylation Sites are used More Frequently in RNA From Testes of Cstf2t 2/2 Mice Recent surveys have revealed that changes in patterns of poly(A) site usage from more distal sites to more proximal sites in proliferating cells such as cancer cells [26,32,33]. The relative usage of distal poly(A) site (RUD) score is a method for determining overall polyadenylation site use from microarray data [26,27]. Examining both wild type and Cstf2t 2/2 samples, we note that RUD scores decrease progressively from 17 to 85 dpp (Fig. 2). This suggests that 39 UTRs of testicular mRNAs shorten as these animals age. Cstf2t 2/2 mouse samples do not differ significantly from wild type samples at 17 dpp (Fig. 2) when appearance of tCstF-64-expressing pachytene spermatocytes is minimal [19]. However, at 22 dpp, when testis composition of pachytene spermatocytes increases [34], RUD scores of Cstf2t 2/2 mouse testis RNA are increased relative to wild type RNA (Fig. 2). The differences are more pronounced at 25 dpp and in adulthood (85 dpp), demonstrating that lack of tCstF-64 ameliorates partially the progressive decrease of 39 UTR length seen in wild type mice. We note that differences in adult mice are also contributed to by changes in cell types expressed in the Cstf2t2/2 mice [18]. This suggests that tCstF-64 is responsible for the progressive use of proximal polyadenylation sites uses in male germ cell development, and that in its absence more distal sites are used. A second hypothesis is that, in the absence of tCstF-64, transcription reads through distal polyadenylation sites and continues to intergenic regions. This latter hypothesis is supported by the finding that increased amounts of intergenic and intronic genomic regions in Cstf2t 2/2 mouse testis ( Figure 1C). However, these hypotheses are not mutually exclusive, and both might be in effect.
ISGs generally consist of expressed retroposons, cDNA copies of existing genes that are reinserted into the genome [35,36]. ISGs are most prominently expressed in mammalian testis, most likely for spermatogenesis-specific functions and to compensate for MSCI [37,38]. This leads us to propose the hypothesis that an important function of tCstF-64 is to control efficient polyadenylation of ISG mRNAs: Cstf2t is an expressed retroposon and thus an ISG [17,18]. Because most testis-expressed ISGs -including tCstF-64arose around 165 million years ago when mammals diverged from archosaurs [35,39], germ cell-expressed ISGs must have co-evolved with tCstF-64. tCstF-64 therefore could accumulate specialized functions in polyadenylation while the more broadly expressed CstF-64 could maintain its more generalized functions. An additional possibility is that mRNA processing involving tCstF-64 would promote nuclear export of these nonintron containing transcripts [36].
Polyadenylation Read-Through Increases for ISGs in Cstf2t 2/2 Mouse Testis A leading hypothesis for how lack of tCstF-64 would affect polyadenylation and gene expression is that transcription will fail to terminate at typical polyadenylation sites, and instead will continue for up to several kilobases downstream [8,40,41]. Consequently, in Cstf2t 2/2 testes, we expected increased readthrough of affected transcripts. We used quantitative reverse transcriptase-mediated PCR (qRT-PCR) to measure transcript abundance both upstream and downstream of reported sites of polyadenylation for two ISGs, Cetn1 and Tssk6 (Fig. 3B, C). These experiments were performed using both oligo(dT)-primed and random oligonucleotide-primed cDNAs with identical results We chose to show the random-primed results because they lack bias to potential changes in poly(A) addition. In Cstf2t2/2 mouse testis RNA, we observed over 3-fold greater read-through in Cetn1 gene transcription and nearly 1.7-fold read-through in Tssk6 transcription than in wild type testis RNA, implying that 39 end formation and transcription termination did not occur in the normal location for these genes. These data support a model in which tCstF-64 (and, by extension, CstF-64) is necessary for accurate placement of the 39 end processing machinery at polyadenylation sites.

LINE-1 Sequences Are More Abundant in Cstf2t 2/2 Mouse Testis RNA
We examined the sequences of intergenic (Fig. 4A), intronic (Fig. 4B), non-uniquely mapped (Fig. 4C) and unmapped (Fig. 4D) reads. These revealed significant overrepresentation of transposable elements in Cstf2t 2/2 mice, primarily LINE-1 sequences. Other repetitive DNA elements such as short interspersed nuclear elements (SINEs and LTRs) were also affected, although to a much lesser extent (Figs. 4A-D). These differences occur in both intergenic (Fig. 4A) and intronic regions (Fig. 4B). This suggests that LINE gene sequences are most highly represented in the intergenic and intronic regions that are affected by the Cstf2t knockout. There are two possible interpretations of this finding: first, LINE-1 sequence elements, including non-functional and fragmented sequences, are represented in these genomic regions and thus more highly expressed when those regions represented. The second interpretation is that loss of tCstF-64 somehow activates or de-represses LINE gene expression, perhaps by altering chromatin structure or by affecting LINE mRNA polyadenylation directly. This second interpretation is interesting, and we are designing future experiments to distinguish these hypotheses.

Conclusions
Perhaps it was not surprising to see global changes in genome expression in the testes of Cstf2t knockout mice. Polyadenylation has long been linked to transcription and termination. Recent studies have even shown a role for both CstF-64 and tCstF-64 in histone mRNA expression in human epithelial cells [42], which might have further impact on euchromatin structure and expression. While those authors saw a strong effect of tCstF-64 on histone mRNA expression, we did not see evidence for a similar effect on germ cell histone variants in Cstf2t 2/2 mice (with our methodology, we would have detected only the polyadenylated variants [43]). Again, detection of effects on the class of ISGs should not have been surprising. As a class, these genes co-evolved with tCstF-64, and therefore might be assumed to have a full or partial requirement for it.
More surprising was the finding that LINE sequences but not other repetitive sequences such as SINEs were over-represented in Cstf2t 2/2 mouse testis. Two possible explanations come to mind: first, these are LINE fragments within intronic and intergenic sequences that are not full-length or active [44] and are therefore not physiologically relevant. More relevant, however is the second possibility that the absence of tCstF-64 has relieved repression of LINE mRNA expression in germ cells. An earlier report demonstrated LINE mRNA truncation by alternative polyadenylation in mouse fibroblasts [45]. This suggests the exciting possibility that tCstF-64 plays a role in control of LINE mRNA levels in germ cells. Future experiments will differentiate these possibilities.

Supporting Information
Table S1 Primers used for quantitative RT-PCR. (DOC)