The authors have declared that no competing interests exist.
Conceived and designed the experiments: YF XW AX. Performed the experiments: YG YS JL LW. Analyzed the data: YF. Wrote the paper: YF YG AX.
Gene transcribing with alternative polyadenylation (APA) sites leads to mRNA isoforms, which may encode different proteins or harbor different 3'UTRs. APA plays an important role in regulating gene expression network among various physiological processes, such as development, immune responses and cancer. Several methods of library construction for APA study have been developed to apply high-throughput sequencing. However, the requirement of high-input RNA and time-consuming nature of the current methods limited the studies of APA for the samples difficult to obtain. Here, we describe a new method based on our SAPAS in combining
Transcription termination by RNA Pol II (RNAPII) in eukaryotes is conducted with recognizing the poly(A) signals by cleavage and polyadenylation complex factors. More than half of the genes harbor alternative polyadenylation (APA) sites. Tandem APA sites located in 3'UTR region can lead to transcription of different mRNA isoforms with various 3'UTRs. Various biological effects associated with tandem APA were investigated, including cancer transformation [
Noticed the important function of APA, several groups [
A breast cancer cell line MCF7 (a gift from Dr. Erwei Song’s lab, Department of Breast Surgery, No. 2 Affiliated Hospital, Sun Yat-sen University, Guangzhou, China) was cultured in Dulbecco’s modified Eagle’s medium (DMEM), and a human normal mammary epithelial cell line MCF10A (a gift from Qiang Liu’s lab, State Key Laboratory of Oncology in South China, Sun Yat-sen University, Guangzhou, China) was cultured in monolayer in DMEM/F12. Total RNA was extracted from the cells using QIAGEN RNeasy® Mini kits, and maintained in RNase-free water. The quality of the samples was checked with agarose gel electrophoresis and OD260/280 ratio greater than two.
Twelve libraries with different barcodes were pooled together and sequenced with Hiseq 2500 with rapid run mode. To overcome the problem of the homogeneous nucleotide T of the first 20 bases, 20 dark cycles were taken and then 55 bp were further sequenced.
The raw reads were mapped to the human genome (hg19) using Bowtie [
A dataset of genes was constructed by obtaining the genes with the largest 3’UTR for each stop codon from UCSC known genes. The merged poly(A) sites were mapped to these 3'UTRs of the new dataset, and the poly(A) sites mapped to a single 3’UTR were used for further analysis. The expression levels of poly(A) sites were calculated as the number of raw reads scaled to the sample with the lowest number of reads.
To reduce the variance of 3’UTR length across genes, we standardized the length by designating the longest 3’UTR as 1.0 and calculated the weighted mean of 3’UTR length with multiple APA sites for each gene.
A test of linear trend alternative to independence [
The current methods of poly(A) sequencing, including our SAPAS method [
The method mainly includes the steps of RNA fragmentation, two rounds of reverse transcription, PCR and size selection. See the
Using IVT-SAPAS, with 200 ng of initial total RNA, we profiled poly(A) sites of a human mammary epithelial cell line (MCF10A) and a breast cancer cell line (MCF7) with Hiseq 2500 rapid mode. Three biological replicates were performed for each cell line. On average, 27.9 million raw reads for each sample were obtained (
To check the poly(A) site sequencing efficacy of IVT-SAPAS, we first annotated the reads and poly(A) sites to known poly(A) sites, 3'UTRs, introns, CDS, 1 kb downstream, and noncoding gene and intergenic regions. We found that 90.5% of reads could be mapped to poly(A) sites in the UCSC and Tian's poly(A) databases (
The reads and poly(A) sites were firstly mapped to the known sites of UCSC and Tian's poly(A) database, and the unmapped were annotated to 3'UTR, intron, CDS, 1kb_downstream and intergenic region. A) Pie-chart of mappig location of reads; B) Distribution of mapping location of poly(A) sites; C) Distribution of poly(A) signals; D) Nucleotide composition flanked poly(A) sites.
Some genes have multiple stop condons by the interaction of alternative splicing and APA, then we constructed a gene dataset by selecting the known genes with the largest 3’UTR for each stop codon from UCSC known genes database. We mapped the poly(A) sites to these 3'UTRs of the new dataset. In total, 10,341 genes were mapped with 9,812 UCSC known gene clusters. We created a scatter plot and calculated pairwise correlation coefficients (R2) of poly(A) site expression levels (
The 3'UTRs with multiple poly(A) sites were defined as tandem 3'UTRs. These poly(A) sites were tandem APA sites. The data were used for the subsequent analysis. We first calculated the mean standardized 3'UTR length of each gene in each sample. Both box (
To reduce the variance of 3’UTR length across genes, we standardized the length by designating the longest 3’UTR as 1.0 and calculated the weighted mean of 3’UTR length with multiple APA sites for each gene.
We found 511 UCSC known gene clusters with different stop codons with reads mapped to 3'UTR regions, which can lead to truncated protein. We summed the poly(A) reads of each gene for each sample as their expression levels. Then, with the mean expression level of each cell line, we compared the gene expression between MCF7 and MCF10A and found 72 significant gene clusters (Fisher Exact test, p≤0.05 corrected by Bonferroni method) (
One of the examples switched to distal stop codon in MCF7 is
MCF7 prefers to use the full length transcript compared to MCF10A. The inner graph shows qRT-PCR validation (p<0.01 with t test). Two pair of primers (proximal and distal) were used to measure the expression level of the mRNA isoforms.
Here, based on SAPAS method, we developed a new APA sequencing method by integrating
At least 1 μg of total RNA is usually needed to prepare RNA-seq library with Illumina standard protocol. The average length of human mRNA is about 2 kb and only 200bp upstream of poly(A) are usually amplified for poly(A) sites sequencing, then more abundant total RNA are usually required for these methods than RNA-seq. Actually tens of μg total RNA are required for previous methods to construct poly(A) sites sequencing library [
In this method, we directly add the Illumina adaptors by reverse transcription and PCR, which reduces the difficulty of the experiment comparing to 3P-seq [
We found genome wide shorter 3'UTRs in cancer cell than normal cell, which is consistent with previous findings [
The poly(A) sites were classified into eight classes as described in text.
(PDF)
(PDF)
The red lines show the diagonal lines.
(PDF)
(DOCX)
p values were corrected by bonferroni method. The genes confirmed by qRT-PCR were labeled by yellow color.
(DOCX)
(DOCX)
(DOCX)
The authors are grateful to Dr. Jingde Zhu and Xueqin Wang for their suggestions. This work was supported by National High-tech R&D Program of China (863 Program) (No. 2012AA02A520 to W.X.), National Natural Science Foundation of China (No. 30730089 to A.X. and No. 91331113 to Y.F.), and Committee of Science and Technology Innovation of Shenzhen (No. JCYJ20120831173744342 to Y.F.). A.X. is the recipient of an “Outstanding Young Scientist Award” from the National Natural Science Foundation.