Conceived and designed the experiments: CS. Performed the experiments: CS. Analyzed the data: CS. Contributed reagents/materials/analysis tools: CS. Wrote the paper: CS CG. Assisted with interpretation of results: CG.
The authors have declared that no competing interests exist.
In eukaryotes mRNA transcripts of protein-coding genes in which an intron has been retained in the coding region normally result in premature stop codons and are therefore degraded through the nonsense-mediated mRNA decay (NMD) pathway. There is evidence in the form of selective pressure for in-frame stop codons in introns and a depletion of length three introns that this is an important and conserved quality-control mechanism. Yet recent reports have revealed that the efficiency of NMD varies across tissues and between individuals, with important clinical consequences.
Using previously published Affymetrix exon microarray data from cell lines genotyped as part of the International HapMap project, we investigated whether there are heritable, inter-individual differences in the abundance of intron-containing transcripts, potentially reflecting differences in the efficiency of NMD. We identified intronic probesets using EST data and report evidence of heritability in the extent of intron expression in 56 HapMap trios. We also used a genome-wide association approach to identify genetic markers associated with intron expression. Among the top candidates was a SNP in the
While we caution that some of the apparent inter-individual difference in intron expression may be attributable to different handling or treatments of cell lines, we hypothesize that there is significant polymorphism in the process of NMD, resulting in heritable differences in the abundance of intronic mRNA. Part of this phenotype is likely to be due to a polymorphism in a decapping enzyme on human chromosome 3.
The transcriptome of higher eukaryotes is complex and diverse, with multiple isoforms present for most genes, resulting from heterogeneity at several stages of the generation and processing of RNA from transcription initiation to splicing and polyadenylation. The diversity of the splicing step has been intensively studied. Microarrays that target splice junctions were used to demonstrate that the majority of human genes are alternatively spliced
The rate at which mRNA of a transcript comes into the system is just one part of the equation determining mRNA levels. Different mRNA isoforms have different stabilities and can have decay rates ranging over orders of magnitude
There is good evidence of variability in the efficiency of NMD across tissues
In this study, we set out to determine whether there is evidence of transcriptome-wide differences in the abundance of intron-containing mRNA isoforms observed in different cell lines, using exon microarray data from published studies. We identified putative intronic probesets on the Affymetrix Human Exon 1.0 ST microarray and estimated the relative frequency with which these probesets were included in the final transcript. Intron-containing transcripts may be unprocessed, partially processed or fully processed mRNA in which one or more introns have been retained. Differences in the abundance of intron-containing transcripts may, therefore, reflect differences in the efficiency of splicing or in the efficiency with which intron-containing transcripts are degraded through NMD.
The microarray data included 56 complete trios from the HapMap
We used data generated with the Affymetrix Human Exon 1.0 ST array from human cell lines to investigate heritability in the relative expression of introns in human. Because the probes on the array target computationally predicted exons as well as known exons and exons derived from ESTs and other sources, the targeted regions are likely to include a proportion of introns as well as correctly predicted exons. From among the approximately 1.4 million probesets on the array, we identified 269,007 that are likely to lie within introns, as described in
To investigate the possibility of transcriptome-wide differences in the proportion of intron containing transcripts, we restricted to those intronic probesets that were detectable above background (p<0.05) in at least 20 of the 176 HapMap individuals and for which at least one core probeset in the corresponding gene was detectable above background in the majority of the individuals. We then ranked the normalized intensities for the probeset across all of the arrays and obtained a summary statistic for each array by calculating the average rank of its normalized intensity values across all intronic probesets. This intron expression summary statistic showed substantial variability across arrays – indicating that in some of the cell lines the normalized intensity of intronic probesets (i.e. their relative inclusion in the mature transcript) is generally lower than for other cell lines. Values ranged from 56.6 (indicating an array for which, on average, the normalized intensity of intronic probesets ranked 56.6th out of 176 arrays, i.e. in the lower 32.2%) to 119.4 (corresponding to the top 67.8%) for the array with the highest value. The distribution of the summary statistic across arrays is provided as supplementary
Although there was evidence of substantial difference in relative intron expression across arrays, it was not clear to what extent this reflected biologically significant differences between the samples or merely differences in the arrays or sample preparations. To investigate this we obtained a second set of data, consisting of exon microarrays applied to members of the CEPH 1444 pedigree by Kwan
a) stripchart showing technical and biological replicates for two lymphoblast cell lines. Separate cell passages are shown in different colours. b) boxplots and superimposed stripcharts of biological replicates (separate cell passages) of four cell lines.
To search for evidence of heritability in relative intron expression we returned to the exon array data from the HapMap trios
Points in the scatter plot of offspring values of the summary statistic against mean of the parent values are colored in blue for CEU trios and black for YRI trios. The estimated regression line (from the combined data) together with upper (green) and lower (red) bounds on the regression line estimates are shown.
As an additional check, we also compared the intron expression statistic between parents of the trios and found that these were significantly correlated (r = 0.44; p = 0.0007), suggesting that a batch effect may be involved in the apparent heritability of the intron expression summary statistic. When we analyzed the CEU and YRI samples separately we found that the correlation between parents was significant only for the YRI (p = 0.0003 and p = 0.23 for the YRI and CEU trios, respectively). The YRI parents have been found to be more closely related to one another than random pairs of YRI samples while this was not the case for the parents of the CEU trios
We used a genome-wide association approach to search for loci that are associated with the relative intron expression statistic. To avoid the inclusion of close relatives, we took only the data from the parents of the HapMap trios and carried out an additive test of association between SNPs genotyped as part of the HapMap project and the intron expression summary statistic. We carried out the test separately on the CEU and YRI samples, to avoid the effects of this population structure on the analysis. Histograms of the distributions of p-values obtained from individual tests of association and qqplots of the logarithm to the base ten of the p values against the logarithm of random draws from the uniform distribution are shown in
Quantile-quantile plots of log10 of the p-values from the CEU (c) and YRI (d) association tests against log10 of random values, drawn from the uniform distribution.
Successive chromosomes are shown in alternating colours on the plot. Results from the CEU and YRI populations are shown in panels a and b, respectively.
Markers associated with the intron expression phenotype are shown in
SNP | Location | Function | Associated gene | Population | P-value |
rs7088129 | 10: 84230104 | Intronic | NRG3 | CEU | 1.9×10−9 |
rs659554 | 10: 84341085 | Intronic | NRG3 | CEU | 1.2×10−8 |
rs2115904 | 15: 29424204 | Intergenic | None | YRI | 9.2×10−9 |
rs4878127 | 9: 89577039 | Near 5′ | CTSL3 | YRI | 7.1×10−8 |
rs17053466 | 9: 89579157 | Intronic | CTSL3 | YRI | 7.1×10−8 |
rs10512189 | 9: 89580577 | Intronic | CTSL3 | YRI | 7.1×10−8 |
rs9311496 | 3: 53347118 | Intronic | DCP1A | YRI | 9.6×10−8 |
rs16889633 | 6: 24811421 | Intergenic | None | YRI | 1.1×10−7 |
rs686394 | 12: 9966342 | Intergenic | None | YRI | 1.3×10−7 |
rs17101452 | 14: 74157738 | Intergenic | None | YRI | 2.0×10−7 |
rs888419 | 14: 74158848 | Intergenic | None | YRI | 2.0×10−7 |
rs4899337 | 14: 69740936 | Intergenic | None | YRI | 3.5×10−7 |
rs13329672 | 15: 56487229 | Intergenic | None | YRI | 3.5×10−7 |
rs410509 | 3: 5216309 | Synonymous | EDEM1 | YRI | 3.5×10−7 |
rs377120 | 3: 5216223 | Intronic | EDEM1 | YRI | 3.6×10−9 |
rs11201378 | 10: 86856283 | Intergenic | None | YRI | 3.7×10−7 |
rs12244919 | 10: 86860618 | Intergenic | None | YRI | 3.7×10−7 |
rs12252660 | 10: 86860718 | Intergenic | None | YRI | 3.7×10−7 |
rs11201382 | 10: 86862554 | Intergenic | None | YRI | 3.7×10−7 |
The table shows the combined results of the association test in the CEU and YRI populations.
As a further test for evidence of a contribution of NMD to the differences in the intron expression, observed between individuals, we identified a set of unprocessed pseudogenes, expressed above background (p<0.05) in at least 10 cell lines. Because NMD is involved in degrading pseudogenes
In conclusion, we found evidence that the normalized expression intensity, averaged across intronic probesets, shows reproducible differences between individuals and report that this appears to be a heritable trait in humans. However, we caution that this analysis is subject to any batch effects relating to the collection and treatment of the cell lines and report a correlation between parents of the YRI HapMap trios that we were not able to explain fully. This is the first study to explore transcript-wide differences between human individuals in the types of mRNA isoforms observed and it points to the contribution of
Raw microarray intensity data generated using the Affymetrix GeneChip Human Exon 1.0 ST array from 176 HapMap cell lines by Huang
For each probeset from the exon array, we counted the number of times the probeset was within the spliced portion of an EST, thus putatively in an intron. Spliced portions of ESTs were inferred from gaps in the alignments of ESTs to the genome, of at least 50 bp, surrounded by upstream and downstream aligned blocks of at least 20 bp. We also counted the number of times the probeset fell within the exonic (i.e. aligned) portion of an EST. Probesets that occurred in intron-like gaps in the genomic alignments of at least 10 ESTs and which occurred at least five times as frequently in gaps than in aligned blocks were designated as intronic. In some cases, these probesets may overlap skipped exons that are included infrequently in the mature transcript. However, given the large number of probesets identified in intron-like alignment gaps (269,0007) and given the much greater length of introns than exons, the majority of these probesets are likely to be non-exonic.
We calculated normalized probeset intensity by dividing the probeset intensity by the estimated intensity of the corresponding meta-probeset (i.e. transcript)
We obtained genotype data from the CEU and YRI trios from HapMap
Distribution of the summary statistic of intron expression across cell lines.
(1.32 MB TIF)
The authors would like to thank Andrew Flaus and three anonymous reviewers for comments that resulted in improvements to the manuscript.