Cryptic Variation in the Human Mutation Rate

Alan Hodgkinson; Emmanuel Ladoukakis; Adam Eyre-Walker

doi:10.1371/journal.pbio.1000027

Loading metrics

Open Access

Peer-reviewed

Research Article

Cryptic Variation in the Human Mutation Rate

Alan Hodgkinson,

Affiliation Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
⨯
Emmanuel Ladoukakis,

¤ Current address: Department of Biology, University of Crete, Iraklio, Greece

Affiliation Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
⨯
Adam Eyre-Walker

*To whom correspondence should be addressed. E-mail: a.c.eyre-walker@sussex.ac.uk

Affiliation Centre for the Study of Evolution, School of Life Sciences, University of Sussex, Brighton, United Kingdom
⨯

Cryptic Variation in the Human Mutation Rate

Alan Hodgkinson,
Emmanuel Ladoukakis,
Adam Eyre-Walker

Published: February 3, 2009
https://doi.org/10.1371/journal.pbio.1000027

Reader Comments

Post a new comment on this article

Paralogous sequences

Posted by AdamEyreWalker on 26 Oct 2010 at 16:36 GMT

It has been pointed out to us that the excess of coincident SNPs could be due to paralogous sequences that have been incorrectly assembled on to the same location. Substitutions between such paralogous sequences would appear to be SNPs. If the same mis-assembly error occurred in both humans and chimpanzees (and macaques) and the substitutions occurred before the species split, then an excess of coincident SNPs would be generated. Musumeci et al. [1] have recently estimated that ~8.3% of all human single SNPs in dbSNP may be artifacts due to this problem.

First, we note that the pattern of coincident SNPs is not consistent with the mis-assembly hypothesis; under this hypothesis we would expect an excess of transition coincident SNPs, since transitions dominate the process of mutation and substitution; but we observe a stronger excess of transversions, and in particular AT/AT coincident SNPs .

However, to explore the issue further we performed two analyses. In the first we repeated the analysis of Musumeci et al. [1] on our coincident SNPs. They blasted human SNPs from dbSNP against the human genome and considered cases in which the SNP mapped to two or more location, where a successful match was defined as cases in which at least 20% of the full length SNP sequence had at least 90% identity. The SNP was considered to be potentially artifactual if the two bases involved in the SNP were found in the two different mapped locations at the site of the putative SNP. We repeated this analysis using coincident SNPs and found that of our 11571 coincident SNPs, 9611 mapped to a unique location, 233 had multiple matches, but did not contain the nucleotides involved in the SNP, and 269 had multiple matches to the reference genome and the nucleotides involved in the SNP were found at the site of the SNP in the two locations; 95 SNPs did not match the reference. We therefore estimate that at most 2.3% of coincident SNPs are due to known duplicated sequences. This analysis suggests that known paralogy can only explain a very small fraction of our coincident SNPs.

However, this analysis does not allow us to assess the impact of duplicated regions that are yet to be identified. We therefore considered the minor allele frequency of the human SNP in the coincident SNP was greater than that of randomly selected human SNPs from dbSNP. If a disproportionate number of coincident SNPs are due to mis-assembly then they should have higher MAF, because an artifactual SNP generated by substitution between two duplicated regions should have a MAF of 50%. We were able to obtain the MAF for 7801 of our coincident SNPs; these have a mean MAF of 0.274 (95% CIs of 0.270, 0277). The same number of randomly chosen SNPs have a mean MAF of 0.271 (0.267, 0.274) (t-test is not significant, p=0.241). Again there is no evidence that paralogy is contributing significantly to the excess of coincident SNPs.

Acknowledgements: We are very grateful to Richard Durbin, Ewan Birney, Peter Keightley, Philip Johnson and Ines Hellman for helpful discussion.

1. Musumeci L, Arthur JW, Cheung FS, Hoque A, Lippman S, et al. (2010) Single nucleotide differences (SNDs) in the dbSNP database may lead to errors in genotyping and haplotyping studies. Hum Mutat 31: 67-73.

No competing interests declared.

RE: Paralogous sequences

AdamEyreWalker replied to AdamEyreWalker on 12 Nov 2010 at 09:53 GMT

There is a slight mistake in the report above; I incorrectly state that 95 coincident SNPs didn't match the reference human genome, whereas in fact this was 1363. Therefore at most 2.6% of SNPs, which match the reference genome, are due to paralogy. Our conclusions remain unaffected.

No competing interests declared.

Subject Areas
?

For more information about PLOS Subject Areas, click here.
We want your feedback. Do these Subject Areas make sense for this article? Click the target next to the incorrect Subject Area and let us know. Thanks for your help!

Single nucleotide polymorphisms
Is the Subject Area "Single nucleotide polymorphisms" applicable to this article?

Thanks for your feedback.
Chimpanzees
Is the Subject Area "Chimpanzees" applicable to this article?

Thanks for your feedback.
Nucleotides
Is the Subject Area "Nucleotides" applicable to this article?

Thanks for your feedback.
DNA methylation
Is the Subject Area "DNA methylation" applicable to this article?

Thanks for your feedback.
Mutation
Is the Subject Area "Mutation" applicable to this article?

Thanks for your feedback.
Sequence alignment
Is the Subject Area "Sequence alignment" applicable to this article?

Thanks for your feedback.
Human genomics
Is the Subject Area "Human genomics" applicable to this article?

Thanks for your feedback.
Gene expression
Is the Subject Area "Gene expression" applicable to this article?

Thanks for your feedback.

Cryptic Variation in the Human Mutation Rate

Cryptic Variation in the Human Mutation Rate

Reader Comments

Post Your Discussion Comment

Why should this posting be reviewed?

Thank You!

Paralogous sequences

Posted by AdamEyreWalker on 26 Oct 2010 at 16:36 GMT

RE: Paralogous sequences

AdamEyreWalker replied to AdamEyreWalker on 12 Nov 2010 at 09:53 GMT