Fig 1.
mtDNA sequence quality control.
(A) Correlation of allele frequencies of the variants between 17,815 sub-haplogroup tagged mtDNA sequences and the remaining 12,691 non sub-haplogroup tagged mtDNA sequences. (B) Correlation of 59 mtDNA variant frequencies between 9,935 independent population controls from the Wellcome Trust Case Control Consortium and this study. (C) Correlation of allele frequencies of the variants between 1,370 mtDNA sequences from 1000 Genome Project (1kg) and 30,506 GenBank NCBI mtDNA sequences. (D) Ratio of non-synonymous (NS) to synonymous (S) mtDNA variants in 30,506 mtDNA sequences compared to an independent published dataset [9], the numbers of NS and S are shown in bar chart at the bottom. (E) Allele frequencies of ten common disease-causing mutations. There was no difference when compared to previously published values determined through a population-based study in sequential healthy live-births in Europeans[10].
Fig 2.
The distribution of variant frequency and assessing the pathogenicity score in 30,506 mtDNA sequences.
(A) Circos plot summarizing all of the genetic data in 30,506 mtDNA sequences. From outside the circle to inside: (1) mtDNA position, (2) mtDNA genes, (3) mtDNA Complex, (4) frequency of all variants in 30,506 mitochondrial sequences(range 0 to 98.70%), (5) frequency of diseases-causing mutations in 30,506 mitochondrial sequences(range 0 to 0.89%), (6) frequency of all variants in L group(range 0 to 99.45%), (7) frequency of diseases-causing mutations in L group(range 0 to 0.65%), (8) frequency of all variants in M group(range 0 to 99.60%), (9) frequency of diseases-causing mutations in M group(range 0 to 3.11%), (10) frequency of all variants in N group(range 0 to 98.60%), (11) frequency of diseases-causing mutations in N(range 0 to 1.26%), Color code for circles (4)–(11): Red—frequency of diseases-causing mutations, blue—frequency of all variants. (B) The distribution of frequency of variants in each macro-haplogroup. MtDNA variants in were ordered based on frequency from high to low. The right-hand panel highlights the variants with frequency above 0.5% in each group. (C) Probability distributions of the observed pathogenicity scores for all population variants and defined disease-causing mutations. (D) Probability distributions of the pathogenicity scores for all variants and disease-causing mutations within each macro-haplogroup. (E) Proportion of samples carrying disease-causing mutations. (F) Percentage of mtDNA sequences harboring two of more disease-causing mutations.
Fig 3.
Frequency of disease causing mtDNA mutations in each macro-haplogroup.
(A) Frequency of 13 disease causing mutations present at >0.1% frequency in each haplogroup. (B) Disease-causing mutations significantly associated with specific mtDNA haplogroups. Uncorrected p-value thresholds are shown.
Fig 4.
The mutational signatures observed in 30,506 mtDNA sequences.
Each signature is displayed according to the 96 substitution classification defined by the substitution class and sequence context immediately 3’ and 5’ to the mutated base. The probability bars for the six types of substitutions are displayed in different colors and labeled at the top of the graph. The mutation types are on the horizontal axes at the bottom of the graph. (A) All possible variants in macro-haplogroup L. (B) All possible variants in macro-haplogroup M. (C) All possible variants in macro-haplogroup N. (D) All diseases-causing mutations in macro-haplogroup L. (E) All diseases-causing mutations in macro-haplogroup M. (F) All diseases-causing mutations in macro-haplogroup N. The arrows highlight the variants in CpG dinucleotides.
Fig 5.
Predicted age of each mtDNA sequence.
Age calculations are based on the Rho. (A) Distribution of the mean pathogenicity scores of each mtDNA sequence and predicted age (Rho). Orange “x”: mtDNA sequence not carrying disease-causing mutations; Purple “x”: mtDNA sequence carrying disease-causing mutations. (B) Distribution of mean pathogenicity scores of each mtDNA sequence and predicted age (Rho) (C) Distribution of mtDNA sequences with disease-causing mutations according to predicted age (Rho). (D) Distribution of mtDNA sequences with non-disease-causing mutations according to predicted age (Rho).
Fig 6.
Pathogenicity of mtDNA tRNA variants in 30,506 mtDNA sequences.
(A) Probability distributions of the observed pathogenicity scores for all tRNA variants and defined disease-causing tRNA mutations. (B) Probability distributions of the pathogenicity scores for all tRNA variants and disease-causing tRNA mutations within each macro-haplogroup. (C) Distribution of the mean pathogenicity scores for tRNA variants of each mtDNA sequence and predicted age (Rho). Orange “x”: mtDNA sequence not carrying disease-causing tRNA mutations; Purple “x”: mtDNA sequence carrying disease-causing tRNA mutations. (D) Distribution of mean pathogenicity scores for tRNA variants of each mtDNA sequence and predicted age (Rho) within each macrohaplogroup.