Figure 1.
Algorithm for Identifying Losses of Well-Established Genes in the Human Lineage Since the Common Ancestor of Euarchontoglires
TransMap predicts (pseudo)genes in the human and dog genomes by syntenically mapping mouse mRNA gene structures to the target genomes through genome alignments and then transferring over the corresponding genomic coordinates. The predicted coding regions are conceptually translated and scanned for ORF-disrupting mutations. In this example, a stop codon (labeled with an “*”) is detected in the first coding exon mapped to the human genome, which has also experienced an insertion. Genomic insertions or deletions are shown as red rectangles. Of 19,541 mouse RefSeq genes, 1,008 are identified as initial candidate gene losses in the human lineage based on the differential mutation status in the TransMap results. The list is narrowed down to 72 after eliminating those overlapping with human transcription evidence and filtered out by a manual inspection. Twenty-six are identified as losses of well-established genes in the human lineage after analyzing their duplication histories.
Table 1.
Gene Losses in the Human Lineage since the Common Ancestor of Euarchontoglires
Figure 2.
A Nonsense Mutation Inactivated ACYL3 During Great Ape Evolution
It occurred after the divergence of gorillas from the human lineage and before the human–chimp split. The nonsense mutation is located in exon 13 of ACYL3. A multispecies syntenic alignment showing the nonsense mutation (“*”) lies in a highly conserved protein coding region. The stop codon mutation (TGA) is present in the human and chimp genomes, but a TGG tryptophan (W) codon is present in the rhesus, mouse, rat, dog, and other mammalian genomes. The region maps to human Chromosome 18 at location 54,881,070–54,881,124. We sequenced the genomic region in a gorilla DNA sample to show that the codon TGG (W) codon is present in the gorilla genomes.
Figure 3.
Timing of the Gene Losses in the Human Lineage Since the Common Ancestor of Euarchontoglires, Estimated Based on Shared Mutation Analysis
Branch intervals enclosing the earliest ORF-disrupting mutations shared between human and other mammals are illustrated on the human lineage of a mammalian species tree. Genes are represented by numbers, which correspond to their row numbers in Tables 1 and 2. Marks on the rhesus lineage represent independent ORF-disrupting mutations that are not shared with the ones in the human lineage. Approximate time when the species diverged from the human lineage is shown in Mya. Species with complete genome sequences are enclosed by rectangles, while others only have trace sequences available for analysis. Orang.: orangutan; Marm.: marmoset; T.shrew: tree shrew.
Table 2.
Losses of Well-Established Genes in the Human Genome, Classified Using Estimated Functional Time Length
Figure 4.
Timing of Gene Birth Is Estimated by Determining Duplication Histories of Genomic Regions Surrounding the Gene Losses
For the subset of gene losses with detectable human self-alignment, the duplication branch is determined by tracing each duplicate of the best human self-alignment through a seven-species syntenic genomic alignment. For those without detectable self-alignments, the duplication branch is determined by the seven-species syntenic alignments plus alignments with the chicken genome, and the elephant, tenrec, and armadillo scaffolds when available. Filled rectangles represent syntenic alignment to the human genome, and open rectangles represent genomes or scaffolds aligned to the human genome without the syntenic constraint. Approximate time when the species diverged from the human lineage is shown in Mya.