Figure 1.
The mammalian L1TD1 gene was born from a tandem insertion of L1 ORF1.
A. Genomic context of L1TD1. Single-copy genes INADL and KANK4 flank L1TD1 in human, mouse and dog genomes; this shared syntenic arrangement helped us identify L1TD1 orthologs in other mammalian genomes. B. L1TD1 evolved according to the accepted species tree. We aligned L1TD1 nucleotide sequences and generated a maximum-likelihood phylogeny (see Materials and Methods). Bootstrap values show the percentage of 1000 replicates in which descendent taxa cluster together, and the scale bar shows substitutions per site according to the GTR+I+G evolutionary model. C. L1TD1 comprises two L1-ORF1p-like regions. L1 elements encode an approximately 6.5 kb transcript containing two open reading frames. L1 ORF1 encodes a protein, ORF1p, with RNA-binding and chaperone activity. ORF1p contains a coiled-coil motif (CC), a RNA-recognition motif (RRM), and a C-terminal domain (CTD). ORF2 encodes a protein with endonuclease and reverse transcription enzymatic functions. Sequence identity demonstrates that L1TD1 was formed from the domestication of two copies of ORF1 from L1. The two copies may derive from independent insertions or from duplication after a single insertion. Human coding exon 1 and coding exon 2 share 30% and 43% amino acid identity, respectively, with ORF1p of human L1.3. In coding exon 1, only the CTD is conserved, while in coding exon 2, the CC, RRM, and CTD are all conserved. Coding exon 2 also contains a variable length glutamic acid-rich region (ER). After splicing, the human L1TD1 transcript is 3849 nucleotides in length and encodes a single 865 amino acid protein product.
Figure 2.
Phylogenetic tree of representative L1-ORF1p sequences and mammalian orthologs of the two L1TD1 ORF1p-like regions.
Predicted protein sequences of L1 ORF1p and the ORF1p-like regions encoded by each L1TD1 exon were aligned and this alignment was used to generate a maximum-likelihood tree (see Materials and Methods). Bootstrap values show percentage of 1000 replicate trees in which the descendent taxa clustered together (only values >50% shown). The scale shows the number of substitutions per site. The tree was rooted using fish swimmer elements as outgroups. The N-terminal L1TD1 ORF1p-like sequences cluster together with a high bootstrap value (node A), as do the C-terminal L1TD1 ORF1p-like sequences (node B), confirming that we have identified true L1TD1 orthologs and that the double ORF1p structure arose just once since the divergence of placental mammals and has not been subject to gene conversion between the two exons since. The L1TD1 N-terminal and C-terminal clades branch off from placental mammal L1-ORF1p sequences (node C) after marsupial and placental mammal L1-ORF1p sequences diverged (node D). The tree does not help to distinguish whether L1 ORF1 was independently domesticated twice, or just once with a subsequent genomic tandem duplication. Extensive sequence divergence between paralogous ORF1p-like sequence means that deep nodes of the tree are poorly resolved. Nonetheless, the tree supports our model in which both L1TD1 exons were born after marsupials and placental mammals diverged.
Figure 3.
L1TD1 has been lost multiple times in eutherian mammals.
A species tree shows the presence or absence of L1TD1 across mammals. Arrows depict L1TD1's genomic locus; black (INADL) and white (KANK4) arrows depict the flanking genes we used to identify syntenic regions, and the blue/green arrow depicts L1TD1. L1TD1's presence in the armadillo genome but not in platypus, opossum, wallaby or Tasmanian devil indicates it was most likely born before the divergence of placental mammals, but after divergence from marsupials (solid branches). L1TD1 function was lost in three lineages (X's, and gray branches); it is present as a pseudogene in megabat and entirely missing from Afrotherian and Cetartiodactylan genomes. The bushbaby L1TD1 gene acquired a novel N-terminal region (depicted in red) through a more recent L1 ORF1p domestication event (red asterisk) that occurred after bushbaby diverged from lemurs (Figure 4B).
Figure 4.
Novelty in L1TD1 of primates and mice.
A. Site-specific PAML analyses reveal a signature of positive selection in L1TD1. The labels on the annotated schematic indicate positions that are highly likely to be evolving under positive selection (P>90%) according to PAML NSsites (Table 1) in primates and mice (above and below the gene diagram, respectively). A species tree of primates or an L1TD1 gene tree of mice, shows branches with statistically significant episodic diversifying selection (p<0.05) according to HyPhy's Branch-site REL (marked with a red asterisk). To the right of each tree, the amino acids found at each positively selected position are shown, along with the length of the glutamic acid-rich region in each primate. Position numberings are based upon the human and M. musculus (C57/BL6) sequences. B. The L1TD1 gene of bushbaby has acquired a novel 5′ end of coding exon 1 through the insertion of a portion of a L1 element from the L1PA15-16 class (shown in red). The gene retains high sequence conservation with L1TD1 of lemurs and simian primates across the latter half of its first coding exon and all of its second coding exon (shown in blue and green). This insertion is unique amongst all the species we have examined, and is not evident in lemur genomes (mouse lemur or aye-aye). Elsewhere in the genome, bushbaby contains at least two complete and two partial processed L1TD1 pseudogenes that allowed us to infer the structure of the active L1TD1 gene.
Table 1.
Primate and mouse L1TD1 are evolving under positive selection.
Figure 5.
Loss of L1TD1 in megabats appears to follow the loss of L1 activity.
We obtained L1TD1 sequences from thirteen bat species (Materials and Methods). We show a species tree partly based upon a published megabat phylogeny [82], inferring placement of additional taxa using species from the same genus. In addition, we used our L1TD1 sequences to resolve relationships in the Pteropus and Myotis/Eptesicus clades. Species in which L1TD1 appears intact are shown in black, and those in which L1TD1 harbors inactivating mutations (stop codons, frameshifting insertions/deletions) are shown in gray. Some species share the same inactivating mutation(s) represented by the subscripts of the gray X symbols, suggesting L1TD1 was lost three independent times in the megabats. For one species, R. eloquens (starred), we were only able to obtain part of coding exon 1 of L1TD1, but this region is intact. Presence or absence of active L1s is based upon previous data [44] and analysis of the M. davidii genome assembly (Yang and Wichman, unpublished data).