Molecular Evolution of PTEN Pseudogenes in Mammals

Phosphatase and tensin homolog (PTEN) is a tumor-suppressor gene. PTEN pseudogene (PTENp) acts as an endogenous RNA, which regulates its parental gene by competitively binding to the 3’ UTR of PTEN gene in the human. Despite the importance of this pseudogene, little is known about the molecular evolution of PTENp in mammals. In this study, we identified 37 pseudogenes from 65 mammalian genomes. Among them, 32 were from rodents or primates. Phylogenetic analyse showed a complex evolutionary history of this gene family. Some PTENps were shared both in primates and rodents. However, some PTENps were shown to be species-specific, such as the tasmanian devil PTENp1, nine banded armadillo PTENp1 and gibbon PTENp1. Most interestingly, the naked mole rat (NMR), an anticancer model organism, possessed 17 copies of PTENps, which were classified into four clades based on the phylogenetic analyses. Furthermore, we found that all the 3’UTR of PTEN and PTENps shared common microRNA (MicroRNA) binding sites in NMR, based on our prediction of specific MicroRNA binding sites. Our findings suggested that multiple gene duplications have occurred in the formation of PTEN/PTENp gene family during the evolution of mammals. Some PTENps were relatively ancient and were shared by primates and rodents; others were newly originated through species- specific gene duplications. PTENps in NMR may function as competitive endogenous RNAs (ceRNAs) to regulate their counterpart genes by competing for common MicroRNAs, which may be one of the interpretations for the cancer resistance in NMR.


Introduction
In 1977, Jacq et al found a truncated version of the 5S ribosome DNA in Xenopus laevis, which is homologous to the native gene, and this fragment of genomic sequences was first named Pseudogene [1]. Traditionally, pseudogenes were defined as the functionless relatives of protein-coding genes, mainly due to the presence of premature stop-codons or frame shifts, and have long been viewed as the non-functional genomic remnants during evolution [2]. Based on their formation mechanisms, pseudogenes can be classified into three categories, which are unitary pseudogenes, unprocessed pseudogenes, and processed pseudogenes. Unitary pseudogenes, previously referred to those functionless genes, originated from functional genes by various mutations. Unprocessed pseudogenes are derived directly from duplications of DNA sequences, with their original intron-exon structures and promoter elements having been maintained. Processed pseudogenes are formed by retrotransposition of mRNA transcripts. Introns and other regulatory elements such as enhancers and promoter elements have been lost during the process of pseudogenization.
It is proposed that messenger RNA, transcribed pseudogenes, and long non-coding RNAs can crosstalk by competing for common MicroRNAs [3,4]. These RNA transcripts were termed as competitive endogenous RNAs (ceRNAs).The activity of ceRNAs forms a large-scale regulatory network across the transcriptome. More and more experimental evidences, such as PTEN-PTENP1 [5], TUSC2-TUSC2P [6], HMGA1-HMGA1Ps [7], CYP4Z1-CYP4Z2P [8] and BRAF-BRAFP1 [9], support the ceRNA regulation hypothesis. For example, PTEN negatively regulates intracellular levels of phosphatidylinositol-3,4,5-trisphosphate in cells and acts as a tumor suppressor by negatively regulating Akt/PKB signalling pathway [10].The PTEN pseudogene (PTENp1) is a processed pseudogene, which shows high sequence similarity with PTEN in human. The binding sites of the MicroRNAs, including miR-20a, miR-19b, miR-21, miR-26a and miR-214, are highly homologous in the 3'UTR of PTEN and PTENp1, and those MicroRNAs are able to regulate the translation of PTEN in humans [5]. PTENp1 can thus regulate PTEN by competitively binding to these MicroRNAs, and serving as decoy for PTENrelated MicroRNAs. Furthermore, decreasing of the copy number of PTENp1 was observed insporadic colon cancer, which was correlated with a decrease of PTEN, thus leading to the proposal that PTENp1 is a bona fide tumour suppressor gene [5]. In addition, Johnsson et al. reported that PTENp1-expressed transcripts can also actasantisense RNAs (asRNAs) to regulate PTEN expression at both transcriptional and post-transcriptional levels [11]. PTENp1 encoded two asRNA isoforms: PTENp1 asRNA alpha and beta. The alpha isoform acts as a negative regulator for transcription of PTEN. Because of the sequence homology, PTENp1 asRNA alpha recruits the DNA methyltransferase 3a (DNMT3a) and Enhancer of Zeste Homolog 2 (EZH2) to the PTEN promoter, resulting in PTEN transcription suppressed by the formation of H3K27me3. In contrast, the beta isoform forms RNA-RNA interactions with PTENp1 sense transcript. This RNA-RNA interaction stabilizes PTENp1 sense, consequently affecting MicroRNA sequestration and ultimately PTEN protein level.
Except for PTENp1, some other pseudogenes were reported to perform as ceRNAs. The tumour suppressor candidate-2 gene pseudogene (TUSC2P) can talk with the TUSC2 gene through MicroRNA response elements (MREs), as well as PTEN-PTENP1. The3'UTR of TUSC2P captures these TUSC2-targeting MicroRNAs, which increases the translation of TUSC2 and then inhibits cell proliferation [6]. In addition, Esposito and co-workers found seven pseudogenes homologous to the high mobility group AT-hook 1 (HMGA1) gene, which is associated with insulin resistance and carcinogenesis [7]. Two of them, the HMGA1P6 and HMGA1P7, showed high sequence similarity with each other and conserved MRE with the parental gene. HMGA1P6 and HMGA1P7 also act as ceRNAs by competitively binding to MicroRNAs with the HMGA1, regulating the expression of HMGA1 and accordingly increasing proliferation and cell migration [7]. Florian et al. discovered that BRAFP1 functions as a ceRNA of BRAF in humans and mice, competing for miR-134, miR-543, miR-653, miR-30a, miR-182 and miR-876 [9]. Most interestingly, the effect of over-expression of the 3'UTR of BRAFP1was more significant than over-expression of its CDS on the parental gene expression and proliferation [9]. Overall, these findings suggest that 3'UTRs from both pseudogenes and coding genes may possess powerful biological activity through their ability to act as endogenous decoys for MicroRNAs.
Despite the importance of those functional pseudogenes, their evolutionary histories were largely unknown. In this study, we investigated the molecular evolution of PTEN/PTENp gene family in mammals. By searching the available mammalian genome sequences, we found 37 pseudogenes from 65 mammalian genomes. Most intriguingly, we identified 17 copies of PTENps from naked mole rat (NMR), an anticancer model organism, and found that all of these genes shared common MicroRNA binding sites with their PTEN gene, suggesting that the PTENps in NMR may be functional in regulating their cognate genes by competing for MicroRNA binding sites, just as that found in the humans.

Materials and Methods
Our animal experiment was approved by the Institutional Animal Care and Use Committee of the Sichuan Agricultural University under permit number DKY-B20150301

Sequences obtain
The PTEN mRNA sequences from 65 mammals were downloaded from National Centre for Biotechnology Information (NCBI) GenBank, and their PTENps were identified by BLAST, the reference genomic sequences database, using PTEN mRNA as query. All potential pseudogenes meet at least one of the following three criterions:1. incomplete open reading frame (ORF), 2. frame-shifts and 3. premature stop codons. All were labelled as pseudogenes in GenBank.

Phylogenetic analyses
As different regions of a gene play different roles and are, apparently, subjected to different stringencies of functional constraints, it has been customary to treat different regions separately. In contrast to the coding regions of genes, the rates in non-coding regions are usually higher. Furthermore, most of them vary greatly in the length of these noncoding regions. For example, the length of 3'UTR of PTEN/PTENps in primates are largerat1000bp, but in most of other species are less than 1000bp.This variation makes the phylogenetic analyses using noncoding regions very difficult.Therefore, in this study, we only compared the evolutionary rate of CDS of PTEN/PTENps. The CDS region of PTEN and PTENp sequences of mammals were aligned using ClustalW in BioEdit [12] followed by manual adjustments. Maximum Likelihood (ML), Maximum Parsimony (MP) and Neighbour Joining (NJ) phylogenetic trees were conducted by using MEGA6.0 [13]. Fourteen sequences out of 102 identified PTENs and PTENps were removed in the phylogeny analyses because of too many ambiguous bases, long gaps or the incompleteness of the sequences. The removed sequences were degu PTENp1, NMRPTENp12, NMR PTENp15, NMR PTENp16, horse PTEN,orangutanPTENp1and chimpanzee PTENp1 (these sequences contained ambiguous bases); cattle PTEN, duckbill platypus PTEN, domestic water buffalo PTEN, southern American pikaPTEN and orguinea pig PTENp1 (these sequences showed big gaps); guinea pig PTEN and European domestic ferret PTEN (these sequences were incomplete). And then the coding regions of 88 sequences were used for phylogenetic tree construction. In addition, a dataset contains 67 sequences from Primate, Rodents, Even-toed ungulates, Carnivores, Cingulata and Dasyuromorphia, where both PTEN and PTENps were identified, was also used to construct phylogenetic trees. Kimura 2-parameter method [14] were used to infer NJ tree implemented in the program MEGA6.0. For ML tree, Tamura 3-parameter model with a discrete gamma distribution was used as suggested by MEGA6.0. Default settings in MEGA6.0 were used in reconstructing the MP tree. For ML, MP and NJ methods, 1000 bootstrap replications were conducted to evaluate the reliabilities of the reconstructed trees.

MicroRNAs binding sites Prediction
MicroRNA binding sites of the 3'UTR of PTENs and PTENps were predicted by PITA algorithm [15]. Five MicroRNAs (mir-19b, mir-20a, mir-21, mir-26a, mir-214),which can competitively bind with PTEN and PTENp1 in human [5], were downloaded from miRBase database [16]. Then the specific MicroRNAs and 3' UTR of PTENs or PTENps were uploaded to the Online microRNA prediction tool to predict MicroRNA binding sites [15]. Minimum seed size is set to 6 and other parameters were as default settings. ΔΔG is an energetic score, the lower (more negative) its value, the stronger the binding of the MicroRNA to the given site is expected. We first calculated the ΔΔG values for the known binding pairs of MicroRNA and the corresponding binding sites in the humans, and then conservatively set the lowest value-3.8 as our cut-off value in this study.

Pseudogene Sequences
In total, we found 65 functional genes and 37 pseudogenes from 65 genomes of mammals by BLAST using PTEN mRNA as query (S1 and S2 Tables). We found that 17 out of 65 species possess one or more copies of PTENps. Among them, 32 out of 37 pseudogenes identified in this study were from primates and rodents. We identified 9 species each possessed one PTENp in primate. Interestingly, these PTENps only existed in old world monkeys and hominoids. Five species of rodents were found to possess PTENps. And most excitingly, 17 copies of PTENps in NMR were identified (Table 1). In addition, one copy of PTENp was found in the nine banded armadillo and Tasmanian devil. Three copies of PTENps were found in the pig.

Phylogenetic analyses
To explore the evolutionary relationships of these PTENps and PTEN genes in mammals, we constructed the phylogenetic trees based on the coding region of 88 sequences using the Neighbour joining (NJ) [17] All trees showed overall similar topology. In these trees, PTENps were dispersed into several clades rather than forming one clade, suggesting that multiple gene duplications have occurred during the evolution of PTEN/PTENp gene family (Fig 1 and S1-S5 Figs). As showed in Fig 1, some PTENps existed for a relatively long time such as clade 1, which was shared by the NMR and the pig; and clade 9,which was shared by species from primates and rodents. In addition, the two clades showed longer branch lengths compared to other clades of PTENps, which indicated that PTENps of the two clades were relatively old. However, some PTENps were relatively young. For example, clade 2, clade 4 and clade 6 displayed a species-specific evolutionary pattern, in which PTENp clustered with its cognate gene, suggesting these PTENps emerged after the divergences of these species from their sister groups. What's more, we found that the branch lengths of PTENps were longer than that of the PTEN, suggesting faster evolutionary rate of CDS of PTENps than the PTENs in mammals (Fig 1). PTENps in NMR were divided into four clades in the phylogenetic tree. Clade 1 includes PTENp17, PTENp8 and PTENp4. Clade 7 contains only PTENp9. Clade 8 includes PTENp1, PTENp2, PTENp3, PTENp5, PTENp6 and PTENp7. Clade 9 includes PTENp10, PTENp11, PTENp13 and PTENp14.PTENps in clade 8 were NMR specific with shorter branch lengths, suggesting that these genes appeared recently. MicroRNA binding sites prediction To investigate whether 3'UTR of these PTENps could potentially bind to specific MicroRNAs just like in the human, we used the PITA algorithm to predict the binding sties of specific MicroRNAs on 3'UTR of PTENps. We chose the PITA for MicroRNA target prediction because it has high prediction accuracy and low false positive rate, since it pays more attention to the accessibility but not the conservation of the target sequences [15,19]. It was evidenced that 5 MicroRNAs (mir-19b, mir-20a, mir-21, mir-26a, mir-214) could bind to the 3'UTR of PTEN and PTENp1, and thus act as MicroRNA sponges to protect their parent gene from MicroRNA disturbancein the human [5]. In this study, we predicted the binding sites of these five MicroRNAs in 3'UTR of PTEN and PTENps identified in this study (S3 and S4 Tables). Interestingly, we found that the 3'UTR of PTENp and PTEN shared MicroRNAs bind sites in most cases, except for the PTENp1 in the blind mole rat (BMR), in which no shared binding site was found. This result suggested that MicroRNAs were potentially able to bind to the 3' UTR of both PTENps and their cognate PTENs. Most importantly, mir-19b existed in all of 3'UTR of PTENps and PTEN of NMR (Table 2). In addition, the NMR had eight copies of PTENps shared three different kinds of MicroRNAs, and four copies of PTENps shared two sorts of MicroRNAs. MicroRNA binding sites identified in this study are illustrated in Fig 2.

Discussion
In this study, we found that PTENps not only existed in the human, but also appeared in some species of primates, rodents, even-toed ungulates, carnivores, cingulata and dasyuromorphia, suggesting that PTENps emerged before the divergences of these mammalian orders. However, the majority of other mammals (48 out of 65 mammals) lacked the PTENp, which may be due to either the loss of PTENps during evolution, or the pseudogenziation of PTEN never took  place in these species. Since no clade of PTENp genes was shared by all mammalian orders, providing no evidence supporting the origination of PTENp before divergence of mammals, it is not clear whether PTENps were lost in these 48 species. According to the sequences alignment, we observed that some PTENps are completely duplicated from their parental gene and some are partially duplicated, such as the NMR PTENp10, 11, 13 and 14. In addition, we found some PTENps showed species-specific evolutionary pattern, such as the Tasmanian devil PTENp1, nine banded armadillo PTENp1 and gibbon PTENp1.These results suggest that the mammalian PTENps were originated by multiple gene duplications, and experienced the so called 'birth and death' evolution [20].  [22]. Similarly, the multiple copies of PTENps in the NMR may also contribute to its unusual resistance to cancer. But further studies on the expression of PTENps and the interaction with miRNAs are needed to support this hypothesis. However, the BMR, which showed a striking resistance to cancer as well as the NMR [23], is different from the NMR in terms of the copy numbers of PTENps and the MicroRNA binding sites. First, we only found one PTENp in the BMR compared to17 copies of it in the NMR. Second, unlike in the NMR, no common MicroRNA binding sites were predicted in the 3'UTR of PTENp1 and its PTEN in BMR. This may indicate that the anticancer mechanism in the BMR is different from that of the NMR. However, the high cut-off value we set may lead to no shared MicroRNA binding sites was found in this study, and it was also possible that the gene-pseudogene crosstalk was mediated by different MicroRNAs in the two species. In addition, some other factors also showed differences between these two species. For example, Fang et al determined that BMR have evolved a cancer-resistance mechanism depending on heightened immunoinflammatory response via gene amplification within the interferon-β1 pathway [23]. But in another study, Tian et al suggested that NMRs had evolved a higher concentration of high-molecular-mass hyaluronan (HA) that restricted cell division when cells gathered closely resulting in cancer resistance [21]. Hence, it is possible that the multiple PTENps in NMR function as ceRNAs to regulate its cognate gene by competing for common MicroRNAs, may play an important role in anticancer. Keep in mind, this mechanism may not fit for BMR.

Conclusions
In conclusion, our findings established that the PTENps in mammals originated by multiple gene duplications and experienced the 'birth and death' evolution pattern. Some PTENps have existed for a long time whereas others have appeared recently. PTENps may function as ceR-NAs to regulate their PTENs in mammals. Interestingly, the multiples of PTENps may compete for the common MicroRNA binding sites in the NMR as well as in the human, which may be responsible for the anticancer trait. These results provide a possible explanation for this anticancer model. However further experiments are needed to prove this hypothesis.