Adaptive Evolution of a Stress Response Protein

Background Some cancers are mediated by an interplay between tissue damage, pathogens and localised innate immune responses, but the mechanisms that underlie these linkages are only beginning to be unravelled. Methods and Principal Findings Here we identify a strong signature of adaptive evolution on the DNA sequence of the mammalian stress response gene SEP53, a member of the epidermal differentiation complex fused-gene family known for its role in suppressing cancers. The SEP53 gene appears to have been subject to adaptive evolution of a type that is commonly (though not exclusively) associated with coevolutionary arms races. A similar pattern of molecular evolution was not evident in the p53 cancer-suppressing gene. Conclusions Our data thus raises the possibility that SEP53 is a component of the mucosal/epithelial innate immune response engaged in an ongoing interaction with a pathogen. Although the pathogenic stress mediating adaptive evolution of SEP53 is not known, there are a number of well-known candidates, in particular viruses with established links to carcinoma.


INTRODUCTION
Stress responses classically involve heat shock proteins or molecular chaperones that maintain protein function or repair damage after cell injury. As such, the integrity of chaperone systems can critically alter the progression of diseases associated with ageing, DNA damage and chronic injury [1,2]. Although the molecular chaperone proteins are among the most evolutionarily conserved proteins and have a ubiquitous function in all repair processes, there is a high degree of tissue specificity in chaperone induction [3,4,5], indicating that some cells have evolved unique stress responses due to unique microenvironmental pressures.
Surface squamous epithelium is one such example, as it is not buffered from environmental stresses by the circulatory system, and is subject to a range of relatively unique stresses including thermal stresses and, in the gut, refluxed acid and bile adducts. Squamous epithelia may additionally be subject to bacterial infestation and viruses, for example infection by Papilloma viruses which can lead to cervical cancer [6,7,8]. Recently a ''functional proteomics'' study showed that stressed squamous cells do not synthesize the classic stressed-induced protein HSP70 [9], but instead express a novel class of stress proteins [10], including the Squamous Epithelial induced stress Protein of 53 kDa (SEP53; synonyms include c1orf10 and cornulin). SEP53 was independently cloned as a gene expressed in normal oesophagus but downregulated in oesophageal squamous cancers and was named Clone 1 open reading frame 10 [10]. The SEP53 gene is located on chromosome 1q21 within the epidermal differentiation complex fused-gene family that is silenced as part of a general mechanism that suppresses genes from this locus in cancer cells [11,12]. Molecular characterization of SEP53 has demonstrated that its stress-responsive functions are linked to its activity as a survival factor. For example, death induced by exposure of cells to normally lethal levels of deoxycholic acid (DCA; a bile stress imposed upon the gastrointestinal tract) can be attenuated by SEP53 [13].
SEP53 may have additional functions, and because some epithelial cancers are linked to microbes, our particular interest was the role that SEP53 might play in the defense against pathogenic infection of epithelial tissue. A hallmark of genes involved in host-pathogen interactions is an exceptionally fast rate of protein evolution (corrected for mutation rate) [14,15,16,17,18]. This arises because fighting pathogens leads to arms races: a series of selective sweeps caused by repeated adaptation and counter-adaptation between host and pathogen. Indeed, almost half the documented cases of arms races are pathogen defense genes [18], raising the possibility that the identification of molecular arms races provides a first approach to implicating a gene's involvement in defense. In the present study, comparisons between human and primate SEP53, as well as phylogenetic analyses on a broader range of mammal species, identified the signature of an arms race, suggesting that SEP53 could be engaged in antagonistic interactions with a pathogen. As we were unsure if a high rate of molecular evolution might be a general feature of cancerassociated genes, we performed a similar analysis on p53, but this gene, well known for its central role in a range of cancers in all tissue types, showed no evidence of rapid adaptive evolution.

RESULTS AND DISCUSSION
Analysis of SEP53 coding sequence across 9 mammalian species revealed that this gene evolves rapidly through positive selection. We compared the rate of replacement nucleotide substitutions (substitutions which result in an amino acid change, K A ) to the rate of synonymous nucleotide substitutions (substitutions that do not result in an amino acid substitutions, K S , which evolve at an approximately neutral rate). As most genes are subject to purifying selection, i.e. where replacement mutations produce inferior phenotypes that are pruned from the population, K A /K S tends to be much less than 1.0. Reflecting this, mean K A /K S between species is typically much less than 0.2 [19,20]. K A /K S at SEP53 is in the upper range of that seen in the vast majority of genes, especially in the C-terminal open reading frame (Table 1). For comparison, the p53 gene, also known to suppress cancers, but in a functionally dissimilar way, showed very low K A :K S ratios (Table 1).
Whilst K A :K S ratios ..1.0 are evidence of positive selection (see also [21]), a simple K A /K S calculation is thought to be conservative because it represents the average ratio across all codons, and does not account for the possibility that some codons are highly conserved while others evolve rapidly. Indeed, sliding window analysis along the length of SEP53 showed depressed K A /K S in the highly conserved N-terminal Ca + binding domain and a series of peaks in the C-terminal domain ( Figure 1; full alignment of SEP53 is presented in supplementary information). We therefore used an approach that allows K A /K S to vary among codons to compare the likelihood of a model that permits a proportion of codons to be under selection to the likelihood of a model that assumes neutrality at all sites [22,23,24]. These comparisons of neutral and selection models indicated that SEP53 was subject to adaptive evolution (Table 2), with as many as 8 percent of codons showing K A /K S substantially above 1.0. For the same analysis on the p53 gene, K A /K S was low and neutral models were as likely as selection models (Tables 1, 2). Analysis of variation within the human population did not provide additional evidence of selection at SEP53 (or p53) (Table S1), however there was little power to do so as polymorphism at both loci was extremely low.
Although the same domains appear to be present in all mammalian SEP53 open reading frames, the chimpanzee SEP53 gene contains a large insertion. Such a difference in amino acid sequence is unusual in Pan-Homo comparisons. SEP53 contains repeat regions, and the insertion sequence in Pan is two additional tandem repeats that other species lack ( Figure 2). Many positively selected sites were in the repeat regions that the primates share, suggesting that these are evolutionary hotspots. Examination of the four tandem repeats in the chimpanzee offers further support for the action of positive selection. Assuming these repeats arise through duplication and that differences between repeats then accumulate independently (i.e. without concerted evolution), K A /K S between chimpanzee repeats averaged 1.07 and ranged up to 1.97. Additionally, we created a data set that included repeats from all primate SEP53's (Pan (four repeats), Homo, Gorilla, Orangutan and Macaca (two repeats each)). Average K A /K S in this data set was again high (0.89) and by using a gene tree of these repeats we performed maximum likelihood analyses as above to show that selection models were in all cases significantly more likely than neutral ones (for example, model 8 vs 8a, x 2 = 19.8, p,0.0001). Indeed, almost half of the sites showed evidence of positive selection. Thus, both SEP53 divergence between species and the diversification of repeats within SEP53 appears to be driven by positive selection.
The low K A /K S and lack of positively selected sites in the calcium binding domain (Table 1; Figure 1) indicates evolutionary constraint and that SEP53 functional activity relies strongly on the conservation of this domain. By contrast, the high K A :K S ratio in the SEP53 C-terminal domain indicates that new mutations in some codons may be adaptive and selected up to high frequency. Past examples of adaptive evolution of this sort has been dominated by two classes of genes: i) genes associated with reproduction, including seminal fluid factors and proteins of the female reproductive tract [25]. The selective pressure on reproductive genes is likely to be driven by arms races linked to sexual conflict; ii) immune system genes engaged in an arms race of adaptation and counter-adaptation with parasites or pathogens. Such an anti-pathogen role is the most likely explanation for rapid evolution of SEP53. The epidermis is a notable point of pathogen entry, and indeed a regional immune response is well developed [26,27,28,29,30]. Thus, in addition to its role in mediating epidermal damage associated with adenocarcinoma, the pattern of adaptive evolution of SEP53 indicates that it is part of the epithelial immune response. We thus forward SEP53 as a candi-   This raises intriguing possibilities regarding the role of SEP53 in attenuating stress and in cancer progression. Although the SEP53 protein's capacity to limit stress and control carcinogen-mediated DNA damage may be an important contribution to cancer avoidance, a pronounced role for SEP53 during mammalian evolution might have been to eliminate lethal squamous-cell viruses that effect both overall fitness and cancer progression. Playing a dual role as stress response protein and a viral defence molecule could limit SEP53's effectiveness at one of these tasks. For example, the need to coevolve with a pathogen could alter the protein to a degree that compromises stress-response functionality or the capacity to interact with other epithelial proteins, thus leading to cancer.
To conclude, much genomic research is centred on identifying homologous regions of a protein to acquire information on the essential function of a gene product. Recent whole-genome comparisons [20,31,32,33] have, however, noted the rapidly evolving set of genes that play a role in speciation or are subject to rapid adaptive evolution. We speculate that the evidence for adaptive evolution in SEP53 reveals its role in immunity because the majority of cases of selective sweeps involve host-pathogen arms races. Thus, the present study highlights how documenting regions of high amino acid divergence can reveal hitherto unthought-of roles for particular genes.

MATERIALS AND METHODS
Human SEP53 was used to search the non-redundant, Expressed Sequence Tag, and high throughput genomic sequences in EMBL/Genbank or the EMBL trace repository with Blast [34]. This revealed homologous sequences for a range of mammals, but none for non-mammalian species. Species obtained from Genbank and used in analyses were: human (Homo sapiens), pig (Sus scrofa), rat (Rattus norvegicus), mouse (Mus musculus), chimpanzee (Pan troglodytes), cow (Bos taurus), and the macaque (Macaca fascicularis).
We further isolated orangutan and gorilla SEP53 from genomic sequences obtained from the ECACC (European Collection of Cell Cultures). For this. we used the polymerase chain reaction (PCR) and oligonucleotide primers designed from human intronic sequence so that complete coding sequences were obtained for the SEP53 gene's two coding exons. One set of primers (FW: GAGCCTCCAAGGGAACTTTT RV; CTGCTATGTCCCC-TCTCCAC) amplified the exon that codes for the EF-hand domain of SEP53, the other set (FW: GGATGCTGACTC-CACCTCAT; RV:GCAGGACAAGCCAAACTCTC) amplified the exon corresponding to the C-terminal domain. PCR amplicons were electrophoresed, excised from agarose gels, cleaned and then sequenced (with the primers above plus additional internal ones to ensure that we had two fold coverage in all areas) in both directions using BigDye reagents and an ABI capillary sequencer. The sequence chromatograms were inspected by eye to confirm the validity of all differences within and between species, and assembled using SeqManII (DNAstar Inc., Madison USA).
Sequences were aligned with ClustalW and K A and K S were calculated using the program DNAsp [35]. We also used DNAsp to perform sliding window analysis along the length of the SEP53 sequence to visualise areas with high or low K A /K S . To statistically test for positive selection, we used the phylogeny-based analysis of K A /K S as implemented in the Codeml program of the PAML package [22,23,24]. Specifically, we varied the Nssites option to generate log likelihood values for models where K A /K S is specified to vary among sites, but can be constrained to be ,1.0 (neutral models) or allowed to rise above 1.0 (selection models). We studied a range of neutral (denoted M1a, M7, and M8A) and selection models (M2a, Figure 2. The SEP53 gene contains repeat regions, the chimpanzee has four of these and the other primates only two. A neighbour joining tree indicated that copy 4 in the chimpanzee was orthologous with copy 2 in the other species. Chimpanzee copy 1 is probably orthologous with copy 1 in the other species, but this was less clear. doi:10.1371/journal.pone.0001003.g002 M3 and M8) but focused attention on the comparisons thought to be the most robust (M1a vs M2a, M8a vs M8; [25,36]). Significance was assessed using two times the difference in log-likelihood (2Dl) value for each model, which is expected to follow a chi-squared distribution with degrees of freedom determined by the difference in the number of parameters for each model.
One notable feature of SEP53 was repeat regions of approximately 60 amino acids long ( Figure 2). All of the primates except the chimp have two tandem repeats, while the chimp has four, leading to an insertion sequence of approximately 120 amino acids. We performed alignment and phylogenetic analysis separately on these repeat regions to ensure their accurate placement in the global SEP53 alignment (Figure 2, S1). Porcine SEP53 also contains an insert in this region (further details and functional analyses of porcine SEP53 are being presented elsewhere). K S ranged from 0.0095 (human-chimp) to 0.79 (Macaquemouse). The latter is value is somewhat high, raising the possibility that some sites are saturated and thus our estimates of K S may be inaccurate. We therefore repeated the analysis using only the five primates (mean K S = 0.04). While this reduced data set lacked power due to the small number of species and slight divergence between them, the primate-only data set has the most confident alignment. The primate results confirmed the analysis of all nine species (for example, model 8 vs 8a, x 2 = 9.37, p,0.001). K S also appeared to vary along the length of the SEP53 gene ( Figure 1) and this is known to influence the detection of selection [37]. We tested for variation in K S using the codon selection analysis component of the computer program HYPHY [38]. Whilst this did not indicate significant variation in K S (two times the difference in log-likelihood between dual variable rates model and variable nonsynonymous rates model = 0.94, df = 4 p = 0.92), we nevertheless repeated the selection analysis (with all nine species) using the HyPHy/Datamonkey interface. Using the random effects likelihood method (REL) which allows for rate heterogeneity in both K S and K S and the TrN93 model (as chosen by the built in model selection tool), Datamonkey confirmed the results from PAML by identifying 20 positively selected sites in Sep53 (Bayes factor = 50).
Additional evidence for selection can be gained by comparing patterns of polymorphism within species to divergence between species. To obtain polymorphism data, we sequenced both SEP53 and p53 in 24 individuals from the human DNA polymorphism discovery resource (Coriell Institute). For SEP53, methods were identical to those described for the orangutan and gorilla above. For p53, which has many small exons interspersed with intronic sequence, we placed 6 PCR primer pairs in introns such that we amplified and obtained sequences from exons 2 to 11. Following alignment and removal of intronic sequence, we used the 48 alleles obtained from each gene to perform MacDonald-Kreitman tests as implemented in DNAsp. These test for a departure from the neutral expectation that the ratio of non-synonymous to synonymous fixed differences between species will be the same as the ratio for polymorphism within species [39].