Pervasive adaptation in Plasmodium-interacting proteins in mammals

The protozoan genus Plasmodium causes malaria in dozens of mammal species, including humans, non-human primates, rodents, and bats. In humans, Plasmodium infections have caused hundreds of millions of documented deaths, imposing strong selection on certain populations and driving the emergence of several resistance alleles. Over the deep timescale of mammalian evolution, however, little is known about host adaptation to Plasmodium. In this work, we expand the collection of known Plasmodium-interacting-proteins (PIPs) in mammalian hosts from ~10 to 410, by manually curating thousands of scientific abstracts. We use comparative tests of adaptation to show that PIPs have experienced >3 times more positive selection than similar mammalian proteins, consistent with Plasmodium as a major and long-standing selective pressure. PIP adaptation is strongly linked to gene expression in the blood, liver, and lung, all of which are clinically relevant tissues in Plasmodium infection. Interestingly, we find that PIPs with immune functions are especially enriched for additional interactions with viruses or bacteria, which together drive a 3.7-fold excess of adaptation. These pleiotropic interactions with unrelated pathogens, along with pressure from other Plasmodium-like Apicomplexan parasites, may help explain the PIP adaptation we observe in all clades of the mammalian tree. As a case study, we also show that alpha-spectrin, the major membrane component of mammalian red blood cells, has experienced accelerated adaptation in domains known to interact specifically with Plasmodium proteins. Similar interactions with Plasmodium-like parasites appear to have driven substantial adaptation in hundreds of host proteins throughout mammalian evolution.


1
. Expression changes were the most common form of evidence (72% of PIPs), 15 but 28% of PIPs were supported by multiple sources of evidence, and 41% by 16 multiple studies. Virtually all of the studies were conducted on five Plasmodium 17 species infecting humans or mice (Fig 1A). 18 19 20

PIP expression and function support role in malaria 1
If PIPs are truly a set of malaria--relevant genes, we would expect the 2 pathophysiology of malaria to be reflected in their tissue expression profiles. We 3 tested this hypothesis by examining human gene expression in each of the 53 tissues 4 collected by the GTEx Consortium (2015; Methods III). We first found that, on 5 average, PIPs have 9.5% higher total expression than other genes (p<0.0001; Fig  6   2A). To fairly evaluate PIP overexpression in each tissue, we designed a matched 7 permutation test that compares PIPs to many, similarly--sized sets of control genes 8 with similar total expression (Methods III). After controlling for total expression in 9 this way, we find seven tissues in which PIPs are significantly overexpressed ( 14 Similarly, we expected PIPs to be enriched for GO functions that reflect 15 malaria pathology. We tested 17,696 GO functional categories (Methods V) for PIP 16 enrichment using Fisher's Exact Test. After correcting for multiple testing, over 17 1,000 categories contained significantly more PIPs than expected (S2 Table). These 18 categories are dominated by immune functions, especially for the highest levels of 19 enrichment (Fig 1C). Other functions, including apoptosis, cell--cell signaling, and 20 coagulation, are also highly enriched for PIPs (S2 Table). These results confirm the 21 biological connections between PIPs and malaria, and suggest that immune 22 pathways present a major functional interface between host and parasite. 23 1  4 axis. That is, sets with higher average values will traverse more x--axis space (and appear as 'lower' 5 lines) before reaching the maximum density of 100%. Descriptions of data sources are available in 6 Methods V. *** = p<0.0001; ** = p<0.001; ns = p>0.05. when testing the link between Plasmodium, as a single causal pathogen, and 14 pleiotropy has many interesting implications, including the need to carefully isolate 1 any single selective pressure when linking it to protein adaptation.

3
PIPs are not like other proteins 4 We have already shown that PIPs have two unusual properties-high mRNA 5 expression, and excess overlap with other pathogens-that may influence their rate 6 of evolution. We assessed several additional metrics for differences between PIPs 7 and other proteins, in order to fairly evaluate PIP adaptation. 8 First, we tested three more broad measures of gene function in humans: the 9 density of DNAseI hypersensitive elements; protein expression, as measured by 10 mass spectrometry; and the number of protein--protein interactions (see Methods 11 V). For each of these metrics, PIPs have significantly higher mean values than sets of 12 random controls, indicating that PIPs are more broadly functional in humans (Fig 2,  13 B--D; all p<0.01). We next tested four measures of genomic context, which have been 14 linked to the rate of protein evolution: aligned protein length; the regional density of 15 protein--coding bases; the density of highly conserved, vertebrate elements; and GC 16 content (Methods V). Most of these metrics do not differ between PIPs and other 17 genes (Fig 2 E--H), with the exception of conserved element density, which is slightly 18 but significantly lower in PIPs (mean=8.0% vs. 8.8%; p=0.0004; Fig 2G). 19 Based on these results, we expanded our permutation test to find matched 20 controls for each PIP. Control genes were considered acceptable matches if their 21 values for each of the five significantly different metrics (Fig 2A-- be generated. About 9% of PIPs were too dissimilar from other proteins to be 4 matched, and were excluded from subsequent analysis. 5 Finally, one of the largest differences between PIPs and other proteins is the 6 frequency with which they are discussed in the scientific literature ( Fig 2I). The 7 average PIP has 6.5 times more PubMed citations, and 9.1 times more scientist--8 contributed References Into Function (GeneRIFs), than the average mammalian 9 protein (Methods V). This difference was too large to control for in the matched 10 permutation test without excluding the majority of PIPs. However, we show that the In contrast, we find that PIPs do have a significantly higher ratio of non--2 synonymous to synonymous substitutions across 24 mammal species ( Both models find evidence of excess adaptation in PIPs. Over 37% of PIPs 14 have BUSTED evidence (at p≤0.05) of recurrent adaptation in mammals, versus 23% 15 of matched controls (p<10 --5 ; Fig 3C). Similarly, PIPs have BS--REL evidence for 16 adaptation on more branches of the mammalian tree (p=1.87× 10 --4 ; Fig 3D), and for 17 more codons per protein (p<10 --5 ; Fig 3E). This excess is robust to the BUSTED p--18 value threshold used to define adaptation, and increases as the threshold becomes 19 more stringent (Fig 3F, p=0.001). Overall, these matched tests show that PIPs have 20 indeed experienced an accelerated rate of adaptive substitutions, consistent with 21 malaria as an important selective pressure. 22

High rate of adaptation in PIPs known to interact only with Plasmodium 1
We have shown that a large set of host proteins with strong connections to 2 Plasmodium (STable 1 , Fig 1 A--C) have, over deep time scales, evolved under 3 exceptionally strong positive selection (Fig 3). Given that nearly half of PIPs are 4 known to also interact with viruses and/or bacteria (Fig 1D), one critical question is 5 whether Plasmodium is truly the source of this selection. We attempted to isolate 6 Plasmodium as a selective pressure by dividing PIPs into 'Plasmodium--only' and 7 'multi--pathogen' categories, based on the available information regarding viruses 8 and bacteria (Fig 1D; Methods IV). We find that Plasmodium--only PIPs have a 2.2--9 fold excess of adaptation compared to matched controls (p=0.008; Fig 4A, far left), 10 when adaptation is measured as the proportion of adaptive codons per gene (Fig  11   3E). This suggests that Plasmodium may have specifically driven adaptation in a 12 large number of mammalian proteins, apart from any pleiotropic interactions they 13 may have with other pathogens. 14 Nonetheless, multi--pathogen PIPs have 3.7 times more adaptation than 15 matched controls-significantly higher than the excess in Plasmodium--only PIPs 16 (p=0.005; Fig 4A, left). This suggests that an increased number and diversity of 17 pathogen interactions may drive a cumulative increase in host adaptation. 18 Importantly, however, these multi--pathogen interactions are concentrated in 19 immune PIPs (Fig 1D; Fig 4A). Since immune genes are well known to evolve at 20 PIPs. 15 Before disentangling this issue, we first verified the correlation between 16 immune function and adaptation (Methods VI). We find that while PIPs overall have 17 adapted at a 3.1--fold higher rate than matched controls, non--immune PIPs have 1 adapted at a 1.7--fold higher rate than matched, non--immune controls (Fig. 4A, 2 middle). This difference, which is highly significant (p<0.001), reinforces that 3 immune enrichment could confound adaptation in multi--pathogen PIPs. To isolate 4 these two effects, we then considered only non--immune PIPs, divided into groups by 5 their total number of pathogen interactions (Fig 4A, right; S7 Fig). In these non--6 immune PIPs, in contrast to all PIPs, we find that additional interactions beyond 7 Plasmodium have no additional effects on adaptation. 8 Together, these results suggest that adaptation in immune genes is difficult to 9 attribute to any single selective pressure. The immune system appears to be the 10 most efficient avenue for hosts to simultaneously adapt to multiple pathogens. In 11 contrast, host adaptation to Plasmodium is apparent through both immune and non--12 immune pathways (Fig 1D; Fig 4A). We have shown that non--immune genes evolve 13 more slowly and have less pathogen pleiotropy (Fig 4A; Fig 1D). Thus, though 14 Plasmodium has likely played a major role in immune evolution, we can be more 15 confident that selection imposed by Plasmodium has specially driven adaptation in 16 non--immune PIPs. 17 18 19 Malaria infections are biologically complex, and host adaptation to 20

PIP adaptation is related to expression in blood, liver, and lung
Plasmodium could occur in genes expressed in several malaria--relevant tissues (Fig  21   1B). We used multiple linear regression to test whether the rate of adaptation in a 22 gene, as measured by BS--REL and BUSTED, was related to its tissue--specific 23 expression, as measured by GTEx 1 For PIPs, rates of adaptation are significantly and positively related to 2 relative expression in blood, liver, and lung, but not in other malaria--related tissues 3 (Table 1, column 2). Overall, in a multiple linear model, PIP expression in these 4 tissues explains 17.4% of the variance in the proportion of adaptive codons. In 5 contrast, the tissue--specific expression of matched control genes (Methods III) 6 explains only 4.6% of this variance in adaptation, or 3.8 times less (p<0.001). When 7 compared to samples of control genes matched for total expression, as well as for 8 expression in blood, liver, and lung, PIP relationships between adaptation and tissue 9 expression are significantly stronger than expected (

PIP adaptation is not limited to Plasmodium--infected lineages 1 A number of Plasmodium species infect mammalian hosts in the orders 2
Primates and Rodentia (Carlton, Perkins, and Deitsch, 2013). In contrast, 3 Artiodactyla and Carnivora are parasitized by other genera of Apicomplexan 4 parasites, which also reproduce in the blood and are transmitted by insects (Clark 5 and Jacobson, 1998). To further test the specificity of PIP adaptation, we applied the 6 BUSTED and BS--REL models to separate protein alignments for each mammalian 7 order (Methods VIII). 8 When all PIPs are considered, we find significant excesses of adaptation in 9 rodents (p<0.001), primates (p=0.005), and carnivores (p=0.02; Fig 4B). The signal 10 is positive, but not significant, in artiodactyls (p=0.28; Fig 4B). Artiodactyls are the 11 most poorly--represented group in our mammalian tree (S1 Fig Apicomplexan pathogens. Other ubiquitous pathogens that interact with PIPs, 18 namely viruses and bacteria (Fig 1D), may further contribute to these mammal--wide 19 patterns. 20 21

Understanding a single case of adaptation to Plasmodium 22
We have shown that Plasmodium has driven, at least in part, an accelerated 23 rate of adaptation in a set of 410 mammalian PIPs. In order to understand this 1 adaptation at a more mechanistic level, we selected a single PIP for more detailed 2 investigation. 3 Of the top ten PIPs with the strongest BUSTED evidence of adaptation, only 4 one candidate-alpha--spectrin, or SPTA1-has been extensively characterized for 5 molecular interactions with Plasmodium proteins. Alpha--spectrin is a textbook 6 example of a major structural component of the red blood cell (RBC) membrane. In 7 humans, several polymorphisms in this gene are known to cause deformations of 8 the RBC, which may either be symptomless or cause deleterious anemia (reviewed 9 in, e.g., Gallagher, 2004). The SPTA1 protein has a well--defined domain structure, 10 and specific interactions with Plasmodium proteins are known for three domains 11

6 7
We wished to test whether sites of mammalian adaptation in SPTA1 mapped 8 to any of these Plasmodium--relevant domains. To identify adaptive codons with 9 higher precision and power, we aligned SPTA1 coding sequences from 61 additional 10 mammal species (S5 Table) for analysis in MEME (Murrel et al., 2012; Methods IX). 11 Of the 2,419 codons in this large mammalian alignment, we found that 63 show 12 strong evidence of adaptation (p<0.01), and that these are distributed non--13 randomly throughout the protein. 14 Remarkably, three domains-Repeat 1, Repeat 4, and EF--hand 2-are 15 significantly enriched for adaptive codons, after controlling for domain length and 16 conservation (Fig 5; Methods IX). That is, all three SPTA1 domains with strong 17 evidence of adaptation in mammals are known to either interact specifically with P. 18 falciparum proteins, or harbor human mutations that provide resistance to P. 19 falciparum. This overlap is unlikely to occur by chance (p=0.015), and is robust to 20 the p--value thresholds chosen (S6 Table). Thus, evidence from SPTA1 suggests a 21 meaningful and specific connection between host adaptation and the mechanics of 1 Plasmodium infection. 2 3 Discussion 4 In this work, we have examined decades of malaria literature to expand the 5 collection of mammalian, Plasmodium--interacting proteins by over an order of 6 magnitude (Fig 1). We show that, compared to control proteins matched for various 7 properties (Fig 2), these 410 PIPs have adapted at exceptionally high rates in 8 mammals (Fig 3). The highest rates of adaptation are evident in immune PIPs, 9 especially those that share interactions with viruses and bacteria (Fig 4A). However, 10 we show that Plasmodium itself (or related Apicomplexans) has likely been an 11 important driver of this adaptation, especially for non--immune proteins (Fig 4A). 12 We used collections of available data on other pathogens to isolate a set of 13 PIPs that, to the best of our knowledge, lack any 'multi--pathogen' interactions. These 14 'Plasmodium--only' PIPs, whether immune or not, have adapted at over twice the 15 expected rate in mammals (Fig 4A). This suggests that Plasmodium has had an 16 appreciable effect on PIP evolution, beyond the effect of unrelated pathogens. Still, 17 many interactions with other pathogens likely remain unknown, making it 18 difficult-based on this evidence alone-to dismiss their importance. 19 However, two other pieces of evidence support Plasmodium as a key selective 20 pressure. First, mammal--wide adaptation in PIPs is strongly linked to PIP 21 expression in human blood, liver, and lung (Table 1). Plasmodium parasites are well 22 known to replicate within red blood cells (RBCs) and hepatocytes, and infected 23 RBCs tend to sequester in the lungs, with serious consequences (e.g. Aursudkij et al., 1 1998). Thus, the pathophysiology of malaria is reflected in the tissues where PIPs 2 show the strongest evidence of adaptation. 3 Second, in the well--studied case of alpha--spectrin, we show that domain--level 4 interactions with Plasmodium perfectly explain the observed patterns of adaptive 5 substitution (Fig 5). Besides validating the ability of codon evolution models to 6 detect adaptation at particular residues (Methods VII), this result affirms a specific 7 role for Plasmodium in mammalian evolution, beyond the immune--focused role 8 played by pathogens in general ( Fig 4A). Thus, despite the inevitability of at least 9 some pleiotropy (Wagner and Zhang, 2011), we show that phenotypic information 10 can be leveraged to link genetic adaptation to specific sources of selection. 11 Throughout this work, we showcase the utility of phenotypic information for 12 studying evolution. We demonstrate that recent, well--funded projects like GTEx and 13 Encode can provide, among many other uses, the raw information required for 14 meaningful evolutionary comparisons (Fig 2; Methods). Smaller--scale projects, 15 including most of the scientific papers contained in PubMed, also contain an 16 impressive quantity of valuable data ( Fig 1A). However, we find that heavy manual 17 curation is still required to remove false positives from literature searches (Methods 18 I). In the future, unique and automatic indexing of existing data will be key to 19 understanding the evolution of complex phenotypes, and should be a major 20 research focus, alongside the accessibility of new data. 21 Finally, this work provides an interesting contrast with previous studies, 22 which have associated only a few dozen human genes with malaria resistance 23 (Verra et al., 2009). Only a handful of these genes are backed by convincing evidence 1 of positive selection in humans, and nearly all of these are RBC proteins (Hedrick, 2 2011;MalariaGEN, 2015). In contrast, our work provides a repository of hundreds 3 of diverse human genes with phenotypic links to malaria (Fig 1; S1 Table). Why, 4 then, do we know of so few examples of recent human adaptation to Plasmodium? 5 This disconnect may depend simply on the timescale of human evolution, which is 6 only a fraction of the 105 million years of mammalian evolution (Murphy et al., 7 2007). Or, perhaps the difficulty of detecting balancing selection (Charlesworth, 8 2006) has obscured additional, important human variants. Future work will utilize 9 the large set of PIPs to better understand the evolution of malaria resistance in 10 humans. 11 In conclusion, we have found evidence of substantially accelerated 12 adaptation in mammalian proteins that interact with Plasmodium. In the case of 13 rapidly evolving immune proteins, Plasmodium appears to share responsibility with 14 other groups of pathogens, including viruses and bacteria. We show that it can be 15 difficult to attribute evolutionary changes to a single selective agent, given the 16 surprising pleiotropy of host genes with regard to very different pathogenic agents. 17 But in many cases-as well as in the case of alpha--spectrin-our approach allows us 18 to infer that Plasmodium--like parasites have imposed a substantial selective 19 pressure on mammals. We hope that our collection of 410 mammalian PIPs will 20 continue to prove a powerful resource for exploring host interactions with 21 Plasmodium. 22 addressing non--genetic aspects of malaria. 12 For papers discussing genes, we examined the abstracts for the presence and 13 type of evidence connecting genes to malaria phenotypes. In cases where the 14 abstract was ambiguous, we examined the full text of the paper. To limit the number 15 of false positives, we did not include results from RNAseq or other high--throughput 16 experiments. 17 18 19 We used BLAT to identify homologs of 22,074 human coding sequences in 24 20 high--depth mammal genomes (S1 Fig). We retained orthologs which (1) had best 21 reciprocal hits in all 24 mammal species, (2) lacked any in--frame stop codons, (3) 22

II. Generation of mammalian ortholog alignments
were at least 30% of the length of the human sequence, and (4) had clearly 23 conserved synteny in at least 18 non--human species. Coding sequences for the 1 resulting 9,338 proteins were aligned with PRANK, and any codon present in fewer 2 than eight species was excluded from analysis. Additional details are available in 3 values for each tissue. RA is simply the proportion of each gene's total RPKM found 10 in each tissue. For matching controls, we summed RPKM values over all tissues to 11 yield total expression. 12 Because PIPs have substantially higher total expression than other proteins 13 ( Fig. 2A) Alpha--spectrin homologs were initially identified in 88 mammal species 2 using NCBI Gene (http://www.ncbi.nlm.nih.gov/gene/?Term=ortholog_gene_6708). 3 The sequence of the longest mRNA transcript for each species was downloaded 4 using E--Utilities, and each transcript was trimmed to the longest ORF using  Table). The alignment was manually inspected and corrected using  Table). Then, for each domain, we 16 calculated an 'adaptation score' as: 17 a/v 18 where a measures adaptation (the proportion of codons within the domain with 19 MEME p≤0.01*) and v measures variability (the proportion of codons within the 20 domain that vary among species, i.e., are not 100% conserved). This score also 21 controls for domain length, as it uses the proportion of codons within the domain. 22 To calculate the significance of each domain's adaptation score (i.e., to ask, is it 23 higher than expected?), we randomly permuted codons among domains 10,000 1 times. 2 *We also tested MEME p--value cutoffs of 0.1, 0.5, 0.005, and 0.001 for 3 defining a; these results are available in S6 Table. The results for p≤0.01, which are 4 reported in the main text, are representative across these cutoffs. 5 6

Data Access 7
All data used in this work are publicly available (Methods I--V). The collection of PIPs 8 is available in S1 Table. 9 10

Acknowledgements 11
We wish to thank Kerry Geiler--Samerotte for her thoughtful comments on the 12 manuscript, along with the rest of the Petrov lab. ERE thanks Jane Carlton for 13 abbreviation advice; Jamie Blundell and Anisa Noorassa for figure advice; and Daniel 14 Friedman, for conceding that 'protein' can mean 'gene.' 15 16 Author Contributions 17 E.R.E curated the PIPs. D.E., E.R.E., and N.T. collected other data. E.R.E. and D.E. 18 performed the analyses, with design input from D.A.P. and S.V. E.R.E. and D.A.P. 19 wrote the paper, with contributions from all other authors. 20 This work was supported by NIH grants R01GM089926 and R01GM097415 and 21 NSF grant R35GM118165--01 to DAP, and an NSF Graduate Research Fellowship to 22