Naegleria fowleri: Protein structures to facilitate drug discovery for the deadly, pathogenic free-living amoeba

Naegleria fowleri is a pathogenic, thermophilic, free-living amoeba which causes primary amebic meningoencephalitis (PAM). Penetrating the olfactory mucosa, the brain-eating amoeba travels along the olfactory nerves, burrowing through the cribriform plate to its destination: the brain’s frontal lobes. The amoeba thrives in warm, freshwater environments, with peak infection rates in the summer months and has a mortality rate of approximately 97%. A major contributor to the pathogen’s high mortality is the lack of sensitivity of N. fowleri to current drug therapies, even in the face of combination-drug therapy. To enable rational drug discovery and design efforts we have pursued protein production and crystallography-based structure determination efforts for likely drug targets from N. fowleri. The genes were selected if they had homology to drug targets listed in Drug Bank or were nominated by primary investigators engaged in N. fowleri research. In 2017, 178 N. fowleri protein targets were queued to the Seattle Structural Genomics Center of Infectious Disease (SSGCID) pipeline, and to date 89 soluble recombinant proteins and 19 unique target structures have been produced. Many of the new protein structures are potential drug targets and contain structural differences compared to their human homologs, which could allow for the development of pathogen-specific inhibitors. Five of the structures were analyzed in more detail, and four of five show promise that selective inhibitors of the active site could be found. The 19 solved crystal structures build a foundation for future work in combating this devastating disease by encouraging further investigation to stimulate drug discovery for this neglected pathogen.

combination with other repurposed drugs such as rifampin, miltefosine, and fluconazole [9,12]. Recent studies suggest that posaconazole is more efficacious than fluconazole in vitro and in animal models of PAM [12]. Miltefosine was used in combination to successfully treat two patients, one reported and one unreported in the literature. Miltefosine in combination is not always helpful, in that a patient was treated with miltefosine and suffered permanent brain damage and another had a fatal outcome [11,13]. The multi-drug therapy is associated with severe adverse effects and requires higher than normal dosages to penetrate the blood-brain barrier and to reach the CNS [11,14]. New development of rapid-onset, brain permeable, efficient, and safe drugs is urgently needed.
Given the lack of efficacy of current drugs against N. fowleri and the urgent need for new drugs, we investigated the proteome of N. fowleri for likely drug targets attempting to enable further drug discovery efforts by producing material for characterization of the proteins. This work is a first step towards the discovery of drugs specifically designed against N. fowleri.

The N. fowleri proteome contains hundreds of potential drug targets
Potential drug targets were selected by sequence homology to DrugBank protein targets [15]. Additional targets were requested by the amoeba research community, leading to a total of 178 N. fowleri targets entering the Seattle Structural Genomics for Infectious Disease (SSGCID) structure determination pipeline. The SSGCID is a National Institutes for Allergy and Infectious Disease (NIAID) supported preclinical service for external investigators (www.SSGCID. org) [16]. All targets were filtered according to the standard SSGCID target selection protocol and criteria [13]: eliminating proteins with over 750 amino acids, 10 or more cysteines, or 95% sequence identity with 70% coverage to proteins already in the PDB, targets claimed or worked on by other scientific groups, and targets with transmembrane domains (except where a soluble domain could be expressed separately) [17]. Target criteria resulted in selection of 178 proteins which entered the SSGCID production pipeline. These proteins, homologous to other known drug targets, consisted of metabolic enzymes, protein synthetases, kinases, and others. Real-time updates to target status progress can be viewed at the SSGCID website (https://apps. sbri.org/SSGCIDTargetStatus/TargetStatus/Naegleria). Additionally, a detailed table of the Naegleria protein crystallography statistics is available in Supplemental information.

One third of targets attempted produced soluble protein
The open reading frames of each target were obtained from AmoebaDB.org. Progression of the targets through the SSGCID protein production pipeline is shown in Table 1. Of the 178 NIAID approved targets, 177 were selected for cloning. One protein target was eliminated due to redundancy given its 100% identity match and 84% coverage to another protein target already in the SSGCID pipeline. Of the 177 targets attempted in PCR amplification using N. fowleri cDNA, 133 were successfully amplified and cloned into SSGCID expression vectors (75%) [18]. In small-scale expression screening, 82 of the 133 successfully cloned targets (61.6%) demonstrated soluble expression with a N-terminal His6-tag vector [18]. Of these 82 soluble proteins, 64 proteins were purified to >95% purity with yields ranging from 1.1 mg to 348 mg. These proteins are available, under request, at SSGCID.org.

N. fowleri proteins resulted in a 11% structure determination success rate
Following purification of high-purity preparations of protein, the targets were submitted for structure determination by X-ray crystallography. Of the 64 proteins produced, 26 crystallized (29%), and 20 diffracted, 19 of which met the SSGCID resolution quality criteria and were submitted to the Protein Data Bank (PDB). Table 2 lists the 19 targets deposited in PDB that will be reported in this paper, including five structures with a unique ligand bound, for a total of 23 PDB deposits. It should be noted that the annotated functions of proteins are based on orthologs from other species and biochemical evidence of function is limited to glucokinase [19], at this time. Overall, this resulted in a structure determination success rate of 11%, which is comparatively higher than usual structural genomic pipeline rates reported by us or other structural genomic groups [20].

Comparative analysis of N. fowleri and Homo sapiens enzyme active sites suggest some targets have promise for selective inhibition
We have already published that glucokinase inhibitors can be obtained that are selective for N. fowleri vs. the human homolog, supporting glucokinase as a target for N. fowleri therapeutics  [19]. We analyzed the other structures determined, comparing the N. fowleri structure to human homolog structures, in order to determine opportunities for selective design of chemical inhibitors. Comparison of the N. fowleri determined structure to human structures available in the PDB was done by superimposition of the coordinate files (Table 3). With the exception of a pair of 96% identical N. fowleri and human ubiquitin-conjugating enzymes e2, all of the N. fowleri enzymes differed from human homologs by more than 31% (Table 3). We wanted to focus on the known ligand binding sites, to search for potential differences for inhibitors. A PDB search revealed that five of the 19 N. fowleri structures determined also had human homolog structures determined which contained a known inhibitor of the human protein (Table 3). We then manually inspected and compared the binding sites of these five proteins, described below for each protein. Despite sequence similarities of the active sites, there were four cases where a case for active site specificity could be made, supporting these proteins as targets for therapeutics for N. fowleri. N. fowleri S-adenosyl-L-homocysteine hydrolase (NfSAHH) catalyzes the breakdown of S-adenosyl-homocysteine (SAH) into adenosine and homocysteine. SAH is a byproduct of Sadenosyl-L-methionine as a methyltransferase; the transfer of a methyl group to its respective cellular substrates such as DNA or rRNA, produces SAH [21]. SAH hydrolases play a central role in methylation reactions required for growth and gene regulation, and inhibitors of SAH

PLOS ONE
hydrolase are expected to be antimicrobial drugs, especially for eukaryotic parasites [21]. Ribavirin is structurally similar to adenosine and has been proved to produce a time-dependent inactivation of human (Hs) SAHH and Trypanosoma cruzi (Tc) SAHH [22]. The NfSAHH asymmetric unit contains a homo-tetramer ( Fig 1A). Although each chain contains an active site, structural analysis indicates that two chains must be present for the hydrolysis reaction to occur successfully. Each chain consists of three domains: a substratebinding, a cofactor-binding, and a C-terminus domain [23]. When substrates are not bound, the substrate-binding domain is located on the exterior, far from the meeting point of all four subunits of the asymmetric unit [24]. The C-terminus domain is involved in both cofactor binding and protein oligomerization [23]. In addition to the three main constituents, the structure contains two hinge regions that connect the substrate-binding and cofactor-binding domains. When substrates bind, the hinge region changes conformation, closing the cleft between the substrate-binding domain and the cofactor-binding domain of the respective chain [24]. In the structure of NfSAHH, all subunits exhibit a closed conformation.
SAHH is one of the most highly conserved proteins among species, with many of the same amino acids binding the same substrates across homologs. NfSAHH is 62% identical to the human homolog. In the NAD binding region, conserved Lys and Tyr residues characteristic of most SAHH's bind via hydrogen bonds to oxygens of NAD in both NfSAHH and HsSAHH [24] (Fig 1C). Residues involved with binding adenosine (ADO) via polar interactions are also highly conserved [24]. To regulate the entrance of substrates into/out of the active site, there is a highly conserved His-Phe sequence within the cofactor-binding domain. This works as a molecular gate that, when the protein is in open conformation, allows access to the substratepocket [23].
A search of the PDB for HsSAHH structures found multiple inhibitor bound human structures including 1LI4 (neplanocin) and 5W49 (oxadiazole compound). A comparison of the NfSAHH structure to the neplanocin bound human structure revealed a highly conserved conformation of the protein. In the neplanocin bound structure, the two domains of the monomer are similar to the NfSAHH structure. However, in the oxadiazole bound structure, two domains of the monomer are in a more open conformation, where the C-terminal and Nterminal domains have opened up relative to each other in a hinge-opening motion. The oxadiazole compound stretches across the interface and is surrounded by 11 residues within 4 Å. Of the 11 residues coordinating the inhibitor oxadiazole compound of the 5W49 structure, 10 are identical between the Naegleria and human SAHH. Only one change of Met351Thr relative to the human enzyme is present, suggesting a highly conserved inhibitor binding site (Fig 1C  and 1D).
The crystal structure of NfSAHH (Fig 1A) contains the adenosine substrate and NAD cofactors bound to the active site to guide structure-activity relationships that could help to optimize adenosine analog compounds. The sequence differences that line the access channel at the dimer interface allow a rational approach to selectively inhibit the otherwise highly conserved active site [27]. Amoeba SAHHs have an additional helix insertion that in NfSAHH forms a hydrophobic groove accessible from the adenosine binding site (Fig 1B and 1C). Specificity could be achieved by designing compounds that simultaneously target this hydrophobic pocket and the active site ( Fig 1E). Thus, we feel a reasonable case can be made that structural differences, close to the active site, would allow development of specific NgSAHH inhibitors supporting development of a therapeutic.
N. fowleri phosphoglycerate mutase (NfPGM), a glycolysis enzyme, catalyzes the isomerization of 3-phosphoglycerate and 2-phosphoglycerate during glycolysis and gluconeogenesis and is regarded as a key enzyme in most organism's central metabolism [28]. There are two distinct forms of PGMs, differentiated by their need of 2, 3-bisphosphoglycerate as a cofactor. A. Quaternary structure of NfSAHH (5V96) with chain A, B, C and D colored blue, green, pink and yellow respectively with ligands ADO and NAD colored red and orange respectively. B. Structural alignment of Chain A HsSAHH (1LI4) and Chain A NfSAHH (5V96) colored pink and blue respectively with NAD ligand colored orange and neplanocin inhibitor colored red. C. Depiction of the electron density surrounding the NAD ligand from 5V96 model. |F o |-|F c | electron density omit map shown sculpted around NAD ligand at 2.0 Å, calculated using the composite-omit-map function of Phenix [25]. Residues which contact the ligand within 4 Å are shown in ball and stick (blue:nitrogen; red: oxygen; gray:carbon). Dotted lines represent polar contacts within 4 Å of the ligand, calculated using PyMol [26]. Conserved Lys and Tyr residues are directly labelled with Thr383 exhibiting the human amino acid substitution. D. Depiction of the electron density surrounding the oxadiazole compound from 5W49 model. |F o |-|F c | electron density omit map shown sculpted around the oxadiazole compound at 2.0 Å, calculated using the composite-omit-map function of Phenix [25]. Residues which contact the ligand within 4 Å are shown in ball and stick (blue: nitrogen; red: oxygen; gray:carbon). Dotted lines represent polar contacts within 4 Å of the ligand, calculated using PyMol [26].

PLOS ONE
PGM in mammals require the cofactor whereas PGM present in nematodes and bacteria do not [29]. The NfPGM is likely the cofactor-dependent PGM type. The crystal structure of NfPGM (PDB: 5VVE) was solved at a resolution of 1.7 Å and consists of 250 amino acid residues (~30 kDa).
The HsPGM and Nf PGM structures share 61% identity. Residues surrounding the binding pocket for citrate acid are all conserved, with the exception of a conservative change from a Thr30 (Nf) to Ser30 (Hs) (Fig 2B). A comparison of the NfPGM structure to the homologous human enzyme HsPGM (PDB: 5Y65) shows a conformational opening of the substrate binding site to accommodate the KH2 ligand. However, the residues surrounding the inhibitor molecule and supporting the movement of the peptide are identical between the two enzymes. It is likely that with this high homologous identity that NfPGM is not a strong candidate for selective active site inhibitor design.
N. fowleri protein arginine N-methyltransferase (NfPRMT1) methylates the nitrogen atoms found on guanidinium side chains of arginine residues within proteins. The methylation of nucleotide bases is a well-known mechanism of importance that influences DNA, nucleosomes, and transcription functionalities [30]. The enzyme is highly conserved across eukaryotes. Faulty regulation or deviating expression of PRMTs is associated with various diseases including inflammatory, virus-related, pulmonary, and carcinogenesis [31]. Overexpression of PRMTs has been observed in multiple forms and types of cancer, including PRMT1v1 overexpression in colon cancer [32] and large increases of PRMT1v2 in breast cancer [33]. Inhibitor discovery and testing using PRMTs in cancer has been frequently employed [31]. NfPRMT1 was compared to the drug bound structure of HsPRMT1 (PDB: 6NT2) (Fig 3). The protein binds ligands at a dimer interface closing around two inhibitor molecules, one on each monomer. A large ligand binding loop is disordered in the NfPRMT1 structure, presumably becoming ordered and visible in the crystal structure in the presence of inhibitor in the human structure. Due to the large binding surface for peptide substrates, PRMTs typically are promiscuous in nature with a wide range of binding substrates [31]. Comparison of over 40 PRMTinhibitor complexes revealed 5 distinct binding mechanisms at multiple sites including active site and allosteric binding pockets [34]. Isozyme specific peptide mimics have been identified

PLOS ONE
which preferentially bind HsPRMT1 vs. HsPRMT5 enzyme. A similar approach could be considered for selective NfPRMT inhibitor development [35,36]. There is still a need to improve both the affinity and selectivity of these micromolar, sub-micromolar potent PRMT inhibitors as well as to better understand the enzyme's biological and disease processes in greater scope [37].
Despite high sequence identity in the ligand binding pocket, there are distinct differences in side chain orientation between the structures. These residues may change conformation upon binding inhibitor. A number of distinct features of NfPRMT1 exist which can be exploited for potential structure-based approach to developing selective allosteric inhibitors against the Nf enzyme. A methionine is present in the NfPRMT1 structure adjacent to the adenine moiety of the S-adenosyl homocysteine (SAH) which differs significantly from all nineknown human PRMTs. The substrate binding region is lined by residues variant between Nf and all nine-known human PRMTs; for example, though the NfPRMT1 pocket is similar to Structural alignment of Chain A HsPRMT1 (6NT2) and Chain A NfPRMT1 (6CU5) colored pink and blue respectively with SAH ligand and GSK3368715 colored yellow and red respectively. C. Depiction of the electron density surrounding the GSK3368715 inhibitor from the 6NT2 model. |F o |-|F c | electron density omit map shown sculpted around GSK3368715 inhibitor at 2.0 Å, calculated using the composite-omit-map function of Phenix [25]. N-terminal residues which contact the ligand within 4 Å are shown in ball and stick (blue:nitrogen; red: oxygen; gray: carbon) and are labelled. All labelled residues are within the allosteric inhibition site of HsPRMT1. https://doi.org/10.1371/journal.pone.0241738.g003

PLOS ONE
the allosteric inhibition pocket of HsPRMT1, there are two tyrosines lining the pocket of the HsPRMT1 [38] (Fig 3C). These N-terminal residues which interact with inhibitors of HsPRMT1 are largely not present in NfPRMT1 [39]. Thus, inhibitors that selectively target NfPRMT1 vs. the 9 HsPRMTs are envisioned due to structural differences near the ligand binding sites.
N. fowleri peptidylprolyl isomerase (NfPPI) is a member of a superfamily of proteins comprised of three structurally distinct main families: cyclophilins, FK506 binding proteins (FKBPs), and parvulins. Based on structural and sequence alignment, the N. fowleri structure falls in the FKBP family, a group of enzymes inhibited by compounds such as FK506 and rapamycin [40]. PPIs assist protein folding and influence protein denaturation kinetics by catalyzing the cis/trans isomerization of peptide bonds preceding prolyl residues [41]. The enzymes participate in a diverse array of processes ranging from signal transduction to gene regulation and have been found to have close interaction with heat shock 90 proteins [42]. PPI inhibitors are an emerging class of drugs for many therapeutic areas including infectious diseases and many potent small molecule inhibitors have been derived for each of the members of the superfamily. However, selective inhibitor design has been difficult due to the shallow, broad, solvent-exposed active sites and their conservation between homologs and protein families [43].
The interior of the binding pocket of NfPPI is mostly hydrophobic. Only four putative hydrogen-bonding interactions are observed between the enzyme and substrate. All residues involved in polar interactions in the NfPPI are also present in the human homolog HsFKBP51 (PDB: 1KT0) (Fig 4C), but the regions occupied by two hydrophilic residues in HsFKBP51 (Ser118 and Lys121) are instead occupied by hydrophobic residues, (Ile and Leu, respectively) ( Fig 4D) [42]. Another difference found in the conformation of this loop region is the insertion of an additional residue after Gly95 of NfPPI. These changes in structure and sequence may lead to selective inhibition and thus establish PPIs as a selective drug target for Naegleria.
N. fowleri Prolyl-tRNA synthetase (NfProRS). Aminoacyl-tRNA synthetases (ARSs) are globally essential enzymes among all living species. Their roles in protein translation and biosynthesis have been heavily researched and understood as attractive therapeutic targets. Recently, evidence of their propensity for adding new sequences or domains during ARS evolution hints at broader functions and complexity outside of translation [44]. Protein translation as a drug target has been validated for anti-infective compounds for a wide array of microbes [45]. The natural product known as febrifugine, a quinazolinone alkaloid, and its analogues have shown antiparasitic activity in targeting ProRS. Halofuginone, a halogenated derivative of febrifugine, has shown promising potency though a lack of specificity, in that it inhibits both the parasite and human ProRS [45].
The structure of NfProRS folds into a α2 homodimer (Fig 5A) with each subunit containing three domains characteristic of Class II ARSs: the catalytic domain, the anticodon binding domain, and the editing domain (Fig 5C). The NfProRS catalytic domain features the three motifs which are exclusively conserved between class II ARSs for both sequence and structurefunction (Fig 5D). Motif 1 is located at the interface of the dimer and is hypothesized to facilitate communication between the active sites of the two subunits [46]. Motif 2 consists of βstrands connected by a variable loop which makes critical contacts with the acceptor stem of tRNA Pro and thus plays an important role in proper tRNA recognition [47]. Motif 3 is made up of entirely hydrophobic residues and comprises an integral part of the aminoacylation active site.
Alignment of NfProRS bound to AMP and proline ligands (PDB: 6NAB) with apo HsProRS (PDB: 4K87) exhibits no significant structural changes between the apo and ligand forms of the ARS (Fig 5B). The eukaryotic and archaeal origins of these ProRS make them suitable comparisons for the reason mentioned earlier: their strict conservation in all three structural domains. Both the proline and AMP bound NfProRS (PDB: 6NAB) and the halofuginone and AMP-PNP bound NfProRS (PDB: 6UYH) structures have been solved. The proline and AMP NfProRS (6NAB) shares structural homology with the halofuginone liganded ProRS (6UYH) and halofuginone binding induces a conformational change of residues 80-87 of the N. fowleri enzyme. In the proline bound 6NAB structure, residues 80-88 form a two-turn alpha helix. However, the halofuginone compound displaces Phe87 and disrupts the short helical structure. Residues making up this helix (EKDHVEGFS) are disordered in the 6UYH coordinate set. B. Structural alignment of Chain A HsPPI (4DRO) and Chain A NfPGM (6MKE) colored pink and blue respectively with FK506-AN inhibitor and FKBP51 colored red and yellow respectively. C. Depiction of the electron density surrounding the FK506-AN inhibitor from the 6MKE model. |F o |-|F c | electron density omit map shown sculpted around FK506-AN inhibitor at 2.0 Å, calculated using the composite-omit-map function of Phenix [25]. Residues which contact the ligand within 4 Å are shown in ball and stick (blue:nitrogen; red: oxygen; gray:carbon). Dotted lines represent polar contacts within 4 Å of the ligand, calculated using PyMol [26]. Conserved polar interactions are labelled. D. Depiction of the electron density surrounding the FKBP51 compound from the 4DRO model. |F o |-|F c | electron density omit map shown sculpted around the FKBP51 compound at 2.0 Å, calculated using the composite-omit-map function of Phenix [25]. Surrounding in a gray cartoon is Chain A of the HsPPI with hydrophilic residues surrounding the active site that diverge from the NfPPI hydrophobic residues annotated. https://doi.org/10.1371/journal.pone.0241738.g004

PLOS ONE
Naegleria fowleri: Protein structures to facilitate drug discovery The equivalent region of the human ProRS (PDB: 4K87) is structurally homologous to the proline bound N. fowleri in the absence of halofuginone binding. The equivalent helix in 4K87, residues 90-98 (EKTHVADFA), includes non-conservative amino acid substitution adjacent to the crucial phenylalanine which must be displaced for halofuginone to bind the human enzyme, including Glu85-Gly86 which are Ala95-Asp96 in the human sequence. Exploiting differences in the mobility of this non-conserved loop adjacent to the active site of NfProRS and HsProRS could enable selective targeting. In addition, allosteric inhibitors that take advantage of sequence differences throughout the NfProRS might be found by screening, as was the case for P. falciparum ProRS (PfProRS) [48]. Thus, ProRS may be a reasonable drug target for N. fowleri drug development.

Conclusion
This manuscript reports 19 new protein structures from N. fowleri that are potential targets for structure-based drug discovery. Eighteen of 19 possess a >31% difference in AA alignment in comparison to the human homologs, suggesting selective inhibitors may be found by screening campaigns. In this paper we analyzed five of the N. fowleri enzymes that have ligands that define the active sites and compared them to human homologs. Though all are somewhat homologous at the active site, differences in four of the five N. fowleri enzymes analyzed support the hypothesis that selective active site inhibitors could be developed as therapeutics.
There are therapeutic opportunities, as well for some of the other 14 unexamined proteins. For example, the Nf serine tRNA synthetase (NfSerRS) structure (PDB: 6BLJ). SerRS is required for charging tRNAs with serine critical for protein synthesis and thus is an essential gene. An insertion of four residues (391-395) adjacent to the substrate and tRNA binding sites creates a pocket with differential sequence identity to HsSerRS and provides a foothold for the design of selective inhibitors blocking tRNA charging.
Even if selective active site inhibitors cannot be identified, high-throughput screening of compound libraries can still reveal selective inhibitors, as was found for Plasmodium falciparum ProRS compared with human HsProRS [48]. In this case, two allosteric inhibitors were found to bury themselves into a lobe of the PfProRS enzyme, distant from the active site, and inhibit the activity of the PfProRS enzyme, but not HsProRS. Selective high throughput screening of a eukaryotic enzyme including counter screening against the homologous human enzyme, can also identify selective inhibitors as has been shown by us in the case of Plasmodium N-myristoyltransferase [34]. Another approach that could be considered is fragmentbased screening for novel building blocks to support a medicinal chemistry effort [49]. Indeed, this fragment-based approach can take advantage of some of the selectivity pockets discussed for each target above.
It is a good time to pursue drug development for N. fowleri therapeutics, for in addition to the new structures of potential drug targets reported here, a number of advances are being made. The availability of N. fowleri growth assays for moderate-to-high-throughput screening efforts has allowed identification of compounds for repurposing efforts [12,50] and has led to chemical implication of potential targets for drug development [50]. Some targets, such as HMG-CoA reductase [51][52][53] and other sterol biosythesis pathway enzymes [54], farnesyltransferase [53], and glucokinase [19] have been explored through chemical inference and biochemical inhibition, and these targets are well-positioned for structure-guided drug development. A mouse model for N. fowleri meningoencephalitis has long been available and recapitulates the pathophysiology, while responding to anti-amoebic drugs [12,55]. Some aspects of N. fowleri research could be improved to support drug development. A facile means to manipulate the genome of N. fowleri is not yet available, and would be useful for establishing essentiality of targets as well as chemical validation of targets. The need for new therapeutics, as well as advances in N. fowleri biology and biochemistry, support a concerted effort for N. fowleri drug discovery. Focusing on essential genes and drug targets of other eukaryotes and producing a pool of potential drug target structures, SSGCID has created a foundation on which to build structure-based drug discovery. The relatively quick successful progress of targets through the pipeline has catalyzed a consortium of investigators interested in addressing N. fowleri drug discovery.

Bioinformatics
The complete genome and transcriptome is available on the EupathDB BRC website (www. amoebadb.org) [2]. The complete ORFs and annotated predicted proteome from Naegleria fowleri strain ATCC30863 was downloaded from AmoebaDB release 24. Analysis of the ORFs indicated that 39% were missing a start codon and 12% were missing a stop codon. The sequence authors, the Wittwer group at the Spiez Laboratory, confirmed that the 40% of transcripts without an AUG start codon were most likely due to the ORF finder they used, which searches for the longest ORFs in the RNA assembly, but has no start codon finding function. To address this issue, we applied a conservative strategy to select high quality sequences from the draft genome. A sequence homology search using BlastP against DrugBank v.4.3 targets (4,212 sequences) [15] and potential drug targets in the SSGCID pipeline (9,783 sequences) was performed. Sequences with at least 50% amino acid sequence identity over 70% coverage were selected for further filtering. Manual inspection indicated that half the potential targets without a start codon appeared to be significantly truncated when compared to the Naegleria gruberi and other closely related Eukaryota orthologues. Therefore, additional filters were applied to remove likely truncated sequences: (1) targets without a start or stop codon were discarded, (2) remaining candidates were blasted against the Naegleria gruberi proteome and sequences with over 10% length difference to their Naegleria gruberi orthologues were discarded, and (3) shorter variants with 100% match to a longer ORF transcript were discarded. In the end, 178 ORFs with a start and stop codon were identified, nominated, and approved by the SSGCID target selection board and NIAID to attempt structure determination.

High-throughput protein expression and purification
All proteins discussed were PCR-amplified using cDNA as a template. RNA template of Naegleria fowleri ATCC30215 was provided by Dr. Christopher Rice (University of Georgia, Athens) through RNA extraction and cDNA synthesis using previously published methodology in Acanthamoeba [56]. PCR, cloning, screening, sequencing, expression screening, large-scale expression and purification of proteins were performed as described in previous SSGCID publications [18,57]. All described constructs were cloned into a ligation-independent cloning (LIC) pET-14b derived, N-terminal His tag expression vector, pBG1861. Targets were expressed using chemically competent E.coli BL21(DE3)R3 Rosetta cells and grown in largescale quantities in an auto-induction media [58]. All purifications were performed on an Ä KTAexplorer (GE) using automated IMAC and SEC programs in adherence to prior established procedures [18].

Crystallization and structure determination
Crystal trials, diffraction, and structure solution were performed as previously published [16]. https://www.zotero.org/google-docs/?ENVcX2 Protein was diluted to 20 mg/mL and single crystals were obtained through vapor diffusion in sitting drops directly. The screens and conditions that yielded the crystals are listed in S1 Table. The screens that were used to find the crystallization conditions were typically JCSG+ (Rigaku Reagents), MCSG1 (Microlytic/Anatrace), Morpheus (Molecular Dimensions), in some cases supplemented by ProPlex (Molecular Dimensions) and JCSG Top96 (Rigaku Reagents). All data was integrated and scaled with XDS and XSCALE [59]. Structures were solved by molecular replacement with MOLREP [60][61][62], as implemented in MoRDa. The structures were refined in iterative cycles of reciprocal space refinement in Phenix and real space refinement in Coot [63,64]. The quality of all structures was continuously checked using MolProbity [65] as implemented in Phenix. Structural comparisons for analysis among homologues was done using DALI Protein Structure Comparison Server.
Supporting information S1