Identification of Attractive Drug Targets in Neglected-Disease Pathogens Using an In Silico Approach

Background The increased sequencing of pathogen genomes and the subsequent availability of genome-scale functional datasets are expected to guide the experimental work necessary for target-based drug discovery. However, a major bottleneck in this has been the difficulty of capturing and integrating relevant information in an easily accessible format for identifying and prioritizing potential targets. The open-access resource TDRtargets.org facilitates drug target prioritization for major tropical disease pathogens such as the mycobacteria Mycobacterium leprae and Mycobacterium tuberculosis; the kinetoplastid protozoans Leishmania major, Trypanosoma brucei, and Trypanosoma cruzi; the apicomplexan protozoans Plasmodium falciparum, Plasmodium vivax, and Toxoplasma gondii; and the helminths Brugia malayi and Schistosoma mansoni. Methodology/Principal Findings Here we present strategies to prioritize pathogen proteins based on whether their properties meet criteria considered desirable in a drug target. These criteria are based upon both sequence-derived information (e.g., molecular mass) and functional data on expression, essentiality, phenotypes, metabolic pathways, assayability, and druggability. This approach also highlights the fact that data for many relevant criteria are lacking in less-studied pathogens (e.g., helminths), and we demonstrate how this can be partially overcome by mapping data from homologous genes in well-studied organisms. We also show how individual users can easily upload external datasets and integrate them with existing data in TDRtargets.org to generate highly customized ranked lists of potential targets. Conclusions/Significance Using the datasets and the tools available in TDRtargets.org, we have generated illustrative lists of potential drug targets in seven tropical disease pathogens. While these lists are broadly consistent with the research community's current interest in certain specific proteins, and suggest novel target candidates that may merit further study, the lists can easily be modified in a user-specific manner, either by adjusting the weights for chosen criteria or by changing the criteria that are included.


Introduction
Several strategies exist for the pursuit of drugs to treat neglected tropical diseases. Major approaches can generally be classified as: (A) label extension, extending the indications of existing drugs for other conditions to tropical diseases; (B) piggyback discovery, in which the discovery of new drugs is focused on one or a few classes of well-studied and validated targets; and (C) de novo drug discovery [1]. These strategies collectively seek to exploit two possible sets of drug targets: those that have been validated in other organisms and diseases, and those that have not -perhaps because they are unique to neglected-disease pathogens -but that nevertheless have potential as novel sites of action.
Since experimental investigations of possible drug targets are time-consuming and expensive, it is worthwhile to conduct in silico analyses [2][3][4][5][6][7][8] to identify the proteins most worthy of experimental follow-up. These analyses consider traits commonly thought to be desirable in a drug target, including essentiality, druggability (whether drug-like molecules are likely to interact with the target), assayability, specificity/selectivity (potential for inhibiting the pathogen without harming the host), and importance in life-cycle stages of the pathogen relevant to human health. Inferring these traits from experimental data is a nontrivial task. For example, guesses at a target's essentiality can be made from gene knockout experiments with the pathogen of interest [9] or related organisms [3,6], from naturally occurring gene deletions in clinical isolates [10], from microarray and/or proteomic data [11], and/or from metabolic chokepoint (flux balance) studies [12,13]. Since the best choices are partly a matter of opinion, there is a clear need for databases that are flexible enough to integrate datasets from different sources and to filter these datasets based on the preferences of individual researchers.
To facilitate target-focused analyses for pathogens prioritized by the World Health Organization's Special Programme for Research and Training in Tropical Diseases (TDR), TDRtargets.org [14] was created as a central repository of target-related data. The database may be used for two general scientific tasks: (A) analysis of individual proteins, finding information that relates to their potential as drug targets; and (B) genome-level analysis, sorting and ranking multiple proteins as drug target candidates according to user-specified criteria. The latter task is the main focus of this paper.
TDRtargets.org is designed to facilitate multiple approaches to target prioritization. Users can browse target lists that others have posted (http://tdrtargets.org/published), generate their own lists from standard criteria offered by the database, and/or extend the criteria used to rank prospective targets by uploading files representing additional published or unpublished data. A previous publication [14] has outlined the user interface and concepts underlying the possible queries. In this study, we provide examples of whole-genome prioritization of targets, focusing on key issues for the specific diseases covered. We use these prioritization tools to generate lists of promising drug targets for TDR organisms -lists which provide useful starting points for target characterization in these organisms, as well as illustrate the general utility and versatility of TDRtargets.org in identifying and ranking targets.

Database Infrastructure
We have previously described the construction of the TDRtargets.org database, as well as the formulation of searches (queries) to identify proteins meeting criteria of interest and the viewing, saving, and exporting of search results [14]. Since then, while the overall workflow of the database has remained the same, additional genomes and datasets have been included (see below), and several improvements have been implemented on the user interface side of the database. Although users have always been able to perform ''weighted union'' queries, with different weights (point values) assigned to different user-specified criteria, formulating these queries and viewing and adjusting their results has recently been made more convenient. To construct a weighted union query from the website's target search page, a user (1) selects a pathogen (e.g., P. falciparum), (2) selects a criterion (e.g., functional category = enzyme) with which to query the pathogen genes, (3) enters a name and a weight for the query in the ''Run this query'' sub-menu at the bottom of the page, (4) clicks the ''Next Query'' button, and (5) repeats steps 2 to 4 until the last criterion is entered, at which point the user selects ''Run this query'' rather than ''Next Query.'' The search results are displayed on a page where users have the option of changing the previously entered weights for each criterion (Figure 1). (These results are archived on the user's history page, where he/she can combine different subsets of previous queries with the Union function to obtain new ranked target lists.) The presentation of ranked lists has also been revised to display the criteria met by each protein (Figure 1). Further flexibility in data analysis is provided by an option to export the results to a dynamic spreadsheet so that proteins' fulfillment of individual criteria can be viewed and the weights of the criteria can be adjusted offline.

Using External Data in TDRtargets.org
The TDRtargets.org web application lets users take advantage of datasets obtained externally or in-house. Lists of genes matching user-defined criteria may be saved as text files (each containing a column of gene identifiers -one per line -plus an optional second column for point values, if the targets have been ranked outside of TDRtargets.org) and uploaded at the user's history page. Uploaded lists can be combined with other gene sets from the same organism using any of the history page tools, including ranking by weighted union.
In the present work, a number of target lists meeting different criteria were obtained from external resources, uploaded into TDRtargets.org, and used in various prioritization strategies (see Results), as follows. (A) T. cruzi genes with proteomic evidence of expression in amastigotes (at least 2 mass spectra/peptides mapped to the protein) were obtained from TriTrypDB.org [15]. (B) S. mansoni genes with evidence for expression at the transcript level (i.e., genes with mapped expressed sequence tags derived from the ''egg,'' ''schistosomula,'' and ''adult worm'' cDNA libraries) were taken from SchistoDB.net [16]. (C) Drosophila melanogaster genes associated with abnormal phenotype tags (i.e., ''lethal'' and ''neurophysiological defect'') were taken from FlyBase.org [17]. This list was converted into a list of the corresponding S. mansoni orthologs (available from OrthoMCL.org [15]) before uploading into TDRtargets.org.

Genome Data and Functional Datasets
The current version of the database includes genome data for ten different pathogens (Brugia malayi, Leishmania major, Mycobacterium leprae, Mycobacterium tuberculosis, Plasmodium falciparum, Plasmo-

Author Summary
In cell-based drug development, researchers attempt to create drugs that kill a pathogen without necessarily understanding the details of how the drugs work. In contrast, target-based drug development entails the search for compounds that act on a specific intracellular target-often a protein known or suspected to be required for survival of the pathogen. The latter approach to drug development has been facilitated greatly by the sequencing of many pathogen genomes and the incorporation of genome data into user-friendly databases. The present paper shows how the database TDRtargets.org can identify proteins that might be considered good drug targets for diseases such as African sleeping sickness, Chagas disease, parasitic worm infections, tuberculosis, and malaria. These proteins may score highly in searches of the database because they are dissimilar to human proteins, are structurally similar to other ''druggable'' proteins, have functions that are easy to measure, and/or fulfill other criteria. Researchers can use the lists of highscoring proteins as a basis for deciding which potential drug targets to pursue experimentally.
Identification of Drug Targets In Silico www.plosntds.org dium vivax, Schistosoma mansoni, Toxoplasma gondii, Trypanosoma brucei, and Trypanosoma cruzi) and one endosymbiont bacterium (Wolbachia, endosymbiont of B. malayi). The depth of data coverage in various functional datasets (searchable at http://tdrtargets.org/ search) varies for different organisms; wherever possible, gaps in coverage are compensated for by mapping relevant information from orthologous proteins in other organisms. (For example, protein structure data available for P. falciparum proteins were mapped to P. vivax proteins.) Ortholog identification on whole genomes was carried out using tools available from OrthoM-CL.org [18]. Data recently added to TDRtargets.org include curated data on production of recombinant proteins and activity assays from BRENDA [19]; three-dimensional models of proteins from B. malayi and its endosymbiont Wolbachia, M. leprae, and S. mansoni, obtained from ModBase [20]; and phylogenetic information on Arabidopsis thaliana (so that users can search for proteins with or without orthologs in plants).

Ranking Target Genes via Weighted Unions
TDRtargets.org has a flexible ranking system for prioritizing target proteins. In multi-criteria searches, it is possible to take a Boolean intersection of the criteria so that only those proteins with all of the desired traits (e.g., essentiality AND druggability AND assayability, etc.) are selected. However, a protein may lack one or more preferred properties and still be the target of an effective drug (Table 1). Therefore the prioritization queries presented below are devised as weighted unions (see ''Database infrastructure'' above), in which each criterion is assigned a subjective weight (point value) and targets earn points for each criterion they meet. (Less important and undesirable criteria are given small and negative weights, respectively.) These queries return ranked lists of all potential targets, ordered by cumulative score. Target lists can then be re-ranked, if desired, by changing the weights and/or adding additional criteria (see ''Database infrastructure'' above).

Overview of Queries Presented in This Paper
The criteria used in generating the lists presented below are summarized in Figure 2. As a starting point, a basic set of criteria of general interest were chosen to frame a ''standard'' query for identifying targets in L. major (see Query 2 in Figure 2). In compiling this basic set of criteria, we included most datasets that are commonly available for organisms with complete genomic information so that the standard query could be easily applied to different pathogens. Queries 3, 4, and 5 of Figure 2 are examples of extending the standard query. Queries 6, 7, 8, and 9 of Figure 2 are framed in a pathogen-specific manner to prioritize target proteins from a particular metabolic pathway, subcellular location, or life-cycle stage. These queries make use of criteria based on external datasets uploaded to TDRtargets.org. (Readers can explore the upload tool at http://tdrtargets.org/history.) Queries 10 and 11 of Figure 2 were based heavily on data obtained by manual curation of the literature [21] and homology/orthology analysis for protein-specific information, illustrating how even incompletely annotated genomes are amenable to target identification. Additional details of these queries are noted below.

Searching for Candidate Drug Targets in Leishmania
An example of the weighted-union approach to target prioritization (see Methods) is shown in Query 2 of Figure 2, which covers the Leishmania major genome. In this example, points are awarded for many of the criteria covered in Table 1, plus some additional conditions. From these criteria a list of prioritized targets is generated ( Table 2). Such a list is hardly the final word in Leishmania target selection, however. The researchers who generated the list in Table 2 may subsequently decide that, since essentiality data for Leishmania genes are very limited, they will    In general, the following might be considered desirable target traits: a low molecular weight and a lack of transmembrane (TM) domains (to favor expression and solubility of recombinant protein), existence of 3D crystal structures and ModBase models (for structure-based drug design), absence of orthologs from humans (to favor selectivity), high druggability and compound desirability scores (0-to-1 scale), and a precedent for assayability. consider the presence of an essential ortholog in at least one other organism to be an acceptable predictor of essentiality. Orthologous proteins usually have the same function [22], and several studies indicate that having essential orthologs is predictive of essentiality [23,24]. The researchers could then amend their initial query so that, for example, 50 additional points are awarded to targets whose orthologs are essential in C. elegans, E. coli, M. tuberculosis, and/or S. cerevisiae (the four organisms for which genome-wide essentiality data are available in TDRtargets.org). Such a revision can easily be made by running a new query using the ''Any evidence of essentiality in any species'' option within the Essentiality subsection of the Search For Genes/Targets page and then using the query history page to find the union of this query and the previous one. The results are similar to but distinct from the previous results (Table 3).
Now consider a more drastic revision of the Leishmania search: application of the previous criteria ( Figure 2, Query 3) to two other pathogens, namely P. falciparum and M. tuberculosis. This too is readily done within TDRtargets.org -there is a ''Change species'' option on the Query History page -again highlighting the ease of modifying previous searches. While use of exactly the same criteria to prioritize targets in different species might seem naïve, the results (Tables 4 and 5) are instructive. First of all, the top-ranked proteins of each species are rather different, showing that this search strategy is sensitive to species differences, as opposed to being unalterably biased toward the same proteins in every species. Second, many of the top-ranked targets in each species appear to be appealing options. For example, the three top-scoring targets from each species -dihydrofolate reductase/thymidylate synthase, enoyl-ACP reductase, and triose-phosphate isomerase in Table 2. Preliminary genome-wide prioritization of Leishmania major targets.

Ranking
Gene_ _name Gene product Weight Top targets according to the criteria shown in Query 2 of Figure 2. Complete genome-wide rankings for this example and all other examples discussed in the paper (Tables 3-11) are available online at http://www.tdrtargets.org/published/browse/366. Please note that multiple targets often receive the same total weight, and that the order in which these ''tied'' targets are displayed has no significance. doi:10.1371/journal.pntd.0000804.t002 Figure 2. A summary of the multiparameter search queries presented in this study. Ten different queries (Queries 2-11) are listed as individual columns for which the criteria are shown on the left. For each criterion, the number of qualifying proteins from a given pathogen is shown in black and the associated weight is shown in red within parentheses. Symbols: (#) enzymes were selected by combining searches by EC number and by functional category, except for Queries 10 and 11, which were based only on EC number; (&) the conserved-in-taxon criterion refers to the presence of orthologs in L. major, T. brucei, and T. cruzi (Tables 2 and 3), P. falciparum and P. vivax (Tables 4 and 7), M. tuberculosis and M. leprae (Table 5), and L. major and T. cruzi (Table 8); (") druggability and compound desirability scores were queried using respective cutoff values of $0.6 and .0.3 (Tables 2 to 5), $0.4 and .0.2 (Tables 6 and 7), and $0.5 (druggability scores only; P. falciparum and enoyl-ACP reductase (InhA), glutamine synthetase, and 5-enolpyruvylshikimate-3-phosphate synthase (AroA) in M. tuberculosis -have all attracted interest as proven or prospective targets [25][26][27][28][29][30]. It is interesting that legitimate candidates such as these rise to the top of the target rankings despite certain quirks of this ''one set of criteria fits all species'' example. In the M. tuberculosis prioritization, for instance, many of the top-ranked targets are essential even though the genome-wide mutagenesis data available for this species were not queried. Thus, although these lists are imperfect, they generally suggest that rational choices of criteria lead to plausible and informative rankings of target desirability across species.
T. brucei and P. falciparum: Metabolic Pathway-and Organelle-Specific Targets While TDRtargets.org integrates numerous datasets relevant to target prioritization, it cannot possibly anticipate every possible prioritization strategy that could be used by any given researcher. Accordingly, users can upload (weighted or unweighted) lists of targets meeting any criteria for which they have relevant data; these may then be combined with other queries covered by TDRtargets.org. Supplementation of standard TDRtargets.org criteria with a user-provided criterion is illustrated in the following example. Researchers specializing in the T. brucei glycolytic pathway are convinced that this pathway is essential in these parasites and wish to rank the enzymes within this pathway for their suitability as drug targets. Since they already assume the pathway to be essential and know glycolysis is also present in host cells, they may not address these issues in their search criteria, but may instead award points as listed in Query 6 of Figure 2. The query shown there combines criteria addressing integral TDRtargets.org data (such as availability of structural models) with a usergenerated list of ''bonus points'' to some T. brucei enzymes in proportion to their relative control over the glycolytic flux [31]. The rationale for such a scoring might be that the greater an enzyme's flux control, the less completely it must be inhibited for flux through the entire pathway to be affected (and thus the better a target it is). In this example, the inclusion of flux control as a criterion lifts the two glycosomal orthologs of glyceraldehyde-3phosphate dehydrogenase, the enzyme with the highest control Table 3. Revised L. major rankings after incorporating an essential-in-other-species criterion.

Ranking
Gene name Gene product Weight Top targets according to the criteria shown in Query 3 of Figure 2. Italicized targets are those that were not top-ranked in the list shown in Table 2 Table 6). The recent genetic validation of this enzyme [32] likewise identifies it as a possible target of interest. Interestingly, hexokinase was thought to have a much lower control coefficient [31] but may also have promise as a drug target [33]. The next scenario also employs a user-provided list, which in this case permits scrutiny of a specific organelle rather than a specific metabolic pathway. Consider a newly independent crystallographer with a special interest in Plasmodium apicoplasts, which are absent from the human host and thus are likely to contain many appealing drug targets [34]. The PlasmoAP algorithm [35] predicts that 485 proteins are localized to the apicoplast; the user can download this list from PlasmoDB.org [36], manually delete proteins that seem unlikely to reside in the apicoplast, and then upload the modified list to TDRtargets.org. In sorting through the ,400 proteins likely to reside in the apicoplast, the user may decide to minimize competition with labs already working on apicoplast biology by penalizing well-studied proteins (e.g., subtracting 100 points from targets whose 3D structures have already been solved) while rewarding other desirable characteristics such as those discussed above (likely essentiality, lack of orthologs in humans, etc.). Finally, a previous publication [37] has convinced the hypothetical user that a low molecular weight and low isoelectric point (pI) improve the odds of successful expression of soluble Plasmodium proteins, so those factors are weighted accordingly (Query 7 of Figure 2). The most highly ranked proteins in this example (Table 7) include some proteins (e.g., pseudouridine synthetase and cysteine desulfurase) that are rarely mentioned in the Plasmodium literature, consistent with this researcher's desire to explore truly novel target options.

Trypanosoma cruzi: Candidate Targets Associated with an Intracellular Lifestyle
Unlike the bloodstream trypomastigotes of African Trypanosomes (Salivaria), the T. cruzi (Stercoraria) bloodstream forms do not replicate, and instead invade cells. In this parasitic strategy, Table 4. Application of standard search criteria to P. falciparum.

Ranking
Gene name Gene product Weight which is shared with Leishmania spp., the replicative amastigotes are the intracellular parasite forms that persist and maintain the infection. Given the early evolutionary divergence of Salivarian trypanosomes [38] and the different strategies used by Salivarian and Stercorarian parasites to mount and maintain an infection, these groups of parasites may exhibit numerous instances of (A) gene loss and (B) gene duplications followed by neofunctionalization [39]. Proteins that are orthologous between T. cruzi and Leishmania but that lack T. brucei counterparts may represent proteins vital to intracellular survival and/or growth, which could be excellent targets for drug development.
To look for such proteins, we used a general strategy similar to that used for Leishmania (see Query 3 of Figure 2) but now focused on T. cruzi, with an extra phylogeny-based restriction: additional weight was added to proteins that have been conserved in Leishmania and T. cruzi but that have been lost or substantially changed in T. brucei. The attributes and weights used in this query are shown in Query 8 of Figure 2. The strategy also relies on proteomic evidence of expression in intracellular amastigotes [40]. However, because the proteomic data have a low coverage of the proteome, only a moderate weight has been assigned to this criterion. (This illustrates users' ability to assign relative weights based not only on which characteristics they consider predictive of target desirability, but also on their confidence that available experimental datasets accurately reflect those characteristics.) The results of this prioritization of T. cruzi targets are shown in Table 8. Because of the hybrid nature of the strain used to sequence the genome of T. cruzi, the list is somewhat redundant: most single copy genes appear twice in all genome databases. The top 32 targets include representatives of validated pathwaysergosterol biosynthesis, as represented by sterol C-24 reductase, and glycolysis, as represented by glucokinase -and other interesting alternatives for drug development. As suggested above, glycolysis is an essential pathway in trypanosomes, and the glycosome-localized glucokinase has attracted interest as a possible target since it was discovered in the sequenced Leishmania and T. cruzi genomes [41]. On the other hand, the top-ranked sterol C-24 reductase provides a good example of the attractiveness of the phylogenetic criteria used in this strategy. The ergosterol biosynthesis pathway is also present in T. brucei, although it is not essential for the bloodstream forms, which scavenge sterols from the host [42]. This highly ranked C-24 reductase belongs to the OrthoMCL ortholog group OG4_16908 (OrthoMCL version 4), which contains orthologs from the genomes of T. cruzi, L. major, and yeast (ERG4). However, this enzyme is apparently absent in the genomes of T. brucei TREU927, T. brucei gambiense, T. vivax, and Another top-ranking target in Table 8 is the T. cruzi serine acetyltransferase (TcSAT), involved in the de novo synthesis of cysteine, which is present in Leishmania and T. cruzi and absent in T. brucei [43]. Cysteine in these parasites is important for the biosynthesis of polyamines and for antioxidant metabolism based on trypanothione, the trypanosome equivalent of glutathione. Inhibitors of the E. coli SAT enzyme have recently been shown to inhibit the growth of Entamoeba histolytica, another pathogen that is highly sensitive to oxidative stress [44].
Other interesting targets in this list include a putative amine oxidase (Tc00.1047053511277.600) which further analysis shows is conserved in several Leishmania species but absent in sequenced T. brucei subspecies; a putative phosphatidate cytidylyltransferase (Tc00.1047053509073.70) that belongs to an ortholog group with a very restricted phylogenetic distribution (OG4_29276), with members from many Leishmania species with complete genomes, Entamoeba histolytica (another pathogen), and two non-pathogenic species (Thalassiosira pseudonana and Aquifex aeolicus); a protein kinase (Tc00.1047053509287.20) whose yeast orthologs regulate endocytosis through the organization and function of the actin cytoskeleton; and a tyrosine protein phosphatase (Tc00.1047053506839.60) that also shows an unusual phylogenetic distribution, being almost exclusively present in T. cruzi, Leishmania spp., and metazoa.

Mycobacterium tuberculosis: Exploiting Previous Prioritizations
Previous target prioritization efforts [2][3][4][5][6][7][8] raise the question of how these efforts should be viewed in relation to TDRtargets.org. We consider TDRtargets.org to be complementary to others' prioritization work, and anticipate that it can be used to combine Table 6. Prioritization of glycolytic enzymes in T. brucei.

Ranking
Gene name Gene product Weight and apply the ranking methods of other target identification efforts. For instance, a recent paper on M. tuberculosis by Hasan and colleagues [4] provided an excellent synthesis of experimental data to rank targets by persistence in dormant stages. These data (available in [4] as Supplemental Dataset S1, and also at http:// tdrtargets.org/published/browse/379) can be easily interrogated and combined with other queries using TDRtargets.org. For example, while Hasan et al.'s rankings considered proteins essential for growth on defined medium in vitro [45,46], they did not reward proteins thought to be essential for growth in macrophages or in the infection of mice [47,48], which may well be very relevant to human infection. In addition, because Hasan et al. awarded points to proteins with solved crystal structures, it seems apt to give points to proteins whose structures have been solved during the four years that have elapsed since the original analysis was published. TDRtargets.org was therefore used to make a few modifications to one set of Hasan et al. 's rankings: the set that emphasized targets' likely importance in persistent-stage disease. We uploaded a modified version of this list that excluded points for PDB structures, then gave additional points to all genes represented in the Protein Data Bank of crystal structures [49] as of April 2010.
To these subtotals, we added points based on an analysis of latentstage infections by Murphy & Brown [7]. In that analysis, genes were given upregulation and downregulation scores based on their expression in various models of dormancy, thus offering a distinct estimate of genes' importance during latency, and ''attenuation'' scores based on the effect of gene knockouts on growth in various contexts, including the macrophage and mouse studies noted above. (See ''Additional file 1'' from [7]; see also http://tdrtargets. org/published/browse/383.) The combined input of the two previous studies was thus used to create a ''consensus list'' ( Table 9) that might be considered superior to either one alone. Combining the two previous analyses could also be done off-line using spreadsheets, but performing these operations within TDRtargets.org is considerably faster and facilitates retrieval of TDRtargets.org-compiled information on each individual target. Naturally, our ''consensus list'' reflects the limitations of the previous analyses, e.g., the low rankings of important persistent-stage proteins such as Rv0470c (mycolic acid synthase, PcaA) and Rv2583c (GTP pyrophosphokinase, RelA), as discussed by Hasan et al. [4].

Helminths: The Importance of Homology
Since many valuable helminth datasets are only starting to emerge, our attempts to prioritize helminth targets required some analysis beyond the standard TDRtargets.org queries. For example, B. malayi and S. mansoni proteins are not yet scored for druggability in TDRtargets.org, so we assessed their druggability by comparing their amino acid sequences to those of known drug targets in the StARLITe/ChEMBL database [50]. The sequence similarity analysis was performed using BLAST; a helminth protein was considered druggable if (A) it is $80% of the length of the corresponding druggable target, (B) it has an amino-acid sequence that aligns with $80% of the druggable target, and (C) the BLAST expectation value of the alignment is less than 10 210 (database size: 11,508 genes for B. malayi, 13,331 genes for S. mansoni). In addition, proteins' functional importance in helminths was inferred from knockout data taken from their orthologs in C. elegans and D. melanogaster (see Materials and Methods and Queries 10 and 11 of Figure 2). Being able to connect the helminth proteins to similar proteins in other species was thus critical in allowing us to evaluate their potential as drug targets.
Our strategy of relying heavily on orthology and sequence similarity to rank helminth targets is broadly similar to those used by Kumar et al. [6] to rank Brugia targets and by Caffrey et al. [3] to rank Schistosoma targets. However, these authors sought targets that met each of several desired criteria (Boolean ''AND''); for example, Kumar et al. only considered Brugia proteins with orthologs in C. elegans but not in humans, and whose absence causes deleterious phenotypes (according to RNAi of C. elegans orthologs). In contrast, we again used the ''weighted union'' approach to avoid premature elimination of any proteins from consideration as targets. Kumar et al. also took a distinct approach to druggability, rewarding proteins with domains targeted by compounds obeying the Lipinski ''Rule of 5'' [51] and having EC numbers associated with druggability. Additionally, Kumar et al. penalized proteins for hydropathicity (which reduces the ease of recombinant expression) and rewarded them for being expressed (according to a small dataset of expressed sequence tags, or ESTs, encompassing 250 genes); in contrast, we gave additional points to all proteins having EC numbers (and therefore presumed to be enzymes), 3D structural models, and/or bibliographic references.
A comparison of our helminth prioritizations (Tables 10 and 11 Table S1 of [6], also available at http://tdrtargets. org/published/browse/282). This lack of overlap is likely due in part to (A) our emphasis on druggability, as inferred from sequence similarity against targets in the ChEMBL database, and (B) the fact that we didn't penalize proteins with human orthologs (see Discussion subsection ''No List is Canonical''). By adding two conditions to the weighted union to penalize proteins with orthologs in human and in mouse (with weights 240 and 220, respectively), some overlap between both lists can be observed: among our top 200 targets, 32 are also among the top 200 as ranked by Kumar et al. One unique aspect of our list is that it includes four tRNA synthetases among the top 39 proteins. These enzymes have been proposed as drug targets in Brugia, and are also considered good drug target candidates in other parasites such as trypanosomes [52], since they must be essential but often have major structural differences with respect to the human orthologs. The list of 57 recommended Schistosoma targets generated by Caffrey and colleagues (see Table S1 of [3], also available at http:// tdrtargets.org/published/browse/247) includes 18 targets they considered to be of the highest priority because they are druggable, are expressed in relevant life-cycle stages, yield deleterious phenotypes, and are homologous to proteins with solved crystal structures including co-crystallized ligands. Of these 18 targets, eight rank within our 170 top Schistosoma targets. An obvious difference between the two lists is that ours includes nine tubulins among the top 23 proteins. The prominence of the tubulins is consistent with beta-tubulin's validation as a helminth drug target [53]. A number of ATPases also appear among our top targets. The top-ranked target in our list is the alpha (catalytic) subunit of a Na + / K + ATPase (Smp_015020), which in mammals (and probably also in schistosomes) is the target of ouabain and other cardiac glycosides [54]. This target does not appear in the list of 57 targets published by Caffrey et al.; however, the beta subunit of this or a closely related Na + /K + ATPase (Smp_124240) is ranked #52 in this study. Other attractive targets include a putative extracellular-signal-regulated kinase (ERK, Smp_142050), and a putative HMG-CoA reductase (Smp_138590), which is the target of cholesterol-lowering drugs like mevinolin [55].

Stability of Ranked Lists
A relevant question for any ranked list of targets using any strategy is: how different would this list be if the weight given to a certain attribute is changed? Using the M. tuberculosis queries whose results are in Table 5, we analyzed the robustness of the final ranked list, by selecting one attribute at a time and changing its weight from a very low (negative) score to a very high (positive) score. To assess the change observed in the ranked list we counted the number of curated targets (i.e., those with some level of validation) observed within the top 200 targets in the ranked list and used this value as our objective function (see panel B in Figure 3). Using this measure, we observed that a high score is obviously needed for those attributes that are enriched in validated targets (see panel A in Figure 3) in order to find well-known targets at the top of the list. This is also true for attributes that are not independent of these ''good'' attributes (e.g., availability of 3D structures). In contrast, changing the weight of attributes that are not expected to be enriched in validated drug targets (e.g., low molecular weight) does not affect the final result. In these cases, the final lists are all different, but they are consistent in having the highest ranks of the list being enriched in validated targets. In general, of course, targets' rankings within a list can be increasingly stabilized by including more and more relevant criteria in the prioritization.

Old Targets Versus New Targets
In analyzing target candidates, we often wonder what sort of mix of well-studied and not-so-well-studied proteins might be most ''desirable'' at the top of a ranked list. On the one hand, having well-known targets or targets of known drugs at the top of our lists offers some assurance that our search strategies are reasonable (i.e., they serve as ''positive controls'' of the strategy). On the other hand, a method that only identifies well-established targets would not serve the important purpose of suggesting novel targets, so the presence of novel (even ''hypothetical'') targets near the top of a list is also welcome. With the deliberate exception of Table 7, our lists reflect a desire to spotlight both previously validated and newly emerging targets.
In addition to trying to achieve a mix of new and established targets in prioritization lists, users need also to robustly consider which established targets they should classify as ''successful.'' Some targets enjoy long-held high esteem within the research community in the absence of any clinical validation, while other proteins, particularly for the organisms being studied here, are targets of clinically used drugs whose product profiles are unlikely to be acceptable in future drug development programs.

False Negatives
Previous bioinformatic analyses of drug targets [4] have suggested that certain established targets never rank highly unless given artificial boosts in points for that specific purpose. Examples of these ''false negatives'' are also apparent in the lists presented here. For instance, cytochrome b is the known target of the antimalarial drug atovaquone [56], yet it ranks in the bottom 25% of targets represented by Table 4 because it has transmembrane domains (making recombinant expression difficult), is not easy to assay in isolation, lacks a known crystal structure, and so on. Likewise, certain targets of antihelminth drugs -such as the acetylcholine and GABA receptors, glutamate-gated chloride channel, and SLO-1 potassium channel [57,58] -do not appear near the top of our helminth lists. There are several possible (nonexclusive) explanations for this. First, some drugs were found through phenotypic screens and their targets do not meet many of the criteria required in a target-based approach, and thus might not be expected to rank highly. Second, current target prioritization strategies are generally based on the assumption that drugs will cause a loss-of-function phenotype, but most existing helminth drugs lead to gain-of-function phenotypes [57]. Ranking proteins according to their potential as gain-of-function targets might be a fruitful direction of future work. Finally, it is conceivable that the total number of viable drug targets vastly exceeds the number that have been clinically validated, such that the position of many nonvalidated targets ahead of some validated ones is appropriate.

False Positives
The failure of some validated targets to be highly ranked in our lists is not particularly surprising or troublesome, as discussed above. A more interesting issue is that of ''false positives,'' i.e., proteins that do rank highly but have not been validated as drug targets despite considerable effort. For instance, the Leishmania adenosine kinase ranks among the top 25 proteins in Tables 2 and  3, yet turns out to be nonessential in promastigotes [59]. Similarly, the Plasmodium enoyl-ACP reductase (FabI) ranks 2 nd in Table 4, yet is nonessential for blood-stage growth [60]. Among M. tuberculosis proteins, pantothenate kinase (PanK or CoaA) is in the top 100 of the Query 5 rankings (though not among the top 24 and thus not shown in Table 5), yet screens targeting this enzyme yielded no leads active against wild-type M. tuberculosis cells (C. E. Barry, personal communication). PanK activity in vivo appears to be so far in excess of what is required for growth that killing M. tuberculosis cells by inhibiting this enzyme is virtually impossible.
Although such examples can be seen as discouraging, we can use them to ask whether the incidence of false positives can be reduced through the use of additional datasets and search strategies. The nonessesentiality of the Plasmodium FabI during erythrocyte stages is perhaps suggested by the fact that expression of the enzyme is neither high nor tightly regulated during the erythrocyte life-cycle stages [61]. While TDRtargets.org does not currently offer a metric for the periodicity of gene expression in blood-stage Plasmodium, this could be added to future versions of the database.

No List Is Canonical
The target rankings presented here are meant to be illustrative rather than definitive. The lists presented here were sent to experts on relevant neglected diseases for evaluation, and, predictably, we encountered numerous reasonable differences of opinion. For helminths, arguments were made both for and against penalizing proteins with orthologs in humans. The presence of human orthologs suggests an increased likelihood of toxicity in the host; on the other hand, several existing drug targets do have human orthologs. For M. tuberculosis, it was noted that existing drugs tend to target information-processing enzymes (DNA and RNA polymerase, DNA gyrase) rather than metabolic enzymes, so searches for new drugs might pay special attention to that area. Generally applicable suggestions included penalties for proteins that are part of macromolecular complexes, since they are hard to study in isolation, and for proteins of unknown function, since they are hard to study with biochemical or biophysical methods. In addition to legitimate differences of opinion among researchers, the relative appeal of individual targets will continue to change as additional data are gathered. Fortunately, the infrastructure of TDRtargets.org is flexible enough to accommodate different individuals' interests (as seen especially in the lists focused on T. brucei glycolysis and Plasmodium apicoplasts) and the incorporation of new data (most prominent in the rankings for the helminths and for M. tuberculosis persistence). We therefore see TDRtargets.org as a tool that individual scientists may use to explore new research directions, rather than as a final arbiter of proteins' potential as drug targets.
As noted, target prioritization with TDRtargets.org or any other computational method is probably most useful as a prelude to (rather than a replacement of) laborious experimental follow-up work. Experimental characterization of promising targets often requires chemical inhibitors of target activity; therefore lists of target-specific inhibitors would be of great value to the research community. Though TDRtargets.org currently includes a preliminary dataset of such inhibitor-target associations, future editions of the database should offer major expansions and refinements of this dataset.