Decoding the Ubiquitin-Mediated Pathway of Arthropod Disease Vectors

Protein regulation by ubiquitin has been extensively described in model organisms. However, characterization of the ubiquitin machinery in disease vectors remains mostly unknown. This fundamental gap in knowledge presents a concern because new therapeutics are needed to control vector-borne diseases, and targeting the ubiquitin machinery as a means for disease intervention has been already adopted in the clinic. In this study, we employed a bioinformatics approach to uncover the ubiquitin-mediated pathway in the genomes of Anopheles gambiae, Aedes aegypti, Culex quinquefasciatus, Ixodes scapularis, Pediculus humanus and Rhodnius prolixus. We observed that (1) disease vectors encode a lower percentage of ubiquitin-related genes when compared to Drosophila melanogaster, Mus musculus and Homo sapiens but not Saccharomyces cerevisiae; (2) overall, there are more proteins categorized as E3 ubiquitin ligases when compared to E2-conjugating or E1-activating enzymes; (3) the ubiquitin machinery within the three mosquito genomes is highly similar; (4) ubiquitin genes are more than doubled in the Chagas disease vector (R. prolixus) when compared to other arthropod vectors; (5) the deer tick I. scapularis and the body louse (P. humanus) genomes carry low numbers of E1-activating enzymes and HECT-type E3 ubiquitin ligases; (6) R. prolixus have low numbers of RING-type E3 ubiquitin ligases; and (7) C. quinquefasciatus present elevated numbers of predicted F-box E3 ubiquitin ligases, JAB and UCH deubiquitinases. Taken together, these findings provide novel opportunities to study the interaction between a pathogen and an arthropod vector.


Introduction
Vector-borne diseases are some of the most prevalent infectious illnesses worldwide. According to the World Health Organization, there are approximately 216 million cases of malaria alone, with over 1.2 million estimated deaths [1]. Dengue fever, another prominent vector-borne disease is responsible for 50-100 million cases each year, and it is making one of the fastest growing infectious maladies. An estimated 120 million individuals are affected each year by lymphatic filariasis [2]. Lyme disease, the most common tickborne illness in the northern hemisphere, is responsible for 30,000 clinical cases in the United States alone with the actual number possibly being much higher [3]. Overall, vector-borne diseases are responsible for 16% of disability-adjusted life years [4], and these statistics are compounded by the social costs that also affect heavily-infected communities.
In Drosophila, it is established that developmental regulation by the evolutionarily conserved Notch receptor depends on ubiquitin [8], and some proteins associated with neuronal control and neurodegenerative disorders undergo ubiquitination [9]. Additionally, there is abundant evidence showing that ubiquitination licenses the Toll, the janus kinase (JAK)/signal transducer and activator of transcription (STAT) and the immunodeficiency (IMD) pathways during immune challenge against bacterial, viral and fungal infection [5]. This immunological circuitry is not unique to Drosophila because ubiquitination also regulates these pathways in disease vectors, such as: Anopheles gambiae, Aedes aegypti, Culex quinquefasciatus and Ixodes scapularis [5].
The lack of information regarding the role of ubiquitin in disease vectors is problematic since molecular interactions depend on this posttranslational modification. These biochemical signatures may lead to novel factors to control pathogen transmission and/or acquisition [5]. For instance, drugs targeting the ubiquitin machinery for therapeutic development have entered clinical trials (http:// www.clinicaltrials.gov) and have been approved by the Food and Drug Administration for use (e.g., Bortezomib -Millennium Pharmaceuticals and Nutlin -Roche) [5]. In this study, we used computational biology to identify the ubiquitin machinery of six clinically-relevant arthropod vectors: An. gambiae, Ae. aegypti, C. quinquefasciatus, I. scapularis, Pediculus humanus and Rhodnius prolixus. We compared our findings to datasets available for Saccharomyces cerevisiae, Drosophila melanogaster, Mus musculus and Homo sapiens [10][11][12][13]. While computational biology has been extensively used as a methodology in disease-causing arthropods for comparative genomics [14,15], transcriptomics [14,[16][17][18][19] and quantitative proteomics [14,20,21], it has not been fully utilized to uncover the machinery associated with posttranslational modifications.

Protein identification
Using the selected Pfam domains, the hmmsearch program of the HMMER v3.0 package [27] was employed to identify analogous proteins within the chosen species (Tables S2, S3,  S4, S5, S6, S7, S8, S9, S10). Proteins were assigned to the corresponding Pfam domains if searches returned an E-value <0.5. Importantly, observed discrepancies between our results and findings previously published for S. cerevisiae, D. melanogaster, H. sapiens and M. musculus [10][11][12][13] were determined to result from one or more of the following: (1) changes made in the databases; (2) different programs and search parameters employed across studies; and (3) the high sensitivity of the relatively new HMMER 3.0 algorithm in detecting remotely related sequence similarities.

Computation of multiple alignments
The resulting protein sequences from I. scapularis, An.gambiae, Ae. aegypti, and C. quinquefasciatus were aligned using the Multiple Sequence Alignment tool MUSCLE [23] available on the Phylogeny.fr site [24]. The full mode option was utilized for the MUSCLE program and contained 3 stages: 1) draft progressive alignment, 2) improved progressive alignment, and 3) alignment refinement. Maximum number of iterations was 16, the default option. We did not remove any poorly aligned regions prior to phylogeny estimation because we were primarily interested in the relationship of proteins.

Generation of protein phylogenies
For each aligned dataset, protein phylogenies were estimated with the maximum likelihood method, employing the PhyML program [28] available through Phylogeny.fr. We presented cladograms, as opposed to phylograms to maintain readability of the large numbers of sequences considered in this study. Cladrograms are also sufficient for illustrating the underlying function of related proteins, rather than inferring evolutionary relationships.

Composition of the ubiquitin machinery in model organisms versus disease vectors
The ubiquitin machinery regulates fundamental biological processes within eukaryotic cells ( Figure 1). Thus, we used a bioinformatics approach to characterize this pathway in arthropod disease vectors. Our analysis demonstrated that the genomes of three clinically-relevant mosquitoes carried a similar ubiquitin pathway to total gene ratio (%U), ranging from 2.80 to 2.86 ( Figure 2

Ubiquitin-like proteins
Ubiquitin-like proteins were first discovered in 1979 and named after the interferon-stimulated gene 15 (ISG15) [29].
Since its discovery, several other ubiquitin-like proteins have been identified. In our study, we analyzed the paralogs for several ubiquitin-like proteins (Table 1; Figure 3). Specifically, we used Pfam hmmsearch domains for APG12, ATG8, UFM1 and URM1. For direct classification of ubiquitin-like proteins, we manually analyzed and categorized each protein encoded in the genomes of An. gambiae, Ae. aegypti, C. quinquefasciatus, and I. scapularis ( Figure S1). Ubiquitin-like proteins analyzed were SUMO, NEDD8, URM1, HUB1 and ATG8. prolixus and I. scapularis). The statistics shown are genome size in giga base pairs (Gb) and "all" referring to total number of predicted genes. "Ubi" refers to the total number of characterized (model organisms) or predicted (vectors) ubiquitin machinery proteins encoded within each genome, with "%U" the proportion of ubiquitin-related genes. Proteins were categorized into three major groups: Ubiquitin and ubiquitin-like proteins, E1-E3 enzymes, and de-ubiquitinases. For each genome, a breakdown within each major group depicts the composition of protein families (sub-groups). doi: 10.1371/journal.pone.0078077.g002 One of the most extensively studied ubiquitin-like proteins is SUMO, a polypeptide with around 100 residues and 18% identity to ubiquitin [30]. SUMO modification has been known to regulate a multitude of biological functions. Among the most prominent are protein localization, gene expression, and DNA repair [31]. We observed a total of six proteins under the description of SUMO: An. gambiae (2), Ae. aegypti (2), C. quinquefasciatus (1), and I. scapularis (1) ( Figure S1). NEDD8 is an ubiquitin-like protein with 81 amino acid residues [32]. It is around 60% identical to ubiquitin. Previous reports have identified the ligation of NEDD8 to members of the cullin family protein, suggesting a critical role in cullin-RING E3 ubiquitin ligase (CRL) activation [33]. We observed four proteins categorized under the NEDD8 hmmsearch description for An. gambiae, Ae. aegypti, C. quinquefasciatus and I. scapularisone from each vector ( Figure S1).
URM1 is a ubiquitin-like protein that shares very little homology with ubiquitin [34]. Four proteins were categorized under the URM1 description for An. gambiae, Ae. aegypti, C. quinquefasciatus, and I. scapularis (Table 1). HUB1 differs from ubiquitin in that instead of a C-terminal tail with a GG motif, it carries a double tyrosine (YY) motif [35]. In our analysis, we reported four HUB1 proteins within An. gambiae, Ae. aegypti, C. quinquefasciatus and I. scapularis, with one being identified in each vector ( Figure S1). ATG8 is one of the most extensively studied ubiquitin-like proteins and has been identified as being a key component in the regulation of autophagy [36]. Of the ubiquitin-like proteins examined, the number of proteins similar to ATG8 was ranked the highest: An. gambiae (6), Ae. aegypti (2), C. quinquefasciatus (3), and I. scapularis (3) ( Table 1).
We further analyzed the RING E3s in An. gambiae, Ae. aegypti, C. quinquefasciatus, and I. scapularis by phylogeny estimation (Figure S4, S5). Because of computational limitations and the large number of RING proteins, we created a separate analysis for each of the four vectors. We also identified clusters related to each RING Pfam domain within the phylogeny ( Figure S4, S5).
Cullin. Cullin proteins form a RING E3 in order to create a scaffold that facilitates the ligation of ubiquitin to the target substrate [41]. We calculated the number of proteins associated with Cullin in I. scapularis (5), An.gambiae (6), Ae. aegypti (9), C. quinquefasciatus (4), P. humanus (7) and R. prolixus (8) (Table 1; Figure 4). The number of Cullin proteins is the smallest amongst the subcategories of ubiquitin ligases within each vector. This pattern is consistent with model species: S. cerevisiae (4), D. melanogaster (14), H. sapiens (16), and M. musculus (11) (Table 1). We then compared the Cullin proteins within An. gambiae, Ae. aegypti, C. quinquefasciatus, and I. scapularis against other E3 ligases ( Figure S6). Similar to HECT ligases, most of the Cullin protein clusters were separated in the estimated tree.
U-box Ubiquitin Ligase. In addition to RING, HECT, and Cullin, recent studies have identified proteins that share many similarities with the RING finger, but lack the metal-chelating residue and many signature Cys residues in the original RING domain [42]. This protein family has been classified as the Ubox, although it has previously been identified as E4 [43]. Here, we report the number of U-box ubiquitin ligases in I. scapularis (8), An.gambiae (10), Ae. aegypti (10), C. quinquefasciatus (12), P. humanus (7) and R. prolixus (8) ( Table 1). Much like Cullin, U-box proteins account for only a small ratio of the total E3 ligases in each vector. When comparing U-boxes with other E3 ligases, we observed only two clusters of U-box proteins within the estimated tree ( Figure S6).
F-box. The F-box domain is categorized by a structural motif that contains approximately 50 amino acids [44]. We used hmmsearch to reveal the following numbers of F-box proteins in the various disease vectors: I. scapularis (43), An. gambiae (39), Ae. aegypti (49), C. quinquefasciatus (151), P. humanus (32) and R. prolixus (39) (Table 2; Figure 4). The number of proteins associated with F-box is the second largest component for five out of six arthropod vectors. In model species, we observed a similar increase in the number of F-box and most other ubiquitination components: S. cerevisiae (12), D. melanogaster (46), H. sapiens (111), and M. musculus (104) ( Table 1). We then extracted the sequences of F-box proteins in An. gambiae, Ae. aegypti, C. quinquefasciatus, and I. scapularis and estimated phylogenies for each species ( Figure  S7, S8).
Peptidase_C12, Peptidase_C48, Peptidase_C54 domains. Peptidase_C12 has been linked to the UCH family of DUBs, while the Peptidase_C48 domain is a C-terminal catalytic domain associated with the Ubl-specific protease 1 (ULP1) protease family [51,52]. Peptidase_C54 has not been previously studied, although our analysis has identified it as a domain similar to a DUF related to URM1 proteases [53]. We analyzed the number to DUBs containing each of the three peptidase domains: I. scapularis ( Figure 5). In most analyzed species, the number of Peptidase_C48 DUBs were the highest among the peptidase domain DUBs. The two exceptions were Ae. aegypti, where the number of Peptidase_C12-associated proteins was the highest, and C. quinquefasciatus, which contained one more deubiquitinase with a Peptidase_C54 domain. Each species possessed at least one DUB containing each of the peptidase domains. When analyzing the DUBs, we observed differences in the number of clusters formed for each vector ( Figure S9, S10).

Conclusion
Vector-borne illnesses threaten public health in the tropics. Environmental changes due to globalization and the lack of effective vaccines are also contributing to the spread of these maladies to temperate climates. Thus, novel treatments are necessary in the clinic. One effective therapeutic strategy recently used to treat cancer, neurodegenerative disorders and some infectious diseases is pharmacological intervention targeting (de)ubiquitination. However, the use of (de)ubiquitination as a therapeutic target to combat arthropodborne diseases has not been established. A possible reason for this gap in translational research is the inherent complexity Here we took advantage of publicly available arthropod genomes and described the ubiquitin machinery of An. gambiae, Ae. aegypti, C. quinquefasciatus, I. scapularis, P. humanus and R. prolixus. Although independent work will be necessary to validate our bioinformatics analysis, empirical evidence suggests that the ubiquitin machinery is present in disease vectors. Recently, Severo et al., 2013 characterized a RING-type E3 ubiquitin ligase named x-linked inhibitor of apoptosis protein (XIAP) and showed the importance of ubiquitination for microbial pathogenesis in ticks [54]. Similarly, targeting of genes by RNAi from the ubiquitin-mediated pathway affected bacterial infection in arthropod vectors [55]. Corroborating with these findings, Huang and colleagues demonstrated that monoubiquitinated proteins decorate pathogen-occupied vacuolar membranes during infection of embryonic tick cells [56]. In summary, this report should: (1) provide a framework for studying ubiquitination in disease vectors; (2) generate a basis for empirical experimentation correlating arthropod physiology and disease; and (3) potentially unveil novel pharmacological targets to interfere with vector-borne diseases.