Characterization of Bacterial Endosymbionts Represented in the D. citri Metagenome Sequence
The goal of the strategy employed here was to maximize detection of endosymbionts in the unassembled metagenome by mapping reads to reference sequences for organisms previously identified by rDNA amplification. In addition to establishing the level of support present in the metagenome sequence for these candidates, this approach further reveals whether coverage levels of any component endosymbionts are sufficient to proceed with draft genome assembly. A total of 17 genomes were selected for endosymbiont identification based on the list of endosymbionts previously identified by rDNA amplification from Florida and Indonesian D. citri isolates, with endosymbionts of B. cockerelli, the psyllid vector of another Ca. Liberibacter pathogen also included (Table 1).
Wolbachia
As shown in Table 1, the template genome exhibiting the highest level of read coverage was that of Wolbachia endosymbiont of Culex quinquefasciatus Pel, with illuminated regions covering over 1.2 Mb or 82% of the reference genome. The titer of Wolbachia strains can vary significantly among insects and isolates [33], [34] and the relatively high incidence of reads from the D. citri metagenome mapping to the reference Wolbachia sequences may reflect a relatively high titer in these samples.
Ca. Carsonella
In contrast to Wolbachia, a relatively low number of reads mapped to the Ca. Carsonella ruddii PV reference genome sequence, producing illuminated regions totaling 5 kb or 3% of the genome (Text S1). Ca. Carsonella is assumed to be present and the reads accounting for the illuminated regions appear specific for Ca. Carsonella as no non-Carsonella nucleotide sequences in Genbank share over 80% identity with these regions. The most likely reason for the low coverage is the previously demonstrated bias of next generation sequencing technologies for regions of DNA with higher GC content [35]. Ca. Carsonella strains have the lowest GC content (14–17%) among sequenced bacterial genomes and successful sequencing by Illumina technology has required alternations to standard protocols [7]. Consistent with this explanation, the small number of regions that were illuminated in the reference genome have higher GC content (24%) than the genome overall.
Enteric endosymbiont
In contrast to Wolbachia and Ca. Carsonella, which have been found in psyllid isolates from diverse sources, the repertoire of other psyllid-associated bacteria identified by rDNA amplification vary depending on psyllid species and geographical origin [11], [12]. To identify those candidates supported by the metagenome sequence from the Florida isolate, sequence reads were mapped to reference genome sequences of bacteria identified from multiple D. citri isolates and B. cockerelli. Among the endosymbionts identified from the Florida D. citri isolate by rDNA sequencing was an enteric bacteria closely related to Klebsiella variicola and Salmonella enterica [12]. Our read mapping supports the presence of an enteric bacterium, with 604 kb and 387 kb cumulatively illuminated in the genomes of Salmonella and Klebsiella, equivalent to 12.6% and 7.1% of their respective genomes (Text S2, S3). While ribosomal DNA sequencing was insufficient to distinguish between Salmonella and Klebsiella, the higher coverage for Salmonella shown here suggests that the enteric bacterium represented in the metagenome is more closely related to Salmonella than Klebsiella. This is further supported by taxonomic analysis of the illuminated regions. 27% of the illuminated regions in Salmonella are specific to that genus with the remainder mapping to regions of the Salmonella genome that are more generally conserved among enteric bacteria. In contrast, only 14% of illuminated regions in Klebsiella are specific to that genus with the remainder being shared with Salmonella. Interestingly, while enteric bacteria have been found in the gut microflora of a variety of insects [36]–[38], Salmonella is less commonly found than other genera such as Klebsiella and Enterobacter.
Endosymbiont candidates with low coverage
Of the remaining bacteria identified by ribosomal DNA amplification, only Acidovorax displayed read coverage exceeding 1% of the genome (Text S4). Taxonomic analysis of the 3% of the Acidovorax genome illuminated during read mapping indicated that a quarter of the regions illuminated were specific to Acidovorax at the sequence identity cutoff used, with 85% being more generally conserved among the Comamonadaceae. Members of the Comamonadaceae have been found in association with diverse insects [36], [39] and the closely related genus Verminephrobacter is known to be a symbiont of earthworms [40].
In contrast, mapping to reference genome sequences for Acinetobacter, Janthinobacterium, and Herbaspirillum yielded illuminated regions of just a few kilobases amounting to less than 1% of these genomes. rDNA amplification from both D. citri and B. cockerelli revealed a sequence having 99% sequence identity to a Staphylococcus isolate, suggesting that bacteria in this genus may be widely distributed among psyllids. However, mapping of the D. citri metagenome reads against four different Staphylococcus species did not yield any illuminated regions at the cutoffs used. Methylibium, Ralstonia, and Bradyrhizobium have also been reported present in the potato psyllid, B. cockerelli, but read mapping did not yield illuminated regions exceeding 1% of the genome. Closer examination of the few kilobases that are illuminated in these cases of exceptionally low coverage indicate that they correspond either to mobile elements such as insertion sequences that are not specific to the genus in question (as in the case of Herbaspirillium) or map to regions more broadly conserved across higher taxonomic levels. For example, regions illuminated in Ralstonia and Methylium are broadly conserved among the Burkholderiales and Comamonadaceae, respectively, corresponding to a subset of the regions illuminated in the Acidovorax genome. While the limited coverage observed for these bacteria does not rule out their presence as shown by rDNA sequencing, these data suggest that the major impact on the biology of the D. citri and Ca. L asiaticus likely derives from Wolbachia, Ca. Carsonella, and the enteric bacterium.
Draft Genome Sequence of the D. citri Wolbachia Strain
Wolbachia are maternally inherited, intracellular, Rickettsia-like bacteria known to infect a wide range of arthropods. Recent surveys indicate that as much as 66% of all insect species may be infected with Wolbachia, making it one of the most ubiquitous endosymbionts described to date [8]. Infections with this agent have been associated with various reproductive abnormalities in the host, including cytoplasmic incompatibility (CI), the most common phenotype in arthropods, whereby the offspring of uninfected females and infected males fail to develop. CI additionally leads to parthenogenesis in wasps, in which infected virgin females produce infected female offspring, and feminization of genetic males in an isopod species [41]–[43]. The ability of Wolbachia to modify the reproductive success of its host enables it to increase in frequency in host populations without the need for horizontal transmission. Introduction of life-shortening Wolbachia strains into mosquitoes has proven an effective strategy for control of the vectored virus causing dengue fever [44], [45].
Read mapping to the wPip genome sequence suggested that coverage for Wolbachia in the metagenome data was of a level sufficient for generation of a draft genome sequence. To more comprehensively isolate Wolbachia-derived reads, the D. citri metagenome sequence data was filtered using the complete genome sequences for Wolbachia strains wMel, wBm, wPip, and wRi. The resulting read set was assembled and the 167 contigs evaluated for overlaps reducing the total scaffold number to 104. The wDi contigs were aligned with closed Wolbachia genome sequences using MAUVE [25] to gain a better picture of gene conservation and synteny. As shown in Figure 2 and Figure S1, wDi contigs exhibited a higher degree of gene synteny with wPip sequence than with wMel or other Wolbachia genome sequences, resulting in selection of wPip as the reference genome for contig ordering. As shown in Table 2, the number of protein coding genes in wDi is very similar to wPip, though the total genome size is somewhat lower, likely owing to the fact that repeat regions are under-represented in assemblies from short-read sequence data.
To assess completeness of the wDi draft genome, annotated genes in Wolbachia strains wPip, wRi, wMel, and wBm were categorized using OrthoMCL (Table 3). A total of 670 core gene clusters were identified for the four genomes using an e value of 10−5. Each of the 670 core clusters is represented in the wDi draft genome annotation, with the exception of a single core group composed entirely of hypothetical genes. Small differences in the numbers of genes assigned to core clusters result from instances where gene products were assigned to more than one cluster.
Genes determined by OrthoMCL to be lineage specific in wPip and wDi were manually curated and those arising from different annotation calls in conserved regions eliminated. Blastp analysis of the remaining 32 lineage specific gene products in wDi and 65 lineage specific gene products in wPip was conducted. All of the unique gene products in wDi were of unknown function, with 11 having homologs in strains wAlbB and wAna which, like wPip, are endosymbionts of mosquito [46], [47]. Of the 65 gene products present in wPip but absent from wDi, 40 are hypothetical and 16 correspond to mobile elements. Those with known function include a predicted glyoxylase and an aminoglycoside phospho-transferase, both associated with antibiotic resistance.
Ankyrin domain proteins
Among the most interesting proteins encoded by Wolbachia strains are those having ankyrin domains, characterized by the presence of tandemly arranged 33-residue long repeats of variable number but sufficiently divergent at the nucleotide level to permit assembly even when sequenced by short read technologies. Typically associated with eukaryotes, ankyrin proteins have been shown to mediate protein-protein interactions [48]. They are secreted by other members of the Anaplasmataceae and interact with host DNA and/or protein [49], [50]; it has been speculated that reproductive manipulation of host by Wolbachia might be achieved through ankyrin binding of host proteins [51], [52].
The number of ankyrin proteins varies among sequenced Wolbachia strains, with as few as five in wBm to as many as 60 in wPip [53], [54]. Annotation of the wDi genome revealed the presence of 54 predicted proteins containing ankyrin repeats (Text S6). Blastp analysis of these against the four closed Wolbachia genomes reveals that four of the predicted wDi ankyrin gene products are common to all of these genomes. Of the remaining 50, 38 exhibit a high level of similarity with those encoded by wPip, 10 and 11 with wMel and wRi, respectively, and two with wBm. Twenty-five of those shared with wPip are also present in the three draft sequences for other mosquito-associated Wolbachia strains from Culex quinquefasciatus (JHB [55], wAlbB [47]), and from C. pipiens molestus, suggesting that the mosquito may be a useful model for understanding psyllid-Wolbachia interactions (Table S1).
Extensive studies attempting to correlated ankyrin protein repertoire and/or expression with reproductive impacts such as cytoplasmic incompatibility suggest a complex relationship involving a network of factors [51], [56], [57]. A homolog of the phage-associated pk2 group of ankyrin proteins which correlates with cytoplasmic incompatibility in Culex [51] and feminization in isopods [52] is present in one of the two wDi phage regions.
That said, there are also significant differences in the ankyrin repertoire between wDi and mosquito-associated strains. Twelve predicted wDi ankyrin proteins diverge significantly from previously characterized Wolbachia ankyrin proteins. Although five cases of apparent divergence likely result from fragmentation due to contig boundaries, seven predicted ankyrin proteins represent candidates for involvement in a psyllid-specific host-endosymbiont interaction. Conversely, 11 of the ankyrin protein encoding genes in wPip do not have close homologs in wDi, including four ankyrin proteins noteworthy for their length and present in two or more of the other mosquito-associated Wolbachia strains: WP0293 (5.9 kb); WP0292 (8.2 kb); WP0407 (7.8 kb); WP0462 (7.9 kb). These four gene products share regions of similarity with one another and with two non-ankyrin proteins conserved in both wDi and the mosquito-associated strains (WP0364 and WP1346 in wPip) indicative of a rapidly evolving family derived in part from the non-ankyrin genes. The presence of WP0364 and WP1346 homologs in wDi suggesting that the wPip and wDi lineages split off prior to the evolution of this family [54].
Type IV secretion
Ankyrin proteins produced by Legionella pneumophila and Coxiella burnetii [58], and by Anaplasma phagocytophilum which is in the same family as Wolbachia [59], are secreted by the type IV secretion system. This has led to speculation that Wolbachia may employ the Type IV secretion system to secrete ankyrin proteins or other effectors involved in manipulation of host biology [60]. A two cluster arrangement of Type IV secretion genes is widely conserved in Wolbachia genomes [60], [61], and appears to be shared by wDi. The arrangement of the type IV secretion genes in wDi aligns with the clusters in wPip and with alignment extending into flanking genes; the only exception being the second copy of virB9 copy which in the wDi draft is on a contig of its own, preventing evaluation of flanking genes (Figure S2).
Nutritional provisioning
Many insect endosymbionts provide a fitness advantage to their hosts through metabolic provisioning and it has been proposed that a nutritional relationship with the host may enhance selection for Wolbachia infection particularly for strains that have successfully invaded host populations in the absence of reproductive manipulation [62]. Kremer et al have demonstrated that Wolbachia can alter iron homeostasis in both hosts for which it is an obligate mutualist as well as in cases of facultative parasitism. They speculate that by reducing iron toxicity in cases of high iron, Wolbachia may provide a selective advantage to its hosts. [63]. The required bacterioferritin gene is present in all sequenced Wolbachia strains including wDi.
In contrast to well-studied insect-endosymbiont systems like that between aphids and Buchnera, there is no evidence for Wolbachia providing its host with essential nutrients such as amino acids. However, given the extent of gene loss, Wolbachia is clearly nutritionally dependent upon its host. Predicted metabolic pathways and transporters have been tallied in both the wMel and wBm Wolbachia strains revealing retention of pathways for glycolysis, pentose phosphate pathway, purine metabolism and catabolism of select amino acids in addition to transporters for diverse substrates including carbohydrates, amino acids, and inorganic cations [53], [64]. Comparison of the fully sequenced genomes and wDi reveals conservation of these remaining metabolic genes and transporter genes among Wolbachia genomes, with variation observed only for three transporters that are limited to the wMel and wRi genomes. Ca. Liberibacter is a reduced genome bacterium dependent on its host plant or insect vector for provisioning of many essential nutrients. Comparison of the predicted metabolic capabilities of wDi and Ca. L. asiaticus reveals several metabolic capabilities present in wDi and absent from Ca. L. asiaticus, including the ability to synthesize thiocysteine, homocysteine, methylmalonyl-CoA and L-erythro-4-hydroxyglutamate from precursor compounds. However, there is no evidence for an evolved symbiotic relationship involving the provisioning of Ca. L. asiaticus with essential nutrients.
DNA repeat analysis
Wide variation has been observed among Wolbachia strains regarding the proportion of the genome comprised of repeated sequences, with strain wRi having the highest (22.1% of the total genome) and others significantly lower (wMel = 14%; wBm = 5.4%). Draft genome sequences derived from short-read next generation technologies typically underestimate the extent of repeated regions owing to the difficulty of assembling non-unique sequences. However, repeat characterization provides a valuable tool for future development of strain-specific diagnostic markers, and analysis with RepeatMasker [65] and RepeatScout [66] succeeded in identifying known and novel repeats in the wACP scaffold including 16 ab-initio repeat families with an average length of 184 bp and comprising 20315 bp or 1.63% of the wACP scaffold. A total of 196 known repeats with an overall length of 9256 bp (0.74%) were identified by RepeatMasker. A majority of known repeats are either small RNA or low complexity regions (Text S7). The annotated draft genome sequence for wDi, including the locations of predicted ankyrin proteins and repeat sequences, can be viewed on the GBrowse genome viewer at http://citrusgreening.org/.
Phylogenetic Characterization of Wolbachia
Genetic differences among populations of D. citri and associated endosymbionts hold potentially important insights into differences in vector behavior and their contribution to geographical variations in the spread and control of citrus greening. For instance, several research groups have shown that the parasitoid wasp Tamarixia radiata, introduced in the New World to control invasive D. citri populations, varies significantly in effectiveness depending on geographical location [67]–[71] and as previously discussed, the complement of endosymbionts in the D. citri metagenome appears to vary in relation to isolate origin [11], [12].
Accumulated phylogenetic analyses indicate that the Florida D. citri isolates cluster with D. citri populations in Southwest Asia, distinct from D. citri populations of in China [72]. Supporting data include analyses of the D. citri CoxI protein sequence [72], as well as comparison of prophage gene sequences from D. citri-derived Ca. Liberibacter asiaticus. Sequence variation in the phage terminase gene between Guangdong and Yunnan strains show they are highly similar or identical suggestive of a common recent origin, while the single Florida strain evaluated showed significantly more divergence [73].
To determine whether Wolbachia phylogeny supports the same pattern, the FtsZ and Wsp gene products of wDi were analyzed. The sequence of the cell division protein FtsZ is routinely used for placement of Wolbachia strains into the established supergroups A–F [74]. Supergroups A and B include Wolbachia spp. from arthropods only, while known members of supergroups C and D are restricted to filarial nematodes. Wolbachia spp. from the Collembolan F. candida represent a divergent lineage, named supergroup E by [75] and supergroup F comprises representatives of filarial nematodes (Mansonella spp.) and the termite Kalotermes flavicollis [76]–[78]. Phylogenetic analysis of the FtsZ sequences from Wolbachia in diverse D. citri isolates clearly places wDi within Wolbachia supergroup B, confirming the previously observed superior alignment of the wDi draft genome to supergroup B strain wPip (Figure 3). The FtsZ phylogenetic tree also supports the hypothesis that Wolbachia strains from the Chinese D. citri isolates fall within a different clade than the Florida isolate characterized here. Distinction between Chinese isolates and the Florida isolate is further supported by phylogenetic analysis of Wsp, an outer membrane protein frequently used for distinguishing relationships among more closely related strains [79] (Figure 4, Table 4). Interestingly, the Wolbachia strain present in B. cockerelli, the psyllid vector of Ca. Liberibacter solanacearum clusters with the four Chinese wDi isolates.
Sequence diversity in D. citri, wDi, and Las underlies variation in the biology of citrus greening disease, including but not limited to observed differences in parasatoid effectiveness. In combination with the availability of primary cell cultures for D. citri-USA [80], genome sequence data for the Florida isolate of D. citri (http://www.sohomoptera.org/), for a Florida isolate of Ca. L. asiaticus [81], and for the Wolbachia endosymbiont described here provides a valuable basis for comparison from which to explore the genetic sources of variation in vector and disease biology for citrus greening disease worldwide.