Whitefly Genome Expression Reveals Host-Symbiont Interaction in Amino Acid Biosynthesis

Background Whitefly (Bemisia tabaci) complex is a serious insect pest of several crop plants worldwide. It comprises several morphologically indistinguishable species, however very little is known about their genetic divergence and biosynthetic pathways. In the present study, we performed transcriptome sequencing of Asia 1 species of B. tabaci complex and analyzed the interaction of host-symbiont genes in amino acid biosynthetic pathways. Methodology/Principal Findings We obtained about 83 million reads using Illumina sequencing that assembled into 72716 unitigs. A total of 21129 unitigs were annotated at stringent parameters. Annotated unitigs were mapped to 52847 gene ontology (GO) terms and 131 Kyoto encyclopedia of genes and genomes (KEGG) pathways. Expression analysis of the genes involved in amino acid biosynthesis pathways revealed the complementation between whitefly and its symbiont partner Candidatus Portiera aleyrodidarum. Most of the non-essential amino acids and intermediates of essential amino acid pathways were supplied by the host insect to its symbiont. The symbiont expressed the pathways for the essential amino acids arginine, threonine and tryptophan and the immediate precursors of valine, leucine, isoleucine and phenyl-alanine. High level expression of the amino acid transporters in the whitefly suggested the molecular mechanisms for the exchange of amino acids between the host and the symbiont. Conclusions/Significance Our study provides a comprehensive transcriptome data for Asia 1 species of B. tabaci complex that focusses light on integration of host and symbiont genes in amino acid biosynthesis pathways.


RNA isolation, sample preparation and Illumina sequencing
Total RNA was isolated from each sample separately using Tri reagent (Sigma, USA), following the manufacturer's protocol. DNA contamination in the RNA preparations was removed by DNA-free Kit (Ambion, USA). The integrity of the RNA samples was analysed on the 2100 Bioanalyzer (Agilent Technologies, USA). Equal amount (5 μg) of total RNA from each sample was pooled, and mRNA was purified once to retain the symbionts transcripts as described [17]. The mRNA was fragmented and cDNA library was prepared using 'Tru Seq RNA sample preparation kit v2 from Illumina following the manufacturer's protocol (http://www.illumina.com/ systems/hiseqsystems/hiseq20001000/kits.ilmn). The library was sequenced, in a Paired End 100 base run, using TruSeq PE Cluster Kit v3-cBot-HS for cluster generation on C-bot and TruSeq SBS Kit v3-HS for sequencing, on the Illumina HiSeq1000 platform following the standard protocols provided by the manufacturer. Illumina sequencing data is available at the NCBI Short Read Archive (SRA) database (accession number: SRR1159208).

Assembly of Illumina sequencing reads
Adapter sequences, low quality reads (reads containing ambiguous sequences 'N') and empty reads were removed from raw sequencing data. The reads were assembled into scaffolds using ABySS (Assembly By Short Sequences) paired end assembly program (http://www.bcgsc.ca/ platform/bioinfo/software/abyss) at k-mer 21 [18], which gave the best assembly result. Scaffolds were further assembled by CAP3 (contig assembly program) at a high stringent parameter of minimum 40 bp overlap with more that 95% sequence similarity [19]. Contigs and singletons obtained after CAP3 assembly were pooled together and used as distinct sequences for further characterization. Assembled sequences had been already deposited as Transcriptome Shotgun Assembly projects at DDBJ/EMBL/GenBank under the accession numbers GAUC00000000 and GAUL00000000. The version described in this paper is the first version.

Functional annotation
For functional annotation, unitigs were used for blastx search against the NCBI non-redundant protein database at E-value 10 -5 . Additionally, we used 27397 protein sequences of Tribolium castaneum (considered as model insect), and 20590 of Acyrthosiphon pisum (closest insect to B. tabaci) for Blastx search of unitigs. Top blast hits data was analysed for percent similarity and E-value distribution. The contribution of different species in annotation was also investigated using the top blast hit data of B. tabaci unitigs against the NCBI nr protein database.
For GO, inter-pro-scan, enzyme code and KEGG pathway analysis, BLAST2GO software was used [20]. Blastx output file (in xml format) was loaded to the BLAST2GO server and used for the above analysis following the standard protocol.

Gene expression analysis
Illumina sequencing data can be used for accurate estimation of gene expression level by mapping the number of reads for a gene. We calculated the same following the reported protocols [21,22]. To calculate the gene expression level, we mapped the number of reads to each unitig. Further, the mapped raw read count was adjusted with the length of unitig to calculate reads per kilobase per million mapped reads (RPKM), because the mapping of reads also depend upon the size of reference sequence and amount of sequencing [21].
To further validate the expression of some selected genes of amino acid biosynthesis and transportation pathways, quantitative real time PCR was performed in triplicates on ABI 7500 Real-Time PCR using the SYBR Green PCR Master Mix (Applied Biosystems, CA, USA) following the standard protocol described [23]. Actin was used as internal reference gene [24,25] and relative expression was calculated. The reaction mixture (10 μl) consisted of cDNA (0.5 μl), specific primers (5 pmole) (S1 Table) and 2X SYBR Green PCR Master Mix (5 μl), and abundance of transcripts was quantified by ▲▲Ct method [23]. Further, the expression of these genes were also analysed in different developmental stages (egg, nymph and adult) of whitefly.

Amino acids biosynthesis and transportation
To analyse the genes involved in the biosynthesis of amino acids, unitig sequences were searched for biosynthetic pathway mapping using BLAST2GO and KOBAS server. Unitig sequences, showing significant match to the enzymes involved in the biosynthesis of amino acids, were extracted and analysed for their origin on the basis of similarity to the insect and the symbiont sequences. The genome and 246 protein sequences reported for the symbiont CPA (gene bank accession number NC_018618.1) were downloaded and subsequently utilized for local nucleotide and protein database development.Unitig sequences were used for blast search against the CPA nucleotide and protein databases. Sequences showing significant similarity with the symbiont were further analysed for their contribution in the amino acid biosynthetic pathways using BLAST2GO and KOBAS server. The contribution of the symbiont and the host in amino acid biosynthesis was analysed on the basis of expression of mapped genes to the insect and the symbiont.
The genes encoding for transporters were also identified and used for expression analysis. The sequences for important amino acid transporters reported in aphid [26] were used as database. Unitig sequences were searched against the transporters database for the identification of similar transporters in whitefly at e-value 10 -6 . Top hit sequences with more than 60% similarity and query coverage were considered as putative homologs. Identified amino acid transporters were also used for expression analysis.

Transcriptome sequencing and functional annotation
The transcriptome sequencing gave a total of 83828866 reads in whitefly, with an average length of 101 bp in a single run on the Illumina sequencing platform. The reads were cleaned by removing the adapter sequences and poly A, T and N sequences. After cleaning 82818787 reads containing 8364697487 bases were assembled into 1324517 scaffolds by ABySS pairwise assembly program [18]. These scaffolds were further assembled by CAP3 program [20]. A total of 34428 contigs and 38288 singletons were pooled together to form 72716 unitigs as distinct sequences of mean size 592 bp for further analysis (Table 1). Length distribution analysis showed 68% unitigs were in the range of 150 to 500 bp. 10823 unitigs with more than 1 kb length were obtained, out of which 201 unitigs were more than 5 kb in length (S1 Fig). These results are in agreement with the earlier report for different species of whitefly [7][8][9]27]. The accuracy of assembly was confirmed by blastn search of randomly selected unitigs against the NCBI-transcriptome shotgun assembly (TSA) sequences of whitefly. All the tested unitigs showed significant similarity to one or more TSA sequences (S2 Fig). A total of 76.9% (44425/57741) of MEAM1, 81.5% (37365/45831) of Asia II3 and 78% (21309/27288) of MED sequences were mapped with e-value 10 -5 , that revealed significant homology between these species.
Blastx search of unitig sequences against the NCBI nr protein database showed significant similarity of 21129 (29%) unitigs with the reported proteins in the database (S2 Table). About 27, 31 and 16% distinct sequences were annotated in case of MEAM1, Asia II3 and MED species, respectively [7][8][9]. Approximately 70% unitigs were annotated with more than 60% similarity ( Fig 1A). E-value distribution of the best hit revealed that~84% sequences were annotated with e-value smaller than 10 -5 ( Fig 1B). Top hit species distribution analysis showed that the maximum number of sequences matched Acyrthosiphon pisum (2954), followed by Tribolium castaneum (1880) and Pediculus humanus (1509) (Fig 2A; S3 Table). High homology with A. pisum was expected due to their contiguous taxonomic position (i.e. Homoptera) and similar feeding behaviour.
Whiteflies are known to harbor a number of primary and secondary symbionts, which help in biosynthesis and defence related functions [28,29]. The NCBI nr top blast hit sequences of the transcripts for the mentioned symbionts in the whitefly transcriptome data obtained in the current study were critically analyzed. The Candidatus with 313 top blast hit was the most abundant bacteria followed by Wolbachia (78) and Clostridium (72) (Fig 2B; S4 Table). CPA is reported as the primary endosymbiont of whitefly and is expected to be involved in amino acid  biosynthesis in whiteflies [13]. Wolbachia and other bacterial species are also reported as symbionts in hemipteran insects [30,31].

GO annotation and KEGG pathway mapping
BLAST2GO server was used to assign GO terms to the whitefly unitigs. GO terms have been classified into three categories-cellular components, molecular functions and biological processes. A total of 52847 GO terms were assigned to the 15335 annotated unitigs on the basis of similarity. Out of the total assigned GO terms, 11403 assigned to cellular components, 19890 to biological processes and 24237 to molecular functions (S5 Table). The sum of the GO terms did not match the number of assigned unitigs because several unitigs were classified into more than one group. The number of sequences assigned to GO terms was higher as compared to that in the MED (7330), MEAM1 (4771) and Asia II3 (4819) species [7][8][9]. This might be due to differences in the amount of sequencing data from each sample. Among cellular components, cell (53%), extracellular matrix (26%) and macromalecular complex (12%) were highly represented ( Fig 3A). In case of biological processes, metabolic (32%), cellular (23%) and single organism processes (9%) were most enriched processes (Fig 3B). In the molecular function category, binding (47%) and catalytic (37%) activity were at the top ( Fig 3C). About similar representation of GO functional groups is reported for the MED, MEAM1 and Asia II3 species [7][8][9].
KEGG pathway mapping was also performed by using BLAST2GO server. A total of 4492 annotated unitigs was mapped for 554 enzyme codes and 131 KEGG pathways (S6 Table). Enzyme classification analysis showed that transferases (34%) account for the highest proportion, followed by oxidoreductases (27%) and hydrolases (17%) (Fig 4A). Pathways related to the purine, pyrimidine, amino acid and sugar metabolism were highly enriched. The energy generating pathways like oxidative phosphorylation, glycolysis and pyruvate metabolism were also significantly represented (Fig 4B). Besides these, several pathways involved in drug and  xenobiotics metabolism, especially related to the cytochrome P450 were significantly present in the Asia 1, as reported in MEAM 1 and MED species [7,27].

Gene expression analysis
The level of gene expression was analyzed by calculating the reads per kilobase per million mapped reads (RPKM) [32]. Summary of the top 100 most expressing transcripts is given in S7 Table. The genes involved in development (vitellogenin), ribosomes (large and small subunits), translation (initiation and elongation factors), cell structure (tubulin) and energy generation (NADH dehydrogenase, malate dehydrogenase) were highly expressed (Fig 5). Similar results are reported for Asia II3 species of B. tabaci [12]. Since these genes play pivoltal roles in survival and fecundity (vitellogenin), high level of their expression is understandable.

Amino acid biosynthesis and transporter genes in B. tabaci
Animals generally lack the genes for the synthesis of essential amino acids, and mostly obtain them from food. Sap sucking insects take their food from phloem sap of the host plant, which contains very little quantity of essential amino acids [10,[33][34]. Therefore, these insects often depend on microbial symbionts for the synthesis of these amino acids [35][36]. In return, the host provides nonessential amino acids to the symbiont [12]. Symbiosis of A. pisum and B. aphidicola is the best reported model [36][37]. The CPA is reported as a symbiont for whitefly and expected to involve in the biosynthesis of essential amino acids [13]. Genome sequence analysis of CPA clearly indicates that most of the genes responsible for the synthesis of essential amino acids were present in CPA. However, CPA lacks the genes for non-essential amino acid biosynthesis [13]. Since CPA is an endosymbiont, host insect is the only source for the supply of these amino acids.
We analysed the interdependence of the whitefly and its symbiont for amino acid biosynthetic pathways by mapping of unitigs for their origin, function and expression (Table 2 and S2, S6, S8 and S9 Tables). Unitigs mapped to the amino acid biosynthetic pathways were analysed for their origin on the basis of similarity with the reported sequences. Unitigs of the symbiont were identified by blastn search against the CPA genome sequence with more than 90% similarity (S9 Table). However, unitigs showing similarity with whiteflies or other insect's sequences were considered as the hosts' transcripts. Most of the enzymes involved in the biosynthesis of non-essential amino acids (glutamate, glutamine, aspartate, asparagines, glycines, serine, alanine, cysteine, tyrosine and proline) were encoded by the transcripts of whitefly with reasonably higher expression ( Table 2, Fig 6), which is probably sufficient for meeting the requirement of the symbiont also. The result was in agreement with the earlier reports that the CPA lacks the genes responsible for the synthesis of non-essential amino acids [13]. In case of essential amino acid biosynthetic pathways, we observed that the majority of the enzymes was encoded by the symbiont. Since, whitefly lacks the genes for the synthesis of essential amino acids (except threonine and methionine) and phloem sap is also very poor source of these amino acids. The CPA remains the only source to supply these amino acids to whiteflies as B. aphidicola supplies to aphids [36][37]. We observed that the enzymes responsible for the synthesis of tryptophan and threonine were encoded by the symbiont. Most of the genes involved in the biosynthesis of methionine, lysine, arginine and histidine expressed in the symbiont. However, the complete biosynthetic pathways for these amino acids could not be established from the available transcriptome and genome data [13]. Further, genes involved in the threonine and methionine biosynthesis were detected from the host, also. Since the symbiont is localized in a limited number of cells, the expression value of the genes between whitefly and CPA cannot be compared. Yet, significant expression of most of the genes involved in the synthesis of essential amino acids was noticed in the CPA; which were absent in the whitefly ( Table 2). In case of amino acids phenyl-alanine, valine, leucine and isoleucine biosynthesis, CPA lacks the genes encoding enzymes (EC: 2.6.1.1 and 2.6.1.42) for the last step, as also reported earlier in the case of aphid symbionts [11]. However, these genes were available in    Fig 6). The results suggested the transport of immediate precursor molecules for the synthesis of these amino acids from symbiont to insect. Our results suggested integration of host whitefly and symbiont CPA genes in amino acid biosynthesis. Both, host and symbiont encoded genes filled the gap in amino acid biosynthetic pathways, and compensate the requirement for each other by synthesizing significant quantity of amino acids as evidenced by high expression of genes. Some non-essential amino acids (like glutamine, glutamate, aspartate and serine) are used as donors of amino groups in the biosynthesis of essential amino acids. Glutamate and glutamine are readily available as a substrate in amino acid and nucleic acid biosynthesis [38]. They act as direct amino group donors in arginine biosynthesis in CPA. Aspartate provides amino group in lysine and threonine biosynthesis [39]. Glutamine and asparagine are reported in high concentration in phloem sap [10], which might also affects the concentration of these amino acids in hemolymph of the phloem feeding insects. Genes involved in the synthesis of these amino acids and for aminotransferase (2.6.1.42) were identified to express at a high level in whitefly (Fig 6; Table 2). The results indicated that the pool of amino group donors for the synthesis of essential amino acids is available in sufficient quantity to compensate the requirement.
The expression of selected genes ( Table 2) involved in amino acid biosynthesis was also analysed in different developmental stages of whitefly by quantitative real time PCR analysis in three biological replicates. Significant expression of all the genes was observed in the nymph and adult stages of whitefly, suggesting their production in both the developmental stages ( Fig  6). The relative expression pattern was almost similar as calculated from the transcriptome data (Fig 6). However, the expression of genes involved in the essential amino acid biosysthesis (particularly the genes encoded by symbionts) was not detected in the egg (Fig 6). This might be due to either very confined availability of the symbiont and their transcripts in egg; or insect provided enough nutrient to eggs for their development. On the basis of mapping and expression of the symbiont and insect related genes of the amino acid biosynthetic pathways, we predicted a distinct complementing contribution of each partner. A hypothetical model for meeting amino acid needs of the whitefly and its symbiont is given in  Phloem sap is highly deficient in available nitrogen, due to very high carbon/nitrogen ratio. However, nitrogen is important for the growth of insects. Therefore, sap sucking insects developed machinery for the assimilation of nitrogen from ammonia waste. The machinery is well studied in aphid [11], but not reported in whitefly. Genes encoding enzymes glutamate synthase (GltS) and glutamine synthetase (GS) were also detected in whitefly with significant expression ( Table 2). The results suggested that whitefly also has the capability to assimilate the ammonia into amino acids as reported in case of aphid [11]. These enzymes are reported to form a shuttle (GOGAT cycle) in plants and microbes for the assimilation of ammonia nitrogen into glutamate [40], which act as substrate for most of the amino acid pathways. GS/GltS is reported as a leading medium for ammonia assimilation in the presence of high glucose concentration [41]. The condition applies to the phloem feeding insects like whitefly and aphids, which feed plenty of sugar and have to maintain low ammonia concentration in cells due to its toxicity. In insects, the GS / GltS cycle is characterized in aphids, mosquitoes and silkworms, where it is associated with ammonia detoxification and nitrogen assimilation in different tissue [11,[42][43]. However, the contribution of this cycle in quantitative terms of amino-nitrogen supply is yet not determined.
Transporters play significant role in facilitating cooperation between the host and the symbiont [26]. A total of 628 unitigs were annotated as transporter proteins (S10 Table), in which 71 were the amino acid transporters (S11 Table), which might be involved in transportation of different metabolites and substrate related to the amino acids across the host-symbiont boundary. A total of 34 transporters were considered as high expressing (RPKM>50), which include amino acid transporter, proton-coupled amino acid, sodium nucleoside, sugar, trehalose, organic cation, monocarboxylate and ABC transporters. Several types of amino acid transporters like proton-coupled, neutral and basic amino acid, cationic amino acid and sodium-dependent neutral amino acid were detected with high levels of expression. Besides these, the expression of several specific transporters like glutamate, tyrosine transporter and vacuolar amino acid transporter were also detected. Similar transporters are highly expressing in case of aphids [11,44]. Cationic amino acid transporters involve in arginine, lysine and histidine transport, however, branched-chain amino acid transporters facilitate the transport of phenyl-alanine, valine, leucine and iso-leucine [45]. The excitatory amino acid transporter plays significant role in glutamate transportation (http://www.sdbonline.org/sites/fly/genebrief/eaat1.htm).
Forty putative amino acid transporters are reported recently in A. pisum; five of them (ACYPI000536, ACYPI000550, ACYPI001018, ACYPI008904 and ACYPI008971) are reported as more imperative in host-symbiont interaction [26,45]. Among these, two (ACYPI001018 and ACYPI008904) genes are involved in glutamine transport. We also identified putative homologs of these five transporters in whitefly with significantly higher expression (Table 3). The expression of these transporters was further confirmed by quantitative real time PCR in triplicates (Fig 6). The results suggested that whitefly expresses the amino acid biosynthesis pathways and the transporters for sharing these across the host symbiont boundary. Since glutamine is synthesized in the insect and is an important source of amino group in the biosynthesis of several essential amino acids in the symbiont; its transport in significant quantity across the host-symbiont boundary is reported [11]. Two (ACYPI001018 and ACYPI008904) transporters are reported to be actively involved in glutamine transport in aphids, in which ACYPI001018 is the dominant transporter followed by ACYPI008904 [26]. Availability of genes encoding these transporters with high expression indicated similar mechanism of cooperation between whitefly and CPA as reported for aphids and Buchnera [26]. However, the exact role of these transporters in whitefly needs to be validated in future studies by following either recombinant expression or silencing or both the approaches.

Conclusions
Comprehensive analysis of the transcriptome data of the whitefly has provided important information regarding the mechanism of cooperation between the host and its symbiont by sharing distinct amino acid pools. Whitefly feeds on phloem sap which contains low levels of essential amino acid and lacks its own gene for their synthesis. Whitefly contains genes for the synthesis of non-essential amino acids and expresses those at significantly high level. On the other hand, the symbiont CPA produces essential amino acids by sharing some metabolites with the host insect, but lacks the genes for non-essential amino acids. These results along with the significant expression of several amino acid transporters suggest various levels of cooperation between the host and the symbiont. Future studies at the protein and biochemical level will define subsequent details of interdependence between the two partners in the symbiosis.