Comparative transcriptome analysis reveals genetic diversity in the endosymbiont Hamiltonella between native and exotic populations of Bemisia tabaci from Brazil

The whitefly, Bemisia tabaci, is a species complex of more than 40 cryptic species and a major agricultural pest. It causes extensive damage to plants mainly by transmitting plant viruses. There is still a lack of genomic data available for the different whitefly species found in Brazil and their bacterial endosymbionts. Understanding the genetic and transcriptomic composition of these insect pests, the viruses they transmit and the microbiota is crucial to sustainable solutions for farmers to control whiteflies. Illumina RNA-Seq was used to obtain the transcriptome of individual whiteflies from 10 different populations from Brazil including Middle East-Asia Minor 1 (MEAM1), Mediterranean (MED) and New World 2 (NW2). Raw reads were assembled using CLC Genomics Workbench and subsequently mapped to reference genomes. We obtained whitefly complete mitochondrial genomes and draft genomes from the facultative bacterial endosymbiont Hamiltonella for further phylogenetic analyses. In addition, nucleotide sequences of the GroEL chaperonin gene from Hamiltonella from different populations were obtained and analysed. There was concordance in the species clustering using the whitefly complete mitogenome and the mtCOI gene tree. On the other hand, the phylogenetic analysis using the 12 ORF’s of Hamiltonella clustered the native species NW2 apart from the exotics MEAM1 and MED. In addition, the amino acid analysis of GroEL chaperonin revealed a deletion only in Hamiltonella infecting NW2 among whiteflies populations analysed which was further confirmed by PCR and Sanger sequencing. The genomic data obtained in this study will aid understanding the functions that Hamiltonella may have in whitefly biology and serve as a reference for further studies regarding whiteflies in Brazil.

Introduction different species (MEAM1, MED and NW2) of the B. tabaci complex followed by phylogenetic analysis. In addition, the diversity of the facultative endosymbiont Hamiltonella was inferred analyzing 12 different ORF's. The GroEL amino acid sequences of Hamiltonella from different B. tabaci species were also analyzed. Our goal was to add further details concerning phylogenetic diversity of mitochondrial genome and Hamiltonella among Brazilian populations of B. tabaci. In addition, a GroEL protein analysis was carried out that may give further insights about the functions that Hamiltonella may have in whitefly biology.

Whitefly sampling
Samples were obtained from pure colonies and straight from the field. Four different B. tabaci populations (153, 154, 156 and 320) were analysed, including two exotics species: Middle East-Asia Minor (MEAM1) and Mediterranean (MED); and a native species: New World 2 (NW2). Populations from colonies were previously identified by sequencing and analysis of the mtCOI gene using the primers C1-J-2195 and TL2-N-3014 [31].

RNA extraction
The RNA extraction of a single individual whitefly was carried out using the ARCTURUS PicoPure kit with modifications [32]. Extracted RNA was subjected to DNase treatment using the TURBO DNA free kit as described by the manufacturer (Ambion Life Technologies CA, USA). Subsequently, the RNA was concentrated using a vacuum centrifuge (Eppendorf, Germany) at 25˚C for one hour. The pellet was resuspended in 18 μl of RNase free water and stored at-80 C waiting further analysis. Integrity of RNA was quantified by 2100 Bio-analyser (Aligent Technologies).

cDNA and Illumina library preparation
Total RNA from each individual whitefly sample was used for cDNA library preparation using the Illumina TruSeq Stranded Total RNA Preparation kit as described by the manufacturer (Illumina, San Diego, CA, USA). Later on, sequencing of 10 samples was carried out using the HiSeq2000 on a rapid run mode generating 2x50 bp paired end reads. Base calling, quality assessment and image analysis were conducted using the HiSeq control software v1.4.8 and Real Time Analysis v1.18.61 at the Macrogen Korea.

Trimming and de novo sequence assembly
The raw transcriptome data was trimmed using the software CLC Genomics Workbench v8.5.1 (CLCGW) with quality scores limit set to 0.01, ambiguous limit set to 2. Trimmed reads were then assembled into contigs using de novo sequence assembly tool in CLCGW. The assembly parameters consisted of mismatch cost (2), insertion cost (3), deletion cost (3), length fraction (0.5), similarity fraction (0.9) and minimum contig length of either 500 bp or 1000 bp.

Obtaining and analysing complete mitochondrial genomes
The whiteflies complete mitochondrial genome was obtained by mapping of the assembled contigs to reference genomes from GenBank using the software Geneious v9.1.3 [33]. For each B. tabaci species of the complex, a different reference mitogenome was used: KU877168 for MEAM1, JQ906700 for MED and AY521259 for NW2. When the mapped contigs did not covering the full length of the mitochondrial genome of the reference sequence, we resorted to mapping trimmed reads to the reference sequence and thus the whole length of the reference was covered. Mapping was performed with the following setting in Geneious software; minimum overlap 10%, minimum overlap identity 80%, allow gaps 10%, fine tuning set to iterate up to 10 times at custom sensitivity. A consensus between the mapped trimmed reads and the reference was used to form new mitochondrial genome. Improvements on the draft mitochondrial genomes were carried out using the software Pilon, a tool for genome assembly improvement [34]. Subsequently, mitochondrial genomes were annotated by MITOS [35]. Other whiteflies mitogenomes sequences were downloaded from GenBank (KJ778614, KX714967, KY951451, KF734668, KR819174, KY951448, JQ906700, KY951447, KU877168, KY951449. KY951450, KY951452 and AY521259) [11,12,14,15,[36][37][38] and added to the analysis. Sequences obtained were aligned using MAFFT v7.309 [39] followed by visualization and analysis in Geneious software. A total of 19 sequences were aligned and analysed.

Analysing Hamiltonella genetic diversity
Assembled contigs as well as trimmed reads from each sample were mapped to a Hamiltonella reference genome (CP016303) to obtain the facultative endosymbiont draft genome. Mapping was performed with the following setting in Geneious software; minimum overlap 10%, minimum overlap identity 80%, allow gaps 10%, fine tuning set to iterate up to 10 times at custom sensitivity. Afterward, draft genomes were aligned to the reference using the whole genome alignment tool LASTZ version 1.02.00 [40] within Geneious v. 9.1.8. Genes with full coverage for all the samples were selected for further Bayesian phylogenetic analysis using the software ExaBayes version 1.4.1 [41].

Chaperonin GroEL gene analysis
Trimmed reads were mapped onto the reference sequence for chaperonin GroEL gene from Hamiltonella of B. tabaci (AF130421). As the mapped contigs did not cover the full length of the coding region of the reference sequence AF130421, we resorted to mapping trimmed reads to the reference sequence to get the full coverage of the whole length from the reference. A consensus between the mapped trimmed reads and the reference was used to form new chaperonin GroEL sequences, open reading frames (ORF) were predicted in Geneious. In addition, other chaperonin GroEL sequence from A. pisum was downloaded from Genbank (CP001277) and added to the analysis. Sequences obtained were aligned using MAFFT v7.309 [39] followed by visualization and analysis in Geneious software. A total of 9 sequences were aligned and analysed. In addition, primers were designed (GroEL 1,354 For-CCTC TGCG TCAG ATTG TGGT and GroEL 1,663 Rev-TCAT ACCA TTCA TTCC GCCC A) for a PCR reaction (95˚C for 5min, 35 cycles at 95˚C for 30 secs, 59.5˚C for 30 secs, 72˚C for 30 secs and 72˚C for 10 min) followed by nucleotide sequencing to confirm the results obtained by NGS.

Bayesian phylogenetic analyses
All the phylogenetic analyses were run on 384 nodes on the Magnus supercomputer (Pawsey Centre, Western Australia). Mitochondrial genome phylogenetic analysis were performed by MrBayes 3.2.2. [42]. Analyses were run for 30 million generations with sampling every 1000 generations. Each analysis consisted of four independent runs, utilizing four coupled Markov chains. The run convergence was assessed by finding the plateau in the likelihood scores (standard deviation of split frequencies < 0.0015). In each of the runs, the first 25% trees were discarded as burn-in and the posterior probability is shown on each node.
In addition, the Hamiltonella phylogenetic analysis was performed on DNA sequences of 12 protein-coding genes for a dataset with 9 taxa using ExaBayes version 1.4.1; [41]. Bayesian analysis was carried out for four independent runs for 1 million generations, with trees sampled every 500 generations. The run convergence was monitored by finding the plateau in the likelihood scores (standard deviation of split frequencies < 0.0015). In each of the runs, the first 25% trees were discarded as burn-in for the estimation of a majority rule consensus topology and posterior probability for each node. Bayesian run files are available from the authors upon request. Trees were visualized, edited and rooted using FigTree v1.4.3.

Results
The sequenced number of reads from the 10 samples ranged from 29,840,288 to 64,080,188 among the ten samples and the number of contigs assembled ranged from 7,730 to 41,165 (Table 1).

Hamiltonella genetic diversity
The diversity of Hamiltonella in B. tabaci populations from Brazil was carried out analysing 12 ORF's from eight single whitefly transcriptomes totalizing an alignment of 6,378bp length. The analysed ORF's were: DNA transformation protein tfoX, porin OmpA, acyl carrier, 50S ribosomal protein L3, 50S ribosomal protein L23, 30S ribosomal protein S17, 50S ribosomal protein L24, 50S ribosomal protein L5, rpsN, nucleotide exchange factor GrpE, porin and DNA-binding protein. The phylogenetic analysis from the sequenced accessions separate into two deeply divergent clades representing the native species from the Americas (NW2) and the exotics (MEAM1 and MED) (Fig 1). Furthermore, the identity percentages among the 12 ORF's from Hamiltonella were obtained (S1 Table).

GroEL chaperonin analysis
The GroEL (Chaperonin 60) gene was obtained from six B. tabaci single whitefly transcriptomes of Brazil. In addition to the data obtained in this study, the analysis also included Hamiltonella GroEL sequences downloaded from GenBank from the pea aphid A. pisum (CP001277) and from B. tabaci MEAM1 (CP016303). Analysis of the translated proteins revealed a three amino acids deletion present only in the B. tabaci NW2 species and in the pea aphid A. pisum (Fig 2). The deletion present in Hamiltonella from NW2 and pea aphid was a sequence of two glycine and one isoleucine. The nucleotide deletion was confirmed by PCR, which amplified an 300bp amplicon, followed by nucleotide sequencing.

Mitochondrial genome
Six complete mitochondrial genomes were obtained from three different species found in Brazil (NW2, MEAM1 and MED). Phylogenetic analysis of the complete mitochondrial genomes separated different species of the complex in distinct clades (Fig 3). Furthermore, the identity percentages among all the mitogenomes from B. tabaci were obtained (S2 Table).

Discussion
In this study we present the phylogenetic relationship of B. tabaci and its association with the facultative endosymbiont Hamiltonella. We show the evolutionary differences between native populations of B. tabaci species of the Americas and invasive B. tabaci species. From our data the genetic differences are not confined to the mtCOI gene, but extend to the rest of the mitogenome and to the facultative endosymbiont Hamiltonella. These findings will contribute to the understanding about the functions that Hamiltonella may have in B. tabaci biology and will aid the correct identification of B. tabaci specimens. Six complete mitochondrial genomes were obtained from three different species of the complex B. tabaci: MEAM1, MED and NW2. Complete mitochondrial genomes comparisons are essential for better understanding the evolutionary genetic relationships between members of the B. tabaci species complex [3,11]. There are 13 complete whitefly mitochondrial genomes characterized and available on GenBank to date [11,12,14,15,[36][37][38] which is a valuable contribution to a better understanding of the taxonomy of this global pest. The addition of six more full mitogenomes from B. tabaci to the global database will contribute for more accurate identification and will serve as references for further full mitochondrial genome phylogenies studies. The phylogenetic analysis of the full mitogenome (Fig 3) conducted in this study separated the species in different clades in a similar pattern compared to phylogenetic trees of partial fragment of the mtCOI gene [1]. This reinforces that using a partial fragment of the mtCOI gene to infer phylogeny in B. tabaci is a reliable way to delimitate the species boundaries and it represents the whole mitochondrial genome.
A multilocus phylogenetic analysis was carried out for the facultative endosymbiont Hamiltonella. Previous studies in Brazil phylogenetically analyzed Hamiltonella based on 16S rRNA gene and found genetically homogeneous populations of Hamiltonella from MEAM1 and MED across Brazil [29]. Another phylogenetic study based on 16S rRNA gene, in Southeast Europe, have grouped Hamiltonella from both B. tabaci and T. vaporariorum in the same clades. Our phylogenetic analysis based on 12 ORF's clustered Hamiltonella from NW2 populations in a different clade from MEAM1 and MED populations (Fig 1) suggesting that native populations are infected with a genetically different Hamiltonella from invasive populations, reinforcing the inexistence of endosymbiont horizontal transmission between invasive and native species.
The differences of Hamiltonella from native and exotic species extends to the GroEL gene. Our analysis revealed an amino acid deletion of two glycine and one isoleucine in the GroEL gene present only in Hamiltonella from native populations (Fig 2). Glycine is a non-essential amino acid and isoleucine is an essential amino acid for Hamiltonella [43]. The absence of these amino acids could be affecting the confirmation of GroEL on populations of Hamiltonella presenting this deletion. Thus, this deletion in Hamiltonella could imply in biological changes in populations of whiteflies harbouring this facultative endosymbiont.
It's known that facultative bacterial endosymbionts are associated with viral transmission [20,21] and different members of the B. tabaci species complex may transmit the same viruses with different efficiencies [44]. The facultative endosymbiont Hamiltonella has been identified as a key driver in the transmission of begomoviruses [24]. Hamiltonella encodes a GroEL chaperonin homologue protein that is crucial in safeguarding begomoviruses in the haemolymph [24]. Several studies have shown the interaction between the begomoviruses Tomato yellow leaf curl virus (TYLCV) coat protein (CP) and GroEL present in the haemolymph of B. tabaci [20,45]. Disturbing the association GroEL-TYLCV in vivo by feeding insects with an antibody raised against Buchnera GroEL leads to the degradation of the virus and to a markedly decrease in transmission efficiency of the virus [24,45,46].
There is still a lack of knowledge regarding the interactions among vector, symbionts and whitefly-transmitted viruses within Brazilian populations. Previous surveys conducted in Brazil have found that field populations of MEAM1, MED and NW2 analyzed frequently harbor Hamiltonella [28,41]. Therefore, it would be important to find out if there is a relationship between the diversity of H. defensa found in the current study and biological features of field populations of B. tabaci. The existence of biological and behavioral differences between the native species New World and the exotic MEAM1 have been reported already [5,47]. The New World B. tabaci are found more often colonizing weeds and wild plants [5]. In addition, previous transmission studies comparing NW2 and MEAM1 species, both harboring Hamiltonella, have found a different transmission efficiency of Brazilian whitefly-transmitted viruses between the species [47]. It was found that the begomovirus Euphorbia yellow mosaic virus (EuYMV) is transmitted more efficiently by NW2 compared to MEAM1 and that the crinivirus, Tomato chlorosis virus (ToCV) and the carlavirus, Cowpea mild mottle virus (CpMMV) are transmitted more efficiently by MEAM1 than NW2 [47]. Interestingly, each of these viruses has a different mode of transmission by the whitefly vector. The begomoviruses, EuYMV is transmitted in a persistent manner, the crinivirus, ToCV is transmitted in a semipersistent manner and the carlavirus, CpMMV is transmitted in a non-persistent mode [48,49], which is a very unusual mode of transmission among B. tabaci transmitted viruses.
The reasons for a difference in transmission efficiencies by NW2 and MEAM1 are still unknown but might be related to several factors related to the host, the viruses or other facultative endosymbiont found in the vector. However, the data found in this study of phylogenetic differences between insect symbiotic Hamiltonella from NW2 and MEAM1 added to the deletion of three amino acids present in the homologue GroEL Chaperonin protein might aid to explain the difference in the transmission efficiencies among native and exotic species present in Brazil if further studies were carried out.
The genomic data obtained in this study from the facultative endosymbionts, Hamiltonella and the complete mitochondrial of Brazilian B. tabaci populations is unprecedented and essential to serve as a reference for further studies regarding whiteflies in Brazil. The phylogenetic and amino acid analysis revealed genetic diversity between Hamiltonella from native and exotic populations that will aid for better understanding about the functions that Hamiltonella may have in whitefly biology.
Supporting information S1