Analysis of the Olive Fruit Fly Bactrocera oleae Transcriptome and Phylogenetic Classification of the Major Detoxification Gene Families

The olive fruit fly Bactrocera oleae has a unique ability to cope with olive flesh, and is the most destructive pest of olives worldwide. Its control has been largely based on the use of chemical insecticides, however, the selection of insecticide resistance against several insecticides has evolved. The study of detoxification mechanisms, which allow the olive fruit fly to defend against insecticides, and/or phytotoxins possibly present in the mesocarp, has been hampered by the lack of genomic information in this species. In the NCBI database less than 1,000 nucleotide sequences have been deposited, with less than 10 detoxification gene homologues in total. We used 454 pyrosequencing to produce, for the first time, a large transcriptome dataset for B. oleae. A total of 482,790 reads were assembled into 14,204 contigs. More than 60% of those contigs (8,630) were larger than 500 base pairs, and almost half of them matched with genes of the order of the Diptera. Analysis of the Gene Ontology (GO) distribution of unique contigs, suggests that, compared to other insects, the assembly is broadly representative for the B. oleae transcriptome. Furthermore, the transcriptome was found to contain 55 P450, 43 GST-, 15 CCE- and 18 ABC transporter-genes. Several of those detoxification genes, may putatively be involved in the ability of the olive fruit fly to deal with xenobiotics, such as plant phytotoxins and insecticides. In summary, our study has generated new data and genomic resources, which will substantially facilitate molecular studies in B. oleae, including elucidation of detoxification mechanisms of xenobiotic, as well as other important aspects of olive fruit fly biology.


Introduction
The olive fruit fly, Bactrocera oleae (Diptera: Tephritidae) is the most important pest of olive orchards worldwide. The fly lays its eggs in the olive fruit and the developing larvae tunnels through the olive, feeding on the fleshy mesocarp, a plant tissue with high content of phenolic compounds and phytotoxins. The infestation severely affects the quality (up to 80%) and value of the olive oil and cause the rejection of entire harvests of table olives which become unsuitable for consumption [1]. In Greece, 30-35% economic losses due to B. oleae have been recorded, and the annual cost for its control exceeds 2 M euros.
The control of B. oleae has been based on use of traps, sterile insect techniques (SIT) and primarily insecticides. An improved genetic method, Release of Insects containing a Dominant Lethal (RIDL), was recently developed [2], however, regulation issues restrict its wide application at present. The intense use of insecticides, as baits or cover sprays, has resulted in the selection of resistance which negatively impacts on our ability to control B. oleae [3]. The elucidation of insecticide resistance mechanisms at the molecular level, in light of the development of tools for sustainable control, has been achieved in some cases in B. oleae and closely related Tephritidae species (reviewed in [3]). Modified acetylcholinesterase (MACE) resistance mechanism has been studied extensively in B. oleae [3], and specific mutations which reduce sensitivity to organophosphate insecticides have been identified and characterized [4,5]. Target site resistance against spinosad was recently elucidated in the oriental fruit fly Bactrocera dorsalis, where truncated transcripts of nicotinic acetylcholine subunit gene Bda6 were strongly implicated in the phenotype [6]. The study of metabolic resistance, i.e. mechanisms that increased rates of insecticide detoxification and thus compromises the effective dosage of the insecticide that reaches the target, has not kept similar pace. Detoxifying enzymes (such as cytochrome P450s (P450s), carboxyl/choline esterases (CCEs), and glutathione Stransferases (GSTs) have been associated with B. oleae insecticide resistance phenotypes [3]; however, the analysis of the mechanism at the molecular level has been hampered by the lack of genomic information and the complexity of detoxification gene families and pathways involved in resistance. These metabolic enzymes might be among the major weapons of B. oleae larvae to cope with phytotoxins and phenolic compounds present in olive flesh, in line with the adaptation mechanisms operating in other insect species [7,8].
A study which investigated the interaction between B. oleae and olive was recently conducted from the plant perspective [9]. It revealed that the olive response to B. oleae larvae, its most damaging biotic stressor, resulted in the induction of genes implicated in the production of Reactive Oxygen Species (ROS), the activation of different stress response pathways and the production of compounds involved in direct defense against phytophagous larvae.
B. oleae is a strictly monophagous species and feeds, in contrast to the majority of fruit flies, on fresh instead of decaying substrates. The mechanism that B. oleae manages to overcome olive plant defense and utilize the fruit flesh has not been elucidated yet. It has been associated with symbiotic bacteria [10], which however only facilitate the development of the larvae in green olives, but not more mature ones.
To date (April 2013), only 802 B. oleae nucleotide and 876 protein sequences have been deposited in the NCBI database (http://www.ncbi.nlm.nih.gov), including less than 10 detoxification gene homologues in total. This data set is clearly not sufficient for the investigation of metabolism-based insecticide resistance mechanisms, or the study of molecular interactions between the olive and the fruit fly B. oleae. Here we report, for the first time, the use of 454-pyrosequencing technology to characterize the B. oleae (pooled stages) transcriptome and the identification and phylogenetic classification of a large number of genes potentially encoding detoxification enzymes.

FLX Titanium Sequencing and Assembly
A library of pooled life stages of B. oleae were sequenced, using 454 pyrosequencing, in a single run on a picotiterplate (PTP). This resulted in 482,790 aligned reads with an average read length of 421 nucleotides and a total of 147,882,767 bases (Table 1). These reads were assembled into 14,204 contigs. More than 60% of the contigs (8,630 contigs) were larger than 500 base pairs (bp), with a total of 8,675,718 bases. The average contig size was 1,005 bp and

Homology Searches
A total of 8,129 sequences of all contigs (65.36%) from the B. oleae transcriptome returned an above cut-off blast hit to the NCBI non-redundant protein database (Table S1). When an e-value cutoff of 1E 23 was used for blastx, 7,368 (59.24%) blast results were obtained, while when the e-value cut-off 1E 210 was used for blastn, 760 (6.12%) additional blast results were obtained. Blast statistics are presented in Figure S1. 83.88% (6,819 sequences) of the top blast hits correspond to Diptera, 5.75% (468 sequences) to other Arthropoda (except Diptera), 7.56% (615 sequences) to Fungi, 0.42% (34 sequences) to Bacteria and 2.37% (193 sequences) to other organisms ( Figure 1). From the B. oleae contigs having their best blast hit with Fungi, 317 encoded for ribosomal RNA, 296 for hypothetical proteins, one for a transposase (contig07917) and one for a mitochondrial protein (contig03600). These 615 ''fungi'' contigs were considered as contamination and excluded from further analysis.
Analyzing the blast statistics more into detail revealed that the majority of B. oleae sequences returned its best blast hit for Drosophila species (5,299 sequences, 65.18%). Out of these species, Drosophila virilis returned the majority of blast hits (751 sequences, 9.23%) followed by Drosophila willistoni (696 sequences, 8.56%) and Drosophila mojavensis (672 sequences, 8.26%). Among other Diptera, 695 (8.54%) sequences had their best hit for Glossina morsitans, 94 (1.15%) for Aedes aegypti and 55 (0.67%) for Culex quinquefasciatus. The distribution is in accordance with that obtained by transcriptome analysis of the closely related Tephritidae species, B. dorsalis [12], where approximately 80% of the genes were most closely related to Drosophila homologues. Notably, only 65 (0.79%) sequences had their best match with B. oleae sequences subjected to NCBI, reflecting the lack of genetic information for this insect species.

Gene Ontology (GO) Analysis
Gene Ontology (GO) terms were used for the functional categorization of the 5,426 predicted B. oleae proteins (38.2% of the total number of contigs). Data are shown in Table S2. In most cases more than one term was mapped to the same predicted protein. 10,098 terms for biological process categories, 3,662 for molecular function categories and 4,234 for cellular component categories were emerged. The sequences were categorized to 12 molecular function, 15 biological process and 7 cellular component categories in GO level 2 (general function categories) ( Figure 2).
The GO classification results are in line with the recently sequenced transcriptomes of B. dorsalis [12,13,14], T. vaporariorum [15] and Musca domestica [16] where binding, cell and metabolic processes were the three largest groups, suggesting that 454 pyrosequencing technology provided a comprehensive representation of the B. oleae transcriptome. Although another Tephritid species, the Mediterranean fruit fly Ceratitis capitata, may also be related to B. oleae, a comparison with that species is not possible this time, as the only published genomic dataset for C. capitata contains a small number of of ESTs derived from adult head and embryos [17], which is of limited use for annotation studies, such as the one conducted here.

Enzyme Classification and KEGG Pathway Analysis
After initial bioinformatic analysis of gene functions the potential enzymes were further characterized based on the chemical reaction they catalyze. This was achieved using the predictions of Enzyme Commission (EC) numbers for each sequence. Enzyme classification revealed that hydrolases are the largest group of B. oleae enzymes (38%, 644 enzymes), followed by transferases (30%, 519 enzymes), oxidoreductases (17%, 286 enzymes), ligases (7%, 129 enzymes), isomerases (4%, 64 enzymes) and lyases (4%, 61 enzymes). ( Figure 3A). The distribution/ relative proportion of each enzyme category is similar to that determined in the white fly T. vaporariorum [15] and B. dorsalis [14] transciptomes, with the possible exception of transferases and hydrolases, which were slightly over-and under-represented respectively ( Figure 3B). The 1,700 sequences having EC numbers were further characterized by Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis. The predicted enzymes are distributed in 122 pathways (Table S3). The best matches of KEGG mapping are presented in Table 2. Interestingly, a large number of contigs were found to be associated with drug metabolism (58 with the drug metabolism-cytochrome P450 pathway; 52 with the drug metabolism-other enzymes pathway), possibly indicating the evolution of a multi-gene system in B. oleae, which might have a role in xenobiotic/phytotoxin detoxification.

Transcripts Encoding Putative P450s
P450s are one of the largest super-families, playing a dominant role in plant-insect interactions and insecticide/xenobiotic metabolism. They are divided in 4 major clades, named CYP2, CYP3, CYP4 and mitochondrial. Eighty-eight P450s have been identified in the genome of Drosophila melanogaster [18,19], while 90 P450s were found in the transcriptome of the closely related B. dorsalis [12,14]. A total of 55 contigs were identified as P450 encoding genes in the B. oleae transcriptome (Table S4), and no allelic variants were found among P450 protein sequences encoded. Five additional P450 sequences (Table S5) were obtained by a degenerate PCR study.
The total number of P450s identified in the B. oleae transcriptome is similar to those found in the genomes of D. melanogaster and B. dorsalis but much lower than those of other dipteran species like e.g. Aedes aegypti and Culex quinquefasciatus having 160 and 170 P450 genes respectively [12,14,19,20,21] (Table 3). However, it is possible that additional P450 genes may await discovery in B. oleae genome. Notably, only one P450 sequence of B. oleae was available in the NCBI database prior to the present study.
Based on phylogenetic analysis with other known insect P450s or the identification of closest blastp hits in the NCBI nr database  Figure 4). Members of the CYP3 and CYP4 clades in other insect species are most commonly involved in environmental response/detoxifying functions against xeniobiotics and phytotoxins in other  [14], M. domestica [16] and T. vaporariorum [15]. doi:10.1371/journal.pone.0066533.g003 insects [19,22], and the over-representation of members of those clades possibly indicates an enhanced defense mechanism present in B. oleae against such compounds. Orthologues of major insecticide detoxification genes in other species, such as the Bemisia tabaci CYP6CM1, the first P450 from an agricultural pest that was demonstrated that is capable to detoxify neonicotinoid insecticides [23], and the D. melanogaster CYP4G1, an insect-specific P450 oxidative decarbonylase for cuticular hydrocarbon biosynthesis [24] were among the list of genes that were identified in this study (Figure 4).
To identify P450 genes of B. oleae undergoing positive selection, and thus possibly playing a role switching from feeding on decaying substrates to fresh ones, a dN/dS (v) analysis in B. oleae/ B. dorsalis ortholog pairs was performed (File S1).

Transcripts Encoding Putative GSTs
The GST super-family has also been involved in the resistance to phytotoxins and insecticides (reviewed [7]). Insect GSTs belong to seven classes, named delta (d), epsilon (e), omega (v), sigma (s), theta (h), zeta (f) and microsomal GSTs. Thirty seven GSTs have been identified in D. melanogaster (reviewed in [25]) while 42 putative GSTs have been identified in the transcriptome of B. dorsalis [14]. A total of 43 contigs encoding GSTs were identified in the B. oleae transcriptome (Table S6). Three genes with amino-acid homology 99% at the protein level and two additional ones with identical sequences at the amino-acid level but several silent SNPs, which might represent putative allelic variants were found among these sequences: contig07254, contig08723, contig09900 are allelic variants of contig07812, contig08608 and contig09901, respectively, and contigs 09162 and 09177 are identical.
Based on phylogenetic analysis with other known insect GSTs or the identification of closest blastp hits in the NCBI nr protein database in the case of misaligning protein sequences, B. oleae GSTs were assigned to the delta, epsilon, omega, sigma, theta zeta and microsomal classes: out of 39 unique GSTs, 8 belong to the delta class, 12 to the epsilon class, 3 to omega, 1 to sigma, 4 to theta, 3 to zeta and 6 to microsomal class ( Figure 5). The remaining 2 GSTs are described as delta/epsilon since they could not be assigned particularly to delta or epsilon GST class. A comparative summary of the cytosolic GSTs identified in B. oleae transcriptome versus those identified in other insect species is presented in Table 4. The number and distribution of cytosolic GSTs within classes in B. oleae is similar to that of other Diptera, such as D. melanogaster (reviewed in [25]) and A. gambiae [26], with the possible exception of epsilon GSTs which are overrepresented in B. oleae ( Table 4). The delta and epsilon GST classes are unique in insect species and seem to be implicated in xenobiotic detoxification [20]. For example, GSTE2 of A. gambiae (Agam_gi12007373 in Figure 5), a glutathione transferase with DDTase activity, is responsible for conferring DDT resistance in Anopheles gambiae [27]. More than half (22 out of 43) of GSTs identified in the transcriptome of B. oleae belong to delta and epsilon classes, which might indicate an enhanced potential for xenobiotic metabolism.
To identify GST encoding genes of B. oleae undergoing positive selection, and thus possibly playing a role switching from feeding on decaying substrates to fresh ones, a dN/dS (v) analysis in B. oleae/B. dorsalis ortholog pairs was performed (File S1).

Transcripts Encoding Putative Carboxylesterases (CCEs)
CCEs have been shown to be involved in the detoxification of insecticides as well as the metabolism of plant derived allelochemicals (reviewed in [7]). The CCEs can be divided into 13 clades [20,36], including acetylcholinesterases (AChE). These clades can in turn be organized into 3 classes, i.e. the dietary detoxification enzymes (clades A-C), the generally secreted enzymes (clades D-G) and the neurodevelopmental CCEs (clades I-M). Thirty-five CCEs have been identified in the genome of D. melanogaster (reviewed in [28]) while recently, 38 putative CCEs have been identified in the transcriptome of B. dorsalis [14]. A total of 15 contigs putatively encoding CCEs were identified in B. oleae transcriptome and no allelic variants were found among these sequences (Table S7).
Based on phylogenetic analysis with other known insect CCEs or the identification of closest blastp hits in the NCBI nr database in the case of misaligning or short CCE protein sequences, CCEs were assigned to respective clades and classes. Representatives of dipteran microsomal a-esterases (C clade), integument esterases (D clade), b-esterases and pheromone esterases (E clade) and glutactins and glutactin-like enzymes (H clade) were found in this dataset. Out of the 15 identified CCEs, 7 belong to the C clade, 2 to D clade, 1 to E clade and 2 to H clade ( Figure 6). The remaining 3 CCEs could not be assigned to any particular CCE clade (Table S7). Comparative analysis (Table 5) with CCEs from other known insect species shows that the number of identified CCEs is considerably less than those from other insects. However, the majority of them (7 out of 15) are assigned to the dietary class, which might indicate a possible association of this gene superfamily with the ability of olive fly to cope with substances present in the olive flesh.
To identify CCE enconding genes of B. oleae undergoing positive selection, and thus possibly playing a role switching from feeding on decaying substrates to fresh ones, a dN/dS (v) analysis in B. oleae/B. dorsalis ortholog pairs was performed (File S1).

Transcripts Encoding Putative ABC Transporters
The ATP-binding cassete (ABC) transporter superfamily is considered to play a major role in the ability of insects to cope with xenobiotics [29]. Fifty six ABC transporters have been identified in D. melanogaster (reviewed in [30]). A total of 18 contigs encoding ABCs were identified in the B. oleae transcriptome (Table S8). No allelic variants were found among those sequences. Based on phylogenetic analysis or, in the case where the nucleotide binding *numbers were derived from [14], [15], [40], [41] and this study. doi:10.1371/journal.pone.0066533.t004 domain (NBD) was missing, based on the closest blastp hits in the NCBI nr database, ABC transporters were assigned to different subfamilies. The number of identified ABC transporters is less than those from other insects [29]. Out of 18 B. oleae ABCs, four belong to the B subfamily of which only two are full transporters (those clustering with hsABCB1) (Figure S2), two to the C subfamily and four to the G subfamily. Interestingly, these families are believed to be the most relevant to xenobiotic detoxification [29] and are about half of the ABC transporters identified in this study. Furthermore, three ABC transporters belong to the A-family, one to the D subfamily, one to the E subfamily and three to the F subfamily ( Figure S3 and S4). A comparative summary of the ABCs identified in B. oleae transcriptome versus those identified in other insect species is presented in Table 6.

Conclusions
We have generated, using 454 pyrosequencing technology, a B. oleae transcriptome dataset containing 14,204 contigs. This dataset represents a very significant expansion of the number of cDNA sequences currently available for B. oleae. Although the function of the majority of the assembled sequences is unknown, it is likely that new ongoing projects will facilitate their future annotation and our efforts to understand their role in the physiology and fundamental biology of the olive fruit fly.
We have identified and phylogenetically classified at least 132 putative major detoxification genes (60 P450s; 39 GSTs; 15 CCEs; 18 ABC transporters) involved in the metabolism of xenobiotics, such as plant phytotoxins and insecticides.
These new data and genomic resources developed in this study for B. oleae will be useful to the community studying this significant crop pest and will substantially facilitate molecular studies of underlying mechanisms involved in insecticide resistance and adaptation of B. oleae larvae in olive fruits as well as other important aspects of olive fruit fly biology. As our understanding of the regulation of detoxification mechanisms increase, new strategies should be devised for the development of more efficient, eco-friendly and species-specific strategies for pest control.

Insect Sample and mRNA Isolation
In order to obtain a large and broad transcriptome data set, RNA was extracted from a pool of different life stages of B. oleae, including mixed lab strains (Vontas et al 2002) and field caught insects (collected from Herakleion, Crete in 2011-2012) fed on artificial diet and olives (equal representation in each stage), with a proportion: 10 eggs; 4 instar larva, 3 pupae; 4 adults (two males and two females, 4 and 20 day old). This pooled sample was snap frozen in liquid nitrogen and used for mRNA isolation, using an mRNA-Only Eukaryotic mRNA isolation kit (Epicente, USA).
The study was carried out on private land of University of Crete and the owner of the land gave permission to conduct the study on this site and collect insects from olive trees present in this land. No specific permissions were required for these locations/activities, since the study only involved collection of a few insects for sequencing analysis, from abandoned olive fruit fly populations widespread in Crete. The field studies did not involve endangered or protected species.
cDNA Library Preparation, Sequencing and Assembly cDNA synthesis and amplification was performed using Mint-Universal cDNA Synthesis kit (Evrogen, Russia) and 1 mg of B. oleae mRNA. About 800 ng of amplified cDNA were used as starting material in the normalization reaction using the Trimmer kit (Evrogen, Russia). Normalized material was re-amplified for 18 cycles and subsequently digested with 10 Units SfiI for 2 hours at 48uC. Fragments larger than 800 bp were isolated from a Low Melting Point agarose gel and purified using the MinElute Gel Extraction kit (Qiagen, Germany). Purified cDNA fragments (200 ng) were ligated into 100 ng of a dephoshorylated pDNR-lib Vector, pre-digested with SfiI (Clontech, USA) using the Fast Ligation kit (New England Biolabs, USA). Ligations were desalted by ethanol precipitation and re-dissolved in 10 ml water. Threefold desalted ligation was used to transform NEB10b competent cells (New England Biolabs, USA).
Roughly a million clones were plated on LB-Cm plates, scrapped off the plates and stored as glycerol stocks at 270uC. One half of the cells was used to inoculate a 300 ml Terrific Broth/Cm culture, which was grown for 5 hours at 30uC. Plasmid DNA was prepared using standard methods (Qiagen, Germany). Purified plasmid DNA (200 mg) was digested with 100 Units SfiI for 2 hours at 48uC. cDNA inserts were gel-purified (LMP-Agarose/MinElute Extraction kit) and ligated to high-molecularweight DNA using a proprietary SfiI-linker. *numbers were derived from [14], [15], [40], [41] and this study. **for full CCE clade names, see legend of Figure 6. doi:10.1371/journal.pone.0066533.t005 Library generation for the 454 FLX sequencing was carried out according to the manufacturer's standard protocols (Roche/ 454 life sciences, USA). In short, the concatenated inserts were sheared randomly by nebulization to fragments ranging in size from 400 to 900 bp. These fragments were end polished and the 454 A and B adaptors that are required for the emulsion PCR and sequencing were added to the ends of the fragments by ligation. The resulting fragment library was sequenced on a picotiterplate (PTP) on the GS FLX using the Roche/454 Titanium chemistry (LCG Genomics, Berlin, Germany). Prior to assembly, the sequence reads were screened for the SfiIlinker that was used for the concatenation, the linker sequences were clipped out of the reads and the clipped reads assembled to individual transcripts using the Roche/454 Newbler software at default settings (454 Life Sciences Corporation, Software Release: 2.6). Raw sequence data and the transcriptome assembly were submitted under the BioProject accession number PRJNA195424 in the NCBI-database.

Degenerate PCR
A degenerate PCR strategy for insect P450s [31] had been initiated prior to the transcriptomic study, to amplify orthologous regions from B. oleae. cDNA synthesis was carried out using the SuperScript III protocol and the PCR products were isolated, subcloned into pGEMT-easy vector (Promega) and sequenced in both directions. Obtained B. oleae P450 partial sequences were deposited in the NCBI-database (GenBank accession numbers: KC917331, KC917335, KC917340, KC917344, KC917345).

Blast Homology Searches and Sequence Annotation
For blast homology searches and sequence annotation Blas-t2GO software v.2.6.0 [32] was used, as previously described for the analysis of Manduca sexta and Trialeurodes vaporariorum transcriptomes [15,33]. For homology searches, all B. oleae contigs larger than 200 bp were searched using blastx against the NCBI non-redundant (nr) protein database, using an e-value cut-off of 1E 23 . The sequences that did not give any blastx hit were subsequently searched using blastn against the NCBI nr nucleotide database using an e-value cut-off of 1E 210 .
For gene ontology (GO) mapping, Blast2GO recovers all the GO terms associated to the hits obtained by the blast search. After the mapping step, results were subjected to GO annotation, a process of selecting GO terms from the GO pool and assigning them to the query sequences. The sequences were further annotated using InterPro. The functionality of InterPro annotation in Blast2GO allows retrieving domain/motif information in a sequence-wide manner. GO terms corresponding to these Interpro-domains, were then transferred to the sequences and merged with already existent GO terms. GO terms were modulated using the annotation augmentation tool ANNEX [34], followed by GOSlim. As described in the Blast2GO tutorial ''the ANNEX approach uses uni-vocal relationships between GO terms from different GO categories to add implicit annotation and GOSlim is a reduced version of the GO that contains a selected number of relevant nodes''. In this study, the generic GOSlim mapping term was used.
Finally, Enzyme classification (EC) codes and the Kyoto Encyclopedia of Genes and Genomes (KEGG) metabolic pathway annotations were obtained through the direct mapping of GO terms to the corresponding enzyme codes.

Analysis of Genes Related to Xenobiotic Detoxification
B. oleae contigs (.200 bp) encoding P450s, GSTs, CCEs and ABC transporters were identified using tblastn (E-value cutoff ,1E 23 ) and Drosophila melanogaster and Bactrocera dorsalis protein sequences of P450S, GSTs, CCEs and ABCs as query (D. melanogaster sequences were downloaded from GenBank and B. dorsalis sequences were taken from [12]). Each contig encoding a detoxification gene was manually curated for frame shifts and sequencing errors. Contigs were translated and trimmed using BioEdit version 7.1.9 and searched by blastx (E-value ,1E 23 ) against all the assembled contigs. Results with more than 99% similarity were considered as allelic variants.
Muscle 3.8.31 [35] was used to perform multiple sequence alignment of B. oleae P450, GST, CCE or ABC protein sequences with a representative dataset of their counterparts in other species (File S2). Unfortunately, not many detoxification genes of the closely related B. dorsalis could be included in our alignment as at the time of writing mainly raw sequence data (Sequence Read Archives) of this species was deposited in the NCBI database [12,13,14].
For each detoxification family, only protein sequences showing no misalignment were used in the final alignment for phylogenetic analysis. Since N-and C-termini of CCEs are variable, all CCE protein sequences were trimmed as previously described in [36]. Only the nucleotide binding domains (NBDs) of the ABC transporters were used for phylogenetic analysis. NBDs were extracted using the ScanProsite facility (http://expasy.org/tools/ scanprosite) in combination with the PROSITE profile PS5089. When an ABC protein sequence contained 2 NBDs, the Nterminus NBD was selected for further analysis.
Model selection was performed with ProtTest 2.4 [37] and the optimum model for phylogenetic analysis was selected according to Akaike information criterion (P450s: LG+G+F, GSTs: LG+I+G, CCEs: LG+I+G, ABC B: LG+I+G, ABC D: LG+G, ABC EF: LG+I+G). A maximum likelihood analysis was performed using Treefinder (version of March 2011, [38]) and a bootstrap analysis with 1000 pseudoreplicates (LR-ELW) was performed to evaluate the branch strength of each tree. The resulting tree was midpoint rooted and edited with MEGA 5.0 software [39].