Accidental Genetic Engineers: Horizontal Sequence Transfer from Parasitoid Wasps to Their Lepidopteran Hosts

We show here that 105 regions in two Lepidoptera genomes appear to derive from horizontally transferred wasp DNA. We experimentally verified the presence of two of these sequences in a diverse set of silkworm (Bombyx mori) genomes. We hypothesize that these horizontal transfers are made possible by the unusual strategy many parasitoid wasps employ of injecting hosts with endosymbiotic polydnaviruses to minimize the host's defense response. Because these virus-like particles deliver wasp DNA to the cells of the host, there has been much interest in whether genetic information can be permanently transferred from the wasp to the host. Two transferred sequences code for a BEN domain, known to be associated with polydnaviruses and transcriptional regulation. These findings represent the first documented cases of horizontal transfer of genes between two organisms by a polydnavirus. This presents an interesting evolutionary paradigm in which host species can acquire new sequences from parasitoid wasps that attack them. Hymenoptera and Lepidoptera diverged ∼300 MYA, making this type of event a source of novel sequences for recipient species. Unlike many other cases of horizontal transfer between two eukaryote species, these sequence transfers can be explained without the need to invoke the sequences ‘hitchhiking’ on a third organism (e.g. retrovirus) capable of independent reproduction. The cellular machinery necessary for the transfer is contained entirely in the wasp genome. The work presented here is the first such discovery of what is likely to be a broader phenomenon among species affected by these wasps.


Introduction
The transfer of genetic information between different species, horizontal gene transfer (HGT), is an unexpected and transformational discovery resulting from the expansion of whole genome sequence data. Examples of HGT violate the traditional idea that shared sequences among organisms come from a shared ancestral species. HGT can be a source of novel sequences that organisms would otherwise not have access to [1] such as a fungal carotenoid gene that gave pea aphids a bright orange coloration [2] or antifreeze genes transferred between very distant fish species [3]. Not all horizontal transfers are clearly adaptive. There are a number of cases where transposons have been transferred between parasite species and their host species [4,5,6,7,8].
Nevertheless, HGT is rarely reported among multicellular eukaryotes, presumably due to a combination of the poor efficiency of foreign sequences entering the germline and the difficulties of detecting HGT events. Viruses, however, can enter cells and facilitate HGT. Many fragments of virus genomes have been found in eukaryotes [9], and these endogenous viral elements provide a rich source of information on the evolution and history of viruses. Here we present horizontally transferred sequences (HTS) from polydnaviruses which were created by a process similar to the process by which endogenous viral elements are formed. A key difference between EVEs and polydnaviruses is that instead of transferring viral sequences, PDVs transfer genetic material from specific portions of the wasp genome.
Polydnaviruses (PDVs) are an unusual group of virus-like particles named after their dsDNA genomes that exist in many circular fragments [10]. PDVs are utilized alongside or instead of venom to aid in parasitoid reproduction by weakening the secondary host's immune system so that the parasitoid eggs can develop. Unlike typical viruses, PDVs are incapable of selfreplication and instead are endosymbiotic viruses of two clades of parasitic wasps, Braconids (bracoviruses) and Ichneumonids (ichnoviruses). These two wasp clades have stably integrated viral sequences into their genomes and inherit them vertically [11,12,13,14]. Viral particle production occurs exclusively in the ovaries of these parasitoid wasps [15,16].
Specific portions of the wasp's genome are packaged [12] into the virus-like particles (see Figure 1), but genes encoding viral functions are not included, leaving PDV particles without the molecular machinery to replicate in the secondary host. An important evolutionary consequence of the mechanism of PDV production is that the propagation of PDVs depends exclusively on survival of the wasp eggs growing in the secondary host. As a    result, suppression of secondary host immunity is crucial to the survival of the wasp/virus mutualism. PDV genomes encode a number of mechanisms to suppress secondary host immunity, such as controlling gene expression with modified histones [17], interfering with the haemocyte cytoskeleton [18], and inhibiting phagocytosis [19]. In addition to suppressing the host immune system, PDV genes can also function to shift resources from secondary host larval development to growing wasp larvae [20]. A number of cystatins in Cotesia bracoviruses are thought to suppress the development and immune response of the secondary host [21]. These cystatin genes have been under recent positive selection, consistent with classical host/parasite evolution [22].
Despite the many mechanisms PDVs employ to attack the host, these parasitoid attacks vary widely in lethality depending on their timing. Ultimately, a secondary host must survive a PDV attack and reproduce for a horizontally transferred sequence to be seen in the population. In one study, infection by a bracovirus quickly after the start of the fifth instar (the final stage of development before metamorphosis in the species) resulted in a 100% lethality rate, while infection 60 hours post-ecdysis (the shedding of the old cuticle) had 0% lethality and generally had little effect on the larvae [23]. These experiments illustrate the possibility of secondary host survival with a foreign sequence after PDV infection.
PDVs have now been sequenced from several wasps, [24,25] including many related braconid wasps in the Cotesia family [26,27,28]. Analysis of Cotesia relatives reveals that many of the genes packaged into PDVs are part of large gene families. These genes are eukaryotic in origin and some of these gene families only appear to be found in wasps [28]. In addition to positive selection acting on coding sequences, these gene families also appear to be subject to birth-death evolution typical of large gene families under strong or quickly changing selection [29].
Bracoviruses and ichnoviruses arose independently from different viruses early in the evolution of Braconid and Ichneumonid wasps [30]. Braconid wasps originated around 70-100 MYA [31,32,33] when a nudivirus stably integrated into the germ line of an ancestral braconid wasp [34,35]. The origin of ichnoviruses is less clear. Reflecting their different origins, the two groups have different viral gene content and structure. Ichnoviruses package a suite of dsDNA circles in each virion particle, while bracoviruses only include one (of many) circles per virion particle [36]. This means different bracovirus particles from the same wasp have different contents (see Figure 1) and that it is possible for a wasp to inject more of one dsDNA circle than another [37].
The fact that PDVs routinely transfer wasp DNA to the secondary host has prompted researchers to investigate how long this DNA persists and whether it can permanently integrate into the secondary host [38,39,40]. Bracovirus DNA circles from the wasp Cotesia congregata were shown to persist for at least 6 days throughout the living body of the secondary host, Manduca sexta [41]. Additionally, Lepidopteran cell cultures infected with PDVs show stable integration into chromosomes and expression of both bracoviruses [42] and ichnoviruses [43]. Interestingly, both studies found biased integration favoring some PDV segments over others. What has been Table 1. Cont. absent until now is a demonstration that this type of integration can occur outside of laboratory cell cultures, in living organisms.

Results
The initial screen of this study was an extensive bioinformatic search for polydnavirus sequences in eukaryotic genomes. In our initial screen there were 18,797 hits in total before any filters were applied. Within these, 99.44% of our tblastx matches were found in invertebrates even though the invertebrate genomes were only 65/165 (39.4%) of the genome assemblies queried (and an even smaller fraction of the sequence space searched). In the remaining 0.56% of tblastx matches there were 101 vertebrate matches and 11 plant matches. The invertebrate matches had an average tblastx score of 116.4 (e-values 1E-06 to 1E-200) while the noninvertebrates had an average match score of 58.3(e-values 3E-07 to 5E-024). As a result of being poor matches, 88.4% of the noninvertebrate matches were removed by the initial length (67 amino acids) and tblastx score (minimum 70) thresholds alone, with the rest being removed by later filters. In contrast, 31.3% of invertebrate hits were filtered out by those thresholds.
After applying a series of quality filters (see methods), our bioinformatic search revealed many candidate PDV-derived horizontally transferred sequences (HTS) in two Lepidopteran species (Table 1). This is consistent with the natural host range of these wasps, which is heavily enriched for targeting Lepidoptera larvae. These sequence similarities violate the known insect phylogeny and suggest horizontal gene transfer.
To test whether these matches could be explained by unknown insect genes giving a false signal resembling horizontal gene transfer, several analyses were performed. The candidate HTS Figure 1. Illustration of bracovirus production and integration. In part 1 specific portions of the wasp genome are targeted, replicated, and circularized into bracovirus DNA circles. These circles are produced in the wasp ovary then packaged in a viral capsid and lipid membrane. In Part 2 the bracoviruses are injected into the secondary host. In part 3 the bracovirus enters the secondary host cell, shedding the capsid. This typically occurs in haemocytes, with the proteins produced by the transferred DNA destroying or disabling the haemocyte's ability to defend the secondary host against wasp larvae. Occasionally, bracovirus DNA will integrate into the secondary host nuclear DNA as shown in part 4. If steps 3 and 4 occur in a germ line cell the bracovirus DNA may be passed on to the secondary host's offspring. Subsequent genetic drift or selection can result in the HTS becoming fixed in the population. doi:10.1371/journal.pone.0109446.g001 were used as queries in a search against a combined database with the same 165 eukaryotic genomes, the 387 PDV sequences, and NCBI's non-redundant viral sequence list(see methods). In all cases the candidates matched more closely to PDVs. The same search was performed by web-BLAST against NCBI's nonredundant database and all the candidates matched best to PDVs.
In an additional analysis, each of the remaining HTS candidates were searched against the set of 165 eukaryotic genomes with the goal of finding other sequences related to the HTS. In every case, no new matches were found. This indicates that the only sequences with significant sequence similarity to the HTS candidates in the two Lepidopterans are wasp PDV sequences. Thus, our reported 105 HTS are each only found in two places: the Lepidopteran genome assembly (either Bombyx mori or Danaus plexippus) and the PDV assembly from a braconid wasp.
This distant distribution is quite unusual. Sequences with a high relatedness between Lepidoptera and Hymenoptera typically come from highly conserved genes with a broad distribution among insects. In an additional analysis, we identified 2,631genes from the wasp Nasonia vitripennis with homology to genes in Bombyx mori. 300 of these were selected at random to perform a phylogenetic analysis examining how frequently these genes could be found in other insect species. We found that these random Nasonia genes are present in 96-99% of insect assemblies ( Figure 2). This is dramatically different from the sequences examined in this manuscript, which are only found in a single species (plus the donor wasp). We also measured the degree of nucleotide conservation for each gene between the wasp copy and the copy in the other insect. The tblastx match regions between Bombyx and Nasonia for these 2,631 genes had median length 537 nucleotides (range 183-2745) and median 71.3% nucleotide identity (range 58.5-87.3). We found a lower degree of conservation amongst these randomly selected wasp genes (median 71.3%) than in our HTS candidates (median 86.95%). The 105 HTS candidates were found to have significantly different percent identities than the 2,631 conserved Nasonia genes found in Bombyx using the Mann-Whitney u-test (p,2.3E-09). All 300 genes had a broader phylogenetic distribution than any of the PDV matches. The combination of the candidate HTS only being found in one very distant species and having a higher nucleotide identity than conserved genes is indicative of these candidate sequences being horizontally transferred. None of the HTS found in Bombyx mori had any homology to HTS in Danaus plexippus and vice versa, indicating separate integration events.
The HTS were sorted into homology groups, with a member being added to a group if it had 90% similarity over 90% of its length to another member of the group (HTS are listed with groups in Table 1). Some of these HTS exist as a single copy while others appear to be part of such groups (Figure 3). Homology groups containing more than one sequence were aligned by DIALIGN [44] and placed onto trees using PHYML [45] ( Figure  S1). Those that appear together in a homology group are the result of an unknown mix of two factors: repeated integration of the same PDV locus (circle) into the secondary host and subsequent duplication of the sequences in the secondary host. Many repeated integrations of the same sequence are expected from previous cell culture studies [42,43]. It is possible that there are mis-assemblies in the genome sequences that affect the apparent copy number of these sequences. This could increase or decrease the copy number by splitting alleles into paralogs or vice versa. For these reasons it is Figure 2. Representation of highly conserved sequences in insect genomes. Phylogenetic tree generated from sequence data using PHYML. Coding regions from Nasonia vittripennis were used as tblastx queries against the Bombyx mori genome assembly to identify highly conserved regions. 300 conserved regions were selected at random and searched for in a broad range of insect genome assemblies using tblastx. In each insect species, 290-299 of these highly conserved regions were found (a rate of 96-99%). Missing sequences are an unknown combination of genomic deletions and incomplete assemblies. With a divergence time of ,300 MYA between Bombyx and Nasonia, there has been sufficient time for genomic deletions to occur. Branch lengths are based on amino acid divergence between species. doi:10.1371/journal.pone.0109446.g002 difficult to know exactly how many copies exist in the destination genomes, though it must be at least one copy to show up in the assembly. Because the destination genome assemblies do not have chromosome information, only small contigs, there is no information regarding whether the transferred sequences preferentially integrate into certain portions of the destination genome.
Three PDV queries (NC_006658.1, HQ009558.1, EF067323.1) were found to contain significant matches to transposable elements (a gypsy, helitron, and Jockey respectively). Hits matching these sequences were discarded for clarity and focus in this manuscript. However, it is worth noting that if the central hypothesis of this paper is correct (that PDVs will tend to transfer genomic sequences from parasitoid wasps to their host species) it would predict that transposable elements like these would frequently make the transfer from wasp to host along with their surrounding genomic DNA. If a transposable element is targeted for packaging into a PDV, it will be among a small portion of the genome that is heavily enriched, copied in large numbers, and transferred to the host species making PDVs a strong vector for spreading of transposable elements. Once transferred to the host species, these could expand in number and spread throughout the new genome.
To gain insight into the function of ORFs in candidate HTS, we searched the translated protein sequence of the HTS against the PfamA and PfamB databases (http://pfam.sanger.ac.uk/search) using a highly permissive e-value (1e-03). Two sequences (PDV94 and PDV96) were found to contain the BEN domain, which is known to be found in PDVs and is associated with growth arrest and transcriptional regulation. PDV94 also matches to the protein family Pfam-B_3333 (function unknown). The function of ORFs in other candidate HTS is unknown.
Based on previous work, we expect that PDV loci transferred to secondary hosts were under recurrent positive selection before being transferred [25,46]. To test for positive selection in coding regions, the HTS were aligned by codon to the matching PDV sequence and then analyzed for synonymous and non-synonymous changes by maximum likelihood using PAML [47]. The results indicate that many of the sequences have a dn/ds.1, suggesting that they have undergone recent positive selection (Table 1). A dn/ds ,1 generally indicates that a sequence is conserved, though it does not rule out the possibility of positive selection on a small number of residues in the protein. It is possible that the positive selection we observed occurred prior to transfer [25,46], however it is also possible that some of these sequences were co-opted and selected upon after transfer to the secondary host as well.
To rule out the possibility of an apparent HTS being the result of genome assembly error, the presence of HTS were tested by PCR directly from Bombyx genomic DNA with DNA from other insects used as a control (Figure 4). Two representative HTS (PDV32 and PDV101) from Bombyx mori were arbitrarily selected: one larger sequence that appears a single time and one smaller sequence that appears many times in the assembly. Both HTS successfully amplified from all Bombyx strains tested and did not amplify in control species, confirming that these sequences are not artifacts of the Bombyx sequencing project. Additional PCR amplifications were performed on the same two HTS with alternate primers and similar results obtained and PCR amplifications of HTS100 and HTS99 were performed, amplifying in all strains tested (see Figure S2).

Discussion
While previous work has shown a strong trend for HGT between parasitic species and host species [4,5,6,7,8], the findings presented here are the first known instances of HGT between a parasitoid species and host species. Most parasites typically do not kill the host they feed upon, making them a passive vector for horizontal transfer if there happens to be a relevant infection by a vector species (e.g. retrovirus). Unlike parasites, successful parasitoids ultimately kill their host, meaning the secondary host must also defeat the parasitoid infection and survive to reproduce for HGT to occur. Parasitoid/PDV attacks vary widely in lethality. Attacks earlier in development can have a 100% lethality rate while attacks as little as 60 hours later can have 0% lethality and few effects [23]. Based on this, it seems likely that the ancestral infections discussed in this manuscript occurred in larvae that were at a later stage of development. This window of time in which PDV infections can occur but have little effect on the host provides a natural avenue for HGT.
Those parasitoids that utilize polydnaviruses likely increase the chance of horizontal transfer by many orders of magnitude by enriching a small portion of their genome and injecting it directly into the host in a specialized viral vector. Combined with the variable lethality of PDV infections, this gives a plausible range of natural circumstances in which horizontal transfer can occur. Previous work has shown this potential for PDVs to facilitate HGT in somatic insect cell lines, but never in germ line cells or living organisms in natural populations. Our findings represent the first documented cases of HGT using a polydnavirus vector in a live organism.
Though the mechanisms by which PDVs integrate into live organisms are poorly understood, this HGT raises the possibility of experimentally manipulating the system. These PDVs could be used as gene delivery vectors for the many plant [48,49] and animal species these viruses naturally affect, and possibly others. The targeting and efficiency of gene transfer using PDVs could be improved, making PDVs a potentially useful vector that will naturally self-terminate due to a lack of reproductive capability. If PDVs could be produced from wasp ovary cell cultures, this targeted gene transfer could be achieved using artificial injections without rearing wasps or infecting hosts with wasp larvae.
One interesting feature of these PDV producing wasps with respect to HGT is that, unlike many other instances of HGT from one multicellular eukaryote to another, there is no independently reproducing third species required to act as an intermediary. In many other instances of eukaryote HGT, this role is played by an organism (e.g. retrovirus) which is free living and then randomly acquires and carries the sequence from the donor species for a time, often replicating it as a part of its genome, until infecting the recipient species and transferring the sequence to the recipient germ line. HGT facilitated by PDVs is quite different. In this case everything needed to complete the HGT is contained within the wasp/host system. On the donor side, the sequences targeted for packaging in PDVs is non-random: a small subset of the wasp genome is replicated many times and packed into the PDV vector. On the recipient side, the host is a species specifically targeted by the wasp (and not an undirected infection like in the case of other vectors).
These differences predict several effects we expect to see when HGT is facilitated by PDVs. One is that there should be an increased number of horizontal transfers going from wasp to host. Another prediction is that the transferred regions should be regions targeted for PDV packaging. Because PDVs cannot replicate independently, another prediction is that these transferred sequences would not be expected to be found in species that are not targeted by these wasps. In the case of other vectors, the same virus that infects the recipient's germ line could continue reproducing and infect many other organisms (and species), producing a different, more dispersed pattern of HGT than one expects with PDVs.
This unusual genetic phenomenon creates an evolutionary paradigm in which a host species can acquire genetic information from a parasitoid species that attacks it. Previous work has discussed the acquisition of sequences from viral pathogens [50,51,52,53] however discussion of the acquisition of sequences from eukaryotic parasitoids has been absent up to this point. As with mutations or other horizontally transferred sequences, these sequences have the potential to be adaptive, maladaptive, or neutral for the host (probably more often neutral/maladaptive). Maladaptive sequences are typically removed over time by selection, so sequences found in extant organisms are likely depleted for maladaptive sequences. Occasionally, transferred sequences could be co-opted and adapted by the host in a number of ways. The chance of a sequence being co-opted would be expected to increase with the number and diversity of transferred sequences. The ways that the sequences could be adapted to the new host include controlling the host's gene regulation, altering its physiology, or using those sequences as a direct countermeasure against wasp parasitization to fight future PDV infections. For example, the secondary host could use the transferred sequences to produce anti-sense RNA to mark transcribed PDV mRNAs for degradation by normal processes that affect double stranded RNA. Another possibility is that the secondary host could co-opt protein interacting domains taken from the wasp to bind and block proteins produced by the virus. The secondary host could also find completely novel uses for the sequences, such as regulating its growth or immune system. Since it is estimated that there are hundreds of thousands of PDV producing wasp species (and their corresponding host species, ranging from insects to plants), our findings are likely a first look at a large and unusual phenomenon. This sort of horizontal transfer is likely to be discovered more frequently as more eukaryote species and PDVs are sequenced.
Tracking PDV HTS in secondary hosts can be a useful tool for evolutionary studies. It should be possible to deduce parasitoid/ host relationships, potentially even if there is no known parasitoid affecting the secondary host species. This is the case with the data presented here, which predict a parasitoid which affected Bombyx mori (or an ancestor species) though none is currently known. Given the moderate sequence divergence observed between the HTS found in Bombyx/Danaus and the donor PDVs, this suggests that the transfers were from some ancestors of Cotesia wasps to some ancestors of Bombyx/Danaus and may not involve any presently extant species. Evolutionary analysis of PDV HTS is applicable not just to present parasitoids but also parasitoids from the past, opening up the exciting possibility of tracking relationships that involve extinct species.

BLAST searches
The initial screen of this study was an extensive tblastx search of 387 PDV sequences against 165 eukaryote genomes including 75 vertebrate species, 25 plant species, and 65 invertebrate species. Among the invertebrates, there are 48 insect species including 15 Hymenoptera, 5 Lepidoptera, and 24 Diptera. A complete list of PDV queries and eukaryotic whole genome assemblies can be found in Table S1.
PDV sequences contain an unknown mix of coding sequence, regulatory sequence, introns, intergenic regions, and repeat content. When a PDV is naively searched against whole genome sequences, any of these features could match to the whole genome sequence. As a result, matches could appear in species that are not known to be parasitized by PDV wasps (e.g. mammals). These could be caused by simple repeat sequences or by proteins common to the larger group (e.g. metazoans). Tblastx was used because it is highly sensitive at detecting evolutionarily divergent proteins with different amino acids but conserved function (e.g. substituting a non-polar amino acid for a different non-polar amino acid) using the BLOSUM62 matrix. Sequences from this screen were kept if they had all the following: an e-value below 1.0e-10, a BLAST score greater than 70, and a match length of at least 67 amino acids. Remaining hits found on the same contig nearby one another (,5 kb) were merged into a single hit.
Using these ORF containing protein matches as initial regions, we expanded our investigation to regions surrounding these open reading frames to get a more complete understanding of how much sequence was transferred. The protein match from the PDV sequence plus 5 kb of flanking DNA in both directions were searched against the destination genomes for a strong match in DNA sequence. Portions of these ORF/extension regions that match to the destination genome could indicate a large stretch of DNA transferred by PDV to the destination genome. Sequences in this search were kept if they had the following: a match length of at least 200 nucleotides and a percent identity of at least 80%. The remaining sequences were our initial pool of potentially horizontally transferred sequences.

Removing repeats
Sequences matching a PDV sequence found to contain a transposon (NC_006658.1, HQ009558.1, EF067323.1) were removed. While we expect transposons to be transferred from PDVs to destination genomes just as a coding sequence would, they were removed for clarity and focus in this manuscript. Transposons were found by searching the 387 PDV sequences with RepeatMasker 3.3.0 (default settings). PDV sequences with significant matches to transposons were removed from the study.
Remaining candidate sequences were filtered for simple repeats. The DNA sequence of each candidate was searched by RepeatMasker 3.3.0 for significant matches with the following settings: -noint, -poly, default settings elsewhere. Any candidate with a significant match to simple repeats was removed.

Homology Analysis
The set of candidate sequences in the destination genomes were then analyzed and placed into homology groups. All the candidate sequences were searched against themselves using blastn with a max e-value of 1.0e-10. Candidate sequences were placed in a group together when a sequence has a 90% nucleotide identity over 90% of its length to any member in the group. Homology groups (and their candidates) containing more than one destination genome were removed from the study. Though such sequences may be horizontally transferred, we removed the sequences on the possibility that they could be vertically transmitted. There is no biological mechanism precluding protein sequences shared by metazoans or eukaryotes being included in a PDV. If such a shared sequence is included in a PDV, our screen for protein matches would find them and report a dispersed pattern of hits across many destination genomes. Because of this concern, homology groups showing that pattern were removed from the study.
In addition to searching for homology groups within our candidate transferred regions (above), we also searched for homology between our candidate sequences and the whole genome sequences of a large set of organisms. Specifically, we searched the candidate HTS against a database including our set of 165 eukaryotic genomes, the original 387 PDV sequences, and NCBI's non-redundant virus list (containing ''conventional'' viruses). In each search the strongest match of all the sequence space searched (whether viral, PDV, or eukaryote) was recorded. A sequence that matches more strongly to any eukaryotic sequence than to a PDV or virus sequence (self-hits excluded) could indicate that the sequence is an insect gene shared between those species by a distant common ancestor and not actually a PDV horizontally transferred sequence. If the sequences in the destination genomes are truly horizontally transferred regions, they should match more strongly (.20 blast score gap) to PDV sequences than to any other sequence. Searches were also performed by web-BLAST against NCBI's non-redundant database.

Testing conserved insect regions
We started with the coding regions of known Bombyx mori mRNAs (ftp://ftp.ncbi.nlm.nih.gov/genomes/Bombyx_mori/ RNA/) and used each as a tblastx query to the parasitic wasp Nasonia vittripennis genome assembly. Bombyx coding genes that had a Nasonia match score with the threshold used in the PDV searches (BLAST score 70 or higher) were extracted and a single splice form was retained for each such gene. This produced a set of 2,631 Bombyx genes that include a highly conserved segment present in the Nasonia genome. We selected 300 random genes from these 2,631 and used the same tblastx test to determine the representation of such highly conserved gene segments across a representative set of 26 insect genome assemblies. Results of the representation of these genes among tested insect genomes are shown in Figure 2, mapped onto a phylogenetic tree of the species tested.
The phylogenetic tree in Figure 2 was constructed as follows. A suite of 1,400 Apis mellifera coding exons that are present in single copy widely in insects was obtained, starting with all the coding exons annotated by Ensembl Amel4.0, and using tblastn to eliminate exons that are absent or present in more than one copy in the Apis mellifera (Amel 4.5 assembly), Tribolium casteneum (Tcas3.0 assembly), Drosophila melanogaster (dm3 assembly), or Ladonia fulva (Lful_1.0 assembly) genomes. This suite of exons was used as query in tblastn searches of all the species shown in Figure 2 plus the Arachnid Ixodes scapularis (IxscaW3 assembly) for rooting. The single best tblastn match for each genome was extracted and those that were reciprocal best blast matches to the correct Apis mellifera coding exon were retained for alignment and tree building. For each exon, muscle3.8.31 (default parameters) [54] was used to generate a multiple protein alignment, and positions in the alignment with more than 10% gaps were removed. All the exon alignments were then concatenated to generate a large protein multiple alignment (average of 59,188 amino acids per species). This alignment was used to construct a maximum-likelihood phylogenetic tree with phyml (LG model, 6 rate classes, SPR moves). The tree obtained agrees well with all recent analyses of these species [55,56,57], though no single published tree contains this exact set of species.

PCR
Two HTS were tested by PCR amplification. Primers were designed such that the reverse primer was placed in a region alignable between the wasp PDV sequence (see Fig. 3) and the Lepidopteran sequence (i.e. having strong homology between the sequences). The forward primer was placed in a region that was unalignable between the wasp and Lepidopteran sequences (regions lacking homology). Thus the PCR product spanned the junction between the transferred regions and non-transferred regions. PCR reactions where both primers were placed on transferred sequence were also performed, and are shown in Figure S2.
The PCR product was run on a 1.5% agarose gel with a 100 bp ladder and stained with ethidium bromide. The 20 ul PCR reaction contained: 5 ul genomic DNA at 30 ng/ul, 5 ul New England Biolabs 56 taq master mix, 1 ul fwd primer, 1 ul rev primer, 8 ul H20.
Primers are drawn on the relevant sequences in Figure 3 and detailed below. Gels shown in Figure 4. Six of different strains of Bombyx mori were tested for the presence of these hits (all strains courtesy of Marian Goldsmith, University of Rhode Island). p50: Inbred Chinese-originating strain used in sequencing by Japanese part of B. mori sequencing project. Originated from lab of Toru Shimada, University of Tokyo. Nistari: a multivoltine (multiple generations per year; no diapause) strain originally from India then brought to an INRA lab in Lyon. 401: Chinese strain reported to have BT-resistance. 418: Chinese strain. 214: Japanese strain. 555: European strain.
Genomic DNA from other insect species was also used as a negative control. Wild type Drosophila melanogaster was obtained from Celeste Berg. Apis melifera and Chlosyne lacinia were obtained from Charles Laird. Figure S1 DNA trees for homology groups containing more than one member. Each tree includes all members of the homology group and the original PDV sequences that the members of the homology group matched to. Alignments were performed using DIALIGN in ''genomic DNA'' mode and trees were created using PHYML A) Homology group 1 B) Homology group2 C) Homology group 3 D) Homology group 4 E) Homology group 5. (PNG) Figure S2 Gels displaying PCR amplification of HTS in Bombyx mori. Figures S2A and S2B show alternative PCR primers amplifying the same regions as the primers used in Fig 4.  Figures S2C and S2D show result for primers targeting PDV100. Figures S2E and S2F show result for primers targeting PDV99 Note that in parts C and D the Bombyx strain 106 appears to have a deletion polymorphism yielding a smaller fragment than other strains. Gels were ethidium bromide stained and run with a 100 bp ladder (brighter bands at 500 bp and 1000 bp). A) PCR results for reaction targeting PDV 101 with alternative primers. Lanes alternate between the two different forward primers for the reaction (expected product sizes of 1049 and 686). Tested four strains with each pair of primer sets: 418 (Chinese), 214(Japanese), Nistari (Indian multivoltene), 555(European). B) PCR results for alternative primers targeting PDV32 (expected product size of 947). Five strains were tested: 418 (Chinese), 214(Japanese), 401(Chinese BT-resistant), Nistari (Indian multivoltene), 555(European). C) PCR results for primers targeting PDV100. Lane1: B. mori 214(Japanese). Lane2: B. mori 401(Chinese). PDV99 was tested with the following primers in part F: Forward primer 59-GGGCGTTAACAATGCCAAGG-39 (nscaf2734: 12589-12608).

Supporting Information
Reverse primer 59-CGAAAGTTGC-CAGTTTCCGC-39 (nscaf2734: 13508-13489). Expected product size: 920. (PNG) Table S1 Summary information of PDV queries used and Eukaryotic whole genome assemblies used in this study. For the left portion: ''Polydnavirus queries'' lists all the PDV sequences used in the search by accession number. ''Wasp species'' gives the species of wasp the PDV was sequenced from. ''Length'' is the base pair length of the PDV sequence. ''Total'' gives the total amount of base pairs sequenced for that species of wasp. For the right portion: ''Abbreviation'' gives an abbreviation for the eukaryote database used. ''Eukaryote Genome'' lists the eukaryote genome databases used in full.

(XLS)
Table S2 Detailed information for PDV HTS. The first column ''PDV#'' is an arbitrary numbering scheme for the HTS. ''Group'' sorts the found PDV sequences into related groups. ''Donor wasp species'' gives the wasp species that the matching PDV derives from. ''matching viral sequence'' is the viral sequence a match was found for. ''Recipient species'' is the species the transferred sequence was found in. ''Contig'' shows the contig of a sequence. ''%id'' shows the percentage of matching nucleotides reported by BLAST. ''sequence length'' is the length of the horizontally transferred sequence. ''query start'' and ''query stop'' are the start and stop of the match on the query sequence (virus). ''database start'' and ''database stop'' are the start and stop of the match on the database (eukaryote genome). ''evalue'' is the probability of the match by chance. ''score'' is the BLAST alignment score. ''dn/ds'' is the ratio of synonymous to nonsynonymous changes between the query virus and the HTS found. ''Pfam'' shows protein family matches for the sequence. ''PCR'' shows if the sequence was tested by PCR. (XLSX)