Prospecting for viral natural enemies of the fire ant Solenopsis invicta in Argentina

Metagenomics and next generation sequencing were employed to discover new virus natural enemies of the fire ant, Solenopsis invicta Buren in its native range (i.e., Formosa, Argentina) with the ultimate goal of testing and releasing new viral pathogens into U.S. S. invicta populations to provide natural, sustainable control of this ant. RNA was purified from worker ants from 182 S. invicta colonies, which was pooled into 4 groups according to location. A library was created from each group and sequenced using Illumina Miseq technology. After a series of winnowing methods to remove S. invicta genes, known S. invicta virus genes, and all other non-virus gene sequences, 61,944 unique singletons were identified with virus identity. These were assembled de novo yielding 171 contiguous sequences with significant identity to non-plant virus genes. Fifteen contiguous sequences exhibited very high expression rates and were detected in all four gene libraries. One contig (Contig_29) exhibited the highest expression level overall and across all four gene libraries. Random amplification of cDNA ends analyses expanded this contiguous sequence yielding a complete virus genome, which we have provisionally named Solenopsis invicta virus 5 (SINV-5). SINV-5 is a positive-sense, single-stranded RNA virus with genome characteristics consistent with insect-infecting viruses from the family Dicistroviridae. Moreover, the replicative genome strand of SINV-5 was detected in worker ants indicating that S. invicta serves as host for the virus. Many additional sequences were identified that are likely of viral origin. These sequences await further investigation to determine their origins and relationship with S. invicta. This study expands knowledge of the RNA virome diversity found within S. invicta populations.


Introduction
The red imported fire ant, Solenopsis invicta Buren is an invasive species native to southern South America [1]. The ant was introduced into North America sometime in the 1930s [1], most likely from somewhere in Formosa Province, Argentina [2]. This ant is a very serious pest in the U.S., but generally not in its native range; although it is one of the most ecologically dominant ant species in northeastern Argentina [3,4]. Damage  estimated $6 billion annually in the U.S. [5]. Population studies on the two continents have shown that fire ant populations are 5-10 times greater in infested areas within the U.S. [6,7]. These inter-continental disparities support the supposition that S. invicta likely escaped its natural enemies during U.S. founding events. Indeed, direct evaluations have shown a paucity of natural enemies in founding populations of S. invicta [8].
In the U.S., early eradication efforts were attempted [1], but eventually gave way to the implementation of quarantine [9] to limit the spread of the ant. Concomitantly, research focus shifted from eradication to the discovery, characterization, and release of natural enemies of S. invicta with the intention of providing sustainable control in the U.S. This effort led to the discovery of many pathogens and parasites of fire ants in their native and introduced ranges [10], some of which have been utilized and released as natural control agents against invasive fire ants in the U.S. [11][12][13]. Still, there remains a large discrepancy in both the abundance and the number of natural enemies found between populations of S. invicta in South and North America [14,15] warranting continued efforts to identify new pathogens for use in providing natural control.
Despite the known usefulness of viruses to control insect pests [16], viruses have been only recently investigated for use against ants [11,17]. Indeed, the first ant viruses discovered and characterized were from S. invicta [14]. To date, four RNA viruses and one DNA virus have been discovered from S. invicta. The RNA viruses include Solenopsis invicta virus 1 [18], Solenopsis invicta virus 2 [19], Solenopsis invicta virus 3 [20], and Solenopsis invicta virus 4 [21]. All of these viruses are present in both U.S. and Argentine populations of S. invicta. Solenopsis invicta queens infected with Solenopsis invicta virus 1 (SINV-1) have lower body weights that reduce the probability of successful colony founding [22]. Solenopsis invicta virus 2 (SINV-2) infections are associated with significant reductions in queen fecundity and other detrimental fitness effects including longer claustral periods and slower growth of incipient colonies [22]. Solenopsis invicta virus 3 (SINV-3) also reduces queen fecundity [23] and alters the feeding behavior exhibited by the worker caste, which results in colony starvation [24]. The impacts of Solenopsis invicta virus 4 (SINV-4) and the sole DNA virus, Solenopsis invicta densovirus (SiDNV), have not been established [25].
The objective of this research was to discover new virus natural enemies of S. invicta from the native range (i.e., Formosa, Argentina), with the ultimate goal of their release into introduced U.S. populations as self-sustaining biocontrol agents. Metagenomics and next generation sequencing were employed to achieve this objective and resulted in the discovery of one new virus and multiple high-probability target sequences of likely viral origin providing future leads to pursue. frozen at -80 o C for future evaluations. Voucher specimens have been deposited in both the USDA-ARS, Center for Medical, Agricultural and Veterinary Entomology (CMAVE), Gainesville, Florida collection and the Fundación Para el Estudio de Especies Invasivas (FuEDEI), Hurlingham, Buenos Aires, Argentina collection.

RNA preparation
Total RNA was extracted from a pooled group of 15 worker ants from each colony fragment using the Trizol method followed by the PureLink RNA Mini Purification Kit according to the manufacturer's instructions (Thermo Fisher Scientific, Waltham, MA). RNA quality of each preparation was assessed by microfluidic analysis on an Agilent 2100 Bioanalyzer (Agilent, Cary, NC) using the RNA 6000 Nano kit according to the manufacturer's instructions. RNA samples were pooled from ant colonies into four groups according geographic region (Table 1; Fig 1). The four groups included South American Library _1 (SAL_1), collected north of 25.

Library preparation and sequencing
Total RNA (200 ng) purified from each of the four pooled groups of worker ants (SAL_1, SAL_2, SAL_3, and SAL_4) was used for mRNA purification with the Illumina TruSeq Stranded mRNA Library Preparation Kit (Catalog # RS-122-2101). The low sample protocol was followed according to the manufacturer's instructions. The RNA fragmentation step was omitted to maximize library insert length. Rather than fragmenting the RNA, the sealed plate was incubated at 80˚C for 2 minutes to elute the primed mRNA from the RNA purification beads. This omission resulted in RNA fragmentation with an average final library size of 467 bp. Library sizes were determined empirically by microfluidic analysis on an Agilent 2100 Bioanalyzer and quantified using the Quant-iT dsDNA kit with broad range standards (Ther-moFisher Scientific, Q-33130). Samples were pooled together at equimolar quantities and sequenced twice using the Illumina MiSeq (2x300) cycle kit with version 3 chemistry. Using the Illumina indices, the data were demultiplexed and the runs combined to assign the data to individual samples. All other procedures were followed according to the manufacturer's instructions.

Bioinformatics analysis
Sequences were aligned to the Solenopsis invicta reference genome downloaded from http:// www.ncbi.nlm.nih.gov/Traces/wgs/?val=AEAQ01#contigs using the Burrows-Wheeler Aligner (bwa-0.7.5a), a software package for mapping low-divergent sequences against a large reference genome [27]. The S. invicta unmapped reads were selected and converted to FASTA format using NextGENe-2.3.4 (SoftGenetics, State College, PA). Each read was then filtered and retained if the median score was ! 20 and base number ! 25. Unmapped and filtered individual MiSeq sequences were analyzed using BLASTX [28] against the curated Swiss Protein database (http://www.uniProt.org; download date 11/14/2014). Sequences returning an expectation score less than 10 −5 were tabulated. Based on the BLASTX results, each sequence was annotated and assorted taxonomically. The sequences were binned into the following groups: Animal, Plant, Fungi, Bacteria, Archaea, Phage, and Non-phage virus. Also at this stage, sequences exhibiting identity to Enterobacteria phage phiX174, an internal control for Illumina processing [29] were removed and not considered in subsequent analyses.

Virus sequences
Non-phage virus sequences from each library identified from the BLASTX analysis were assembled using the CAP3 algorithm [30] in the Vector NTI ContigExpress program (Invitrogen, Carlsbad, CA). Sequences from phage were not assembled because they infect bacteria and would not be expected to infect fire ant cells. Contiguous sequences (contigs) and remaining singletons were matched to the genomes of known fire ant viruses (Solenopsis invicta virus 1 [SINV-1, GCF_000854925.1], SINV-2 (GCF_000870805.1), SINV-3 (GCF_000881215.1), SINV-4 (MF_041808.1), and SiDNV (GCF_000912895.1)). Those sequences matching known fire ant viruses (! 95% identity) were binned according to virus species and excluded from further analysis. Virus unmatched sequences/contigs were re-analyzed by BLASTX and sequences returning an expectation score of less than 10 −5 were tabulated.

Data availability
Raw sequence data from each library were deposited into the GenBank database as a Sequence Read Archive under accession number, SRP113235 (Bioproject PRJNA394996). Assembled sequences with viral identity (Table 2) have been deposited at DDBJ/EMBL/GenBank as a Transcriptome Shotgun Assembly project under the accession GFUG00000000. The version described in this paper is the first version, GFUG01000000. The SINV-5 annotated genome was deposited in GenBank under accession number MF593921.

Virus genome re-sequencing
Contig_29 was unquestionably a near complete virus RNA genome. Therefore, this sequence was used as template for 5' and 3' RACE to acquire the entire genome sequence. For 3' RACE, cDNA was synthesized with the GeneRacer Oligo dT primer (Invitrogen, Carlsbad, CA). PCR was subsequently conducted with the GeneRacer 3' primer and gene-specific primer, P1604 (S1 Table). For 5' RACE, cDNA was synthesized with oligonucleotide primer P1601 and PCR conducted with P1601 and the GeneRacer Abridged Anchor Primer. Amplicons generated during RACE reactions were cloned into pCR4 vector and submitted for Sanger sequencing. After acquiring the genome termini, oligonucleotide primers were designed to provide complete, overlapping coverage of Contig_29. Amplicons were cloned and sequenced by the Sanger method and the genome assembled with CAP3 in Vector NTI (Life Technologies, Carlesbad, CA).

Preliminary RNA virus confirmation
In order to provide further evidence of whether a contig was of viral origin, or not, oligonucleotide primers (S1 Table) were designed to contigs with viral identity and expressed in all four libraries (i.e. the first 15 contigs in Table 2). RT-PCR was conducted with RNA pooled from all four libraries to establish the orientation of the template and to verify that cDNA synthesis was required for amplification. Once established, RT-PCR was conducted with RNA derived from S. invicta colonies (worker ants) field-collected from around Gainesville, Florida, (n = 27 from 3 locations) to determine whether the sequence was present in the U.S. field population and, if so, its prevalence. This experiment also provided additional confirmation of a potential viral origin. If amplification was observed in 100% of the samples, it was assumed that the sequence was of host origin and further experiments with the sequence were terminated. Viral infections rarely exhibit an incidence of 100% among field-collected arthropods [31].
Finally, to establish whether Contig_29 (SINV-5) was actively replicating in S. invicta, strand-specific RT-PCR was conducted to detect the replicative genome strand by the modified method of Craggs et al. [32]. Pooled total RNA (50 ng) used for library creation was mixed with 10 mM dNTPs, 1 μM of tagged reverse oligonucleotide primer p1600TAG and heated to 65˚C for 5 minutes. First strand buffer and Superscript reverse transcriptase (Invitrogen, Carlsbad, CA) were then added and the reaction mixture was incubated at 55˚C for 30 minutes before inactivating the RT at 70˚C for 15 minutes. Unincorporated cDNA oligonucleotides were digested with 10 units of Exonuclease I (New England Biolabs, Ipswich, MA) at 37˚C for 1 hour. The reaction was terminated by heating to 80˚C for 20 minutes. PCR was subsequently conducted with minus-strand specific cDNA as template. The reaction was conducted in a 25 μl volume containing 2 mM MgCl 2 , 200 μM dNTP mix, 0.5 units of Platinum Taq DNA polymerase (Invitrogen, Carlsbad, CA), 0.2 μM of each oligonucleotide primer p1601 and TAG (S1 Table), and 5 μl of the cDNA preparation. The temperature cycling program was 1 cycle at 94˚C for 2 minutes, 35 cycles of 94˚C for 15 seconds, 59˚C for 15 seconds, 68˚C for 40 seconds, and 1 cycle of 68˚C for 5 minutes. PCR products were separated on an agarose gel (1%) and visualized by SYBR-safe (Invitrogen, Carlsbad, CA) staining. Plus strand RT-PCR was included as a positive control as well as a non-template negative control.  (n = 876,528). The number of non-phage virus sequences was 292,499, or 33.4% of the total. Surprisingly, phage were well represented despite low numbers of bacterial sequences. However, those phage sequences detected exhibited high expression levels and were limited to the Microviridae (ssDNA). The remaining sequences were annotated to the following taxa: Animal (41.7%), Fungi (3.4%), Plant (2.1%), Protist (1.1%), Bacteria (1.0%), and Archaea (0.3%). The relative number of sequences across these broad taxonomic categories was fairly consistent among the four gene libraries (Fig 3A). Detection of a high percentage of animal-related sequences were largely fire ant sequences that were not filtered out during the fire ant matching phase. Further winnowing was accomplished by removing the known S. invicta virus sequences from the non-phage virus pool (Fig 2). The total number of S. invicta virus-matched sequences was 230,555 (i.e., 78.8% of the non-phage sequences). SINV-4 (n = 126,086) represented the largest fraction of S. invicta known virus sequences in the pooled libraries, followed by SINV-1 (n = 84,769), SINV-2 (n = 19,338), SINV-3 (n = 353), and SiDNV (n = 9) (Fig 2). The small number of SiDNV sequences may be explained because SiDNV is a DNA virus which would normally only produce RNA while actively replicating. Conversely, many RNA viruses would be detected while replicating or not. Unlike the broader taxonomic assignments, the prevalence of S. invicta viruses varied by several orders of magnitude across each library/region ( Fig 3B).
The S. invicta virus-unmatched sequences (n = 61,944) were assembled de novo yielding 171 contiguous sequences (comprised of 55,677 singletons) with significant identity to nonplant virus genes (Table 2 and S2 Table). An additional 30 contiguous sequences were assembled from 633 singletons, which showed significant identity to non-virus genes and 35 contiguous sequences (596 singletons) with significant plant virus identity, all of which were excluded from further examination. The remaining singletons and contiguous sequences (n = 5,038) were also excluded from further analysis because they either did not match any sequence in the GenBank database and/or were less than 100 nucleotides in length.
Among the 171 contiguous sequences with significant identity to non-plant virus genes, Table 2 summarizes those considered most likely to be of viral origin (n = 38) and using S. invicta as host. This assumption was based on the total sequence expression representation, representation across the libraries (i.e., by geography), and contig size. Expression representation has been used successfully to detect pathogens and is simply based on the fact that actively replicating genes (e.g., from viruses infecting a host) will be highly represented in non-normalized gene libraries [33]. Among the 38 sequences in Table 2 exhibiting significant identity with viral sequences from the GenBank database, fifteen (9%) were represented in all four gene libraries and were composed of 46,845 of the 55,677 singletons. Thus, these fifteen contiguous sequences alone accounted for 84% of the singletons assembled with non-plant virus identity and were considered high likelihood viral prospects. Nine contigs contained sequences detected in three of the four libraries and were composed of 1,276 singletons; eight contigs contained sequences detected in two of the four libraries and were composed of 927 singletons. Finally, six contigs contained sequences detected in a single library, but were highly represented within the single library (composed of 5,180 singletons). In total, these 38 contigs contained 54,228 singletons, or 97.4% of all sequences of those with non-plant virus identity. All of these sequences exhibited identity to viruses with RNA genomes, the majority of which to the Dicistroviridae (n = 23). Fourteen of the sequences exhibited identity with unclassified viruses and one with a virus in the Nodaviridae (i.e., Contig_51).
Sequences with the highest level of expression and represented in all four gene libraries (n = 15; Table 2) were considered the most likely prospects to be of viral origin (henceforth referred as high likelihood viral prospects). Therefore, we focused our effort on these fifteen contigs ( Table 2). Contig_29 exhibited the highest expression level overall and across all four gene libraries; this sequence appeared to be a near complete virus genome. The large contig sequence was 9,030 nucleotides in length and BLASTX analysis [28] indicated that the sequence had significant identity to Israeli acute paralysis virus (IAPV) and other RNA viruses in the Dicistroviridae. Sanger re-sequencing and RACE reactions revealed a 9,313 nucleotide polyadenylated genome containing two, in-frame, open reading frames (ORFs) separated and flanked by untranslated regions, and a short, overlapping ORF at the 5' end of ORF 2 (Fig 4A). The 5'-proximal ORF contained domains with identity to RNA helicase (pfam00910), virus peptidase (pfam12381), and RNA-dependent RNA polymerase (pfam00680). The 3'-proximal ORF contained domains with identity to CRPV capid proteins (pfam08762) and the capsid protein, VP4, from dicistroviruses (pfam 11492). These characteristics indicate that this genome sequence represents a new dicistrovirus [34]. The virus is provisionally named, Solenopsis invicta virus 5 (SINV-5) and the sequence deposited in GenBank under accession number MF593921. Further evaluation of the SINV-5 genome revealed that the replicative strand was detected in S. invicta (from Argentinean colonies). The presence of a replicative genome strand and a high expression level of SINV-5 sequences detected in all four gene libraries indicate that S. invicta likely serves as host for the virus (Fig 4B). Phylogenetic analysis of the conserved RdRp region of ORF 1 of SINV-5 with the known Dicistroviridae species shows that SINV-5 assorts with dicistroviruses within the Aparavirus genus, near SINV-1 (Fig 4C). The short, overlapping ORF at the 5' end of the structural ORF (ORF 2) provides further support for the Aparavirus placement.
Contiguous sequences (specifically, Contig_66, Contig_30 and Contig_16) also exhibited high expression levels and were represented in all four libraries. These contigs also exhibited significant identity to dicistrovirus non-structural and structural proteins and likely represent new virus species. The remaining 11 contigs, Contig_70 to Contig_80 (Table 2), had lower overall expression levels, but were considered high likelihood virus prospects because they were detected in all four libraries (geographic regions) and exhibited significant identity with viral genes.
Among the fifteen high likelihood viral sequences identified (Table 2), PCR amplification only occurred after reverse transcription (Table 3). No amplification was detected without reverse transcription confirming that these templates were RNA. A small number of field-collected S. invicta colonies from Florida (n = 27) were also examined by RT-PCR to determine whether any of the fifteen high likelihood viral sequences were present in the U.S. S. invicta population ( Table 3). Five of the templates were apparently present in the U.S. population ranging in prevalence from 15 to 56%. However, the majority were not detected in U.S. S. invicta samples.

Discussion
In an effort to discover new viral pathogens to possibly control S. invicta in the U.S., we collected samples from 182 nests from four distinct geographic areas across the Formosa region of Argentina, created gene libraries from each of these pooled groups, and sequenced each of them by the Illumina Miseq method. Through a series of winnowing methods, 171 contiguous sequences with significant viral identity were ultimately identified as viral candidates.
Among these 171 possibilities, we focused on 15 contigs because they exhibited the highest expression levels and were detected in all 4 geographic regions ( Table 2). They were analyzed in an attempt to establish their origin-whether viral, host, or otherwise. Solenopsis invicta is an omnivorous insect, so viruses infecting prey or plant food items must be identified and excluded from consideration. We largely employed the step-by-step decision tree reported previously [35] as a general guide to determine the likelihood that a given sequence was of viral origin. Based on previous studies [18,36], this winnowing method significantly improves virus identification and discovery. In addition to this decision tree, the relative prevalence of each contig/sequence was considered supporting evidence for viral replication and host status. Because no form of mRNA subtraction was conducted before library preparation, the representation of each transcript was expected to be relatively proportional to the actual number of sequences present in the sample. Basically, ingested viruses would likely be represented by fewer sequence copies. Conversely, replicating viruses would be indicated by higher sequence  [43]. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [44]. Only the most conserved region of the RdRp was aligned and a total of 243 positions were included in the final dataset (exact positions for the translated ORF1 are indicated in the phylogenetic tree within parentheses by each taxa). Evolutionary analyses were conducted in MEGA7 [45]. copy numbers. Furthermore, sequences found in more than a single region/library would provide further support that it was, in fact, of virus origin and infecting S. invicta.
However, low representation does not preclude the possibility that a sequence is not from a virus infecting S. invicta. Indeed, low gene/genome copy number has been reported for actively replicating viruses that have detrimental effects on their hosts [22]. Additionally, some RNA viruses, which can be virulent and highly represented, may occur in low numbers because of seasonal variation, low host infection rates, or other unknown factors [37,38].
One nearly complete virus genome was assembled among the high likelihood virus contigs (Contig_29; Table 2). RACE reactions using this contig as template and subsequent Sanger resequencing resulted in a complete virus genome. We have provisionally named this virus sequence SINV-5. It exhibits characteristics consistent with dicistroviruses in the Aparavirus genus, including, a monopartite genome containing two, in-frame, ORFs that are flanked and separated by untranslated regions, a short, overlapping ORF at the 5' end of ORF 2, and a polyadenylated 3' terminus. The 5'-proximal ORF contained domains with identity to RNA helicase, protease, and RNA-dependent RNA polymerase, and the 3'-proximal ORF contained domains with identity to virus capsid proteins. SINV-5 was only detected in S. invicta colonies from Argentina; it was not detected in a limited number of S. invicta colonies from three locations in the U.S. In addition to the genomic architecture, phylogenetic analysis of the RdRp further supports a taxonomic assignment of SINV-5 in the Dicistroviridae (Fig 4C). Taken together, high expression levels, detection of a replicative strand, and phylogeny of SINV-5 indicate that S. invicta serves as the host for this novel virus (Fig 4B). Depending on the impact of this virus on fire ants and it host specificity, it may be a candidate for introduction into the U.S. as a classical biological control agent for S. invicta.
We also discovered what will likely be a second virus among the unidentified contigs in Table 2. Two contigs (Contig_66 and Contig_30) exhibited identity with Aphid lethal paralysis virus (ALPV) structural and non-structural proteins, respectively (Table 2). Because ALPV is a dicistrovirus, we postulated that these two fragments may have been part of the same genome, Viral natural enemies of the fire ant but were lacking sequence linking them. The number of sequences comprising each of these contigs further supported this notion. Notice that in every library (SAL-1, -2, -3, and -4), more sequences comprised Contig_66 than Contig_30 (Table 2). This relationship would be expected in a host in which a dicistrovirus was actively replicating. Specifically, there would be a molar excess of capsid proteins compared with non-structural proteins. RT-PCR with a reverse primer (P1617) specific for Contig_30 and forward primer (P1631) specific for Con-tig_66 produced an amplicon (~3 kbp), whose sequence linked the two contigs. The joined contig was 4,505 nucleotides in length. This sequence was detected in both Argentinean and U.S. S. invicta populations (Table 3) and likely represents another virus as the replicative strand of this genome was also detected in U.S. and Argentinean S. invicta colonies (Data not shown).
A majority of the contigs tested by RT-PCR (10/15; Table 3) were only detected in the RNA libraries from Argentina. This fits well with the hypothesis that most of the natural enemies of S. invicta were left behind in South America when it was accidently introduced into the U.S. [7]. Nevertheless, it is of interest that 4 of the 5 S. invicta viruses discovered to date, plus the likely one just mentioned above, plus 5 of 15 likely viral contigs tested (Table 3) have been found in both North and South American populations of S. invicta. Similarly, the microsporidian pathogen Kneallhazia solenopsae has been found infecting fire ants on both continents, but the microsporidian, Vairimorpha invictae is only found in South America [39]. This frequency of pathogens found on both continents supports the conclusion that multiple colonies of S. invicta were introduced into the U.S., perhaps over a period of years [1] because it is highly unlikely that a single invading colony or only a few colonies would carry this many pathogens [37]. It also appears that most of the viruses found in the U.S. were naturally common in native South American fire ant populations (Fig 3). Certainly, at least 3 of the viruses (SINV-1, SINV-2, SINV-3) can be seasonally abundant in U.S. populations [37]. Another possible explanation for viral pathogens on both continents is that some may be generalists with a wide host range that already occurred in the U.S. on other ant species, especially some of the native fire ants. However, the two viruses tested to date (SINV-1 and SINV-3) appear to have originated from South America and are specific for S. invicta [38,40,41].
While high expression levels are a logical place to start to discover new viruses from next generation sequencing data, low expression levels do not necessarily preclude a sequence from consideration. Indeed, SINV-2 was shown to exhibit comparatively low expression in S. invicta queens, yet had a profound impact on fecundity and gene expression during colony founding [22]. In fact, low copy sequences from gene libraries have previously resulted in virus discovery from the tawny crazy ant, Nylanderia fulva [35]. S2 Table contains 133 contiguous sequences with significant viral identity, all comprised of fewer than 50 singletons. Most of these sequences exhibited significant identity with RNA viral genomes, however, seven sequences showed identity to DNA viral genomes (5 single stranded and 2 double stranded). Thus, there are many possible virus leads resulting from this study that require additional investigation to establish their origin and relationship to S. invicta. All sequences have been deposited in Genbank to facilitate and encourage the discovery of additional viral pathogens of S. invicta in South America. Hopefully, new homologies can be discovered by employing new search algorithms and by periodically reanalyzing these libraries as new viral sequences are added to sequence databases.
In conclusion, our ongoing metagenomics/next generation sequencing efforts [14] have been very successful. In this study, we were able to match 79% of the non-phage virus contigs to known fire ant viruses (Fig 2). We also expanded the virome of S. invicta by discovery of a new S. invicta-infecting virus, provisionally named, SINV-5. SINV-5 is of particular interest because it does not appear to occur in the introduced U.S. range of S. invicta and therefore may be able to be released as a self-sustaining classical biological control agent for these invasive ants in the U.S. Another apparent virus sequence (Contigs_30 and 66; Table 2) appears to be a new virus of S. invicta, although the genome is incomplete. To date, we have discovered and sequenced the entire genomes of 6 viruses in South American fire ants (i.e., SINV-1, -2, -3, -4, -5 and SiDNV). By way of comparison, 31 viruses have been characterized from honey bees [42]. Future work with SINV-5 and other newly discovered fire ant viruses will focus on their pathogenicity, host specificity, and seasonality in order to assess their potential for use as selfsustaining biocontrol agents and/or biopesticides.
Supporting information S1 Table. Oligonucleotide primers and their purpose from experiments in this study. (DOCX) S2 Table. Contiguous sequences comprised of fewer than 50 singletons with significant viral identity by BLASTX analysis of the GenBank database from RNA libraries created from Solenopsis invicta worker ants. Contigs were first sorted in descending order based on the number of the sequences comprising it, followed by the libraries represented. (DOCX)