Phylogenetic analysis has yet to uncover the early origins of flaviviruses. In this study, I mined a database of expressed sequence tags in order to discover novel flavivirus sequences. Flavivirus sequences were identified in a pool of mRNA extracted from the sea spider Endeis spinosa (Pycnogonida, Pantopoda). Reconstruction of the translated sequences and BLAST analysis matched the sequence to the flavivirus NS5 gene. Additional sequences corresponding to envelope and the NS5 MTase domain were also identified. Phylogenetic analysis of homologous NS5 sequences revealed that Endeis spinosa NS5 (ESNS5) is likely related to classical insect-specific flaviviruses. It is unclear if ESNS5 represents genetic material from an active viral infection or an integrated viral genome. These data raise the possibility that classical insect-specific flaviviruses and perhaps medically relevant flaviviruses, evolved from progenitors that infected marine arthropods.
Citation: Conway MJ (2015) Identification of a Flavivirus Sequence in a Marine Arthropod. PLoS ONE 10(12): e0146037. doi:10.1371/journal.pone.0146037
Editor: Sibnarayan Datta, Defence Research Laboratory, INDIA
Received: August 25, 2015; Accepted: December 11, 2015; Published: December 30, 2015
Copyright: © 2015 Michael J. Conway. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited
Data Availability: Data can be found in the manuscript and in the NCBI dbEST database.
Funding: This work was supported by Start-up funds from Central Michigan University College of Medicine. The funder had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The author has declared that no competing interests exist.
Viruses in the genus Flavivirus share a common genomic organization and certain antigenic relationships; however, they can be divided into distinct phylogenetic groups . Important groups include the mosquito-borne and tick-borne flaviviruses, flaviviruses with no known arthropod vector, dual host associated insect-specific flaviviruses (dhISFVs), and classical insect-specific flaviviruses (cISFVs) [2, 3]. Flaviviruses are clustered into phylogenetic groups likely due to selective pressures imposed on the viruses by their unique transmission cycles and ecologies [4–8]. Many mosquito and tick-borne flaviviruses are medically relevant in that they cause hemorrhagic and encephalitic disease in humans. It is unclear if vector co-infection with insect-specific flaviviruses impacts disease transmission or pathogenesis in nature [2, 9, 10].
Phylogenetic analysis of flaviviruses has been reported using individual genes and whole genomes and trees have been built using both amino acid and nucleotide sequences [1, 11–16]. These studies have yet to uncover the early origins of flaviviruses [11, 12]. Molecular clock studies have been unreliable due to dramatic changes in the nucleotide substitution rate over time and differences in the mutation rate between individual flaviviruses [16, 17]. Further, our current database of genetic material may only represent the tips of the evolutionary tree [12, 17].
In order to increase the pool of flavivirus sequences available for phylogenetic analysis, I mined an expressed sequence tag (est) database against a number of flavivirus genomes. Data mining identified flavivirus sequences in a library of sea spider Endeis spinosa (Pycnogonida, Pantopoda) cDNAs. Reconstruction of the translated sequences and BLAST analysis identified the protein as flavivirus NS5. Additional sequences corresponding to envelope and the NS5 MTase domain were also identified. Flavivirus sequences have been previously identified in the genomes of disease vectors, suggesting that integration of flaviviral genomes into the germline of arthropod hosts is a common event. It is unclear if ESNS5 represents genetic material from an active viral infection or an integrated viral genome. Phylogenetic analysis indicated that Endeis spinosa NS5 (ESNS5) shares a common ancestor that predates the evolution of classical insect-specific flaviviruses. These data suggest that classical insect-specific flaviviruses, and perhaps medically relevant flaviviruses, evolved from progenitors who first infected marine arthropods.
Materials and Methods
RNA extraction, library construction, and sequencing
RNA extraction, library construction, and sequencing was performed previously by Meusemann and Burmester et al., and uploaded onto the GenBank database as unpublished data on the expressed sequence tag (est) database. Briefly, Endeis spinosa were collected with the help of scientists from the Istituto di Scienze Marine, Venice (Italy). Animals were shock-frozen and grinded in liquid nitrogen. Total RNA was extracted as described by Holmes and Bonner (1973) and was further purified using the NucleoSpin RNA II kit (Macherey- Nagel, Dueren, Germany) including a DNase digest. Poly(A)+ RNA was enriched from the total RNA using Dynabeads Oligo(dT)25 (Invitrogen, Carlsbad, USA). ESTs were sequenced within the project "Molecular Phylogeny of the Arthropoda and the 'Ecdysozoa' Hypothesis", University of Hamburg, founded by the DFG priority program SPP 1174 "Deep Metazoan Phylogeny". Library construction and sequencing were performed at the MPI for Molecular Genetics, Berlin, Germany. The same research group constructed and uploaded cDNA libraries onto the est database representing the following terrestrial and marine arthropods: Anurida maritima, Acerentomon franzi, Campodea fragilis, Lepismachilis y-signata, Speleonectes cf. tulumensis, Archispirostreptus gigas, Limulus polyphemus, Peripatopsis sedgwicki, Tigriopus californicus, Pollicipes pollicipes, and Triops cancriformis.
Translated nucleotide BLAST (tblastn) database searches were performed by searching for flavivirus sequences in the expressed sequence tag (est) database. 100 cDNA clones were identified and accession numbers are available in S1 Table. ESNS5 and NS5 amino acid sequences for nearly all flaviviruses were aligned using MUSCLE multiple sequence alignment  and Gblocks 0.91b software . A stringent selection that did not allow contiguous nonconserved positions was chosen. Aligned sequences were uploaded into MEGA6 software in FASTA format . Evolutionary histories were inferred by analyzing aligned amino acid sequences by the Maximum Likelihood method based on the JTT matrix-based models. The robustness of the resulting groupings was tested by 1,000 bootstrap replications. The tree with the highest log likelihood was shown. The percentage of trees in which the associated taxa clustered together was shown next to the branches. Initial trees for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using either the JTT approach. Trees were drawn to scale, with branch lengths measured in the number of substitutions per site. All positions containing gaps and missing data were eliminated.
Bioinformatic detection and reconstruction of ESNS5
To identify novel flavivirus sequences, translated nucleotide BLAST (tblastn) database searches were performed by searching for flavivirus sequences in the expressed sequence tag (est) database. Whole flavivirus polyproteins were used to search the database. Tblastn analysis using the Sepik virus polyprotein identified multiple cDNA clones constructed from mRNA isolated from the sea spider Endeis spinosa. The top 10 clones are included in Table 1. To verify the presence of flavivirus sequences in the est database, nucleotide BLAST analysis was performed using the complete Sepik virus genome. Searches were performed with all three program selections: (1) highly similar sequences (megablast), (2) more dissimilar sequences (discontiguous megablast), and (3) somewhat similar sequences (blastn). Using this strategy, 76 cDNA clones were identified that matched to the Sepik virus genome. The top 10 clones are included in Table 2. Each of the cDNA clones identified from both search strategies were translated and the protein was reconstructed by amino acid alignment (Fig 1). Protein BLAST analysis suggested that this protein, which we call Endeis spinosa NS5 (ESNS5), is most related to “no known vector” and tick-borne flaviviruses (Table 3). ESNS5 represented a fragment of a NS5 RNA dependent RNA polymerase (RdRp) domain. Protein alignment with Rio Bravo virus (RBV) NS5 RdRp showed significant internal homology, but deletions at the N- and C-termini (Fig 1). A stop codon was evident in ESNS5, confirming that a small deletion is present at the C-terminus. An additional cDNA mining attempt was performed with tblastn by searching the expressed sequence tags (est) database for sequences that match the RBV polyprotein. Using this search strategy, additional clones that matched the flavivirus envelope protein and NS5 MTase domains were identified (S1 Table).
Amino acid sequence alignment was performed using reconstructed ESNS5 and Rio Bravo virus NS5 RdRp (Rio) using Clustal Omega. Stars represent perfect homology, colons represent partial homology, and periods represent weak homology. Dashes represent deleted sequence.
The Endeis spinosa cDNA library was performed by RNA extraction and purification using a NucleoSpin RNA II kit. A DNase digest was performed to eliminate genomic DNA contamination. Poly(A)+ RNA was enriched from the total RNA using Dynabeads Oligo(dT)25 and clones were sequenced by Sanger sequencing. Interestingly, no 3’ poly(A) tails were identified in any of the ESNS5 clones. Instead, internal tracts of poly(A) sequences were observed. The majority of clones contained the following internal poly(A) tract: “aaaaaaggaaa”. Additional internal poly(A) tracts were also identified in ESNS5 clones, suggesting that the flavivirus sequences may not have derived from an endogenous gene. G+C content was calculated for Endeis spinosa envelope (39.9%) and NS5 (41.1%) sequences, Endeis spinosa actin (GenBank accession number FN213165) (46.8%), histone H3 (GenBank accession number FJ862879) (47.5%), and 18S ribosomal RNA (GenBank accession number FJ862848) (51.2%), and the Rio Bravo virus genome (GenBank accession number AF144692) (43.2%). The flavivirus sequences had noticeably lower G+C content than the Endeis spinosa genes and were more similar to the G+C content of Rio Bravo virus.
The same laboratory that uploaded the Endeis spinosa cDNA library constructed and uploaded cDNA libraries for 11 other terrestrial and marine arthropods: Anurida maritima, Acerentomon franzi, Campodea fragilis, Lepismachilis y-signata, Speleonectes cf. tulumensis, Archispirostreptus gigas, Limulus polyphemus, Peripatopsis sedgwicki, Tigriopus californicus, Pollicipes pollicipes, and Triops cancriformis. These libraries equal 62,739 cDNA clones. Flavivirus sequences were not identified in any of the above cDNA libraries while using tblastn to search the est database for sequences that match the RBV polyprotein. Laboratory contamination was likely not responsible for the presence of flavivirus sequences in the Endeis spinosa library.
The phylogenetic position of ESNS5
In order to determine the phylogenetic relationship of ESNS5 with extant flaviviruses, a phylogenetic tree was constructed with homologous NS5 amino acid sequences from almost all known flaviviruses (Fig 2). The Maximum Likelihood method was employed using a JTT matrix-based model for amino acid alignments. Initial trees were obtained by applying the Neighbor-Joining method. The phylogeny supports that ESNS5 is related to cISFVs, and raises the possibility that cISFVs may have derived from a marine arthropod.
The evolutionary history of ESNS5 and almost all known flaviviruses was inferred by generating homologous NS5 amino acid sequences using MUSCLE multiple sequence alignment and Gblocks 0.91b software, followed by the Maximum Likelihood methods based on the JTT matrix-based models. The tree with the highest log likelihood (-27149.5525) is shown. The percentage of trees in which the associated taxa clustered together is shown next to the branches. Initial trees for the heuristic search were obtained by applying the Neighbor-Joining method to a matrix of pairwise distances estimated using a JTT model. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 74 sequences. All positions containing gaps and missing data were eliminated. There were a total of 496 positions in the final dataset. Evolutionary analysis was conducted in MEGA6.
Endeis spinosa is a member of the Pycnogonida, which are commonly known as sea spiders [21, 22]. These marine arthropods are ubiquitous and various species are found distributed around the world . Most species have a body only mm in length; however, giant sea spiders have been isolated in Antarctic waters . Genetic evidence suggests that Pycnogonida may be an ancient sister group to all living arthropods—a history that spans hundreds of millions of years .
It is unclear at this point if ESNS5 represents an active infection, a very old integration event, or an integration event from an extant but previously unidentified flavivirus. The cDNA library was constructed using Dynabeads Oligo(dT)25, which purify mRNA with high sensitivity and specificity. Viral RNA contamination from infected sea spider tissue may have occurred through binding of internal poly(A) tracts to the Dynabeads, which were identified in many of the ESNS5 clones. RNA with internal poly(A) tracts can contaminate mRNA during enrichment with oligo-dT, and this would be more likely if the RNA had a high copy number such as a replicating viral genome [25, 26]. 3’ poly(A) tracts are only found in prototype tick-borne encephalitis virus strain Neudoerfl [27, 28]. Further, it is unlikely that ESNS5 represents laboratory contamination since flavivirus sequences were not identified in 11 other cDNA libraries that were constructed, sequenced, and uploaded by the same laboratory. G+C content was also lower than Endeis spinosa genes and was closer to the G+C content in Rio Bravo virus.
It is theoretically possible that ESNS5 and its related envelope and MTase sequences represent a genome-wide integration event or multiple integration events. Genetic studies have estimated a long history of flaviviral infections in mosquitoes, and this analysis extends into the genomes of key disease vectors [13, 29]. PCR techniques and whole genomic sequencing of Ae. albopictus and Ae. aegypti revealed fragments of ancestral viral infections throughout both genomes [29, 30]. Approximately two-thirds of a flavivirus-like genome were identified in Ae. albopictus integrated as a single open reading frame spanning NS1-NS4A . Ae. albopictus and Ae. aegypti share a stretch of sequence that resembles flavivirus NS1, but they do not appear to be phylogenetically related . Additional unique sequences, including a stretch of sequence resembling NS5 were identified in Ae. aegypti (AENS5) but not in Ae. albopictus .
This study identified a flavivirus sequence in a cDNA library from a marine arthropod and showed a phylogenetic relationship with classical insect-specific flaviviruses. Additional research is necessary to determine if marine arthropods can be and are currently infected with flaviviruses.
S1 Table. GenBank accession numbers of sequences analyzed in this study.
Conceived and designed the experiments: MJC. Performed the experiments: MJC. Analyzed the data: MJC. Contributed reagents/materials/analysis tools: MJC. Wrote the paper: MJC.
- 1. Kuno G, Chang GJ, Tsuchiya KR, Karabatsos N, Cropp CB. Phylogeny of the genus Flavivirus. J Virol. 1998;72(1):73–83. Epub 1998/01/07. pmid:9420202; PubMed Central PMCID: PMC109351.
- 2. Blitvich BJ, Firth AE. Insect-Specific Flaviviruses: A Systematic Review of Their Discovery, Host Range, Mode of Transmission, Superinfection Exclusion Potential and Genomic Organization. Viruses. 2015;7(4):1927–59. Epub 2015/04/14. doi: v7041927 [pii] doi: 10.3390/v7041927 pmid:25866904.
- 3. Calzolari M, Ze-Ze L, Vazquez A, Sanchez Seco MP, Amaro F, Dottori M. Insect-specific flaviviruses, a worldwide widespread group of viruses only detected in insects. Infect Genet Evol. 2015. Epub 2015/08/04. doi: S1567-1348(15)00319-6 [pii] doi: 10.1016/j.meegid.2015.07.032 pmid:26235844.
- 4. Forrester NL, Guerbois M, Seymour RL, Spratt H, Weaver SC. Vector-borne transmission imposes a severe bottleneck on an RNA virus population. PLoS Pathog. 2012;8(9):e1002897. Epub 2012/10/03. doi: 10.1371/journal.ppat.1002897 PPATHOGENS-D-11-02681 [pii]. pmid:23028310; PubMed Central PMCID: PMC3441635.
- 5. Vasilakis N, Deardorff ER, Kenney JL, Rossi SL, Hanley KA, Weaver SC. Mosquitoes put the brake on arbovirus evolution: experimental evolution reveals slower mutation accumulation in mosquito than vertebrate cells. PLoS Pathog. 2009;5(6):e1000467. Epub 2009/06/09. doi: 10.1371/journal.ppat.1000467 pmid:19503824; PubMed Central PMCID: PMC2685980.
- 6. Weaver SC, Vasilakis N. Molecular evolution of dengue viruses: contributions of phylogenetics to understanding the history and epidemiology of the preeminent arboviral disease. Infect Genet Evol. 2009;9(4):523–40. Epub 2009/05/23. doi: S1567-1348(09)00036-7 [pii] doi: 10.1016/j.meegid.2009.02.003 pmid:19460319; PubMed Central PMCID: PMC3609037.
- 7. Weaver SC, Brault AC, Kang W, Holland JJ. Genetic and fitness changes accompanying adaptation of an arbovirus to vertebrate and invertebrate cells. J Virol. 1999;73(5):4316–26. Epub 1999/04/10. pmid:10196330; PubMed Central PMCID: PMC104213.
- 8. Coffey LL, Vasilakis N, Brault AC, Powers AM, Tripet F, Weaver SC. Arbovirus evolution in vivo is constrained by host alternation. Proc Natl Acad Sci U S A. 2008;105(19):6970–5. Epub 2008/05/07. doi: 0712130105 [pii] doi: 10.1073/pnas.0712130105 pmid:18458341; PubMed Central PMCID: PMC2383930.
- 9. Bolling BG, Olea-Popelka FJ, Eisen L, Moore CG, Blair CD. Transmission dynamics of an insect-specific flavivirus in a naturally infected Culex pipiens laboratory colony and effects of co-infection on vector competence for West Nile virus. Virology. 2012;427(2):90–7. Epub 2012/03/20. doi: S0042-6822(12)00135-3 [pii] doi: 10.1016/j.virol.2012.02.016 pmid:22425062; PubMed Central PMCID: PMC3329802.
- 10. Hobson-Peters J, Yam AW, Lu JW, Setoh YX, May FJ, Kurucz N, et al. A new insect-specific flavivirus from northern Australia suppresses replication of West Nile virus and Murray Valley encephalitis virus in co-infected mosquito cells. PLoS One. 2013;8(2):e56534. Epub 2013/03/06. doi: 10.1371/journal.pone.0056534 PONE-D-12-33705 [pii]. pmid:23460804; PubMed Central PMCID: PMC3584062.
- 11. Pettersson JH, Fiz-Palacios O. Dating the origin of the genus Flavivirus in the light of Beringian biogeography. J Gen Virol. 2014;95(Pt 9):1969–82. Epub 2014/06/11. doi: vir.0.065227–0 [pii] doi: 10.1099/vir.0.065227-0 pmid:24914065.
- 12. Gould EA, de Lamballerie X, Zanotto PM, Holmes EC. Origins, evolution, and vector/host coadaptations within the genus Flavivirus. Adv Virus Res. 2003;59:277–314. Epub 2003/12/31. pmid:14696332.
- 13. Cook S, Moureau G, Kitchen A, Gould EA, de Lamballerie X, Holmes EC, et al. Molecular evolution of the insect-specific flaviviruses. J Gen Virol. 2012;93(Pt 2):223–34. Epub 2011/10/21. doi: vir.0.036525–0 [pii] doi: 10.1099/vir.0.036525-0 pmid:22012464; PubMed Central PMCID: PMC3352342.
- 14. Gaunt MW, Sall AA, de Lamballerie X, Falconar AK, Dzhivanian TI, Gould EA. Phylogenetic relationships of flaviviruses correlate with their epidemiology, disease association and biogeography. J Gen Virol. 2001;82(Pt 8):1867–76. Epub 2001/07/18. pmid:11457992.
- 15. Grard G, Moureau G, Charrel RN, Holmes EC, Gould EA, de Lamballerie X. Genomics and evolution of Aedes-borne flaviviruses. J Gen Virol. 2010;91(Pt 1):87–94. Epub 2009/09/11. doi: vir.0.014506–0 [pii] doi: 10.1099/vir.0.014506-0 pmid:19741066.
- 16. Marin MS, Zanotto PM, Gritsun TS, Gould EA. Phylogeny of TYU, SRE, and CFA virus: different evolutionary rates in the genus Flavivirus. Virology. 1995;206(2):1133–9. Epub 1995/02/01. doi: S0042682285710380 [pii]. pmid:7856087.
- 17. Holmes EC. Molecular clocks and the puzzle of RNA virus origins. J Virol. 2003;77(7):3893–7. Epub 2003/03/14. pmid:12634349; PubMed Central PMCID: PMC150674.
- 18. Li W, Cowley A, Uludag M, Gur T, McWilliam H, Squizzato S, et al. The EMBL-EBI bioinformatics web and programmatic tools framework. Nucleic Acids Res. 2015;43(W1):W580–4. Epub 2015/04/08. doi: gkv279 [pii] doi: 10.1093/nar/gkv279 pmid:25845596; PubMed Central PMCID: PMC4489272.
- 19. Castresana J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol Biol Evol. 2000;17(4):540–52. Epub 2000/03/31. pmid:10742046.
- 20. Tamura K, Stecher G, Peterson D, Filipski A, Kumar S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol. 2013;30(12):2725–9. Epub 2013/10/18. doi: mst197 [pii] doi: 10.1093/molbev/mst197 pmid:24132122; PubMed Central PMCID: PMC3840312.
- 21. Stock JH. Experiments on food preference and chemical sense in Pycnogonida.
- 22. Arango CP, Wheeler WC. Phylogeny of the sea spiders (Arthropoda, Pycnogonida) based on direct optimization of six loci and morphology. Cladistics. 2007;23(3):255–93.
- 23. Raiskii AK, Turpaeva EP. Deep-sea pycnogonids from the North Atlantic and their distribution in the World Ocean.
- 24. Regier JC, Shultz JW, Zwick A, Hussey A, Ball B, Wetzer R, et al. Arthropod relationships revealed by phylogenomic analysis of nuclear protein-coding sequences. Nature. 2010;463(7284):1079–83. Epub 2010/02/12. doi: nature08742 [pii] doi: 10.1038/nature08742 pmid:20147900.
- 25. Beilharz TH, Preiss T. Widespread use of poly(A) tail length control to accentuate expression of the yeast transcriptome. RNA. 2007;13(7):982–97. Epub 2007/06/26. doi: 13/7/982 [pii] doi: 10.1261/rna.569407 pmid:17586758; PubMed Central PMCID: PMC1894919.
- 26. Nam DK, Lee S, Zhou G, Cao X, Wang C, Clark T, et al. Oligo(dT) primer generates a high frequency of truncated cDNAs through internal poly(A) priming during reverse transcription. Proc Natl Acad Sci U S A. 2002;99(9):6152–6. Epub 2002/04/25. doi: 10.1073/pnas.092140899 092140899 [pii]. pmid:11972056; PubMed Central PMCID: PMC122918.
- 27. Mandl CW, Kunz C, Heinz FX. Presence of poly(A) in a flavivirus: significant differences between the 3' noncoding regions of the genomic RNAs of tick-borne encephalitis virus strains. J Virol. 1991;65(8):4070–7. Epub 1991/08/01. pmid:1712858; PubMed Central PMCID: PMC248839.
- 28. Wallner G, Mandl CW, Kunz C, Heinz FX. The flavivirus 3'-noncoding region: extensive size heterogeneity independent of evolutionary relationships among strains of tick-borne encephalitis virus. Virology. 1995;213(1):169–78. Epub 1995/10/20. doi: S0042-6822(85)71557-7 [pii] doi: 10.1006/viro.1995.1557 pmid:7483260.
- 29. Crochu S, Cook S, Attoui H, Charrel RN, De Chesse R, Belhouchet M, et al. Sequences of flavivirus-related RNA viruses persist in DNA form integrated in the genome of Aedes spp. mosquitoes. J Gen Virol. 2004;85(Pt 7):1971–80. Epub 2004/06/26. doi: 10.1099/vir.0.79850-0 85/7/1971 [pii]. pmid:15218182.
- 30. Nene V, Wortman JR, Lawson D, Haas B, Kodira C, Tu ZJ, et al. Genome sequence of Aedes aegypti, a major arbovirus vector. Science. 2007;316(5832):1718–23. Epub 2007/05/19. doi: 1138878 [pii] doi: 10.1126/science.1138878 pmid:17510324; PubMed Central PMCID: PMC2868357.