Choclo virus (CHOV) recovered from deep metatranscriptomics of archived frozen tissues in natural history biorepositories

Background Hantaviruses are negative-stranded RNA viruses that can sometimes cause severe disease in humans; however, they are maintained in mammalian host populations without causing harm. In Panama, sigmodontine rodents serve as hosts to transmissible hantaviruses. Due to natural and anthropogenic forces, these rodent populations are having increased contact with humans. Methods We extracted RNA and performed Illumina deep metatranscriptomic sequencing on Orthohantavirus seropositive museum tissues from rodents. We acquired sequence reads mapping to Choclo virus (CHOV, Orthohantavirus chocloense) from heart and kidney tissue of a two-decade old frozen museum sample from a Costa Rican pygmy rice rat (Oligoryzomys costaricensis) collected in Panama. Reads mapped to the CHOV reference were assembled and then validated by visualization of the mapped reads against the assembly. Results We recovered a 91% complete consensus sequence from a reference-guided assembly to CHOV with an average of 16X coverage. The S and M segments used in our phylogenetic analyses were nearly complete (98% and 99%, respectively). There were 1,199 ambiguous base calls of which 93% were present in the L segment. Our assembled genome varied 1.1% from the CHOV reference sequence resulting in eight nonsynonymous mutations. Further analysis of all publicly available partial S segment sequences support a clear relationship between CHOV clinical cases and O. costaricensis acquired strains. Conclusions Viruses occurring at extremely low abundances can be recovered from deep metatranscriptomics of archival tissues housed in research natural history museum biorepositories. Our efforts resulted in the second CHOV genome publicly available. This genomic data is important for future surveillance and diagnostic tools as well as understanding the evolution and pathogenicity of CHOV.


Background
Hantaviruses are negative-stranded RNA viruses that can sometimes cause severe disease in humans; however, they are maintained in mammalian populations without causing harm.In Panama, sigmodontine rodents serve as hosts to transmissible hantaviruses.Due to natural and anthropogenic forces, these rodent populations are having increased contact with humans.

Methods
We extracted RNA and performed Illumina deep metatranscriptomic sequencing on Orthohantavirus seropositive museum tissues from rodents.We acquired sequence reads mapping to Choclo virus (CHOV, Orthohantavirus chocloense) from the heart and kidney tissue of a two decade old sample from a Costa Rican pygmy rice rat (Oligoryzomys costaricensis) collected in Panama.Reads mapped to the CHOV reference were assembled and then validated by visualization of the mapped reads against the assembly.

Results
We recovered a 91% complete consensus sequence from a reference-guided assembly to CHOV with an average of 16X coverage.The S and M segments used in our phylogenetic analyses were nearly complete (98% and 99%, respectively).There were 1,199 ambiguous base calls of which 93% were present in the L segment.Our assembled genome varied 1.3% from the CHOV reference sequence resulting in 11 nonsynonymous mutations.Further analysis of all publicly available partial S segment sequences support a clear relationship between CHOV clinical cases and O. costaricensis acquired strains.

Conclusions
Viruses occurring at extremely low abundances can be recovered from deep metatranscriptomics of archival tissues housed in museums or biorepositories.Our efforts resulted in the second CHOV genome publicly available.This genomic data is important for future surveillance and diagnostic tools as well as understanding the evolution and pathogenicity of CHOV.
Government employees are not copyrighted, but are licensed under a CC0 Public Domain Dedication, which allows unlimited distribution and reuse of the article for any lawful purpose.This is a legal requirement for US Government employees.
This will be typeset if the manuscript is accepted for publication.

Results
We recovered a 91% complete consensus sequence from a reference-guided assembly to CHOV with an average of 16X coverage.The S and M segments used in our phylogenetic analyses were nearly complete (98% and 99%, respectively).There were 1,199 ambiguous base calls of which 93% were present in the L segment.Our assembled genome varied 1.3% from the CHOV reference sequence resulting in 11 nonsynonymous mutations.Further analysis of all publicly available partial S segment sequences support a clear relationship between CHOV clinical cases and O. costaricensis acquired strains.

Conclusions
Viruses occurring at extremely low abundances can be recovered from deep metatranscriptomics of archival tissues housed in museums or biorepositories.Our efforts resulted in the second CHOV genome publicly available.This genomic data is important for future surveillance and diagnostic tools as well as understanding the evolution and pathogenicity of CHOV.

Author Summary
Hantavirus cardiopulmonary syndrome (HCPS) in Panama, caused by Choclo virus (CHOV, Orthohantavirus chocloense), is intimately linked to the primary mammalian reservoir host, the Costa Rican pygmy rice rat (Oligoryzomys costaricensis).Although the prevalence of hantavirus disease is relatively low in Panama, over a quarter of the country has the agroecological conditions that favor this rodent.In addition, serologic evidence suggests infections are under-reported.Sequence data of the pathogen and host collected across temporal and spatial scales is necessary for diagnostics, surveillance, and forecasting; however, only one complete genome is available in NCBI GenBank.By leveraging deep metatranscriptomics of archived frozen mammal tissues, we generated a low-coverage genome using a reference-guided assembly approach.Sequence data can be used to develop pan-hantavirus diagnostic tools to facilitate acquisition of more detailed genetic data from archival samples to increase our understanding of the evolutionary and population dynamics of rare and neglected hantaviruses.
Generating additional genomic sequence data will also be essential for developing a rigorous taxonomic framework to improve the understanding of hantavirus diversity and distribution.

Introduction
Hantaviruses are tri-segmented negative-stranded RNA viruses within the family Bunyaviridae that can cause two severe diseases in human populations, namely hemorrhagic fever with renal syndrome (HFRS) and hantavirus cardiopulmonary syndrome (HCPS).
Hantaviruses in the Americas are more closely associated with HCPS, which is characterized by fever, headache, myalgia, hypotension, and thrombocytopenia that can progress to cardiopulmonary failure.The mortality rate for HCPS is estimated between 15-50% and varies among virus species and across countries [1][2][3][4][5].
Despite high mortality rates in humans, hantaviruses are maintained naturally in rodent populations and can persist for months to the lifetime of the animal [6].Infected rodents shed virus through saliva, urine, and feces which can form aerosols that can be inhaled by other rodents or humans [7].In Panama, the Costa Rican pygmy rice rat (Oligoryzomys costaricensis) serves as a rodent reservoir for hantaviruses.O. costaricensis is susceptible to habitat conversion from natural to agricultural lands which is hypothesized to increase rodent populations and human contact with infected rodents, ultimately increasing pathogen transmission opportunities 16-60% of the individuals were hantavirus seropositive depending on region [14], documenting that many mild or asymptomatic exposures were not accounted for in clinical case count data.
Despite being first reported more than 40 years ago, Hantaviruses are often described as 'emerging' pathogens due to their increasing number of infections, global distribution, and great breadth of pathogen diversity [15].Current proactive approaches aimed at pathogen prediction are utilizing hantaviruses as a model for understanding spillover events [16].Diverse specimens of wild mammals archived in museum biorepositories over temporal and spatial scales are increasingly being utilized for surveillance and characterization of emerging diseases [17][18][19][20].
For instance, the first complete CHOV sequence was obtained from archival Oligoryzomys costaricensis (=fulvescens, [21]) splenic tissue (MSB:Mamm:96073) using Sanger sequencing [13].Of the 69 CHOV sequences available in NCBI GenBank.Of these, only nine are considered complete segments, eight of which are derived from the same voucher specimen (accessed August 10, 2023).Here, we deep sequenced archived mammalian tissue specimens to generate a metatranscriptome-assembled CHOV genome doubling the total available CHOV genomes.

Sample acquisition
We

RNA extraction, amplification, and sequencing
We performed a total RNA extraction on frozen tissue using the QIAamp viral RNA minikit (Qiagen Inc, Cambridge, MA, USA) according to the manufacturer's instructions, with slight modifications.Briefly, the tissue was bead beaten using 0.

Phylogenetic analyses
Reference and for the Old World hantavirus Orthohantavirus seoulense (Seoul virus, SEOV) as an outgroup [30] (Table S1).For each segment, sequences were aligned with mafft v.7.487 [31] using automatically determined settings (i.e., mafft --auto).The alignment was trimmed with trimal v.1.4.rev22 [32] in automated1 mode with the additional removal of positions with <50% representation (--resoverlap) and sequences with <60% representation (--seqoverlap).The resulting alignments were 1,825 bp for the S segment, 3,618 bp for the M segment, and 6,562 bp for the L segment.A concatenated alignment of the 5,443 bp complete S and M segment was also generated.Maximum likelihood trees were built in IQ-Tree v.1.6.12 [33] with the GTR+GAMMA model with 10,000 ultrafast bootstraps and 10,000 bootstraps for the SH-like approximate likelihood ratio (SH-aLRT) and visualized in ggtree [34] for each segment and the concatenated alignment.A tanglegram for evaluation of phylogenetic concordance between segments was visualized in R v.4.2.2.
Because the nucleocapsid protein is a primary detection marker for clinical diagnostics [35] and therefore the most abundant in sequence archives, we obtained 55 partial S segment sequences from GenBank by searching 'Choclo orthohantavirus' to explore strain diversity (Table S2).The sequences were aligned, trimmed, and filtered as above resulting in a 585 bp alignment.A maximum likelihood was built under the GTR+GAMMA model in IQ-Tree with ultrafast and SH-aLRT bootstraps and visualized in ggtree as described above.

Spatial distribution
We acquired GPS localities for capture sites of 32 rodent voucher specimens (https://arctos.database.museum)and for the approximate residency of 21 clinical cases of HCPS (Table S2).Samples were plotted in R using ggmap v.3.0.2 [36].Samples collected within 12 km were aggregated based on maximum spatial movements of close rodent relatives [37,38].
Although we only present samples here with sequence data, previous surveillance efforts have found hantavirus seropositive O. costaricensis across its broad geographic range in Panama in five out of the nine ecoregions: Central American Atlantic Moist Forests, Isthmian-Pacific Moist Forests, Panamanian Dry Forests, Pacific Mangrove S. America, and Choco/Darién Moist Forests [39].

Results and Discussion
An outbreak from late 1999 to early 2000 of hantavirus cardiopulmonary syndrome (HCPS) in western Panama [9,10] has spurred over two decades of epidemiological and wildlife surveillance [12,39].During those 20 years 712 clinical cases of HCPS were reported in Panama [12] and >11,000 specimens of non-volant mammals with archived biological materials were contributed to museum repositories (https://arctos.database.museum/).Of these, just shy of 800 rodents have been currently screened for prior hantavirus infection using an IgG strip immunoblot assay [40] with an average seropositivity of 16% [39].We requested samples designated hantavirus positive at the University of New Mexico Southwestern Museum of Biology for metatranscriptomic sequencing.Initial taxonomic assignment by Kraken2 classified < 15 reads Orthohantavirus in all samples except MSB:Mamm:131232, for which it identified 658 reads.Hantavirus-specific IgG antibodies can be detected up to six months after initial infection [41].Given that 90% of the hantavirus-positive samples had 0 to 14 reads, there was likely no active infection.
We then took a reference-based mapping approach to maximize recovery of CHOV reads.We generated 113,237,341 reads from deep sequencing the metatranscriptome of MSB:Mamm:131232, of which 2,072 reads (0.00002%) mapped to the CHOV reference.The average depth of coverage from the mapping-based alignment was 16X; however, there was variation between segments with the greatest coverage of the small (S) segment (25X) followed by the medium (M) segment (19X) and the large (L) segment (12X) (Fig 1).Due to the low abundance of sequencing reads mapping to CHOV, we implemented a 3X coverage threshold of base-calling in the consensus genome.If the coverage at a site was below 5X, we only called a base when the allele frequency was 100% at that site.Using these parameters, we recovered a 91% complete CHOV assembly.Similar coverage thresholds have aided in the recovery of more complete assemblies [42,43].This was an improvement upon our initial de novo assembly which was only 64% complete.Under these thresholds, there were 1,199 ambiguous base calls of which 93% (1,114 bp) were in the L segment.Therefore, completeness of the L segment (83%) was less than the M (99%) and the S (98%) segments.While our L segment is not complete, this marks a major improvement compared to many of the hantaviruses deposited in GenBank which are missing the L segment (ELMCV, LANV, and NYV) or only sequenced small portions of the

L segment (BCCV). The International Committee on Taxonomy of Viruses Hantaviridae Study
Group is revisiting minimum sequence requirements for proper hantavirid classification [44].
The addition of more complete or full genomes from additional hantavirus isolates will greatly aid in our ability to design PCR amplicon tiling array sequencing strategies that can lead to affordable and scalable sequencing efforts.to those in SNV [45], should be investigated.Non-synonymous mutations can have effects on protein stability and structure or alteration of protein-protein interactions [46].Phylogenies were built with the GTR+GAMMA model with 10,000 ultrafast bootstraps and 10,000 bootstraps for the SH-aLRT on single segment and concatenated alignments.In the tanglegram, lines connect the same taxa/tip in each tree to one another such that crossing lines suggest topological discordance.Hantavirus evolution has likely been shaped by their co-evolutionary history with their rodent reservoirs [47].The phylogenetic relationships between South American and North American hantaviruses from sigmodontine rodents (Fig 2B) are most likely derived from a complex history of co-speciation events and the biogeographic constraints that influenced rodent expansion into South America [48].However, intra-and inter-lineage reassortment between closely related variants have been reported for several hantaviruses [49][50][51][52][53].Although  The phylogeny was built with the GTR+GAMMA model with 10,000 ultrafast bootstraps and disclosure statement that describes the sources of funding for the work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS Neglected Tropical Diseases for specific examples.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.Funded studies Enter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?Yes Please add funding details. as follow-up to "Financial Disclosure Enter a financial disclosure statement that describes the sources of funding for the work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS Neglected Tropical National Science Foundation (https://www.nsf.gov)PIPP Phase I grant (NSF 2155222, JAC) Powered by Editorial Manager® and ProduXion Manager® from Aries Systems Corporation Diseases for specific examples.This statement is required for submission and will appear in the published article if the submission is accepted.Please make sure it is accurate.Funded studies Enter a statement with the following details: Initials of the authors who received each award • Grant numbers awarded to each author • The full name of each funder • URL of each funder website • Did the sponsors or funders play any role in the study design, data collection and analysis, decision to publish, or preparation of the manuscript?• Did you receive funding for this work?"Please select the country of your main research funder (please select carefully as in some cases this is used in fee calculation).as follow-up to "Financial Disclosure Enter a financial disclosure statement that describes the sources of funding for the work included in this submission.Review the submission guidelines for detailed requirements.View published research articles from PLOS Neglected Tropical Diseases for specific examples.

[ 8 ]
. Choclo virus (CHOV, Orthohantavirus chocloense) was first identified by RT-PCR after an outbreak of HCPS in the agroecosystems of western Panama from December 1999 to February 2000 [9,10].Monitoring of human and rodent populations in Panama over the past two decades have discovered multiple hantaviruses in Panama (i.e.Calabazo virus and Rio Segundo virus); however, CHOV is responsible for almost all human cases [11-13].From 2001 to 2007 multiple community-wide surveys of western Panamanians without reported HCPS symptomology found requested samples designated Orthohantavirus positive by an immunoglobulin G (IgG) serological screening test from the University of New Mexico Museum of Southwestern Biology for metatranscriptomic sequencing.One sample, MSB:Mamm:131232, resulted in sufficient sequencing depth to assemble a Choclo virus (CHOV, Orthohantavirus chocloense) RNA genome.This voucher specimen was from an adult female Costa Rican pygmy rice rat (Oligoryzomys costaricensis) collected in El Bebedero, Tonosi, Los Santos, Panama in January 2003.Identification of host species was based on morphological characters and was subsequently verified using mitochondrial cytochrome b region sequence data (GenBank accession OR365535).
8 g of 1.0 mm Zirconia beads (BioSpec Inc, Bartlesville, OK, USA) and 1.5 g of 2.3-mm Zirconia beads (BioSpec Inc) in 800 µl of AVL buffer using the Benchmark Bead Bug-6 homogenizer at a speed of 4,350 rpm for 45 sec for 2 cycles with a 1 min rest in between.Homogenates went through a series of centrifugation and transfers, first at 4,000 rpms for 7 min, then again at 7,000 rpms for 10 min to pellet debris.The clear lysate was transferred to a new tube with the RNA carrier and vortexed for 15 sec.The final RNA isolation was conducted per manufacturer's protocol including final elution with 50 µl nuclease free water.Utilizing the Zymo RNA Clean & Concentrator-5 kit, extracted RNA was concentrated and treated with DNAse I utilizing on-column digestion following the manufacturer's protocol (Zymo Research, Irvine, CA, USA).The resulting RNA was depleted of ribosomal RNA for two hours, converted to cDNA using random hexamers, and i7 and i5 sequencing adaptors added.Finally, individual samples were barcoded and amplified by PCR (7 cycles).All depletion and library preparation steps were conducted using the Zymo-Seq RiboFree Total RNA Kit (Zymo Research) following the manufacturer's recommended protocol for degraded RNA.Prepared libraries were normalized to 2 nM, pooled, and combined with PhiX control (v.3, Illumina Inc, San Diego, CA, USA) at a final concentration of 1%.Pooled libraries were loaded at a final concentration of 750 pM and sequenced on an Illumina NextSeq 2000 using a P3 2x150 kit (Illumina Inc).De-multiplexing, adapter trimming, and preliminary QC were conducted using the Dragen pipeline (v.1.3.0,Illumina Inc).Reads were submitted to the NCBI Short Read Archive (SRA) under BioProject PRJNA1015235.

Fig 1 .
Fig 1. Coverage plot of sequencing reads mapped to the S, M, and L segments of the CHOV reference genome.

Fig 2 .
Fig 2. Maximum likelihood phylogenies of the S and M segments and a concatenated alignment (A) and a tanglegram used to visualize possible reassortment histories (B).

Fig 3 .
Fig 3. Geographic distribution of CHOV sequences from 32 capture sites of reservoir host pygmy rice rats (O.costaricensis) and approximate residence of 21 clinical cases.
assumptions of reassortment events are often based on conflicting phylogenetic tree topologies, reassortment has also been demonstrated in in vitro experiments[54][55][56].The high degree of genomic similarity in the S and L segments suggests the exchange of the M segment is more common and potentially beneficial [57].We did not find evidence of reassortment between the two CHOV genomes (Fig2B).With additional genomic variants, this question should be revisited with more robust analyses[58].One outstanding question is the relatedness of hantaviruses isolated from rodents to human clinical cases.A phylogenetic analysis of all publicly available partial S segment CHOV sequences clearly demonstrates viruses isolated from O. costaricensis, including the strain described here, are intimately related to all clinical strains from Panama (Fig4).Both clinical and rodent CHOV strains were captured from Los Santos, Veraguas, and Coclé provinces; however, only rodent-acquired sequences were found in Panamá (Fig3).This is reflective of the clinical disease burden with the greatest number of cases in Los Santos (77%) followed by Veraguas (12%) and Coclé (7%)[12].The strains from the province of Panamá form a well-supported clade demonstrating potential geographic substructure (Fig4) which is congruent with previous findings[39].More sequencing is needed to determine how closely these are related to clinical strains; however, only seven Panamá residents have reported HCPS in the last 20 years (1% of all cases)[12].Five other sequences from Panamanian hantaviruses were isolated from the short-tailed cane mouse (Zygodontomys brevicauda) and the Chiriqui harvest mouse (Reithrodontomys creper), but these are likely representative of other hantaviruses (i.e., Calabazo virus and Rio Segundo virus) that have yet to be fully sequenced[44] or associated with human disease (FigS2).

Fig 4 .
Fig 4. A maximum likelihood phylogeny of the partial S segment of 51 CHOV genomes demonstrates sequences isolated from O. costaricensis are associated with human disease.

Fig S1. A concatenated 12 ,
Fig S1.A concatenated 12,005 bp alignment of the S, M, and L segments was used to infer a maximum likelihood phylogeny of New World Hantaviruses.The phylogeny was built with the GTR+GAMMA model with 10,000 ultrafast bootstraps and 10,000 bootstraps for the SH-aLRT.LANV, BCCV, ELMCV, and NYV were limited to just the complete S and M segments.

Fig
Fig S2.A maximum likelihood phylogeny of the partial S segment of 56 Panamanian hantaviruses.Sequences were obtained from sick human patients and three rodent hosts (Oligoryzomys costaricensis, Zygodontomys brevicauda, and Reithrodontomys creper).

Figure 1
Figure 1Click here to access/download;Fig

Figure 3
Figure 3 Click here to access/download;Figure;Fig3_CHOV_map2.pdf

Figure 4
Figure 4 Click here to access/download;Figure;Fig4_Ssegment_CHOV_host_v4.pdf

information captions Table S1. Orthohantavirus sequences obtained from GenBank. Table S2. Partial S segment of 56 Panamanian Orthohantavirus sequences obtained from GenBank.
Front Cell Infect Microbiol.2020;10: 460.Mills JN, Parmenter CA, Ksiazek TG, Parmenter RR, Vande Castle JR, et al.The ecology and evolutionary history of an emergent disease: Hantavirus Pulmonary Syndrome.