Genetic Detection and Characterization of Lujo Virus, a New Hemorrhagic Fever–Associated Arenavirus from Southern Africa

Lujo virus (LUJV), a new member of the family Arenaviridae and the first hemorrhagic fever–associated arenavirus from the Old World discovered in three decades, was isolated in South Africa during an outbreak of human disease characterized by nosocomial transmission and an unprecedented high case fatality rate of 80% (4/5 cases). Unbiased pyrosequencing of RNA extracts from serum and tissues of outbreak victims enabled identification and detailed phylogenetic characterization within 72 hours of sample receipt. Full genome analyses of LUJV showed it to be unique and branching off the ancestral node of the Old World arenaviruses. The virus G1 glycoprotein sequence was highly diverse and almost equidistant from that of other Old World and New World arenaviruses, consistent with a potential distinctive receptor tropism. LUJV is a novel, genetically distinct, highly pathogenic arenavirus.


Introduction
Members of the genus Arenavirus, comprising currently 22 recognized species (http://www.ictvonline.org/virusTaxonomy. asp?version=2008), are divided into two complexes based on serologic, genetic, and geographic relationships [1,2]: the New World (NW) or Tacaribe complex, and the Old World (OW) or Lassa-Lymphocytic choriomeningitis complex that includes the ubiquitous arenavirus type-species Lymphocytic choriomeningitis virus (LCMV; [3]). The RNA genome of arenaviruses is bi-segmented, comprising a large (L) and a small (S) segment that each codes for two proteins in ambisense coding strategy [4,5]. Despite this coding strategy, the Arenaviridae are classified together with the families Orthomyxoviridae and Bunyaviridae as segmented singlestrand, negative sense RNA viruses.
Humans are most frequently infected through contact with infected rodent excreta, commonly via inhalation of dust or aerosolized virus-containing materials, or ingestion of contaminated foods [13]; however, transmission may also occur by inoculation with infected body fluids and tissue transplantation [17][18][19]. LCMV, which is spread by the ubiquitous Mus musculus as host species and hence found world-wide, causes symptoms in humans that range from asymptomatic infection or mild febrile illness to meningitis and encephalitis [13]. LCMV infection is only rarely fatal in immunocompetent adults; however, infection during pregnancy bears serious risks for mother and child and frequently results in congenital abnormalities. The African LASV, which has its reservoir in rodent species of the Mastomys genus, causes an estimated 100,000-500,000 human infections per year in West African countries (Figure 1). Although Lassa fever is typically subclinical or associated with mild febrile illness, up to 20% of cases may have severe systemic disease culminating in fatal outcome [20,21]. Three other African arenaviruses are not known to cause human disease: Ippy virus (IPPYV; [22,23]), isolated from Arvicanthis spp. and Mobala virus (MOBV; [24]) isolated from Praomys spp. in the Central African Republic (CAR); and Mopeia virus (MOPV) that like LASV is associated with members of the genus Mastomys, and was reported from Mozambique [25] and Zimbabwe [26], although antibody studies suggest that MOPV and LASV may also circulate in CAR [27] where the geographies of these viruses appear to overlap ( Figure 1). Up to present, there have been no published reports of severe human disease associated with arenaviruses isolated from southern Africa.
In September 2008 an outbreak of unexplained hemorrhagic fever was reported in South Africa [28]. The index patient was airlifted in critical condition from Zambia on September 12 to a clinic in Sandton, South Africa, after infection from an unidentified source. Secondary infections were recognized in a paramedic (case 2) who attended the index case during air transfer from Zambia, in a nurse (case 3) who attended the index case in the intensive care unit in South Africa, and in a member of the hospital staff (case 4) who cleaned the room after the index case died on September 14. One case of tertiary infection was recorded in a nurse (case 5) who attended case 2 after his transfer from Zambia to Sandton on September 26, one day before barrier nursing was implemented. The course of disease in cases 1 through 4 was fatal; case 5 received ribavirin treatment and recovered. A detailed description of clinical and epidemiologic data, as well as immunohistological and PCR analyses that indicated the presence of an arenavirus, are reported in a parallel communication (Paweska et al., Emerg. Inf. Dis., submitted). Here we report detailed genetic analysis of this novel arenavirus.

Results/Discussion
Rapid identification of a novel pathogen through unbiased pyrosequencing RNA extracts from two post-mortem liver biopsies (cases 2 and 3) and one serum sample (case 2) were independently submitted for unbiased high-throughput pyrosequencing. The libraries yielded between 87,500 and 106,500 sequence reads. Alignment of unique singleton and assembled contiguous sequences to the GenBank database (http://www.ncbi.nlm.nih.gov/Genbank) using the Basic Local Alignment Search Tool (blastn and blastx; [29]) indicated coverage of approximately 5.6 kilobases (kb) of sequence distributed along arenavirus genome scaffolds: 2 kb of S segment sequence in two fragments, and 3.6 kb of L segment sequence in 7 fragments ( Figure 2). The majority of arenavirus sequences were obtained from serum rather than tissue, potentially reflecting lower levels of competing cellular RNA in random amplification reactions.

Full genome characterization of a newly identified arenavirus
Sequence gaps between the aligned fragments were rapidly filled by specific PCR amplification with primers designed on the pyrosequence data at both, CU and CDC. Terminal sequences were added by PCR using a universal arenavirus primer, targeting the conserved viral termini (59-CGC ACM GDG GAT CCT AGG C, modified from [30]) combined with 4 specific primers positioned near the ends of the 2 genome segments. Overlapping primer sets based on the draft genome were synthesized to facilitate sequence validation by conventional dideoxy sequencing. The accumulated data revealed a classical arenavirus genome structure with a bi-segmented genome encoding in an ambisense strategy two open reading frames (ORF) separated by an intergenic stem-loop region on each segment ( Figure 2) (GenBank Accession numbers FJ952384 and FJ952385).
Our data represent genome sequences directly obtained from liver biopsy and serum (case 2), and from cell culture isolates obtained from blood at CDC (case 1 and 2), and from liver biopsies at NICD (case 2 and 3). No sequence differences were uncovered between virus detected in primary clinical material and virus isolated in cell culture at the two facilities. In addition, no changes were detected between each of the viruses derived from these first three cases. This lack of sequence variation is consistent with the epidemiologic data, indicating an initial natural exposure of the index case, followed by a chain of nosocomial transmission among subsequent cases.

Lujo virus (LUJV) is a novel arenavirus
Phylogenetic trees constructed from full L or S segment nucleotide sequence show LUJV branching off the root of the OW arenaviruses, and suggest it represents a highly novel genetic lineage, very distinct from previously characterized virus species and clearly separate from the LCMV lineage ( Figure 3A and 3B). No evidence of genome segment reassortment is found, given the identical placement of LUJV relative to the other OW arenaviruses based on S and L segment nucleotide sequences. In addition, phylogenetic analysis of each of the individual ORFs reveals similar phylogenetic tree topologies. A phylogenetic tree constructed from deduced L-polymerase amino acid (aa) sequence also shows LUJV near the root of the OW arenaviruses, distinct from characterized species, and separate from the LCMV branch ( Figure 3C). A distant relationship to OW arenaviruses may also be inferred from the analysis of Z protein sequence ( Figure S1). The NP gene sequence of LUJV differs from other arenaviruses from 36% (IPPYV) to 43% (TAMV) at the nucleotide level, and from 41% (MOBV/LASV) to 55% (TAMV) at the aa level (Table  S1). This degree of divergence is considerably higher than both, proposed cut-off values within (,10-12%), or between (.21.5%) OW arenavirus species [31,32], and indicates a unique phylogenitic position for LUJV ( Figure 3D). Historically, phylogenetic assignments of arenaviruses have been based on portions of the NP gene [1,33], because this is the region for which most sequences are known. However, as more genomic sequences have become available, analyses of full-length GPC sequence have revealed evidence of possible relationships between OW and NW

Author Summary
In September and October 2008, five cases of undiagnosed hemorrhagic fever, four of them fatal, were recognized in South Africa after air transfer of a critically ill index case from Zambia. Serum and tissue samples from victims were subjected to unbiased pyrosequencing, yielding within 72 hours of sample receipt, multiple discrete sequence fragments that represented approximately 50% of a prototypic arenavirus genome. Thereafter, full genome sequence was generated by PCR amplification of intervening fragments using specific primers complementary to sequence obtained by pyrosequencing and a universal primer targeting the conserved arenaviral termini. Phylogenetic analyses confirmed the presence of a new member of the family Arenaviridae, provisionally named Lujo virus (LUJV) in recognition of its geographic origin (Lusaka, Zambia, and Johannesburg, South Africa). Our findings enable the development of specific reagents to further investigate the reservoir, geographic distribution, and unusual pathogenicity of LUJV, and confirm the utility of unbiased high throughput pyrosequencing for pathogen discovery and public health.
arenaviruses not revealed by NP sequence alone [34]. Because G1 sequences are difficult to align some have pursued phylogenetic analyses by combining the GPC signal peptide and the G2 sequence for phylogenetic analysis [16]. We included in our analysis the chimeric signal/G2 sequence ( Figure 3E) as well as the receptor binding G1 portion ( Figure 3F); both analyses highlighted the novelty of LUJV, showing an almost similar distance from OW as from NW viruses.
Analogous to other arenaviruses, SKI-1/S1P cleavage Cterminal of RKLM 221 is predicted to separate mature G1 (162 aa, 18.9 kDa, pI = 6.4) from G2 (233 aa, 26.8 kDa, pI = 9.5) [52,53,64]. G2 appears overall well conserved, including the strictly conserved cysteine residues: 6 in the luminal domain, and 3 in the cytoplasmic tail that are included in a conserved zinc finger motif reported in JUNV [65] (Figure 4). G2 contains 6 potential glycosylation sites, including 2 strictly conserved sites, 2 semiconserved sites N 335 (absent in LCMVs and Dandenong virus; DANV [19]) and N 352 (absent in LATV), and 2 unique sites in the predicted cytoplasmic tail (Figure 4). G1 is poorly conserved among arenaviruses [16], and G1 of LUJV is no exception, being highly divergent from the G1 of the other arenaviruses, and shorter than that of other arenaviruses. LUJV G1 contains 6 potential glycosylation sites in positions comparable to other arenaviruses, including a conserved site N 93 HS (Figure 4), which is shifted by one aa in a motif that otherwise aligns well with OW arenaviruses and NW arenavirus clade A and C viruses. There is no discernable homology to other arenavirus G1 sequences that would point to usage of one of the two identified arenavirus receptors; Alpha-dystroglycan (a-DG) [66] that binds OW arenaviruses LASV and LCMV, and NW clade C viruses OLVV and LATV [67], or transferrin receptor 1 (TfR1) that binds pathogenic NW arenaviruses JUNV, MACV, GTOV, and SABV [68] (Figure S2).  In summary, our analysis of the LUJV genome shows a novel virus that is only distantly related to known arenaviruses. Sequence divergence is evident across the whole genome, but is most pronounced in the G1 protein encoded by the S segment, a region implicated in receptor interactions. Reassortment of S and L segments leading to changes in pathogenicity has been described in cultured cells infected with different LCMV strains [69], and between pathogenic LASV and nonpathogenic MOPV [70]. We find no evidence to support reassortment of the LUJV L or S genome segment ( Figure 3A and 3B). Recombination of glycoprotein sequence has been recognized in NW arenaviruses [14,16,33,34,[71][72][73], resulting in the division of the complex into four sublineages: lineages A, B, C, and an A/recombinant lineage that forms a branch of lineage A when NP and L sequence is considered (see Figure 3C and 3D), but forms an independent branch in between lineages B and C when glycoprotein sequence is considered (see Figure 3D). While recombination cannot be excluded in case of LUJV, our review of existing databases reveals no candidate donor for the divergent GPC sequence. To our knowledge is LUJV the first hemorrhagic fever-associated arenavirus from Africa identified in the past 3 decades. It is also the first such virus originating south of the equator (Figure 1). The International Committee on the Taxonomy of Viruses (ICTV) defines species within the Arenavirus genus based on association with a specific host, geographic distribution, potential to cause human disease, antigenic cross reactivity, and protein sequence similarity to other species. By these criteria, given the novelty of its presence in southern Africa, capacity to cause hemorrhagic fever, and its genetic distinction, LUJV appears to be a new species.

Sequencing
Clinical specimens were inactivated in TRIzol (liver tissue, 100 mg) or TRIzol LS (serum, 250 ml) reagent (Invitrogen, Carlsbad, CA, USA) prior to removal from BSL-4 containment. Total RNA extracts were treated with DNase I (DNA-free, Ambion, Austin, TX, USA) and cDNA generated by using the Superscript II system (Invitrogen) and 100-500 ng RNA for reverse transcription primed with random octamers that were linked to an arbitrary, defined 17-mer primer sequence [74]. The resulting cDNA was treated with RNase H and then randomly amplified by the polymerase chain reaction (PCR; [75]); applying a 9:1 mixture of a primer corresponding to the defined 17-mer sequence, and the random octamer-linked 17-mer primer, respectively [74]. Products .70 base pairs (bp) were selected by column purification (MinElute, Qiagen, Hilden, Germany) and ligated to specific linkers for sequencing on the 454 Genome Sequencer FLX (454 Life Sciences, Branford, CT, USA) without fragmentation of the cDNA [19,76,77]). Removal of primer sequences, redundancy filtering, and sequence assembly were performed with software programs accessible through the analysis applications at the GreenePortal website (http://156.145.84.111/Tools).
Conventional PCRs at CU were performed with HotStar polymerase (Qiagen) according to manufacturer's protocols on PTC-200 thermocyclers (Bio-Rad, Hercules, CA, USA): an enzyme activation step of 5 min at 95uC was followed by 45 cycles of denaturation at 95uC for 1 min, annealing at 55uC for 1 min, and extension at 72uC for 1 to 3 min depending on the expected amplicon size. A two-step RT-PCR protocol was also followed at CDC using Invitrogen's Thermoscript RT at 60 degrees for 30 min followed by RNase H treatment for 20 min. cDNA was amplified using Phusion enzyme with GC Buffer (Finnzymes, Espoo, Finland) and 3% DMSO with an activation step at 98uC for 30 sec, followed by the cycling conditions of 98uC for 10 sec, 58uC for 20 sec, and 72uC for 1 min for 35 cycles and a 5 min extension at 72uC. Specific primer sequences are available upon request. Amplification products were run on 1% agarose gels, purified (MinElute, Qiagen), and directly sequenced in both directions with ABI PRISM Big Dye Terminator 1.1 Cycle Sequencing kits on ABI PRISM 3700 DNA Analyzers (Perkin-Elmer Applied Biosystems, Foster City, CA).

Sequence analyses
Programs of the Wisconsin GCG Package (Accelrys, San Diego, CA, USA) were used for sequence assembly and analysis; percent sequence difference was calculated based on Needleman-Wunsch alignments (gap open/extension penalties 15/6.6 for nucleotide and 10/0.1 for aa alignments; EMBOSS [78]), using a Perl script to iterate the process for all versus all comparison. Secondary RNA structure predictions were performed with the web-based version of mfold (http://mfold.bioinfo.rpi.edu); data were exported as .ct files and layout and annotation was done with CLC RNA Workbench (CLC bio, Å rhus, Denmark). Protein topology and targeting predictions were generated by employing SignalP, and NetNGlyc, TMHMM (http://www.cbs.dtu.dk/services), the web-based version of TopPred (http://mobyle.pasteur.fr/cgi-bin/portal.py?form =toppred), and Phobius (http://phobius.sbc.su.se/). Phylogenetic analyses were performed using MEGA software [79]. Figure S1 Phylogenetic tree based on deduced Z amino acid sequence. In contrast to phylogenetic trees obtained with the other ORFs (Figure 2), poor bootstrap support (43% of 1,000 pseudoreplicates) for the branching of LUJV off the LCMV clade was obtained with Z ORF sequence. For GenBank accession numbers see Figure 2.