Exploring Genomic, Geographic and Virulence Interactions among Epidemic and Non-Epidemic St. Louis Encephalitis Virus (Flavivirus) Strains

St. Louis encephalitis virus (SLEV) is a re-emerging arbovirus in South America. In 2005, an encephalitis outbreak caused by SLEV was reported in Argentina. The reason for the outbreak remains unknown, but may have been related to virological factors, changes in vectors populations, avian amplifying hosts, and/or environmental conditions. The main goal of this study was to characterize the complete genome of epidemic and non-epidemic SLEV strains from Argentina. Seventeen amino acid changes were detected; ten were non-conservative and located in proteins E, NS1, NS3 and NS5. Phylogenetic analysis showed two major clades based on geography: the North America and northern Central America (NAnCA) clade and the South America and southern Central America (SAsCA) clade. Interestingly, the presence of SAsCA genotype V SLEV strains in the NAnCA clade was reported in California, Florida and Texas, overlapping with known bird migration flyways. This work represents the first step in understanding the molecular mechanisms underlying virulence and biological variation among SLEV strains.

. SLEV first reemerged in the central region of Argentina (Córdoba and Santa Fe Provinces) in 2002, when two cases of encephalitis and three fever cases were reported [5]. The first human epidemic of SLEV outside of North America was reported in 2005, when 47 laboratory-confirmed clinical cases of SLE, including nine fatalities, were reported in Córdoba Province (Argentina) [1].
The geographic distribution of SLEV encompasses tropical, sub-tropical and much of the temperate-tropical zones of the Western Hemisphere which includes most of the populated land masses of North and South America [6]. In the USA, SLEV is naturally maintained by mosquito-to-bird transmission involving several Culex (Cx.) spp. Mosquitoes and a variety of bird species, including the house sparrow (Passer domesticus), house finch (Haemorhousmexicanus) and mourning doves (Zenaidamacroura) [6].
Most phylogenetic relationships among SLEV strains are based on the complete envelope gene sequence [7]. A total of 8 genotypes are presently recognized (I-VIII) [8]. Genotypes I and II are prevalent in the USA, while the others are traditionally found only in Central and South America [8]. During the 2005 SLEV outbreak in Argentina two genotype III SLEV strains were isolated from Cx. quinquefasciatus mosquitoes [9]. Moreover, during the reemergence of SLEV in Buenos Aires province, molecular evidence suggested an association with same genotype III [10]. Although not completely understood, evidence supports a multicausal event for SLEV reemergence. Introduction of a novel more virulent strain, an increase of Cx. quinquefasciatus and Cx. interfor mosquito vector abundance, and highly susceptible avian host availability are potential explanations [11,12]. A retrospective study shows that no previous activity was documented for this genotype in Cordoba city prior to the 2005 outbreak [11]. Moreover, some biological differences among epidemic (CbaAr-4005, Ep) and non-epidemic (79V-2533, NEp) viral strains were detected. For example, house sparrows inoculated with SLEV CbaAr-4005 Ep developed higher and long lasting viremias [12].
In flaviviruses, several mutations have been associated with phenotype alteration, including those linked to virulence. Unfortunately, the genotypic evidence associated with a virulent SLEV phenotype is lacking. The main objective of this study was to characterize the complete genome of SLEV strains from Argentina, to identify molecular differences among Ep and NEp SLEV strains, and to associate these differences with ecologic, epidemiologic and geographic trends.

Viral strains
Two SLEV genotype III strains were completely sequenced (79V-2533 and CbaAr-4005). The 79V-2533 strain was isolated from Culex (Culex) spp. mosquitoes collected in Esperanza city (Santa Fe Province, Argentina) in 1978, during a non-epidemic period. The epidemic strain (CbaAr-4005) was isolated from Cx. quinquefasciatus mosquitoes collected during the SLEV human encephalitis outbreak of 2005. Viruses were propagated on VERO cell monolayers inoculated with 100μl of CbaAr-4005 or 79V-2533viral strains. Virus was harvested on the 6 th day post-infection (dpi) by centrifuging the supernatant after one freeze/thaw cycle.

Primer design and genome sequencing
The sequencing strategy and primer design to obtain the complete sequence for CbaAr-4005 and 79V-2533 strains were based on a consensus sequence generated from Kern217 (DQ525916.1 and NC_007580.2) and Argentine66 (AY632544.1).
For the 5 0 UTR amplification and sequencing, a commercial kit (Ambion #AM1700 First Choice RLM Race) was used. For the 3 0 UTR, the A-Plus Poly (A) Polymerase Tailing Kit (Epicentre Biotechnologies) was used. The manufacturer instructions were applied with the following exceptions: 2.5μl RNAase inhibitor (40U/μl), and 0.5μl A-Plus Poly A (4U/μl) were added and the reaction incubated at 37°C for 10min. The Titan One Tube RT-PCR Kit (Roche) was used for genomic amplification. For sequencing of the amplified fragments the same protocol as that for the 5 0 UTR was used.
The partial fragments generated for each strain were analyzed and individually selected for assembly to generate a consensus genome sequence using the SeqMan II (v. 5.03) program provided within the LaserGene (DNAStar) package. The complete genome sequences for both strains were submitted to GenBank with the accession numbers FJ753286.2 and FJ753287.2.

Multiple alignments and bioinformatics analyses
Multiple alignments of 29 analyzed sequences were generated by ClustalW [13] with the MEGA 6 program (http://www.megasoftware.net). Most of sequences analyzed possessed coding region sequences only, missing the 5 0 and 3 0 non coding regions as well as the last portion of NS5 gene. Therefore, for the bioinformatics analysis, all sequences were truncated in order to use the same sequence range.
The coding region was identified using Clone Manager (http://www.scied.com/pr_cmpro. htm) software. The polyprotein cleavage sites were determined following the method described by Rice and Strauss [14]. The SignalP-NN program was employed (http://www.cbs.dtu.dk/ services/) for proteolytic sites and protease recognition signals. A comparative study among other flaviviruses was carried out based on previously published findings [15]. In the Ep and NEp comparative analyses, conservative and non conservative amino acid substitutions were identified and classified according to the Dayhoff matrix.
The relative homology (RH) profile was calculated for the nucleotide and amino acid sequences of the 29 ORF sequences analyzed. These profiles reflect the conservation grade between different regions of genome and proteins. The relative homology was calculated by a strategy based in overlapping windows, using the consensus symbols obtained in the ClustalW alignments. The media value was calculated using the addition of the total values divided for the window size and plotted (Ghiringhelli, P.D. and Iserte, J.A., unpublished). For homology calculations between protein sequences, the window size was 11 residues and arbitrary values of +1 for identical residues, +0.5 and +0.25 for conservative changes, and -1 for non identical residues were assigned. The non coding regions were analyzed by Sequence Logos tool (http:// weblogo.berkeley.edu/logo.cgi) and the RNA secondary structure was modeled (MFold, http:// www.bioinfo.rpi.edu/~zukerm). Briefly, the Sequence Logo is a graphic representation of a multiple alignment of sequences. After the alignments were generated, the sequences were manually curated (using Dengue 2 virus (DENV2) strains (NC001474.2: DENV-2/16681 and DQ181806.1: DENV-2/Th2_0038_74) and SLEV CbaAr-4005 and 79V-2533 strains), the 5 0 UTR and 3 0 UTR were compared, and the different signals and structures were located [16].
Recombination in SLEV polyprotein open reading frame sequences was tested for with SBP and GARD software on Datamonkey web server [17].

Phylogenetic analyses
An alignment containing 29 SLEV strains (Table 1) and 3 sequences as the outgroup strains (West Nile virus, Japanese encephalitis virus, and Murray Valley Encephalitis virus) was generated using MEGA6 program (http://www.megasoftware.net/). The substitution model used in the phylogenetic reconstruction was GTR with a proportion of invariant sites and Gamma distributed rates. The substitution model was selected with jModelTest 2.1.5 [18]. Phylogenetic reconstruction was made by Bayesian inference using BEAST 2.0 [19]. Site model was partitioned by codon position. Substitution models in Beast were tested and compared with the model selected with jModelTest. Monte Carlo Markov chain (MCMC) length was 50 million, sampling every 1000. The Tracer program (http://tree.bio.ed.ac.uk/software/tracer/) was used to analyze convergence of MCMC. The tree presented is the maximum clade credibility tree summarized from MCMC data with a burn in of 10%. Phylogenetic reconstruction was also made in MEGA6 software using the Maximum Likelihood method, with 1000 replicates for bootstrap analysis.

Geographic distribution and genetic diversity
The association between geographic and genetic distance of the 29 SLEV strains examined was explored through symmetric classic Procrustes analysis [20]. Genetic and geographic distance matrices were constructed. Geographic coordinates were obtained by Google Earth application and geodesic distances were calculated. Genetic distances were calculated using Kimura 2-parameter as the evolutionary model under MEGA 6 software [21]. Distance matrices were used as input for ordination analysis. Non Metric Multidimensional Scaling (NMDS) of each distance matrix showed the ordination constellation and potential outlier in pairwise fashion. After removal of the outliers, the genotype distance's NMDS was transposed, rotated and scaled on the geographic distance`s NMDS. The selected statistics metrics were Procrustes's correlation (t0), sum of squared errors (m 2 ), scaling and rotation. The significance of t0 was tested against of the distribution pseudo Procrustes`s correlation (t) with10,000 permutations. All analysis were performed in R [22] with the Vegan package [20]. The workflow script to run this analysis on R is available in the supplementary data file provided (please see S1 File).

Association of mutations with virulence and viremia
We used bibliographic data available in order to determine biological features associated with all studied viral strains including: source of isolation (mosquito, bird, mammals), viremia in birds, and virulence in mice [12,[23][24][25]. SLEV strains were classified as high virema (HV) or low viremia (LV), and high pathogenicity (HP) or low pathogenicity (LP) ( Table 1). An amino acid change survey was carried out for each protein in order to find non conservative mutations through the ORF.
Multivariate analyses were employed to explore patterns between genomic mutations and biological characteristics. Analysis of 200 point mutations observed in a pool of 14 SLEV strains with respect to the multidimensional distribution of all descriptors (biological features) was carried out. As a consequence of the binary nature of the data (i.e: a Chi-square distance between mutation) a Correspondence Analysis was selected [26]. Once CA analyses were carried out, correlations between source of isolation (birds, mammals, mosquitoes), viremogenic capacity in birds (High and Low) and virulence in mice (High and Low) were tested. Spearman's correlation indexes and bootstrap significance were calculated using "corrgram" and "psych" packages in R. After multivariate analysis, selected mutations were tested in contingency tables for virulence and viremia using a Fisher's exact test and ODD ratio. These analyses were carried out with Infostat software package (www.infostat.com.ar).

Epidemic vs Non-Epidemic strains
Full length, consensus genome sequences from both SLEV strains examined (CbaAr-4005 and 79V-2533) were obtained using traditional RT-PCR and capillary sequencing. Each full length genome consisted of a total of 10936 nt. The ORF is composed of 3429 amino acids (10287 nt) while the 5 0 and 3 0 untranslated regions (5 0 UTR and 3 0 UTR) consisted of 98 nt and 548 nt, respectively. When comparing the Ep vs NEp SLEV strains, a total of 133 changes were observed at the nucleotide level resulting in17 amino acids substitutions (13% of the total amino acids). Of these 17, 10 non-conservative changes were found (59%) (Fig 1). Only four changes in the structural proteins were identified, with only one being non conservative (E 88 ) ( Table 1). The remaining amino acid substitutions were identified in nonstructural proteins. Of these changes, 9 were considered non conservative: NS1 70 , NS1 112 , NS1 290 , NS2A 82 , NS3 92 , NS4B 139 , NS5 178 , NS5 189 , NS5 507 ( Table 2). The relative homology profile carried out with all 27 SLEV strains analyzed indicated that amino acid mutations occurred along the entire genome (without inclusion of the Palenque strains). Hypervariable regions (HR 0.6) were detected only in NS1 and NS4B (Fig 1), while variable regions were present in the rest of the proteins. Most of the compared sequences showed indices 0.8, with few regions between 0.6 and 0.8, indicating high conservation among SLEV strains. When Palenque strains were included the appearance of some areas with more variability in NS3 and NS5 proteins were observed.

Protease cleavage sites
The identified cleavage sites remained preserved in most of the SLEV strains analyzed. Exceptions include the BeAn-247377 strain where the standard SWPAS site between NS2A and NS2B was found to be GWPAS, and LALGM between NS3 and NS4A changed to AAGKR in strain I MP115. Differences in amino acids at positions 2, 4, and 5 relative to the cleavage site were also detected. Less conservation can be found in the C protein of the Palenque strains where the conserved motif PSKKR/GGTRS changed to PSKKK/GGSGS. The protease involved in this cleavage is of viral origin, suggesting this change may be related with changes in the catalytic site of the enzyme by modifying its recognition site. The cleavage sites located between NS4A/2K remain conserved among the flaviviruses analyzed (ROCV, WNV, JEV, MVEV); while most other sites exhibited some degree of variation at amino acids 1 and 2 (Table 3).

Secondary structure and signals in non-coding regions
Only a few changes were detected between Ep and NEp strains in the non-coding sequences. A total of 26 transitions and 3 transversions (1 for the 5'UTR and 2 for the 3'UTR) were identified. For SLEV, only three strains have been sequenced in these regions: Kern217, Hubbard, and Imperial Valley. When comparing available sequences with those reported here (CbaAr-4005 and 79V-2533), only a few changes were observed; the changes are mainly transitions as only 11.5% of the changes were transversions and only one INDEL was present.
The predicted structure was made using MFold with the sequence data from strain 79V-25533;the results are shown in Fig 2. For both Ep and NEp strains, the same structures were obtained. For the 5 0 UTR and 3 0 UTR ends, both stem loop structures and pseudoknots were identified. For the 5 0 UTR, there are three stem loop structures (SLA, SLB, and capsid hairpin-cHP-), while for the 3 0 UTR, one large stem loop (SL) and one short stem loop (SSL) were  found close to the 3 0 end followed by two pseudoknot structures (PK1 and PK2) and a third stem loop. Fig 2 shows the conserved structures identified in non-coding regions for the 2 SLEV strains sequenced and 2 sequences of DENV2 (NC001474.2: DENV-2/16681 and DQ181806.1: DENV-2/Th2_0038_74). The 5 0 upstream UAG region (5 0 UAR) and 5 0 conserved sequence (5 0 CS) in the 5 0 UTR (Fig 2A), and the 3 0 upstream UAG region (3 0 UAR), the 3 0 conserved sequence (3 0 CS) and repeated conserved sequence 1 and 2 (RCS1 and RCS2) in the 3 0 UTR ( Fig  2B) were highlighted. The Sequence Logos from alignment of the respective portions are shown.

Pathogenicity and Viremia phenotypes
A total of 200 non-conservative mutations were mapped across 14 SLEV strains. Their association with biological features was investigated. According to the CA, the first two axes account for 84.8% of the total variation observed in the data set (Fig 3). High correlations between HV-HP and LV-LP mutations were observed. This pattern was confirmed through the Spearmans' correlation test, where strains that developed high viremia in birds were also found to be highly pathogenic in mice (see Fig 4 for Spearman's correlation index values). Interestingly positive and significant correlations were detected among strains isolated from Mml/LP and Mml/LV. Strains isolated from mosquitoes showed significant correlations with those HV and HP strains (Fig 4). However, no statistically significant differences were observed between

Phylogenetic analyses
Both Maximum Likelihood ( Fig 5A) and Bayesian analyses (Fig 5B) methods showed the same phylogenetic reconstruction. The best fit model for both methods was GTR+I+G (parameters: G = 1, I = 0.38). High confidence values per node were observed with bootstrap values ranging from 70-100. The tree presented (Fig 5B) is the maximum clade credibility tree summarized from MCMC data with a burn in of 10%.
No pattern was found among phenotype (high/low viremia in birds, high/low pathogenicity in mice, or source of isolation) and phylogenetic classification. Additionally, no patterns based on either the host or year of isolation were identified. However, two main groups reflecting geographic origin could be distinguished: North America/northern-Central America (NAnCA) (Genotypes I and II: USA, MEX and GTM) and South America/southern-Central America (SAsCA) (Genotypes III, IV, V, VI and VII: ARG, BRA, PERU, PAN and TRN) (Fig 5C). Occasional exceptions were found such as one strain from USA (Imperial Valley, CA) that clustered in the SAsCA group. Although CbaAr-4005 and 79V-2533 strains were close to the NAnCA cluster, they constituted a separate lineage (Fig 5). In order to corroborate this result, a Procrustes analysis was carried out. It showed a high correlation among geographic and genetic distance. The goodness of fit in the NMDS was 0.012 and 0.039 for geographic and genetic distances, respectively. The configuration obtained between NMDSs based on 10000 permutations was highly significant (t0 = 0.521, m 2 = 0.728 scale factor = 0.521, p-value = 0.001).

Discussion
The occurrence of human encephalitis outbreaks in Central and South America by flaviviruses is historically unusual. The first and largest encephalitis outbreak in human populations reported in South America was caused by Rocío virus in coastal region of southern Sao Paulo, Brazil. During that outbreak, a total of 465 cases with 61 deaths was recorded [27]. Beginning in 2002, human encephalitis caused by SLEV became frequent in Argentina and Brazil, with extra heamorrhagic manifestations reported in patients from Brazil [3]. Human outbreaks in populated areas of Argentina were reported in Córdoba (2005), Paraná (2006), Buenos Aires (2010), and San Juan (2012) [1,5,28]. Molecular evidence suggested some association between genotype III SLEV strains and encephalitis [9,10]. In this study, we explored potential molecular markers with epidemiological and biological features through in silico sequence comparison and bioinformatic analysis of epidemic and non epidemic SLEV strains isolated in the Americas.
The substitutions identified in the flavivirus structural proteins (C, prM and E), nonstructural proteins (NS1, NS2A, NS2B, NS3, NS4A, NS4B and NS5) and the 3 0 and 5 0 UTRs influence the biological (virulence, neuroinvasiveness, and viremia profiles) and adaptive behavior of the viruses [29,30]. In our study comparing the coding region of Ep and NEp SLEV strains, we detected 17 amino acid changes of which 59% were non-conservative (Fig 1). Some of these mutations are responsible for differences in biological behavior observed between Ep and NEp SLEV strains in animal models [12,[23][24][25].
This study explored the association between 200 non-conservative point mutations and biological characteristics. Although multivariate analysis suggested there were associations of pooled mutations with biological features (Fig 3), no statistically significant differences were detected when analyzing mutations individually. This could be explained by the low number of viral strains characterized. Additionally, the difficulty in identifying molecular markers related to particular phenotypic characteristics could be masked or lost by compensatory sequence mutations that modify the viral fitness when the strains are subjected to cell culture for isolation [31,32].
It was also observed that several strains showed changes at cleavage site residues but only when the viral serine protease is involved in the processing at that site (Table 3). In the P1 and P2 positions, a predominance of positively charged amino acid residues were observed, while a glycine residue (G) is present in positions P1 0 and P2 0 (site P2P1/P1 0 P2 0 ), in both DENV and WNV. It was also observed in both viruses, that an Arg or Lys was absolutely necessary for the cleavage at P1 or P2, while the Gly presence was not exclusively required for DENV [33]. Likewise, Arg or Lys residues were present in every strain of SLEV analyzed here. While WNV and DENV had a G residue at P1' and P2', a Lys was commonly found at P2' in the SLEV strains. A reverse genetic system would be necessary to analyze these changes and determine their impact on the phenotype. Of particular interest is the strong correlation detected between viremia in birds and pathogenicity in mice. It is known that SLEV viral strains are not able to cause neurological damage in birds but they do develop high viremias. This finding is contrary to what is observed during SLEV infection in mice, where viral loads are low but neurological signs are observed. Our findings suggest that mutations important for viral replication in birds are also significant in the neuroinvasion process in mammals. Experimental evidences suggests a role for E and nonstructural proteins during this process [34].
Two reports described the conserved UTR secondary structure of SLEV and other flaviviruses [35,36]. Because these structures are highly conserved among flaviviruses, it is possible to infer their functional importance in viral RNA replication [37] and translation [38]. For example, it has been reported that changes in the WNV 3 0 UTR (along with other mutations in the coding region) generate temperature sensitive phenotypes, reduce plaque production, and result in attenuated neurovirulence in rodents [39]. A high degree of sequence conservation was observed when comparing DENV-2 and SLEV strains (Fig 2). For the potential 5 0 UAR (5 0 -AGCAGGGAAUUACCCAAUG-3 0 ) found in the 5 0 UTR, complementary residues (underlined) within the 3 0 UAR (5 0 -CAGGAGAUCCCCUGCUUU-3 0 ) were identified. These regions likely come together to form the panhandle structure for circularization of the genomic RNA such as is observed for other flaviviruses [40]. The RCS1 and RCS2 are totally conserved between these viruses and are present in all mosquito-borne flaviviruses. They have been shown in DENV to be implicated in RNA synthesis [41]. However, the impact of subtle variations in these regions between strains is unknown and therefore, it is very important to undertake studies assessing the structure and function of these regions in SLEV strains with different phenotypes.
The phylogenetic analysis based on complete ORF data and Procrustes analysis indicates a strong association between SLEV strains with their geographic distribution with two main clades based on geography (North America /northern Central America and South America/ southern Central America) (Fig 5). However, some exceptions are observed indicating that viral flow among the SLEV strains across its geographic range does occur [42]. Because SLEV is maintained by mosquito vectors and avian hosts, the actual distribution of SLEV along its northward and southward dispersion could be explained by bird migration. The same dispersal mechanism was postulated for WNV and its distribution across the Americas [43]. Although not completely understood, birds migrate across the continent through 3 main flyways (Pacific Americas, Mississippi Americas and Atlantic Americas flyways) (Fig 5C). Interestingly, the presence of South American genotype V SLEV strains in North America has been reported in California, Florida and Texas [8] overlapping with migration flyways (Fig 5C). Conversely, North America genotype II was reported in Argentina and Brazil [8,11]. Based on previously available genetic data, Auguste et al. [42] suggested that SLEV may have originated in Brazil. However, recent evidence obtained from the isolation of a new SLEV variant (Palenque strain) indicates a Central America origin [44].
This study represents the first theoretical step in understanding the molecular mechanisms underlying the virulence features and biological variation among SLEV strains. We have identified highly variable sites and point mutations among high pathogenicity/high viremia and low pathogenicity/low viremia strains that may provide the foundation necessary for development of molecular biology tools such as reverse genetic systems for SLEV. These tools will allow us to identify molecular markers for SLEV virulence and to dissect the mechanisms driving these interactions.
Supporting Information S1 File. Procrustes analysis script to run on R. (PDF)