Chikungunya Virus 3′ Untranslated Region: Adaptation to Mosquitoes and a Population Bottleneck as Major Evolutionary Forces

The 3′ untranslated genome region (UTR) of arthropod-borne viruses is characterized by enriched direct repeats (DRs) and stem-loop structures. Despite many years of theoretical and experimental study, on-going positive selection on the 3′UTR had never been observed in ‘real-time,’ and the role of the arbovirus 3′UTR remains poorly understood. We observed a lineage-specific 3′UTR sequence pattern in all available Asian lineage of the mosquito-borne alphavirus, chikungunya virus (CHIKV) (1958–2009), including complicated mutation and duplication patterns of the long DRs. Given that a longer genome is usually associated with less efficient replication, we hypothesized that the fixation of these genetic changes in the Asian lineage 3′UTR was due to their beneficial effects on adaptation to vectors or hosts. Using reverse genetic methods, we examined the functional importance of each direct repeat. Our results suggest that adaptation to mosquitoes, rather than to mammalian hosts, is a major evolutionary force on the CHIKV 3′UTR. Surprisingly, the Asian 3′UTR appeared to be inferior to its predicted ancestral sequence for replication in both mammals and mosquitoes, suggesting that its fixation in Asia was not a result of directional selection. Rather, it may have resulted from a population bottleneck during its introduction from Africa to Asia. We propose that this introduction of a 3′UTR with deletions led to genetic drift and compensatory mutations associated with the loss of structural/functional constraints, followed by two independent beneficial duplications and fixation due to positive selection. Our results provide further evidence that the limited epidemic potential of the Asian CHIKV strains resulted from founder effects that reduced its fitness for efficient transmission by mosquitoes there.


Introduction
Genetic change, which can lead to adaptation to new hosts or vectors, is a major cause of the emergence or re-emergence of arthropod-borne viral (arboviral) and other RNA viral diseases [1,2]. However, compared to the numerous investigations of point mutations within viral genomic open reading frames, the evolution and determinants of fitness of untranslated genome regions (UTRs) have received far less attention. The 39 UTRs of arboviral genomes exhibit large size variations, ranging from ,100 nt to more than 700 nt, and involving extensive substitutions, insertions and deletions even within viral species. This length variation suggests that the heterogeneous regions may not be essential for replication, a view supported by experimental studies with genetically engineered viruses lacking a large part of the 39UTR that remain viable, albeit with different levels of attenuation [3][4][5][6][7][8]. However, these seemingly redundant sequences must play some role favored by natural selection, because otherwise longer genomes should theoretically be less efficiently replicated. Improved understanding of the forces driving the evolution of the arboviral 39UTR is needed to provide important insights on its role on pathogenesis and host/vector adaptation.
An interesting observation is that the variable region in the 39UTR often contains direct repeats (DRs) in the arboviral genera Alphavirus [9,10] and Flavivirus [11]. These DRs can be relatively conserved in closely related viruses, indicating that repeat duplication may serve as a major evolutionary mechanism for the 39UTR, and that these DRs may have functional significance. Indeed, sequence comparisons of flaviviruses suggest that duplication of long repeat elements (LREs) and extensive deletions are the main evolutionary mechanisms of the 39UTR [12,13]. Despite their high level of sequence diversity, secondary structure predictions suggest that the flavivirus 39UTR comprises enriched stem-loop structures, with some conserved structural motifs observed in all species, suggesting functional selection [14]. Furthermore, sequence comparisons of different eco-groups suggest that mosquito-borne flaviviruses, which usually use multiple invertebrate and/or vertebrate host species, have a more diverse 39UTR than tick-borne or non-vector-borne flaviviruses, which have more limited host ranges and/or transmission dynamics [11]. This raises the hypothesis that the DRs may interact with host/vector factors to maintain efficient replication in multiple hosts, and to facilitate the adaptation to new hosts.
An interaction between viral 39UTR and host proteins has been indicated by several kinds of data. First, the 39UTRs in both alphaviruses [15][16][17][18] and flaviviruses [19] interact with cellular proteins (in both mosquito and mammalian cells) to directly or indirectly facilitate genome replication. It has been observed that several alphaviruses usurp the cellular HuR protein, which enhances mRNA stability and therefore inhibits viral RNA decay [16][17][18]. Additionally, arboviral 39UTRs can encode microRNAs (miRNAs, such as observed in West Nile virus) which regulate cellular gene expression to enhance viral replication [20]. Finally, flaviviruses generate one or more small subgenomic flavivirus RNAs (sfRNAs), which are essential for pathogenicity in vertebrate cells and in mice [21]. These sfRNAs are collinear with the 39UTR and produced by incomplete viral RNA degradation by the host 59-39 exonuclease XRN1, mediated by pseudoknot (PK) structures upstream of the 39UTR [22,23]. Their observed functions include: 1) regulation of antigenome synthesis [24], 2) inhibition of the cellular exoribonuclease XRN1 and alteration of host mRNA stability [25], 3) evasion of the Type I Interferon response [26], and 4) RNA inference suppression in both mammalian and insect cells by inhibiting Dicer-mediated in vitro cleavage of double-stranded RNA [27].
Despite these advances in understanding the functional roles of arboviral 39UTRs, there is no solid evidence to relate the occurrence of indels, which appear to occur frequently during their evolution, with any particular adaptation to a given host or vector. The extensive within-species diversity in the 39UTR of the alphavirus chikungunya virus (CHIKV), especially lineagespecific DR patterns (revealed in this study; Fig. 1), is unique within this genus of mainly mosquito-borne viruses. Together with prior reconstructions of CHIKV evolutionary history [28], as well as the relative sequence conservation and comparability among lineages, this diversity in 39UTR sequences provides a unique opportunity for understanding their evolution and functional importance.
Chikungunya virus (Togaviridae: Alphavirus) is transmitted among nonhuman primates and humans via Aedes spp. mosquitoes. It causes chikungunya fever, a febrile illness associated with debilitating arthralgia and rash [29]. Chikungunya virus has a singlestranded, positive sense RNA genome of ,12 kb, including a notably long 39UTR ranging from ,500 to 700 nt. Enzootic in tropical and subtropical regions of Africa, CHIKV has emerged several times into a human-mosquito urban cycle to cause major epidemics both within and outside of Africa. Phylogenetic analyses suggest that the currently circulating CHIKV strains form three major geographic lineages, namely the entirely enzootic West African lineage, the East, Central and South African (ECSA) enzootic lineage, which includes the recently emerged epidemic strains responsible for Indian Ocean basin and Asian outbreaks, and the Asian lineage [28], which has been circulating in an Aedes aegypti-human cycle for over 50 years. Interestingly, CHIKV genome comparisons suggest lineage-specific 39UTR structures, with the Asian lineage exhibiting a unique pattern of mutation, duplication, and insertion ( Fig. 1). Although the most recent common ancestor of the Asian lineage is estimated to have occurred in the early 1950s, just before the 1956 Thailand outbreak, it is not clear when this lineage was introduced from eastern Africa into Asia or whether the distinct mutations and structural rearrangement in the Asian 39UTR occurred before or after this introduction and establishment of the urban cycle. It is also unknown whether this novel Asian 39UTR structure is the result of adaptation to the urban transmission cycle there [Although CHIKV antibodies have been detected in nonhuman primates in Asia [30], spillback from human transmission cycles is difficult to rule out] such as to the urban vector A. aegypti implicated in all Asian outbreaks prior to 2007.
To address these questions related to evolution of the 39UTR and its potential influence on the epidemic potential of the Asian CHIKV strains, we dissected the inferred structural changes in the 39UTR of the Asian CHIKV lineage and explored their effects on the replication in vertebrate hosts and vectors. Our findings provide important insights into the functional role of the mosquitoborne arbovirus 39UTR.

Lineage-specific Direct Repeat structure in the CHIKV 39UTR
To explore the repeat structure in the CHIKV genome, DNA matrix comparisons were conducted based on representative strains from each of the three major lineages (West Africa, ECSA and Asian). The results suggested that the 39UTR, but not other genome regions, contains multiple DRs, with the Asian lineage having a distinct pattern (Fig. S1). Imposing these repeat patterns onto the rough sequence alignment generated from the guide tree based on the complete open reading frame sequences led to a refined and reliable alignment with striking lineage-specific structures and minor indels within each lineage. The complete sequence alignment is available upon request, with a simplified version shown in Figure S2. As illustrated in Fig. 1, the CHIKV 39UTR contains two DR elements consistent in the West African and ECSA lineages, namely DR1 (39 nt, two copies) and DR2 (62 nt, 3 copies). However, the Asian 39UTR is distinct, including 1) a long insertion (193 bp) near the 39 end, which is the result of the direct duplication of its 59-adjacent region (Fig. 1, shaded blue), and 2) accumulated mutations (point mutations and insertions) around DR2a, including the DR1a region, and the duplication of this entire region [hereafter designated as DR (1+2)] to replace the DR1b/DR2b region, or vice versa. Previous studies

Author Summary
The 39 untranslated genome region (UTR) of arbovirus (arthropod-borne virus) RNA genomes is characterized by enriched direct repeats (DRs) and stem-loop structures, which are thought to play a specific role in maintaining efficient transmission in multiple hosts. However, this hypothesis is vague and has little experimental support. Based on our observation of a distinct, lineage-specific sequence structure in the chikungunya virus (CHIKV) Asian lineage 39UTR, we tried to understand the underlying driving forces on the arboviral 39UTR evolution. Specifically, we sought to determine whether the dramatic genetic changes in the Asian lineage 39UTR were the result of adaptation into the new host or vector populations. Using reverse genetic methods, we obtained results that suggested that the DRs have a significant effect on the viral fitness level in mosquitoes but not in mammals. Interestingly, instead of a directional selection from its ancestral state, our results suggest that the evolution of the CHIKV Asian lineage 39UTR involved a population bottleneck with a deleterious deletion of DRs, followed by accumulation of mutations due to the loss of structural/ functional constraints. Later, the duplication of two regions that are beneficial in mosquitoes led to the fixation of this 39UTR sequence in Asian lineage, possibly facilitating the 1958 Thailand outbreak.
annotated the alphavirus 39UTR into three repeat sequence elements (RSEs) and a 19 nt conserved sequence element (CSE) at the 39 end [9]. The DR2 found in our study corresponds to RSEs defined previously [18]. However, DR1 and DR3 have not been described previously, and we found the 19 nt CSE is not strictly conserved, with occasional mutations that change its length observed among sequences. Interestingly, DR3 is immediately adjacent to the 19 nt CSE which is believed essential in viral replication [31].
To determine if these DRs form any structural/functional units, as suggested in flaviviruses [11,14], RNA secondary structures were predicted via Mfold [32], STAR [33], and Vienna [34,35], at both 37uC (typical primate host temperature) and 28uC (typical mosquito vector temperature in the tropics). These programs generated slightly different secondary structures (data not shown). In most predictions the folds of different copies of DR1 and DR 2 differed, and the overall structures generally differed at 28u and 37uC (except those estimated by Vienna). Figure S3 shows a sample of top ranked structures produced by Mfold. In summary, there are many short stem-loop structures distributed throughout 39UTR, and they form the basis for a higher-level secondary structure. Despite the many stem-loop structures in DR1 and DR2, the repeating elements themselves may not necessarily correspond to a specific uniform structure. This interpretation is consistent with previous analyses of flaviviruses [11]. According to Mfold, this region may also fold into different structures at 37uC and 28uC, with the former more compact and the latter looser. In contrast, the DR3 region is relatively conserved and contains a distinct Y-shaped structure of 80 nt (Fig. S3, dark blue) conserved in all three CHIKV lineages at both 37uC and 28uC, suggesting its functional importance. Notably, duplication of DR3 in the Asian lineage added another copy of this Y-shaped structure.

39UTR differentially affects replication kinetics in C6/36 versus Vero cells
To evaluate the effect of the 39UTR on CHIKV replication, we engineered a series of mutant viruses based on two wild-type (wt) CHIKV strains, the Mal06 strain (MY002IMR/06/BP; GenBank Acc. No. EU703759.1), a representative of the Asian lineage, and the SL07 strain (SL-CK1; GenBank Acc. No. HM045801.1), representing the ECSA lineage. Fig. 2A illustrates the genetic organization of the engineered viruses, including 1) the wt Mal06 and SL07, 2) modified version of each including synonymous mutations as genetic markers, 3) chimeras of each strain with swapped 39UTRs, and 4) Mal06 variants with either 1 or 2 copies of DR(1+2) or DR3 deleted. Their fitness levels were first compared through replication kinetics in both Vero (African green monkey) and C6/36 (A. albopictus) cells, and then further evaluated through competition tests in vivo.
Comparing the replication kinetics of these CHIKV strains provided interesting insights into the role of the 39UTR in mammalian and insect cells (Fig. 2B). First, viruses from both the ECSA and Asian lineages replicated rapidly, reaching peak titers 24-48 h post-infection. Variants derived from the ECSA SL07 strain showed significantly (,3-4 log difference) higher replication in both C6/36 and Vero cells. These results were consistent with the observation that the IOL lineage strains have been rapidly replacing the local Asian lineage strains in Southeast Asia since 2006 [36]. Our finding that chimeric viruses with 39UTRs derived from a different lineage exhibit no significant fitness change in Vero cells and only a slight fitness change in C6/36 also suggested that the ORF, rather than the 39UTR, is the main determinant of CHIKV replication efficiency. Strikingly, the Asian Mal06 strain variants with swapped 39UTRs exhibited significantly different replication kinetics in C6/36 cells, with RNA copies per ml [Del-DR(1+2)ab] to 9.2610 6 (wt) at 24 h post-infection, respectively. In contrast, no significant difference was observed for any ML06 variants at either 24 (2.4-4.7610 7 copies/mL) or 48 h postinfection (4.8-8.9610 7 copies/mL) in Vero cells, indicating that the CHIKV 39UTR may play a more important role in interacting with cell factors in mosquito than mammalian hosts. These results also suggest that genetic variation in the 39UTR does not have a major effect on its interaction with viral proteins and/or RNA. Consistent with reports from other arboviruses [3][4][5][6][7][8], CHIKV with only a partial 39UTR was still viable in cell cultures, but with altered replication kinetics, especially in mosquito cells. In C6/36 cells, deleting either one or both copies of DR(1+2) led to a significant reduction in the replication rate, with the deletion of both copies having the most severe effect. These data indicate that the role of DR(1+2) does not rely on the presence of two copies, such as in the formation of a dimer proposed previously [12].
Similarly, deleting both copies of DR3 in the Asian Mal06 strain also led to a severe reduction in its replication rate in mosquito  (Fig. 2). The fitness loss from deletion of both DR3 copies was intermediate between that resulting from deleting one copy of DR(1+2) and deleting both DR(1+2) copies. Interestingly, deleting only one copy of DR3 from the Mal06 strain did not significantly change its replication kinetics in C6/36 cells.

Competition tests to assess fitness effect of DRs and to compare the Asian and ECSA 39UTRs
Due to the intrinsic experimental error caused by 1) small variations in the viral titer of initial inocula, 2) variation in cell density among triplicate test samples, and 3) RNA quantification, competition tests were conducted to more sensitively compare fitness levels between virus pairs. To determine the underlying reason for the fixation of Asian lineage CHIKV 39UTR, we evaluated its fitness for infection and dissemination in the main Asian mosquito vector (1950s-2007 during evolution of the Asian lineage), A. aegypti, and viremia in the surrogate vertebrate host, CD1 mice infected at 11-12 days of age [37] using competition experiments. The Mal06 variants containing a deletion of one copy of either DR(1+2) or DR3 were competed against the genetically marked wt strain, and the results are shown in Fig. 3. Control competitions indicated that the synonymous genetic marker did not significantly influence viral fitness in mosquitoes [1] or CD1 mice (Fig. 3B).
Strikingly, deleting DR3a in the Asian Mal06 strain produced contrasting effects in mosquito versus mammalian hosts. In mosquitoes, only wt virus was detected from the majority (23/ 30) of the mosquito heads 10 dpi after a mixed bloodmeal, whereas virus with only one copy of DR3 was found only in 5/ 30 samples, with two of them showing a mixture of both competitors (Fig. 3C). In contrast, Mal06 with a deletion of DR3b outcompeted the wt virus in CD1 mice, as indicated by virus ratios in all the blood samples taken 1-2 days post-infection (dpi; Fig. 3C). This result was consistent with our hypothesis that the direct effects of the 39UTR can outweigh the potential detrimental effects of genome length to determine CHIKV fitness.
Similarly, deleting one copy of the DR(1+2) led to a significant CHIKV fitness loss in mosquitoes, and a slight advantage in mice. Specifically, following mosquito infection and dissemination, and assay at 10 dpi, 14 of 15 infected mosquito heads contained only the wt Mal06 virus, whereas only one was infected by the mutant [Mal06/DDR(1+2)a] (Fig. 3D). Despite the nearly equal RNA ratio between the two viruses during the first two days after infection of mice, the mutant virus with only one copy of DR(1+2), and thus shorter genome, showed a significantly higher prevalence at 3 dpi, indicating its selective advantage at later stages of infection (Fig. 3D). The less dramatic effect of deleting DR(1+2)a compared to DR3a may be due to its shorter length (155 vs. 193 nt). In addition, the initial inoculum ratio of Mal06/ DDR(1+2)a was slightly lower in the competition test in CD1 mice, which may also have influenced the outcome. In conclusion, retaining two copies of DR(1+2) provided a selective advantage to CHIKV in mosquitoes but not in vertebrates, compared to only one copy.
Despite the selective advantage for the infection and dissemination in mosquitoes of having two DR3 copies, it is not clear whether the fixation of the current CHIKV Asian lineage 39UTR was due to an improved fitness level compared to its ECSA ancestor, which parsimony analysis predicted shared the 39UTR seen in extant ECSA strains. To address this question, chimeric viruses were generated with backbones from the Mal06 strain (Asian lineage) and SL07 strain (ECSA lineage) and swapped 39UTRs, and their relative fitness levels were compared using competition tests. Surprisingly, the chimeric virus Mal06/SL07 39UTR outcompeted the Mal06 wt strain in both mosquitoes and mice (Fig. 3E). In contrast, the chimeric virus with the SL07 backbone and the Mal06 39UTR exhibited lower fitness than its wt ECSA counterpart (Fig. 3F).

Evolution path of the CHIKV Asian lineage 39UTR
Investigations of the function and evolution of 39UTRs in arboviruses, including repeat identifications [9,10,13] and secondary structure predictions [12,14,38], as well as experimental studies [3][4][5][6][7][8]39,40], have taken place for many years. However, ongoing positive selection on the 39UTR has never been observed in 'real-time,' and the role of the 39UTR remains poorly understood. A distinct 39UTR sequence pattern was observed in all Asian lineage CHIKVs sampled from 1958 to 2009. As shown in Fig. 1, this Asian lineage 39UTR contains an insertion of 193 nt at the 39 end, the result of a direct duplication of the 59end of the adjacent UTR region. In addition, both copies of direct repeats of DR(1+2) contain the same accumulated mutations found in the corresponding ECSA and West African UTRs. No intermediate 39UTR form has been observed in CHIKV sequences from other lineages, raising the interesting questions: When and what caused the unique pattern of the Asian 39UTR, and why did it become fixed in Asia? Was the current Asian lineage 39UTR formed before or after its introduction into Asia? If it was formed before the introduction, why has it apparently disappeared from Africa? Or if it was formed after the introduction into Asia, what fitness advantage did it provide over the ancestral UTR?
Our results offer some insights into these questions. First, the 39UTR from the ECSA lineage has significantly higher fitness than that of the Asian lineage in both A. aegypti and mice when placed into either the ECSA or Asian genetic backbone. This suggests that the fixation of the Asian 39UTR was not due to an increased fitness level compared to its ancestor, and was not likely a result of directional (positive) selection. Next, the duplication of the Asian DR3 imparts increased fitness in mosquitoes, indicating its selective advantage in transmission. Similarly, the deletion of one copy of the DR(1+2) region leads to reduced fitness in mosquitoes but slightly higher fitness in mice. Finally, in contrast to the compact stem-loop structures observed in West African and ECSA 39UTRs, part of the Asian DR(1+2) region contains a fragment of linear sequence that is not predicted to form a stable stem-loop structure, indicating that the mutations in the DR(1+2) region may have been tolerated due to its lack of or loss of a structural/functional constraint.
Based on this, we propose an evolutionary path of the Asian CHIKV 39UTR illustrated in Fig. 4. First, a deletion occurred that resulted in the loss of one copy each of DR1 and DR2. Compared to its inferred ECSA ancestor, this mutant strain was presumably debilitated in its fitness for infection and dissemination in its principal vector, A. aegypti although it may have had a slight fitness increase for replication in humans based on the murine model. In the large enzootic CHIKV populations that exist in Africa, this mutant could disappear quickly due to its low frequency. The rapid fixation of such a mutant in Asia can only be explained by a population bottleneck where stochastic events can facilitate the fixation of a beneficial allele, and even allow a mutant with reduced fitness to circumvent selection. It is possible that this mutation accompanied the intercontinental transmission from Africa to Asia, which probably involved one or a few infected persons; it is also possible that a CHIKV population bottleneck was influenced by a mosquito eradication campaign in Southeast Asia (1955)(1956)(1957)(1958)(1959)(1960)(1961)(1962)(1963)(1964)(1965)(1966)(1967)(1968)(1969). Although this effort was designed to eliminate malaria [41,42], it included the use of DDT inside homes, which also reduced populations of A. aegypti responsible for urban CHIKV transmission. The use of DDT was also instrumental in the eradication of A. aegypti in many parts of the Americas during the 1950s to1960s [43,44]. The coincidence of our estimated year of the most recent common ancestor of currently circulating Asian CHIKV lineage (1948)(1949)(1950)(1951)(1952)(1953)(1954)(1955)(1956) with this malaria eradication campaign suggests a possible link.
Second, due to the breakdown of previous structural/functional constraints on the now deleted 39UTR region, many neutral mutations accumulated in the DR (1+2) region of the Asian lineage. Because the formation of a stem-loop structure in a viral RNA genome can facilitate polymerase slippage [45], a duplication eventually occurred in both the DR(1+2) and DR3 regions. These duplications improved the fitness of the Asian CHIKV strain in A. aegypti to an extent that outweighed possible fitness loss in humans, and the duplicated mutant therefore rapidly replaced the previous 39UTR to become fixed in Asia. This adaptation to the mosquito vector may have facilitated the initiation of Asian epidemics in 1958.

Functional constraints and forces driving 39UTR evolution
Studies in flaviviruses have provided important insights on the structure, evolution and functional importance of arboviral 39UTRs. Basically, sequence duplications and deletions, in contrast to point mutations that are predominant in ORFs, are the major evolution mechanisms of flavivirus 39UTRs [12]. Interestingly, the ORF region adjacent to the flavivirus 39UTR [13], as well as 59UTR panhandle structure [46], may also have originated from duplications of 39 long repeat sequences (LRS). Furthermore, conserved secondary structures have been observed in the 39UTRs of all eco-groups of flaviviruses [14]. However, despite the conservation of these DRs, there is no obvious relationship between them and secondary RNA structures; thus it is not clear why they are preserved, why they retain their double or triple copy numbers, and what exactly are their biological roles.
Similar to most previous studies with other viruses [3][4][5][6][7][8] except those of the flavivirus genome cyclization motif [47,48], CHIKVs with deletions of different 39UTR DRs remain infectious, although they exhibit a spectrum of replication reduction in C6/36 cells. Competing viruses (one of them contains a synonymous genetic marker) were mixed in a 1:1 initial ratio based on genome copies, and used for mosquito and mice infection. The viral RNA ratio was reflected by RT-PCR amplification of a region containing the marker in the middle, followed by thorough digestion on the digestion sites created by the genetic marker. In agarose gel analyses, the lower band reflects the level of virus with the genetic marker, whereas the upper band reflects the RNA level of virus without the genetic marker. B. Competition results between the two wt viruses (Mal06 and SL07) and their correspondent mutants in CD1 baby mice. C-F. Competition between 4 pairs of viruses (C: Mal06/DDR3a vs. Mal06/WT+marker; D: Mal06/DDR(1+2)a vs. Mal06/WT+marker; E: Mal06/SL07-39UTR vs. Mal06/WT+marker; F: SL07/Mal06-39UTR vs. SL07/WT+marker) on the dissemination rate in A. aegypti (Thailand) and viral RNA level in CD1 baby mice. The mosquitoes were infected through blood meal with viral titer in ,1610 6 pfu/ml. On day 10 post infection, the heads of mosquitoes were dissected to study the viral dissemination. The numbers of samples infected by each virus are shown by pie graph, with statistical significance assessed using a Chi-square test. Viruses are labeled in the same colors as in Fig. 2B. CD1 baby mice were infected with initial dose of 1610 4 pfu, 3 or 4 of them were sacrificed each day and blood viral ratio was used to measure the fitness level of competing viruses (shown in the gel). doi:10.1371/journal.ppat.1003591.g003 The significant fitness differences in C6/36 cells but similar replication kinetics of these deletion mutants in Vero cells suggests that the 39UTR plays a more important role in interacting with mosquito cell factors, and adaptation to vectors may be a major driving force for the evolution of the CHIKV 39UTR. The importance of the 39UTR in mosquito transmission is also supported by our findings of a strong impact on mosquito infection and dissemination caused by viruses with swapped 39UTRs. In addition, although competition tests between different virus groups in CD1 mice suggested only minor fitness differences caused by the 39UTR, CHIKVs with shorter genome lengths consistently outcompeted those with longer genomes in the vertebrate model (Fig. 3). Therefore, it is not clear if the higher fitness in the vertebrate host is due to an enhanced functional role of the 39UTR or simply faster replication rates of shorter genomes.
Taken together, our results suggest that adaptation to mosquitoes is a major factor driving evolution of the CHIKV 39UTR. This conclusion is in agreement with those form studies of flaviviruses. Deletion of the entire variable 39UTR region of tickborne encephalitis virus, but not the core element at its end, has no effect on BHK cell replication or murine virulence [6]. Interestingly, the longer 39UTR favored for replication in arthropods has been taken to an extreme by Kamiti River (KRV) viruses, a flavivirus found only in mosquitoes and which cannot infect vertebrates. KRV, which contains a 39UTR of 1208 nt that apparently resulted from self-duplication [49], suggests a major role for the 39UTR for replication in insect cells. Likewise, the alphavirus Eilat, which also is restricted to insect cell infection, has a large 39UTR of 520 nt [50]. In contrast, alphaviruses not known to be transmitted by vectors have very short 39UTRs, including salmon pancreas disease virus (89 nt), sleeping disease virus (87 nt) [51], and salmonid alphavirus-3 with 87 nt [52].
However, the role of the 39UTR in vertebrate cells should not be neglected entirely. For example, it is known that the 39UTR affects alphavirus RNA stability in both mammalian and mosquito cells [17,18]. Short deletions of different parts of the SINV 39UTR lead to host-dependent fitness changes in mammalian, chicken and mosquito cells, suggesting that they are involved in interactions with different host-specific cellular factors [4]. In many cases a SINV 39UTR deletion mutant is more severely impaired in mosquito than in chicken cells, but the inverse phenotype has also been observed [3]. A similar pattern is seen in dengue-4 virus, where deletion of a long upstream region (,120 nt) of the 39UTR leads to increased replication in simian LLC-MK 2 cells but similar antibody responses in Rhesus monkeys, while other 39UTR deletions reduce infectivity in both systems [5]. The balance between functional gain and reduced replication efficiency due to genome size may be key in determining the evolution of the 39UTR.

Mechanism of viral-host interaction and the differential effects on mammalian versus insect cells
What remains obscure is the exact nature of the molecular interaction mechanisms between arboviral 39UTRs and cellular proteins, which have been proposed to be mediated by the stemloop RNA [12]. Flavivirus studies [53] suggested that the level of perturbation of these secondary RNA structures rather than the size of deletions might affect viral replication. Our RNA secondary structural estimations suggest that duplication of CHIKV DR3 provides additional secondary structure, including the 80 nt conserved Y-shaped structure, without significantly changing other 39UTR structures. Also, the enhanced replication in mosquitoes of CHIKV with this insertion suggests that this Yshaped structure interacts with mosquito factors. However, the repeated elements DR1 and DR2 do not correspond strictly to structural units (Fig. S3), although the two copies of DR1 in the West Africa lineage retain the same predicted structure. Rather, duplication is predicted to result in the formation of new, local stem-loop structures, and more complicated secondary structures on a larger scale at 37uC.
The interaction of the cellular HuR protein with different alphaviruses (SINV, CHIKV, and Ross River virus) via different 39UTR binding sites, probably all through AU rich sequences [18], suggests that the DR1 and DR2 region may also interact with cellular factors via primary sequence. Moreover, the stem-looprich structures in arboviral 39UTRs may encode viral miRNAs. The recent discovery of an miRNA generated from the West Nile virus 39UTR in infected mosquito cells, as well as the discovery of its host cellular target [20], provides evidence that viral miRNA can be important determinants of virus-host interactions. To explore the possibility of CHIKV-produced miRNA, we estimated the potential pri-miRNA sites in its genome using Vmir [54] and found some, including several in the 39UTR (data not shown). Further experimental studies should be carried out to confirm these predictions.
The significant effect of genetic change in CHIKV 39UTR on the fitness level in mosquito and mosquito cells, but not in vertebrate cells, could reflect an interaction between CHIKV and insect-specific genes or proteins, such as those in the antiviral RNA interference (RNAi) pathway, the major insect innate immune mechanism [55]. This hypothesis is supported by the flavivirus sfRNA's role in RNA inference suppression in both mammalian and insect cells by inhibiting Dicer-mediated in vitro cleavage of double-stranded RNA [27]. Another possibility is that the 39UTR may fold into different structures in vertebrate vs. mosquito cells maintained at different temperatures, as suggested by our Mfold results. This structural difference could affect protein binding. In addition, the presence of a miRNA (WNV) generated in mosquito but not mammalian cells suggests that the miRNA processing may differ between these cell types [20]. Finally, mammalian cells may have more redundant gene expression regulation systems where the effect of down-regulation in one signal transduction pathway can be compensated via other intertwined pathways, making them more robust in their viral regulation of gene expression than insect cells.
In conclusion, we observed for the first time lineage-specific evolution of the 39UTR in an arbovirus, and our results suggest that the CHIKV 39UTR plays an important role in adaptation to the mosquito vector. The founder effect that our results suggest was apparently responsible for the establishment of an inferior CHIKV 39UTR in Asia before or during the 1950s. This reinforces our previous findings, which demonstrated epistatic mutations in this CHIKV lineage that probably resulted from the same founder effect, and which limited the fitness and adaptation of Asian strains [56].

Ethics statement
This study was carried out in strict accordance with the recommendations in the Guide for the Care and Use of Laboratory Animals of the National Institutes of Health. The protocol # 02-09-068 was approved by the Institutional Animal Care and Use Committee of the University of Texas Medical Branch.

Sequence alignment and repeat pattern identification
All available complete genome sequences of CHIKV were downloaded from the GenBank library. A maximum Likelihood tree was constructed based on the complete coding sequences (CDS) using PAUP* v4.0b [57], utilizing the best-fit model estimated by MODELTEST [58]. This ML tree was then used as a guide to generate a sequence alignment of 39UTR of CHIKV utilizing MAFFT [59]. Strains with incomplete 39UTRs and those with unique indel patterns were excluded, leading to a dataset of 108 sequences. Sequence repeats were identified using the DNA matrix analysis in MacVectorH based on representative strains of each lineage, followed by manual adjustments based on sequence alignments.

RNA secondary structure estimation
To determine whether DRs form structural units, the RNA secondary structure of the CHIKV 39UTR from each lineage was estimated using several programs with different advantages, including Mfold [32], STAR (STructure Analysis of Rna; 33) and Vienna RNA Secondary Structure Package [34,35], based on either representative strains (Mfold and STAR) or a sequence alignment of the entire lineage (RNAalifold in Vienna). Mfold provides a selection of top rated structures, which are all plausible given the dynamic movement of molecules. The Vienna RNAalifold program is based on sequence alignment rather than a single sequence, thus reflecting the natural selection on the secondary structure. The ''Genetic Algorithm'' implemented in STAR uses the procedure of stepwise selection of the most-fit structures, which simulates the process of RNA elongation during synthesis, allowing the observation of important intermediate structures. The resulted structures were illustrated using PseudoViewer webserver [60].

Construction of infectious cDNA clones
Plasmids representing two wt CHIKV strains (Mal06 and SL07) from the Asian and ECSA lineages, respectively, have been described previously [56]. The Mal06 strain (MY002IMR/06/ BP; GenBank Acc. No. EU703759.1), a representative of the Asian lineage, was isolated from a human in Bagan Panchor, Malaysia in 2006. Its cDNA was synthesized directly from GenBank sequence. The SL07 strain (SL-CK1; GenBank Acc. No. HM045801.1), isolated from a human during the Sri Lanka outbreak in 2007, represents the ECSA lineage, and was passed twice in C6/36 mosquito cells before cDNA synthesis from viral RNA. These two plasmids were used for rescuing wt viruses and as templates to generate other infectious clones with altered 39UTRs. To study the genetic change of the 39UTR in the Asian lineage, a series of deletions in the direct repeats DR(1+2) and DR3 was constructed based on conventional PCR methods [61]. Similarly, chimeric viruses with the backbones of the Mal06 and SL07 strains and swapped 39UTRs were also constructed to compare the fitness of the lineage-specific 39UTRs. Finally, to compare the relative fitness levels of two viruses using a competition test (described below), a synonymous point mutation was introduced into both Mal06 and SL07 plasmids in the nsP4 gene to form a restriction digestion site cleavable by endonucleases Apa I and PspOMI [56]. All PCR-generated genome regions used in cloning were completely sequenced to verify their genetic integrity. Detailed information for all plasmids is available from the authors upon request.

Rescue of viruses from the infectious clone
To generate infectious RNA, plasmids were first linearized with Not I restriction endonuclease, followed by in vitro transcription from the minimal SP6 promoter as described previously [61]. About 10 micrograms of RNA were electroporated into 10 7 BHK-21 cells (viruses with Mal06 backbone) or C7/10 cells (viruses with SL07 backbone) using the BTX-Harvard Apparatus ECM 830 Square Wave Electroporator (Harvard Apparatus) and 2-mm or 4mm cuvette, respectively. Cell culture supernatants were harvested between 24 and 48 h post-electroporation and stored at 280uC. Infectious viral titers were determined by titration on Vero cells by plaque assay (range from 610 6 to 610 8 pfu/ml, data not shown). All viruses recovered from electroporation were used directly without any additional passages.

Competition experiments
To sensitively determine the relative fitness levels of mutant vs. wt viruses, competition tests were conducted in mosquitos and mice as described previously [56]. The competing viruses, with and without the genetic marker, were mixed in a 1:1 ratio and then inoculated into mice or mosquitoes. Viral RNA was purified from each sample as well as the initial inoculation stock using the QiAamp Viral RNA kit (QIAGEN). To evaluate the ratio of the two viruses from total viral RNA, the genome region from nt 6106-6794, which covers the position of the introduced ApaI/ PspOMI digestion site in the middle, was amplified by RT-PCR using QIAGEN OneStep RT-PCR kit (QIAGEN) using the following primers: 41855-nsF5: 59-ATATCTAGACATGGTG GA-39; 41855-nsR1: 59-TATCAAAGGAGGCTATGTC-39. The PCR products were subject to digestion by restriction endonucleases Apa I and PspOMI (sharing the same cutting site) for 30 min at 27uC, 4 h at 37uC. Complete digestion was confirmed using controls, which contained only one of the competing viruses (with or without the marker).

Fitness in mosquitoes
The relative replication and dissemination rates in mosquito vectors of competing viruses were examined in an A. aegypti colony established from mosquito eggs collected in 2009 in Bangkok, Thailand. The competing viruses were mixed with sheep blood to a final concentration of ,1610 6 pfu/mL. Mosquitoes collected 4-5 days after eclosion were offered blood meals for ,45 min, and engorged mosquitoes were sorted and incubated at 28uC with 10% sucrose and 80% relative humidity under a 16 h light/8 h dark photoperiod. At 10 day post-infection (dpi), heads of 60 mosquitoes were dissected and homogenized in 1.5 ml of MEM. Because the titers of head suspensions were too low for consistent RT-PCR amplification, viruses from these samples were amplified by infecting Vero cells prepared in 96-well plates. Supernatants from Vero cells showing CPE were collected at 2 dpi and used for RNA extraction, RT-PCR and digestion, as well as gel analyses as described above. To ensure that major differences in fitness for replication in Vero cells did not affect these mosquito fitness assays, each mutant was competed first in Vero cells. Except for Mal06/DDR(1+2)a, there was no detectable difference in fitness for any of the mutants compared to wt (Fig. S4B). Mal06/ DDR(1+2)a did have a moderate fitness advantage for replication in Vero cells (Fig. S4B), but this, difference was contrary to the results of mosquito competitions (where Mal06/DDR(1+2)a consistently lost). Therefore, the Vero cell passages could not have confounded the mosquito competition results.

Fitness in mice
About ten 11 to 12-day-old outbred CD1mice (Charles River) were subcutaneously infected with 50 mL of virus mixtures containing a total of ,1000 pfu. Three or four mice were sacrificed daily from day 1 to 3 dpi and blood samples were collected for subsequent RNA extraction and genetic analyses as described above.