Evolutionary stasis of an RNA virus indicates arbovirus re-emergence triggered by accidental release

The mechanisms underlying virus emergence are rarely well understood, making the appearance of outbreaks largely unpredictable. Bluetongue virus serotype 8 (BTV-8), an insect-borne virus of ruminants, emerged in livestock in Northern Europe in 2006, spreading to most European countries by 2009 and causing losses of billions of Euros. Though the outbreak was successfully controlled through vaccination by early 2010, puzzlingly a closely-related BTV-8 strain re-emerged in France in 2015, triggering a second outbreak that is still ongoing. The origin of this virus and the mechanisms underlying its re-emergence are unknown. Here, we performed phylogenetic analyses of 164 whole BTV-8 genomes sampled throughout the two outbreaks. We demonstrate consistent clock-like virus evolution during both epizootics but found negligible evolutionary change between them. We estimate that the ancestor of the second outbreak dates from the height of the first outbreak in 2008. This implies that the virus had not been replicating for multiple years prior to its re-emergence in 2015. Given the absence of any known natural mechanism that could explain BTV-8 persistence over this period without replication, we conclude that the second outbreak was most likely initiated by accidental exposure of livestock to frozen material contaminated with virus from approximately 2008. Our work highlights new targets for pathogen surveillance programmes in livestock and illustrates the power of genomic epidemiology to identify pathways of infectious disease emergence.


Introduction
Infectious disease outbreaks are a major burden on human and animal health. They can dramatically reduce the productivity of entire countries, due to direct losses, control measures, trade bans or public fear [1]. Diseases caused by insect-borne viruses (arboviruses) in particular, have increased substantially in recent decades [2][3][4] and there is an urgent need to better understand the causes of their emergence in order to devise better control and prevention strategies. The factors leading to disease emergence are often unclear and case studies of intensely studied outbreaks can therefore provide important wider lessons.
Bluetongue is a major disease of domestic ruminants caused by the bluetongue virus (BTV), an arbovirus transmitted by Culicoides midges. BTV is the type species of the genus Orbivirus, within the family Reoviridae and possesses 10 double-stranded RNA genome segments encoding for 7 structural and 4 or 5 non-structural proteins [5][6][7]. BTV infection in sheep can induce a variety of clinical outcomes, which in the most extreme cases include a lethal hemorrhagic fever [8][9][10].
Like many other arboviruses, the geographical spread of BTV has increased significantly in the last 20 years [16][17][18]. In August 2006, BTV serotype 8 (BTV-8) emerged for the first time in the Netherlands [19][20][21][22][23][24], leading to dramatic losses of sheep and causing extensive economic damage to farming communities, costing on the order of billions of euros [25][26][27][28]. The virus quickly spread across the continent, with confirmed infections in 16 countries by 2008 (Fig 1). The outbreak was ultimately controlled through a pan-European vaccination campaign, using inactivated vaccines, with a few last cases detected in Europe in 2010 [29]. However, after a five-year period with no BTV-8 cases recorded throughout Europe, the virus re-emerged in France [30] and has since continued to spread. France was declared enzootic in 2018 and recent cases reported in adjacent countries, including Germany, Switzerland and Belgium [31].
The source and mechanism of BTV-8 re-emergence in France remains obscure. Initial genetic data from one isolate, suggested the re-emerging virus in France to be a close relative of the lineage causing the 2006-2010 outbreak [30,32,33]. The prevailing theory was that the virus had continued to be transmitted sub-clinically but remained unrecorded in livestock or wild ruminants after it had been declared absent from Europe in 2011 [30]. However, there is currently little evidence to support this hypothesis. Based on serological evidence, wild ungulates do not appear to have sustained transmission [34,35]. Similarly, serological testing of cattle sampled in 2014, indicated a rapid decline of seropositivity after vaccination ceased in France in 2010, consistent with a new (re-) introduction of the virus in, or just before, 2015 [34,36,37].
We describe the use of phylogenetic and evolutionary analyses of BTV-8 virus samples, collected during the first and second European outbreaks, to gain insights into the mechanisms that allowed BTV-8 to re-emerge in France in 2015. For this, we generated a novel data set of full genome sequences for more than 150 viruses sampled throughout both outbreaks. We show that the evolutionary signatures contained in these data are inconsistent with continuous circulation of BTV-8 between the outbreaks and instead point to re-emergence being caused by an anthropogenic factor, such as accidental release of contaminated material.  (Table S1). To minimise or exclude the possibility of including genome mutations acquired during extensive passage of the virus in culture, we sequenced the great majority of samples directly from clinical samples (blood) of infected animals or from isolates kept in culture for a minimum number of passages (Table S1).

Viruses from the first and second European BTV-8 outbreaks form a single monophyletic clade
A maximum likelihood (ML) tree revealed considerable genetic diversity both within the first (2006-2010) and the second outbreak (2015-2018) (Fig 2). The tree showed that all sequences from the second European outbreak form a well-supported monophyletic clade that is nested within the virus lineages circulating during the first outbreak in 2006-2010 (Fig 2, Fig S1). Specifically, the clade of the second outbreak derives from a clade from the first outbreak, including predominantly viruses from France and Germany collected in 2007 and 2008. The viruses from the second French outbreak can be distinguished into two further clades, one including viruses from 2015 and 2016, the other including samples spanning the entire outbreak (2015-2018). Interestingly, both clades were already present among the eight samples from the farm in France (in Auvergne-Allier) from which the first diagnosis of re-emerged BTV-8 was made in August/September of 2015. Surprisingly, the branch leading to the re-emerging virus appeared short, given the 5-year period between the outbreaks, implying a slow rate of evolution during this period (Fig 2).

BTV-8 re-emergence associated with exceptionally slow evolution
To test if the lower amount of divergence along the branch separating the two outbreaks was unusual, we estimated the evolutionary rate of BTV-8 from the set of 164 genomes. For this, we applied a lognormal relaxed clock model that allows for branch-specific heterogeneity in clock rates, implemented in the Bayesian phylogenetic software BEAST (Fig 3 and Fig S2). The mean evolutionary only a year apart during the first outbreak is in general far higher than that between the ancestor of the re-emerged BTV-8 strain and its closest relative in the first outbreak.
"Frozen evolution": clock-like evolution of BTV-8 during, but not in-between, outbreaks Next, we included the reconstructed ancestral sequence in our dataset and re-estimated the ML tree to get measures of genetic distance from the root of the tree. Consistent with clock-like evolution, the genetic distance between virus sample and the root increased linearly with time during both the first (TempEst: slope=7.2708x10 -4 subs/site/year, r 2 =0.8113) and second outbreaks (TempEst: slope = 6.9291x10 -4 , r 2 = 0.9605). There was no evidence that the evolutionary rate of the virus differed between the two outbreaks (p-value for Date:Outbreak interaction = 0.194). However, there is a clear discontinuity in the accumulation of mutations between the two outbreaks, consistent with a period where clock-like evolution had essentially ceased. Consequently, the reconstructed sequence of the ancestor of the second outbreak, when included in the ML tree, has an inferred distance from the tree root that is consistent with a virus from late March 2008 according to the root to tip regression ( Fig 4B).

Discussion
Diseases of livestock can be exceedingly interesting models to study virus emergence, given that harmonised international surveillance systems and regulatory frameworks provide opportunities to access field samples with associated metadata across national borders. Here, the BTV-8 European outbreaks provided us with the opportunity to investigate the mechanisms surrounding arbovirus emergence based on a uniquely rich dataset. Our results indicate that the re-emergence of BTV-8 in France in 2015 was caused by a virus that exhibits a lack of evolutionary changes since the first outbreak. This is inconsistent with the prevalent view of undetected low-level circulation of the virus in wild or domestic ruminants, between 2010 and 2015, and instead points to another mechanism of emergence.
We showed a large discontinuity in the number of mutations accumulated by BTV-8 between 2010 and 2015, even though the evolutionary rates of the virus during the first and second outbreak were indistinguishable and of the same order as rates reported in previous BTV studies [38,39]. If the virus had been replicating consistently in an undetected population from 2010 to 2015, we would expect the genetic distance of the isolates from the second European outbreak to continue the trend of increased divergence after the first outbreak. However, the sequences from the second outbreak exhibit genetic divergences that fall considerably below what would be expected if the trend-line from the first outbreak was extended, illustrating a paucity of mutations relative to expectations ( Fig   4B). Indeed, the divergence of the reconstructed ancestor of the second bluetongue outbreak is consistent with the virus stopping replication in March 2008. The lack of divergence is also illustrated by the fact that the reconstructed ancestor of the BTV-8 outbreak has only 7 mutations separating it from its closest relative in the analysed dataset, a French sample collected in August 2007 (BTV-8FRA2007-3673), despite putatively having been replicating for at least half a decade after that samples' collection. In comparison, BTV-8FRA2007-3673 showed 23 mutations compared to the genome of the earliest BTV-8 sample obtained from the Netherlands in August 2006, only a year earlier. The corresponding rate of evolution estimated for the emerging branch was almost an order of magnitude slower than the mean clock rate, highlighting it as exceptionally slow (Fig 3). Moreover, we hypothesise that some or all of the estimated seven mutations on this branch might have been accumulated during the first outbreak, given that the emerging branch connects to an internal node in the time-scaled phylogeny with a date of early 2007, at the height of the first outbreak. The subsequent accumulation of seven mutations is consistent with the idea that this virus continued to circulate until early 2008 (the inferred date from the root-to-tip regression) and then ceased to change all together until its re-emergence in 2015.
Given the unexpectedly low number of mutations observed between the two outbreaks, our data indicate that the common ancestor of the second European outbreak either ceased, or dramatically slowed its replication, in early 2008. This is inconsistent with current knowledge of the biology of BTV and RNA viruses in general. For example, a potential explanation could be that BTV persistently infected a host for several (5 to 8) years, with little or no replication, before being re-activated and starting the second outbreak. While this may be possible with DNA viruses, or RNA viruses with a DNA intermediate [40][41][42][43][44][45], it has never been described for Reoviruses such as BTV or other RNA viruses. Rabies virus may provide an exception based on a handful of case reports of virus reactivation after latency of several years [46] but it has not been documented whether these involved a lack of evolutionary changes. In other cases, as in foot-and-mouth disease, viral RNA and infectious virus have been shown to persist in reservoir hosts for multiple years. However, reisolation of virus (as opposed to detection of viral RNA) indicates that the virus replicates during persistent infection and accumulates nucleotide substitutions at a rate comparable to actively replicating viruses [47,48]. Hepatitis C virus for example is also known to persist in a number of patients for a number of years but, again, with continuing viremia, and thus virus replication [49].
Another hypothetical scenario could be envisaged if BTV-8 was to remain "latent" in midges' eggs for a number of years. However, there is no evidence of vertical transmission in BTV-infected Culicoides [50][51][52][53]. This, in conjunction with the need for infected midge eggs to survive for years, rather than a single overwintering season, makes this scenario highly unlikely.
Overall, we judge the possibility of persistence of BTV-8 in a mammalian or invertebrate host for longer than 5 years, in the absence of viral replication, followed by viral reactivation and subsequent onwards spread, to be implausible. Our data instead point to the release of a BTV-8 virus retained from the initial outbreak by other means as the cause of the second outbreak in France in 2015.
Anthropogenic causes of virus outbreaks have been described before. Accidental virus release is thought to have been responsible for the 1977 influenza A H1N1 outbreak, caused by a virus which closely matched a variant circulating in the 1950s [54]. Likewise, the 1995 Venezuelan equine encephalitis subtype IC epidemic which was caused by a virus closely related to a strain circulating in 1962-64 [55]. For livestock pathogens, a localised outbreak of FMDV in the UK in 2007 was linked to virus escaped from research facilities [56].
Our data cannot reveal the anthropogenic source from which BTV-8 was re-introduced in France in 2015. We speculate that laboratory escape of virus preparations, such as the case of FMDV in the UK in 2007, is unlikely as BTV needs an insect vector for efficient transmission and we are unaware of any in-vivo insect experiments in France with BTV during that period. However, due to specific animal husbandry procedures, there are important potential sources of frozen virus that apply to livestock viruses that are not present in viruses of most other animals, specifically the widespread use of bull semen for artificial insemination and embryo transfer in cows [57,58]. BTV has been detected in the semen of viremic bulls and rams, can initiate infection in the mother and be transmitted vertically to the embryo [59,60]. Additionally, contaminated embryos can cause transmission on implantation [61]. As such, both semen and embryos may represent potential sources of BTV infection. Contaminated frozen colostrum may also be a potential source, considering that oral transmission has been shown to be possible with BTV-8 [62]. However, it is not normal practice to keep colostrum frozen for a number of years. Interestingly, while international regulations specify that bull donors and semen that are exported internationally must be screened for various pathogens including BTV [63], this does not apply to premises trading only locally and carrying out private insemination procedures [64]. Thus, semen from a BTV-8 infected bull could have been collected or an embryo generated from an infected but asymptomatic animal and used years later without detection. The eventual use of this frozen animal product could have then led to the infection that caused the second Western European outbreak.
We stress that the link between bull semen trade and embryo implantation in France and the BTV-8 re-emergence in 2015 is only speculative. However, we have shown that the re-emergence of  in France in 2015 is unlikely to be due to cryptic continuing transmission and we can exclude a reintroduction from another endemic country. Thus, our data are incompatible with the two current dominant theories for explaining the 2015 outbreak [30]. The lack of accumulated mutations in the virus implies that there was either an ongoing persistent infection in the absence of viral replication for several years, or the virus was reintroduced by humans from material that had been frozen during the first outbreak. We argued the second of these explanations to be more likely. Our findings highlight new areas requiring thorough surveillance programmes for the control of infectious disease of livestock. In addition, our approach illustrates how unrecognised pathways of disease emergence can be revealed using pathogen genomic epidemiology.

Samples
Blood samples from animals infected with BTV-8 were received from 10 European countries during the bluetongue outbreaks from 2006 to 2018. In some instances, samples analysed were viruses isolated in tissue culture from blood of infected animals. Table S1 provides the metadata related to the dataset used in this study. These include virus strain names, animal species of origin, geographical location and date of sampling. In addition, metadata include whether the viral genome sequence was obtained directly from clinical material (blood) or from an isolate in tissue culture, sequencing methods and GenBank accession number.

RNA extraction and Illumina library preparation
Total RNA was extracted from infected blood samples, and virus isolates using Trizol LS  (Invitrogen, USA) and purified using Direct-zol RNA MiniPrep (Zymo Research, USA) as per manufacturer's protocol. RNA samples were treated with DNAse I (Ambion) and purified with 3X Agencourt RNAClean XP beads (Beckman Coulter, USA). Total RNA concentration was quantified using the Qubit Fluorimeter (Life Technologies, USA) and Qubit RNA HS Assay (Life Technologies, USA), while RNA integrity was assessed using Agilent 4200 TapeStation (Agilent, USA). In order to avoid cross-contaminations, RNA extractions, from virus isolates was performed separately from those of infected blood samples. Similarly, RNA extractions were carried out separately on the bases of geographical origin and year of collection of the samples. In addition, library preparations, target enrichment and sequencing runs (see below) were carried out also on separate days following the same criteria than above. Libraries from low and without measurable RNA (low input) were also prepared separately from those with measurable RNA (high input). Libraries were prepared for Illumina sequencing using the Illumina TruSeq Stranded mRNA HT kit (Illumina) using 5 μl of sample RNA (up to 250 ng of total RNA) according to the manufacturer's instructions. Briefly, after RNA was fragmented, it was reverse transcribed using SuperScript II Reverse Transcriptase (Invitrogen, USA) and random hexamers. Single strand cDNA was immediately converted to double stranded cDNA, cleaned-up with Agencourt® AMPure® XP magnetic beads (Beckman Coulter, USA), quantified using Qubit Fluorimeter and Qubit dsDNA HS Assay Kit (Life Technologies, USA) and size distribution was assessed using a 4200 TapeStation System with High Sensitivity D1000 Screen Tape assay (Agilent, USA). A-tailing was performed followed by adapters ligation. After a purification step, dual indexed libraries were PCR amplified and the purified PCR products were pooled in equimolar concentrations and sequenced using 150 paired end sequencing on MiSeq or NextSeq500 sequencers (Illumina USA).

Targeted enrichment sequencing
We carried out multiplexed viral targeted enrichment followed by Illumina sequencing using the NimbleGen SeqCap EZ system (Roche, USA), for improved viral detection from clinical material. This approach was followed in order to increase the number of BTV-8 samples from which we could obtain a complete viral genome sequence directly from clinical material, including those with very low amount of viral RNA. Libraries were prepared following the above described standard Illumina TruSeq Stranded mRNA protocol. They were quantified using Qubit Fluorometer and Qubit dsDNA high sensitivity (HS) Assay Kit (Life Technologies, USA). Quality and size distribution was validated using the High Sensitivity D1000 Screen Tape assay (Agilent) in a 4200 TapeStation System (Agilent, USA) and were normalized according to BTV viral load and mass. A 1000 ng aliquot of the pooled library was enriched using SeqCap EZ Developer Probes (Roche/NimbleGen) (see below), according to the manufacturer's protocol. After a 14 cycle post enrichment PCR amplification, the cleaned PCR products were pooled and were sequenced with a 151-base paired-end reads on a NexSeq500, 550 cartridge (Illumina, USA). Probes were designed using all BTV sequences available on NCBI Genbank, RefSeq, DNA Data Bank of Japan (DDBJ) and EMBL EBI databases (as accessed by October 2016). The resulting NimbleGen biotinylated soluble capture probe library ("BTV-Cap"), contains a probe set of more than 500,000 probes, designed to minimise capture of Culicoides sonorensis, Bos taurus, Ovis aries, Capra hircus and Mesocricetus auratus genomes.

Consensus calling
All consensus calling was performed on a cluster running Ubuntu v. 14.04.5 LTS. In all cases, BAM and SAM files were handled using samtools v. 1.3 [65]. The R packages ggplot2 [66], seqinr [67], stringr [68], and vcfR [69] [73] with the minimum coverage of the filtering step set to 5 and all other parameters at their default values. Any variants from the reference with an allele frequency of greater than 0.5 were replaced into their positions in the reference to build a new reference sequence. Reads were then remapped to this new reference. This process was then repeated either 5 times or until the reference generated after the process was identical to the reference at the start of the last round of mapping. Reads were then mapped again against this new reference and ambiguities were called. A base was called unambiguously if the allele frequency of the dominant allele was greater than 0.75, otherwise the base was called ambiguously over all alleles with a frequency of greater than 0.05 using a bespoke script in R (Data S1). For both of these consensus sequences, positions were masked with "N"s if the coverage at the site was less than 5 separate paired end reads.
Final sequences were processed and annotated for submission to GenBank using an extension to the BTV-GLUE resource (http://btv-glue.cvr.gla.ac.uk). GenBank accession numbers for each of the sequences in our dataset are indicated in Table S1.

Quality control
A sequence showing evidence of mixed infection or contamination was discarded. Potential contamination and/or mixed infection were detected by finding sequences that met the following two criteria: i) visible numbers of reads mapping to serotypes other than BTV-8 or the closely related BTV-18 in segments 2 and 6; and ii) the presence of regions, which when aligned showed large numbers of unique SNPs and ambiguous nucleotides. In total, 8 samples were discarded due to mixed infection with different BTV strains and/or contamination. During quality control, segment 7 from the sample FRA2008-28 was also removed, as it represented an obvious reassortment from a distinct BTV strain, but the rest of the sample was preserved. We used GiRaF v 1.02 [74] and MrBayes v 3.2.7a [75] on all the samples to test for the presence of less obvious reassortments between serotypes. Within the GiRaF algorithm, per segment trees were run for 1000000 iterations, with 500000 iterations discarded as burn-in. All other parameters were left at their default values. No reassortment was detected, so we opted to use all segments in a single concatenated phylogenetic tree. However, it should be noted that, as there is little variation in many segments, our ability to detect reassortment between two distinct but phylogenetically related strains is correspondingly low.

Phylogenetics
Two separate phylogenetic analyses were performed, a maximum likelihood analysis performed in PhyML v. 20120412 [76] and a Bayesian analysis performed in BEAST v. 1.10 [77].

Maximum likelihood analysis
To explore the diversity of the outbreaks, we generated a maximum likelihood tree. All segments were concatenated into a single sequence and a phylogeny using the GTR+G+I nucleotide model was run in PhyML v. 20120412 [78]. All parameters were optimised by maximum likelihood. The algorithm was the best of NNI and SPR moves with 10 random starts with 1000 bootstraps being performed on the best tree found. A maximum likelihood tree containing all sequences was then run, using the same settings as the first, and 1000 bootstraps were performed on the best tree found 13 in those 12 starts. Given the observed short branch between the first and second outbreaks, this tree was then rooted at the optimal root found from the tree containing only the sequences from the first outbreak generated under the same settings, as calculated by TempEst v. 1.5.1 [79]. The TempEst rooting procedure also confirmed clock-like evolution for this dataset.

BEAST analysis
Using known break points, the sequence for each segment was split into the untranslated region, and the first, second and third codon positions of the coding sequence. In the 9th and 10th segments there are regions with overlapping open reading frames, these were also placed together in their own partition. Separate evolutionary models, linked across segments, were applied to each of these partitions. The segments shared a lognormal relaxed molecular clock [80]. Given the difficulty, caused by combinatorial explosion, of model selection when there are multiple partitions, we performed a pre-analysis model selection protocol. Each segment was concatenated and a model was chosen for each partition using jModelTest v 2.1.10 [81]. The best model by Akaike Infomration Criterion corrected for small sample size (AICc) that was implemented in BEAST 1.10 was used. This was a GTR model for the first codon position, a HKY for the second, GTR+G with 4 gamma categories for the third, K80 for the UTR and JC for the regions with overlapping ORFs. We used a GRMF skyride model [82] for the tree prior. When the sampling date was not exactly known, the age of the tip was estimated in the MCMC with a uniform prior over the period of uncertainty. All priors were left at their default values except for the mean of the lognormal distribution for the relaxed molecular clock which was given a lognormal (-7.6, 3) prior. In all cases ambiguous nucleotides were used in the tree likelihood. Two trees were run, one containing only the sequences from the 2015 outbreak, and one containing all sequences. The tree containing all sequences was used to reconstruct the sequence of the ancestor of all the viruses in the second outbreak. BEAST will reconstruct sequence even in locations where the majority of sequences show gaps in the alignment. As such, there were three nucleotides that we removed from the final reconstructed sequence corresponding to locations in the original multiple sequence alignment where all sequences but one had gaps. The BEAST XMLs for the two trees described above are available as Data S2.

Downstream statistical analysis and figure generation
Observed genetic distance from the full ML tree and sampling date were combined in R. When the exact sampling date was unknown, if the day within the sampling month was unknown, the date was fixed to the 16th of the month, and when the month was unknown, the date was fixed to the 14 midpoint of the year. General linear models were fitted to test if the evolutionary rate of the virus was the same between two outbreak in the glm function in base R. The models used a gamma distribution with an identity link. The regression equation for the first model was: Genetic distance from root ~ Date + Outbreak. The regression equation for the second model was: Genetic distance from root ~ Date + Outbreak + Date:Outbreak. After no evidence of was found of differential rates between the two outbreaks, the general linear model without the interaction was run in brms [83] to generate predictive intervals for Figure 4b. A normal(0, 10) prior was placed over the intercept, standard normal priors were placed over all regression coefficients and a gamma(0.01, 0.01) prior over the shape parameter. The normalised rank of the evolutionary rate of the long branch was calculated by; for each tree in the posterior, ranking the estimated evolutionary rate of each branch from slowest to fastest, extracting the rank for the long branch, subtracting 1 so that the minimum was 0, [83]then dividing by the number of branches minus 1, so that a number between 0 and 1 was generated. Figures used the following R packages: ggplot2 [66], ggtree [84], ggthemes [85], cowplot [86,87], ggmap [87], viridis [88], tidybayes [89], lubridate [90], sp [91], raster [92], maptools [93], rgeos [94], rgdal [95], sf [96] and PBSmapping [97].        Figure   Fig S2