Phylogeographic analysis of Pseudogymnoascus destructans partitivirus-pa explains the spread dynamics of white-nose syndrome in North America

Understanding the dynamics of white-nose syndrome spread in time and space is an important component for the disease epidemiology and control. We reported earlier that a novel partitivirus, Pseudogymnoascus destructans partitivirus-pa, had infected the North American isolates of Pseudogymnoascus destructans, the fungal pathogen that causes white-nose syndrome in bats. We showed that the diversity of the viral coat protein sequences is correlated to their geographical origin. Here we hypothesize that the geographical adaptation of the virus could be used as a proxy to characterize the spread of white-nose syndrome. We used over 100 virus isolates from diverse locations in North America and applied the phylogeographic analysis tool BEAST to characterize the spread of the disease. The strict clock phylogeographic analysis under the coalescent model in BEAST showed a patchy spread pattern of white-nose syndrome driven from a few source locations including Connecticut, New York, West Virginia, and Kentucky. The source states had significant support in the maximum clade credibility tree and Bayesian stochastic search variable selection analysis. Although the geographic origin of the virus is not definite, it is likely the virus infected the fungus prior to the spread of white-nose syndrome in North America. We also inferred from the BEAST analysis that the recent long-distance spread of the fungus to Washington had its root in Kentucky, likely from the Mammoth cave area and most probably mediated by a human. The time to the most recent common ancestor of the virus is estimated somewhere between the late 1990s to early 2000s. We found the mean substitution rate of 2 X 10−3 substitutions per site per year for the virus which is higher than expected given the persistent lifestyle of the virus, and the stamping-machine mode of replication. Our approach of using the virus as a proxy to understand the spread of white-nose syndrome could be an important tool for the study and management of other infectious diseases.


Introduction
White-nose syndrome (WNS), caused by the filamentous fungus Psedogymnoascus destructans (Pd), is a highly pathogenic infection spreading among North American bats since around 2006 [1][2][3][4]. In the last fourteen years, the disease spread over a wide geographic range across the United States with largest movements in southward and westward directions from New York State, where it was first reported [5]. The fungus transmits to healthy bats by contact with infected bats or from the cave environment where the fungus can survive [6,7]. The ability of the fungus to survive for a considerable time without bat hosts [7][8][9][10] and at elevated temperatures [8] make the fungus a potential candidate for long-distance movement, most likely by humans through contaminated personal equipment or clothing [3]. The efficiency of transmission may also be impacted by other biological, environmental, and climatic factors. Understanding these variables that drive the spread is important for the epidemiology of WNS and planning conservation efforts to protect bats from WNS. Several models were published recently to explain the spread of WNS [11][12][13][14]. However, the models were based on theoretical assumptions and did not capture the dynamic characteristics of the spread over time. Phylogeographic analysis based on phylogenetic relationships of a pathogen over time and space is a valuable tool to characterize the dynamics of spread [15]. This approach not only determines epidemiologically critical dispersal patterns while spreading across the landscape, but also provides evolutionary information.
The North American population of Pd is essentially clonal, represented by a single mating type [16][17][18]. Studies on single nucleotide polymorphism across the genome and microsatellite data showed some genetic variations in Pd population however, the variations are insufficient to determine the spatial spread of the fungus [19,20]. We reported earlier a partitivirus, Pseudogymnoascus destructans partitivirus-pa (PdPV-pa), infection in all Pd isolates that we screened from North American bats with WNS [21]. All partitiviruses (family Partitiviridae) like other mycoviruses lack an external infectious phase in their life cycle [22][23][24]. They are transmitted vertically during cell division and sporogenesis within a host, and via hyphal anastomosis (cell fusion) within the same or closely related vegetatively compatible groups [22]. This lifestyle ensures persistent infection cycles in a host and makes mycoviruses suitable proxies to trace the spread of their hosts based on their genetic signatures. This approach provides a significant tool for understanding the spread of WNS in North America beyond variation in the fungus with its shallow genetic variation. We showed previously, using genetic information of the partitivius from 45 isolates of Pd, that the virus is geographically clustered [21]. Here, we examined the concept with a much larger sample set from across the disease fronts to test the hypothesis that the genetic information from the virus could be used to understand the patterns of WNS spread. We utilized variations in the virus sequences infecting Pd in a Bayesian phylodynamic framework to derive a credible approximation of WNS spread over space and time.

Sample collection
We used 139 Pd isolate samples collected from nine bat species across 17 US states and one province (New Brunswick) of Canada in this study (Table 1). Of the 139 isolates, 45 were from our work published in 2016 [21]. The remainder were from a new collection for this study.
The new isolates were obtained as generous donations from many individuals affiliated with various institutions involved in WNS research (Table 1). This sample set represents the maximum available samples we were able to obtain. Although not comprehensive, we are confident that the samples capture the genetic diversity of PdPV, as the sample location, time and bat species were random and included isolates from the early reporting states and the disease front where WNS was advancing. The samples from a few early reporting states like New York and Ohio were from 2008 to 2012 and not from later dates. However, the aim of our analyses was to examine the spatial-temporal dynamics across North America rather than at a state level, so the data coverage justifies our purpose. Moreover, samples from some states like Pennsylvania, West Virginia, Indiana and Kentucky include wide range time-series samples. The samples were obtained in various forms including bat wings, bat wing punches, bat wing swabs, spore suspensions of Pd, total nucleic acid extraction of Pd, or Pd cultures (Table 1). Except for the total nucleic acid extracts, we grew the fungus out of the material received by culturing on Sabouraud dextrose agar (SDA) media to obtain a pure culture of the isolate. The culture conditions and methods to obtain the pure cultures were as described previously [21]. When samples were obtained as total nucleic acid, we enriched the samples for dsRNA followed by RT-PCR to amplify the coat protein segment of PdPV-pa using specific primers as described previously [21].

Viral RNA amplification, sequence determination and sequence analysis
The dsRNAs of PdPV-pa infecting North American isolates of Pd, were extracted from fungal lyophilized tissue, followed by RT-PCR to amplify the coat protein gene of the virus, and direct sequence analysis, as described previously [21]. The primer pair (5' ACTCTGTGTTAACG GAGG 3'F, 5' CTGTAGTTGACACCTGTACC3' R) amplified 1224 bp of PdPV-pa coat protein; sequences were trimmed to 1088 bp to avoid noise. Nucleotide sequences were aligned using the multiple alignment fast fourier transform (MAFFT) algorithm in Geneious 10.2.6 [25]. We manually edited the sequences whenever necessary, particularly when the frame was shifted, to improve the alignment quality.

Phylogeographic analysis of PdPV-pa
A phylogeographic analysis using the coat protein sequences of PdPV-pa was performed in BEAST v2.5 [26]. The analysis is based on the Bayesian framework and includes molecularclock, evolutionary, and spatial models. Before conducting the analysis, the temporal signals of our molecular data were checked in the TempEst tool [27]. A Bayesian non-clock phylogeny was constructed in Geneious 10.2.6 using Hasegawa-Kishino-Yano (HKY) nucleotide substitution model with gamma distribution rates among sites. Then, a correlation between the genetic divergence and the sampling date was plotted in TempEst (S1 Fig) using the consensus tree file generated in Geneious. After confirming the positive temporal signature of our data, we used a discrete phylogeographic model in BEAST. The discrete phylogeographic model was appropriate because our data were time referenced and spatially descriptive. The sampling dates were entered as years and the sample locations were added as discrete traits. We used the HKY substitution model with a kappa value 2 and gamma category count of 4. HKY (121321, slight variance in r cg rate from HKY) was the most supported model (38% posterior support within 95% highest posterior density, HPD) (S2 Fig) out of 30 reversible models with symmetric rate matrices tested in bModelTest in BEAST [28]. A strict clock model with the prior set to coalescent constant population was used in the analysis. Markov chain Monte Carlo (MCMC) algorithms were run for 250 million states and sampled every 25,000 states. MCMC conversions were evaluated using Tracer v1.7 [29]. The phylogenies with the posterior distribution values were summarized in TreeAnnonator applying maximum clade credibility (MCC) trees excluding 10% burn-in. The consensus tree was visualized in FigTree v1.4.4 [30], using midpoint rooting, based on the assumption of a constant evolutionary rate with the strict clock model [15].
Spatial phylogenetic and reconstruction of evolutionary dynamics (SPREAD) application was used to map spatial diffusion and 95% HPD contours of the virus spread obtained from Bayesian inference [31]. SPREAD version 1.0.7 was used for all visualization in this study. The posterior summaries of diffusion from the MCC tree of BEAST were associated with the discrete locations and exported to a keyhole markup language (KML) file that was viewed in Google Earth [32]. We also performed the Bayesian stochastic search variable selection (BSSVS) analysis and visualized in SPREAD to determine the locations that were significant for PdPVpa dispersal [33]. In the analysis Bayes factors (BF), indicating deviation between the prior beliefs and the posteriors, distinguished the locations that have higher probabilities than expected [34]. We plotted the location pairs having BF�3 in Google Earth for visualization. Data was manually transferred to a USGS map for visualization in the figures.

Phylogenetic history of PdPV-pa in North America
The strict clock phylogeographic tree constructed with the coat protein sequences from 139 isolates of PdPV-pa in BEAST illustrated geographical clustering (Fig 1). We found well-supported clades (posterior probability, PP � 0.7 to 1) for all US states and New Brunswick in Canada. The virus isolates sampled were associated with nine Pd infecting species of bats (Table 1). Unlike geographical signatures, the virus variation showed no relationships with the bat species (Fig 1), implying that the fungus is transmitting without any genetic barrier among bat species. Some inner nodes of the tree closer to the root had PP values ranging from 0.22 to 0.6 (Fig 1). The clade associated with Tennessee showed strong posterior support with the root node that also appeared as a separate branch from the rest of the clades. In the tree, the branches are annotated with colors corresponding to the respective ancestral locations. No single ancestral location was found dominating across the tree topology. The temporal dynamics of PdPV-pa spread based on the discrete-trait (location) model was visualized with SPREAD v.1.0.7 (Fig 2). The area under concentric circles represents the number of branches in that particular location at that time and the lines connecting different locations represent branches in the MCC tree. Our sample size for the virus isolate corresponds to a particular location and thus matches well with the total area under the circles. However, the areas of the circles before our sampling time (before 2008) are the projected measure of the virus diversity at the

PLOS PATHOGENS
Phylogeography of white-nose syndrome ancestral locations. For example, during 1999, the virus appears to have existed in Connecticut, Alabama, and South Carolina (Fig 3A). By 2005, many branches emerged from Connecticut and established the virus in New York, Pennsylvania, West Virginia, Kentucky, and Tennessee (Fig 3B). Virginia and North Carolina were connected from the branches that emerged from West Virginia and Vermont connected from a branch from New York during that period. By 2010, the virus moved to New Brunswick, Ohio, Indiana, Illinois, and Georgia ( Fig 3C). By 2018, the virus had traveled to the rest of the sampled states including a long-distance jump from Kentucky to Washington (Fig 3D). The analysis indicates PdPV-pa presence in many sampled states before WNS was first reported in 2006, and Connecticut, West Virginia, New York, and Kentucky appeared as major source locations with multiple connectivity for the virus spread.

Inference on long-distance PdPV-pa spread
In 2016, WNS was detected in Washington state, over 3500 kilometers of aerial distance if we take New York state as the reference location where the disease was first reported [35]. In this study, we used the virus information from the isolate (ID:27099) first reported from Washington along with an isolate from 2017 and several isolates from 2018. The time referenced Bayesian tree showed that PdPV-pa isolates from Washington were phylogenetically close to an isolate (27WNS_KY_NLE_2012) sampled in 2012 from Breckinridge County, Kentucky (Fig  4). All of these isolates shared a common ancestor with PdPV-pa isolates collected from Mammoth Cave, Kentucky in 2018. The whole clade had posterior support of 100% (Fig 4). The result was further supported by a high BF value of 20.6 in the BSSVS analysis (Table 2) for the

Time to the most recent common ancestor, ancestral location, and the substitution rate
Ancestral reconstruction in the discrete phylogeographic analysis with BEAST supports the average time to the most recent common ancestor (TMRCA) of PdPV-pa was 1999 with 95% HPD interval ranging from 1996 to 2002 (S3 Fig). The TMRCA estimated from TempEst was close to the 95% HPD interval. As noted above, the location-annotated MCC tree (Fig 1) showed no single location dominated across the tree. The probability distribution of the root   locations (retrieved from the summary tree location file in our BEAST analysis) showed a range from 0.01 to 0.12 (Table 3). Some locations like Connecticut, Alabama, South Carolina, New York, and Illinois showed relatively higher probabilities (0.12, 0.11, 0.09, 0.08, and 0.08 respectively) than the rest. However, the probabilities were small in magnitude and the difference among locations was so small that none of the locations can be justified to call as a root location. In the analysis, the mean substitution rate for the virus was determined to be 2 X 10 −3 substitutions per site per year with 95% HPD interval ranging from 1.5 X 10 −3 to 2.3 X 10 −3 substitutions per site per year (S4 Fig).

Discussion
In this study, 139 coat protein sequences of PdPV-pa isolates obtained from Pd infected bats across 17 US states and one province (New Brunswick) of Canada were analyzed using the Bayesian phylogeographic inference framework [33]. We examined the virus spread based on its sequences sampled through time and with spatial characterization. These sequences allowed us to determine the substitution rates through time and space to infer spatial characteristics of the virus spread. We hypothesized that the phylodynamics of the virus is a reasonable approximation of white-nose syndrome spread because the virus infection is common in North American isolates of Pd; the genetic diversity of the virus is explained well by the geography [21]; the virus has a persistent lifestyle; and no incidence of the virus was found in any closely related fungi [21,36]. The shallow genetic diversity of North American population of Pd produces no spatial signatures for such analysis [18][19][20] hence, the genetics of the persistently infecting virus can be a significant proxy to study WNS spread and epidemiology. The large number of samples (139 isolates from diverse locations and bat species) analyzed through the time-referenced BEAST phylogeographic tool reconfirmed our earlier results [21]  that the virus isolates were geographically clustered and showed no patterns of variation associated with bat species. This implies that spread largely occurs within hibernacula, and movement to new areas is occasional. The summarized MCC tree with branches annotated with locations showed none of the locations dominating across the tree topology. This implies a lack of a precise root location where the majority of the isolates were phylogenetically linked. We did not find a clear directionality of the spread emerging from a single location in the phylogeny. It is unlikely that a single root location was significant for the spread but rather that multiple locations were involved and contributed to the epidemiology of WNS. This is evident from the SPREAD maps based on the MCC tree (Fig 3). In the map, the virus presence was evident by 1999 in Connecticut, Alabama, and in South Carolina but the major seeding events to other locations started appearing by 2005, earlier than the year WNS was first reported. A widespread occurrence of cryptic infections in some bat species without symptoms [37] and the variation in tolerance and resistance in a bat species in different sites [38] suggest the possibility of the fungus presence before the outbreak of WNS was observed. Over time (2005)(2006)(2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017)(2018), the dominating source locations for the virus spread emerged from Connecticut, New York, Kentucky, and West Virginia. Since the earliest sequence data of PdPV-pa infecting Pd in our analysis was from the year 2008, the spatial information of the virus prior to that was inferred based on the substitution rate algorithm created by the BEAST model. The model predicts the presence of the virus but we do not have any empirical evidence that supports the virus association with Pd and Pd transmission via bats during that time. It is likely that Pd acquired the virus infection a few years (range of 4-10 years) prior to the first report of WNS in 2006 [1,2] because when the WNS outbreak happened, Pd had already been infected with the virus. A successful microbial invasion and parasitism involve demanding events that are function of time. In general, a microbe must be able to reach the tissue it will parasitize by penetrating host tissue barriers, it should adapt and grow in the host environment, it must be able to absorb and digest components of host tissues, and it must withstand the host immune system [10,39]. Hence, although Pd appears to be well adapted to bats, given the extent of colonization of bats in other parts of the world [20] the lag of a few years before the WNS outbreak is not an unreasonable result. The origin of the virus in Pd is unclear. A recent report of the virus in Pd isolates from the Czech Republic [36] may suggest an alternative explanation to its origin in North America: that the virus came with European introduction of Pd. Previously we screened eight of the 18 Czech Republic isolates tested in the report and we did not find the virus in any of the isolates [21], but this report found five positive out of the eight. Unfortunately no genetic analysis of the fungus was conducted in this report, which would have helped to ascertain that these isolates were truly unique. The likelihood of finding the exact isolate that initiated the Pd infection in North America is extremely remote. Moreover, the report [36] did not describe any genetic variation in the virus from the Czech Republic that had phylogenetic signatures suggesting an ancestral relationship with the US isolates. Testing for the virus in a statistically valid pool from Europe is needed, as other European isolates did not have the virus.
Our analyses indicate that the virus-infected Pd established in local populations and maintained its lineages with mutations. The locations (here US states) with higher connectivity such as Connecticut, New York, Kentucky, and West Virginia appeared as major sources for the WNS spread. This dynamic can be described as a metapopulation model of patchy spread rather than a diffusive dispersal. It further suggests that the history of spread in each location is independent. This explains why the virus isolates from Tennessee appear as a separate clade from the rest. The virus isolates from Tennessee were similar to the root as suggested by a strong posterior support in MCC tree but lacked support for being the origin due to the low probability value for the root state. Diverse patterns of spatial spread have been identified depending on the ecology of pathogens and hosts. For example, a stratified diffusion pattern was evident in rabies virus [40], a random jump was found in the foot-and-mouth disease virus [41] and a diffusive spread was reported in the West Nile virus [42]. In addition, a combination of local diffusion and infrequent long-distance spread was shown in the sudden oak death pathogen, Phytopthora ramorum [43]. Maher et al. [11] applied maximum likelihood estimates on the county scale infection data of WNS in the United States and came to a similar conclusion that WNS spread is not diffusive but patchy. The authors found that the geographic heterogeneity associated with cave connectivity and winter length were the best predictors for their model. A multiscale process involving stochasticity in the spread of WNS was also emphasized in a recent study focusing on the data from 54 hibernacula of little brown bats in New York [14]. The patchy spread fits well with the stochasticity associated with multiscale processes because of the lack of deterministic directionality found in the phylogeny from a single location. Our analysis identifies the source locations for seeding and illustrates the temporal dynamics of the patchy spread that were not captured by previous studies.
In order to verify that our results were not biased by our sampling intensity, we created a random sample subset excluding the states with single isolates and selected only states that had at least two samples collected in different times. We ran the subset for BEAST phylogeographic and SPREAD analysis. Our results showed similar phylogenetic support and topological relationships among the virus isolates as we found with full data set (S5 Fig) and  The recent long-distance spread of WNS to Washington state appeared in 2016 from the field data. Our phylogeographic approach in BEAST reasonably traced the source of the spread to Kentucky that had close sequence similarity with the virus isolates from Mammoth Cave. The long distance spread of the virus is likely mediated by humans because Mammoth Cave National Park is one of the most heavily visited national parks in the United States [44]. The fungal mycelia or conidia could be transmitted through clothing, footwear or various gear used by cavers.
Although the geographical origin of the virus was not definite, we performed BSSVS analysis that provided BF tests to evaluate epidemiologically significant non-zero migration rates between locations. The epidemiological linkages that correspond to BF�3 identified Connecticut, Illinois, New York, Kentucky, and West Virginia as significant source states for the spread. Some transmissions such as Kentucky to Washington, Connecticut to New Brunswick, Kentucky to Ohio, West Virginia to Virginia, and Connecticut to Kentucky showed very high BF values of over 20, indicating that those spreads were extremely likely. Moreover, states like Connecticut, New York, West Virginia, and Kentucky identified in the MCC tree as the dominating seeding location for the virus movement were also supported by BSSVS analysis.
The time-referenced discrete phylogeographic analysis in BEAST also provided some important evolutionary information about PdPV-pa. We inferred the TMRCA of the virus was somewhere between the late 1990s to early 2000s (Mean = 1999, 95% HPD interval [1996,2002]) in North America. This suggests that the virus was present and existed before WNS spread happened. The lag period for WNS emergence likely coincided with the period of the virus infection of the fungus. The average nucleotide substitution rates as determined in BEAST analysis for the virus's coat protein gene (Mean: 2 X 10 −3 substitutions per site per year, 95% HPD interval [1.5 X 10 −3 , 2.3 X 10 −3 ]) is within the order of magnitude for dsRNA viruses (~10 −7 to 10 −3 ) reported earlier [45]. Although the substitution rate for dsRNA viruses, particularly for mycoviruses, is less known, the PdPV-pa substitution rate in the order of 10 −3 is very high for a virus with a persistent lifestyle and a stamping-machine mode of replication [46][47][48]. The higher substitution rate might reflect the recent infection history of the virus in Pd that seems likely from the recent origin of the virus in North America. Most viruses exhibit a higher level of variation after recent species jumping [49,50]. Our data does not allow us to determine a definitive origin for the virus, but it does imply that the virus only recently infected Pd. This could mean that the fungus arrived in North American without the virus, and acquired it from a related native North American fungus. Given the impact of the virus on sporulation [21], this could have led to the spread of WNS from the original site of Pd infection. Unfortunately finding the progenitor of the virus is a nearly impossible task.
The spatial and temporal characteristics of WNS spread described based on the sequence information of PdPV-pa have important management implications. Our analysis depicts the spread of WNS driven by the dynamics of some patchy locations over time. It is important to identify such locations to implement rigorous measures to control Pd and the subsequent spread of the disease. In our current analysis over 10 years, Connecticut, New York, West Virginia, and Kentucky were identified as major patches with higher connectivity both in the MCC tree and BSSVS analysis. Some of the locations such as Kentucky seem more relevant for management purposes as it appears as a major source for the long-distance dispersal of the fungus to Washington. We identify Mammoth Cave at a local scale as a significant location for possible human-mediated dispersal. We also noted Illinois as a significant location for a non-zero migration rate in BSSVS analysis but, unlike other locations, it is not supported in the MCC tree. A regular update of the time-structured phylogeographic analysis as new data become available would be a reliable tool for prioritizing management interventions in controlling WNS.
In conclusion, this study outlines a framework of how a persistent virus infecting a pathogenic fungus could be a helpful tool to understand the epidemiology of the fungal disease. We have shown the importance of the time-referenced phylogeographic approach in BEAST to evaluate the temporal and spatial dynamics of WNS spread and to the management of the disease. The approach will be equally relevant to examine the efficacies of competing mitigation strategies imposed in controlling the disease.