Co-circulating mumps lineages at multiple geographic scales

Despite widespread vaccination, eleven thousand mumps cases were reported in the United States (US) in 2016–17, including hundreds in Massachusetts, primarily in college settings. We generated 203 whole genome mumps virus (MuV) sequences from Massachusetts and 15 other states to understand the dynamics of mumps spread locally and nationally, as well as to search for variants potentially related to vaccination. We observed multiple MuV lineages circulating within Massachusetts during 2016–17, evidence for multiple introductions of the virus to the state, and extensive geographic movement of MuV within the US on short time scales. We found no evidence that variants arising during this outbreak contributed to vaccine escape. Combining epidemiological and genomic data, we observed multiple co-circulating clades within individual universities as well as spillover into the local community. Detailed data from one well-sampled university allowed us to estimate an effective reproductive number within that university significantly greater than one. We also used publicly available small hydrophobic (SH) gene sequences to estimate migration between world regions and to place this outbreak in a global context, but demonstrate that these short sequences, historically used for MuV genotyping, are inadequate for tracing detailed transmission. Our findings suggest continuous, often undetected, circulation of mumps both locally and nationally, and highlight the value of combining genomic and epidemiological data to track viral disease transmission at high resolution.

An unusually large number of mumps cases were reported in the US in 2016 and 2017, despite high rates of vaccination 1,2 . In the pre-vaccination era, mumps was a routine childhood disease, with over 180,000 cases reported in the US annually 1 . After the mumps vaccine was introduced in 1967, mumps incidence declined by more than 99% 1 . Case counts rose again briefly in the mid-1980s, and then continued to decrease after two Measles-Mumps-Rubella (MMR) vaccine doses were recommended in 1989 following a national outbreak of measles 3 . In the early 2000s, only a few hundred cases of mumps were observed annually in the US 1 . This low nationwide incidence was interrupted by a large outbreak (>5,000 cases) in the Midwestern US in 2006 4 , followed by a period of low incidence with minor outbreaks until 2016. The recent resurgence in mumps is not fully understood, although waning immunity 5 and genetic changes in circulating viruses may be contributing factors.
Within Massachusetts, over 250 cases were reported in 2016 and more than 170 in 2017, far exceeding the usual state incidence of <10 cases per year 6 ( Fig. 1a-b). As seen in other recent outbreaks, most cases were associated with academic institutions and other close contact settings 4,7 . Mumps was reported in at least 18 colleges and universities in Massachusetts, including Harvard University (Harvard), University of Massachusetts Amherst (UMass), and Boston University (BU). Of the individuals infected, at least 65% had the recommended two doses of the MMR vaccine (Extended Data Table 1a).
We set out to investigate the evolution and spread of MuV, generating 203 whole genomes (160 collected in Massachusetts and 43 in 15 other states) through a combination of unbiased and capturebased sequencing approaches 8,9 (see Methods, Supplementary Table 1). The genomes had a median of 99.48% unambiguous base calls and a median depth of 176x (Extended Data Fig. 1a). Technical replicates had high concordance: in 27 samples prepared more than once, only two base calls differed across replicates. We also sequenced a set of 29 PCR-negative samples from suspected mumps patients in Massachusetts to determine if we could detect MuV or other viruses that may explain their symptoms. We saw evidence for MuV in one sample, and identified 4 other viruses known to cause upper respiratory symptoms (at least two of which are known to cause parotitis) in four separate samples [10][11][12] (Extended Data Table 2).
In light of the large number of vaccinated individuals who contracted MuV during this outbreak, we looked for genetic variation that could suggest vaccine escape (Supplementary Table 2). We focused on genotype G MuV genomes, the predominant genotype found in the US, and estimated dN/dS on 200 genotype G genomes from our dataset together with all 25 genotype G whole genomes available from NCBI GenBank 13 . This provided no strong evidence for positive selection at any specific site (Extended Data Fig. 2a, Supplementary Table 3) or in any gene (Extended Data Fig. 2b). No fixed nucleotide substitutions were associated with vaccine status or time since vaccination (Extended Data Fig. 2c,e). We also investigated changes in immunogenic regions of the mumps genome between our samples and the vaccine strain used in the US, focusing on the hemagglutinin (HN) protein, the primary target of neutralizing antibodies 14 . We found numerous fixed differences (Extended Data Fig.  2d), including in regions of potential immunological significance, but did not find clear evidence for variants that might lead to escape from vaccine-induced immunity (see Supplementary Information).  Table 1). Samples from prior to 2016 are not shown. (c) Probability distributions for the tMRCA of selected sequences labeled in (a) (see Extended Data Table 3 for additional clades). Dotted line: mean of each distribution.
We investigated mumps transmission at multiple geographic scales, first examining how the Massachusetts outbreak fits into the larger context of resurgent mumps. We performed a phylogenetic analysis using the same 225 genotype G MuV genomes as above (Extended Data Fig. 3a). The resulting phylogeny (Fig. 1a) suggests that most MuV genomes from cases in Massachusetts descend from those in the 2006 US outbreak: these Massachusetts genomes fall within a clade that contains the 2006 samples, and the root of this clade lies within or very close to the 2006 samples. This conclusion is supported by a root-to-tip analysis (Extended Data Fig. 3b,c). Other US samples collected between 2006 and the Massachusetts outbreak (2016) also fall within this clade, suggesting sustained MuV transmission within the country. Additionally, we saw that MuV sequences from Massachusetts are interspersed with sequences across the US also collected in 2016-2017 (Fig. 1a), providing evidence for extensive geographic movement of MuV within the country on short time scales. A principal components analysis of MuV sequences (Extended Data Fig. 3d) similarly shows that sequences from other regions within the US often cluster with ones from Massachusetts.
We see clear evidence for several introductions of MuV into Massachusetts, including two distinct MuV lineages (clades I and II) that descend from the 2006 outbreak and diverged late in 2012 (Fig. 1c, Extended Data Table 3). To estimate when the two primary clades entered Massachusetts, we calculated the time to the most recent common ancestor (tMRCA) of each using only samples from the 2016-2017 Massachusetts outbreak (I-outbreak = July 2015, II-outbreak = November 2014) (Fig. 1c, Extended Data Table 3). These tMRCAs, as well as the estimated divergence time of the two clades, are well before the diagnosis of the first cases associated with the 2016-2017 outbreak, providing strong evidence for a minimum of two introductions into Massachusetts. We also saw evidence for additional introductions: three clades (0-UM, 0-HU, 0-BU) contain MuV sequences distinct from the majority of Massachusetts sequences, and each has at least one MuV sequence from a patient with foreign travel history during the incubation period of the virus (12-25 days before symptom onset 15 ).
The phylogeny illuminates details of the outbreak at finer scales as well, showing that multiple clades were co-circulating within academic institutions. Specifically, there are multiple viral lineages in individuals associated with UMass (clade II and 0-UM), in Harvard-associated individuals (clades I, II, and 0-HU), and in BU-associated individuals (clades I, II, and 0-BU) (Extended Data Fig. 4a). Thus, what appeared to be a single outbreak across multiple institutions is shown by sequence data to consist of multiple overlapping chains of transmission. This adds to the overall picture of mumps in the US as being sustained by multiple, widespread lineages that lead to local flare-ups. Viral genome sequence data allowed us to identify a spillover event from one institution into a nonuniversity community. Before genomic data were available, cases associated with Harvard and with an apparently unrelated local community (clade II-community) were inferred to be separate outbreaks due to the different populations affected (mostly students versus adults with no obvious university connection) and an apparent five month gap between the two sets of cases (Supplementary Table 1). From the phylogeny, however, it is clear that these two groups of cases are connected, and that the community-associated cases likely represent a spillover from Harvard into the broader population (Fig.  2a). Additional epidemiological investigation identified three infected individuals associated with both Harvard and the community who could have served as transmission links. The gap in time between these two sets of linked cases suggests local undetected mumps circulation, potentially due to unreported cases or asymptomatic infection 16 . In line with the picture seen above, this observation illustrates sustained transmission that is only sporadically detected. The comprehensive sampling of clinical mumps cases at Harvard again allowed us to combine genomic and epidemiological data to better understand outbreak dynamics, this time by estimating the number of MuV introductions into the university. The detailed epidemiological data alone were inadequate to distinguish whether the initial effective reproductive number, R E (t=0), within the university exceeded the critical threshold of 1.0 ( Fig. 2b left, Extended Data Fig. 5). However, incorporating the observed number of distinct viral lineages observed markedly improved our ability to infer transmission dynamics within the university, supporting an estimate of five (95% CI: 4-18) distinct introductions, each expected to cause R E (t=0)=1.70 (95% CI: 1.50-1.91) secondary infections (Fig. 2b right). That R E (t=0) is well above 1.0 has implications for the required reach of outbreak response vaccination.
We next investigated the usefulness of sequence data to supplement epidemiological data at the finest analysis scale: reconstructing individual transmission chains. Epidemiological links alone paint a sparse picture of transmission (Fig. 2c left). To determine how effective genomic data could be in inferring any missing epidemiological links, we examined the genetic distance between samples with a known epidemiological link (i.e., samples likely part of the same transmission chain). Samples with a known link group together according to genetic distance (Extended Data Fig. 6a), and genetic distance can be used as a predictor of epidemiological linkage (Extended Data Fig. 6b). Given this, we used genomic data (along with sampling dates, but without using known epidemiological links) to reconstruct transmission chains during the outbreak (Fig. 2c right), focusing on samples within clade II-outbreak. These reconstructed transmission chains in several cases support known epidemiological links, but also included links between individuals without known contacts. The reconstruction did not reproduce several non-direct contact epidemiological links ('shared activity links', see Methods), which highlights the utility of both epidemiological and genetic data in an outbreak context.
When shared between samples, within-host variants (intrahost variants, or iSNVs) can provide additional information about transmission chains during viral outbreaks [17][18][19] , but proved to be uninformative in our dataset. We found no evidence for transmitted iSNVs among our samples, including among sequences from five direct contacts (Supplementary Table 2). These data suggest that the MuV transmission bottleneck may be small enough to preclude shared within-host variation (see Supplementary Information). We did, however, detect changes to MuV during the course of individual infections. In the two patients for whom we had multiple samples, the MuV genomes differed at one site between time points (in both cases, nine days apart) (see Supplementary  Information).
To understand MuV circulation in its global context, we analyzed the 316-nucleotide SH gene. This gene was originally chosen for MuV genotyping 20,21 because it has a high density of variation and provides a small, convenient target amenable to sequencing methods available at the time; it is therefore the region of the genome for which the most sequence data is available. Our SH dataset consists of 3,646 SH sequences available on NCBI GenBank from around the world, including those from genomes generated in this study (Fig. 3a, Extended Data Fig. 7a). We calculated tMRCA for each of the 11 clinically relevant genotypes (Fig. 3b) and found that genotype A, the strain used in most vaccines (including the Jeryl Lynn vaccine used in the US), coalesces the earliest and appears to have stopped circulating (previously noted in ref. 22). The other genotypes, including genotype G, coalesce over a decade later. Each dot represents a sample (colored as in (a)) and each row contains samples with identical SH sequences, except the bottom, which includes samples whose sequences are distinct from those in the above 5 categories. Numbers right of each row are the percentage of all genotype G samples found in that row.
We performed a phylogeographic analysis on these SH sequences to understand movement of MuV between world regions. To reduce temporal and geographic sampling biases due in part to incomplete global MuV surveillance, we looked at samples collected since 2010 in four well-sampled regions (US, Europe, East Asia, and South/Southeast Asia). (For details regarding resampling, see Methods.) We show significant MuV migration between the US and Europe (Fig. 3c, Extended Data Fig. 7b,c) and see that most introductions to the US are from Europe, although there is also support for mumps migrations from East Asia and South/Southeast Asia to the US. Likewise, most introductions to Europe appear to be from the US. We also found that recent sequences from the US have primarily European ancestry, and vice versa (Extended Data Fig. 7d). Notably, recent sequences from East Asia and South/Southeast Asia have minimal external ancestry, indicating relatively little spread to these regions (Fig. 3c).
It is important to note that the SH gene is less than 3% of the MuV genome, represents only a fraction of MuV genetic diversity, and has fewer phylogenetically relevant nucleotide substitutions than other genes 23 . While these sequences are useful for tracking genotype spread on a global scale, the lack of variability across them (Fig. 3d) may inflate our estimates of migration. Moreover, lack of variability makes it difficult or impossible to reconstruct MuV spread on smaller temporal and geographic scales. For example, analysis using SH gene sequencing alone would not have shown a genetic link between Harvard and the geographic community (Extended Data Fig. 6c, Extended Data Fig. 8). Furthermore, there is limited variability in genotype G sequences in the US, even over a time period of several years (Fig. 3d), highlighting the value of whole genome sequencing.
The pairing of whole genome sequences with epidemiological data provides a detailed picture of outbreak dynamics at a full range of geographic scales, from national migration patterns to individual transmission chains. Our MuV sequence data show that multiple co-circulating strains underlie recent increases in mumps cases within Massachusetts, and even within single institutions. Unexpected insights using genomic epidemiology include the finding that two apparently unrelated outbreaks were in fact linked and that a sustained series of cases within one institution stemmed from multiple independent introductions. While SH gene sequencing has proved useful for tracking broad trends in MuV circulation, many of our findings -continuous circulation of MuV lineages, connections between cases in Massachusetts, specific dynamics of MuV at Harvard -were only possible through the use of whole genome sequencing (Extended Data Fig. 8).
The large number of mumps cases among vaccinated individuals has raised concerns about mumps vaccination. The US Advisory Committee on Immunization Practices recently recommended a third dose of MMR vaccine for persons at increased risk for acquiring mumps because of an outbreak 24 . Our finding of ongoing, undetected MuV circulation in the US gives additional weight to this recommendation, and underscores the importance of interrupting transmission in high-risk communities. This study presents a model for the future of genomics-informed infectious disease control, in which genomic data can both support and extend conventional epidemiological data to inform our understanding of how pathogens spread.

Data availability
The 203 MuV whole genome sequences generated in this study, as well as nine low quality sequences not included in the analysis, are available on NCBI GenBank 13 under BioProject accession PRJNA394142 (accession numbers MF965196-MF965318 and MG986380-MG986468).

Ethics statement
The study protocol was approved by the Massachusetts Department of Public Health (MDPH), Centers for Disease Control and Prevention (CDC), and Massachusetts Institute of Technology (MIT) Institutional Review Boards (IRB). Harvard University Faculty of Arts and Sciences and the Broad Institute ceded review of sequencing and secondary analysis to the MDPH IRB through authorization agreements. The MDPH IRB waived informed consent given this research met the requirements pursuant to 45 CFR 46.116 (d). The CDC IRB determined this project to be non-human subjects research as only de-identified leftover diagnostic samples were utilized. In compliance with the IRB agreement, Harvard University, University of Massachusetts Amherst, and Boston University granted approval for publication of their institution names in this paper.

Sample collections and study subjects
Buccal swab samples were obtained from suspected and confirmed mumps cases tested at MDPH and CDC.  Table 1) includes all confirmed and probable mumps cases reported to MDPH in that time period. Probable cases include cases with a positive mumps IgM assay result, or those with an epidemiological link to a confirmed case 25 . Samples from CDC are a selection of PCR-positive cases submitted to the CDC for testing between 2014-2017. See Supplementary Table 1 for de-identified information, including metadata, about study participants.

Viral RNA isolation
Sample inactivation and RNA extraction were performed at the MDPH, Broad Institute, and CDC. At MDPH, viral samples were inactivated by adding 300 µL Lysis/Binding Buffer (Roche) to 200 µL sample, vortexing for 15 seconds, and incubating lysate at room temperature for 30 minutes. RNA was then extracted following the standard external lysis extraction protocol from the MagNA Pure LC Total Nucleic Acid Isolation Kit (Roche) using a final elution volume of 60 µL. At the Broad Institute, samples were inactivated by adding 252 µL Lysis/Binding Buffer (ThermoFisher) to 100 µL sample. RNA was then extracted following the standard protocol from the MagMAX Pathogen RNA/DNA Kit (ThermoFisher) using a final elution volume of 75 µL. At CDC, RNA extraction followed the standard protocol from the QiaAmp Viral RNA mini kit (Qiagen).

PCR diagnostic assays performed at MDPH and CDC
Diagnostic tests for presence of MuV were performed at the MDPH and CDC using the CDC Real-Time (TaqMan) RT-PCR Assay for the Detection of Mumps Virus RNA in Clinical Samples 7,26 . Each sample was run in triplicate using both the Mumps N Gene assay (MuN) and RNase P (RP) assay using this protocol. RT-PCR was performed on the Applied Biosystems 7500 Fast Real-Time PCR system or Applied Biosystems Prism 7900HT Sequence Detection System instrument.

PCR quantification assays performed at Broad Institute
MuV RNA was quantified at the Broad Institute using the Power SYBR Green RNA-to-Ct 1-Step qRT-PCR assay (Life Technologies) and CDC MuN primers. The 10 µL assay mix included 3 µL RNA, 0.3 µL each MuV forward and reverse primers at 5 µM concentration, 5 µL 2x Power SYBR RT-PCR Mix, and 0.08 µL 125x RT Enzyme Mix. The cycling conditions were 48C for 30 min and 95C for 10 min, followed by 45 cycles of 95C for 15 sec and 60C for 30 sec with a melt curve of 95C for 15 sec, 55C for 15 sec, and 95C for 15 sec. RT-PCR was performed on the ThermoFischer QuantStudio 6 instrument. To determine viral copy number, a doublestranded gene fragment (IDT gBlock) was used as a standard. This standard is a 171 bp fragment of the MuV genome (GenBank accession: NC_002200) including the amplicon (sequence: GGA TCG ATG CTA CAG TGT ACT AAT CCA GGC TTG GGT GAT GGT CTG TAA ATG TAT GAC AGC GTA CGA CCA ACC TGC TGG ATC TGC TGA TCG GCG ATT TGC GAA ATA CCA GCA GCA AGG TCG CCT GGA AGC AAG ATA CAT GCT GCA GCC AGA AGC CCA AAG GTT GAT TCA AAC).
23S rRNA content in samples was quantified using the same Power SYBR Green RNA-to-Ct 1-Step qRT-PCR assay kit and cycling conditions. Primers were used to amplify a 183 bp universally conserved region of the 23S rRNA (fwd: 93a -GGG TTC AGA ACG TCG TGA GA, rev: 97ar -CCC GCT TAG ATG CTT TCA GC) 27 . To determine viral copy number, a double-stranded gene fragment (IDT gBlock) was used as a standard. This standard is a 214 bp fragment of the Streptococcus HTS2 genome (accession: NZ_CP016953) (sequence: AGC GGC ACG CGA GCT GGG TTC AGA ACG TCG TGA GAC AGT TCG GTC CCT ATC CGT CGC GGG CGT AGG AAA TTT GAG AGG ATC TGC TCC TAG TAC GAG AGG ACC AGA GTG GAC TTA CCG CTG GTG TAC CAG TTG TCT CGC CAG AGG CAT CGC TGG GTA GCT ATG TAG GGA AGG GAT AAA CGC TGA AAG CAT CTA AGT GTG AAA CCC ACC TCA AGA T). qPCR data from both assayseach performed only on a subset of samples -is reported in Supplementary Table 1.

Illumina library construction and sequencing
cDNA synthesis was performed as described in previously published RNA-seq methods 8 . In samples where bacterial rRNA was not depleted, 25 fg synthetic RNA was added at the beginning of cDNA synthesis to track sample cross-contamination. Positive control libraries were prepared from a mock MuV sample in which cultured Enders strain (ATCC VR-106) MuV was spiked into a composite buccal swab sample from healthy patients and diluted to MuV RT-qPCR Ct=21. This mock sample was extracted using the viral RNA isolation protocol described above, except that total nucleic acid was eluted in 100 µL. Negative control libraries were prepared from water. Illumina Nextera XT was used for library preparation: indexed libraries were generated using 16 cycles of PCR, and each sample was indexed with a unique barcode. Libraries were pooled equally based on molar concentration and sequenced on the Illumina HiSeq 2500 (100 or 150 bp paired-end reads) platform.
Hybrid capture Viral hybrid capture was performed as previously described 8 using two different probe sets. In one case, probes were created to target MuV and measles virus (V-MM probe set), and in one case, probes were created to target 356 species of viruses known to infect humans (V-All probe set) 9 . Capture using V-All was used to enrich viral sequences primarily in samples in which we could not detect MuV, as well as in other samples (see Supplementary Table 1 for a list of which samples were captured using which probe set). As described in ref. 9, the probe sets were designed to capture the diversity across all publicly available genome sequences on GenBank for these viruses. Probe sequences can be downloaded here: https://github.com/broadinstitute/catch/tree/cf500c69/probe-designs.
We replaced deletions in the coding regions with ambiguity ('N'). In one sample, MuVs/Massachusetts.USA/11.16/5[G], with an insertion at position 3,903 (based on a full 15,384-nucleotide MuV genome, e.g., accession JN012242.1) we removed a poorly-supported (<5 reads covering the site) extra 'A' in a homopolymer region.
To calculate sequencing metrics (Extended Data Fig. 1), we used SAMtools 29 to downsample raw reads for each replicate to 1 million reads and then re-ran assembly as described above. Samples from one contaminated sequencing batch were excluded, as were all replicates from PCR-negative samples. In cases where samples from two time points were sequenced from a single patient, we included only the first time point in the collection collection interval analysis (Extended Data Fig. 1d).

Metagenomic analysis
We used the V-All probe set for capture on all samples from suspected mumps cases with a negative MuV PCR result (n=29). A subset of PCR-positive samples were also sequenced with this probe set (n=145) or without capture ('unbiased', n=111). We used the mock Enders strain MuV sample as a positive control on a sequencing run containing all PCR-negative samples, as well as a water sample as a negative control. We used the metagenomic tool Kraken v0.10.6 30 via viral-ngs to identify the presence of viral taxa in each sample. We built a database similar to the one described in ref. 9 , except without insect species. This database encompasses the known diversity of viruses known to infect humans and can be downloaded in three parts at https://storage.googleapis.com/sabeti-public/meta_dbs/kraken_full_20170522/[file] where [file] is: database.idx.lz4 (595 MB), database.kdb.lz4 (75 GB), and taxonomy.tar.lz4 (66 MB). Due to the possibility of contamination, we prepared a second, independent sequencing replicate on all PCR-negative samples with evidence for MuV or another virus, and required both replicates to contain reads matching the virus detected in the sample. We found no evidence of viruses other than MuV in PCR-positive samples.
We required the total raw read count for any genus in any sample to be twice (in practice, seven times) that in any negative control from any sequencing batch. For any sample that had one or more pathogenic viral genera that passed this filter and had de-duplicated reads well distributed across the relevant viral genome, we attempted contig assembly: we used viral-ngs to filter all sample reads against all NCBI GenBank 13 entries matching the identified species and then de novo assembled reads using Trinity 31 through viral-ngs and scaffolded against the closest matching full genome identified by a blastn query 32 . We report all viruses identified via this method in Extended Data Table 2.
In parallel, we used SPAdes 33 within viral-ngs to de novo assemble contiguous sequence from all de-duplicated, depleted reads. We used the metagenomic tool DIAMOND v0.9.13 34 with the nr database downloaded 29 May 2017, followed by blastn 32 of DIAMOND-flagged contigs. Using this method, we confirmed the presence of all previously identified viruses except Influenza B virus, for which we never assembled a contiguous sequence. We found no evidence of additional viruses using this method.

Criteria for pooling across replicates
We prepared one or more sequencing libraries from each sample and attempted to sequence and assemble a genome from each of these replicates. We required a replicate of a sample to contain 3,000 unambiguous base calls for its read data to be included in that sample's final genome assembly. This threshold was based on the maximum number of unambiguous bases (2,820) observed in negative controls across all uncontaminated sequencing batches. One sequencing batch showed evidence of contamination: we were able to assemble 7,615 unambiguous MuV bases from a water sample, with a median coverage of 4x. For samples prepared in this batch only we implemented an additional requirement for including a replicate in pooling: the assembly must have a median coverage of at least 20x, 5 times the median coverage of the water sample.

Multiple sequence alignment of genotype G whole genomes
We required a MuV genome to contain 11,538 unambiguous base calls (75% of the total 15,384-nucleotide genome; accession JN012242.1) for inclusion in the alignment of whole genome sequences that we used for downstream analysis. We aligned MuV genomes using MAFFT v7.221 35 with default parameters. In Supplementary Data, we provide the sequences and alignments used in analyses.

Visualization of coverage depth across genomes
We plotted aggregate depth of coverage across the 200 samples whose genomes were included in the final alignment (Extended Data Fig. 1a) as described in ref. 36. We aligned reads against the reference genome with accession JX287389.1 and plotted over a 200-nt sliding window.

Analysis of within-and between-sample variants
We ran V-Phaser 2.0 37 via viral-ngs on all pooled reads mapping to a sample assembly to identify within-sample variants (Supplementary Table 2). To call a variant, we required a minimum of 5 forward and reverse reads, as well as no more than 10-fold strand bias, as previously described 17 . Samples with genomes generated by the sequencing batch that showed evidence of contamination (see 'Criteria for pooling across replicates' above) were not included in within-host variant analysis. When analyzing variants in known contacts, we used pairs of samples designated as 'contact links', as described in 'Relationship between epidemiological and genetic data' below.
Between-sample variants were called by comparing each final genome sequence to JX287385.1, the earlier of the two available whole genomes from the 2006 mumps outbreak in Iowa, USA (Supplementary Table 2). We ignored all fully or partially ambiguous base calls, and excluded sequences that did not descend from the USA_2006 clade from this analysis. When examining amino acid changes in HN given vaccination status (see 'SH and HN multiple sequence alignment' below), we ignored sequences from patients with unknown vaccination history.

Maximum likelihood estimation and root-to-tip regression
We generated a maximum likelihood tree using the whole genome genotype G multiple sequence alignment. We used IQ-TREE v1.3.13 38 with a GTR substitution model and rooted the tree on the oldest sequence in this dataset (accession KF738113.1) in FigTree v1.4.2 39 .
To estimate root-to-tip distance of samples in the primary USA lineage, we subsetted the full genotype G alignment to include only samples descendent of the USA_2006 clade, including samples in this clade (see Fig.  1a) and used TempEst v1.5 40 with the best fitting root according to the heuristic residual mean squared function to estimate distance from the root. We used scikit-learn 41 in Python to perform linear regression of distances on dates.
We also generated maximum likelihood trees using the SH gene only (full 316-nucleotide mRNA), HN (coding region only), F (coding region only), and a concatenation of the aforementioned SH, HN, and F regions (Extended Data Fig. 8). For each tree, we started with the whole genome genotype G alignment (225 sequences) and extracted the relevant region(s). We then removed any sequence with two or more consecutive ambiguous bases ('N's) in any of SH, HN, or F, leaving 209 sequences in each alignment. We used IQ-TREE v1.5.5 with a GTR substitution 42 model to generate maximum likelihood trees.

Molecular dating using BEAST
We performed all molecular clock analyses on whole genome sequences using BEAST v1.8.4 43 . We excluded from the CDS the portion of the V protein after the insertion site 44 because of reading frame ambiguity in that region. On the CDS, we used the SRD06 substitution model 45 , which breaks codons into two partitions (positions (1+2) and 3) with HKY substitution models 46 and allows gamma site heterogeneity 47 (four categories) on each. We used a separate partition on noncoding sequence with an HKY substitution model and gamma site heterogeneity. To accomodate inexact dates in seven sequences from NCBI GenBank, we used sampled tip dates 48 .
We tested six models as described in ref. 36. Each was a combination of one of two clock models (strict clock and uncorrelated relaxed clock with log-normal distribution 49 ) and one of three coalescent tree priors (constant size population, exponential growth population, and Bayesian Skygrid model 50 with 20 parameters). On each model, we estimated marginal likelihood with path sampling (PS) and stepping stone sampling (SS) 51,52 (Extended Data Table 3) after sampling 100 path steps each with a chain length of 2 million.
We sampled trees and other parameters on each model by running BEAST for 200 million MCMC steps, sampling every 20,000 steps, and removing 20 million steps as burn-in. We report the mean clock rate as the substitution rate for relaxed clock models. On the sampled trees, we used TreeAnnotator v1.8.4 to find the maximum clade credibility (MCC) tree and visualized it in FigTree v1.4.3 39 . To estimate tMRCAs (Fig. 1c, Extended Data Table 3), we ran BEAST again for each of the six models, drawing from these same sets of sampled trees (without any parameters), for 10,000 steps, sampling every step. We selected a relaxed clock and Skygrid model for plots of tMRCA distributions (Fig. 1c) and MCC trees over the whole genome genotype G sequences.

Gene-and site-specific dN/dS analyses
We used BEAST v1.8.4 43 to estimate dN/dS per-site (Extended Data Fig. 2a, Supplementary Table 3) and per-gene (Extended Data Fig. 2b) using the same alignment of 225 whole genome sequences described above (again, removing the portion of the V gene after the insertion site).
For site-specific dN/dS estimation, we used the CDS as input and created a separate partition for each codon position (three partitions). We used an HKY substitution model 46 on each partition, and an uncorrelated relaxed clock with log-normal distribution 49 for branch rates. Here, we sampled from the same set of trees that were sampled as described above in 'Molecular Dating using BEAST' (relaxed clock with Skygrid tree prior). We ran BEAST for 10 million MCMC steps, sampling every 10,000 steps. We estimated site-specific dN/dS at each sampled state using renaissance counting 53,54 and show summary statistics at each site after discarding 1 million steps as burn-in.
For per-gene estimation, we created eight separate partitions: seven correspond to the CDS of a gene (F, HN, L, M, NP, SH, partial V) and the last corresponds to noncoding sequence. For each gene partition, we used a Goldman-Yang codon model 55 with its own parameters for dN/dS (omega) and clock rate. For the noncoding partition, we used an HKY substitution model 46 and gamma site heterogeneity 47 (four categories). We sampled tip dates as with the molecular clock analyses above, and used a Bayesian Skyline tree prior 56 (10 groups). We ran BEAST for 200 million MCMC steps to sample trees and parameter values, discarded 20 million steps as burn-in, and plotted the posterior distribution of omega for each gene partition.

Principal components analysis
The dataset for PCA consisted of all SNPs from sites with exactly two alleles in the set of all genotype G genomes. We imputed missing data with the R package missMDA 57 and calculated principal components with the R package FactoMineR 58 . We discarded 14 samples as outliers based on visual inspection, leaving 211 samples in the final set.

Relationship between epidemiological and genetic data
We obtained detailed epidemiological data for samples shared by MDPH from the Massachusetts Virtual Epidemiologic Network (MAVEN) surveillance system, an integrated web-based disease surveillance and case management system 59 . We defined two types of epidemiological links: 'contact links', between individuals who were determined to be close contacts during public health investigation and had symptom onset dates 7-33 days apart (individuals with MuV are usually considered infectious two days before through five days after onset of parotid swelling, with a typical incubation period of 16-18 days, ranging from 12-25 days) 15 ; and 'shared activity links', between individuals who participated in the same extra-curricular activity (e.g., a sports team or university club) or frequented a specific residence or athletic facility. When we refer to epidemiological links without specifying link type, we include both types of links.
We calculated pairwise genetic distance between all pairs of samples in the whole genome genotype G alignment. For each pair, the genetic distance score is / , where s is the number of unambiguous differing sites (both sequences must have an unambiguous base at the site, and the called bases must differ) and n is the number of sites at which both sequences have an unambiguous base call.
To visualize the similarity between genomes and its relationship to epidemiological linkage, we performed a multidimensional scaling on sequences in clade II-outbreak (Fig. 1a). This clade is comprised of mostly cases from Harvard and the related community outbreak. Using their pairwise genetic distances, we calculated a metric multidimensional scaling to two dimensions in R with cmdscale 60 . We then evenly split the range of the output coordinates into a 100x100 grid and collapsed each point into this grid, and plotted the number of points at each grid coordinate; this improves visualization of nearly overlapping points (identical or near-identical genomes). We plotted curves that represent epidemiological links between cases within each of the grid coordinates. This is shown in Extended Data Fig. 6a.
To determine the ability of genetic distance to predict epidemiological linkage, we again looked specifically at cases within clade II-outbreak (Fig. 1a). Using the Python scikit-learn package 41 , we constructed a receiver operating characteristic (ROC) curve using pairwise distance between II-outbreak cases as the predictor variable and presence or absence of an epidemiological link as the binary response variable. This is shown in Extended Data Fig. 6b.

Model of mumps transmission in a university setting
We developed a stochastic model for mumps transmission accounting for the natural history of infection, vaccination status, and control measures implemented in response to the outbreak at Harvard. Our stochastic model of mumps virus transmission included the stages after initial infection, the durations of which we inferred using data from previous clinical studies (Extended Data Fig. 5a-c). These included the gamma-distributed incubation period from infection to onset of mumps virus shedding in saliva 61 ; the gamma-distributed period of latent infection from shedding onset to parotitis onset 61,62 ; and the log-normally distributed time from parotitis onset to the cessation of shedding (defined in ref. 63). For asymptomatic cases, we defined the total duration of shedding ( ) as the sum of independent random draws from the durations of shedding before and after parotitis onset, based on the lack of any reported difference in durations of shedding for symptomatic and asymptomatic cases 61 . To account for case isolation interventions implemented at Harvard, we modeled the removal of symptomatic individuals one day after onset of parotitis. In comparison to the 70% probability for symptoms given infection among unvaccinated individuals 64 , we modeled the probability of symptoms given infection as uniformly distributed between 27.3% and 38.3% 5,16 .
We used previous estimates of the effectiveness and waning rate of mumps vaccination 5 , and of the vaccination status distribution of individuals on a university campus 65 , to account for susceptibility to infection among the Harvard population (N=22,000). We scaled risk for MuV infection, given exposure, to time since receipt of the last vaccine dose, yielding the hazard ratio for an individual i who received his/her last dose ! years previously, relative to an unvaccinated individual. For fitted values from ref. 5, estimates were below 1.0 for individuals vaccinated since 1967, when the Jeryl Lynn vaccine was introduced (Extended Data Fig. 5d).

Given the instantaneous hazard of infection for an as-yet uninfected individual i exposed to I(t) infected individuals
the probability of evading infection over the course of a one-day simulated time step was !! ! (!) . The percontact transmission rate ( ) was measured from the initial (pre-introduction) value of the effective reproductive number:

Inferring transmission dynamics
The number of cases (71) and identification of multiple, distinct viral clades within Harvard suggested limited permeation of MuV after any introduction. We simulated dynamics of individual transmission chains to understand the epidemiological course of introduced viral lineages, and to infer values of ! (0) and the number of importations of MuV. We used the simulation model to sample from the distribution of the number of cases (X, including the index infection if symptomatic) resulting from a single introduction over a 1.5 year time course: We resampled according to { ! | ! (0)}to define the distribution of the cumulative number of cases (Z) resulting from Y introductions, conditioned on ! (0): Of the 71 cases at Harvard, 66 had MuV genomes in our dataset, so we ran simulations where ≥ 66, drawing k=66 cases at random to determine the number of distinct lineages (S, defined by the index infection) expected to be present within such a sample. The probability of obtaining 66 sequences, and observing = ! lineages among them, is The posterior density of our model also accounted for the probability of observing 71 symptomatic cases in total. Defined in terms of the number of introductions and the initial reproductive number, the model posterior was proportional to where 4 is the number of viral lineages in the 66 Harvard cases (representing clades 0-HU, II-community, and two sub-clades within clade II). We measured this probability from 100,000 iterates for each pairing of {0.10, 0.11, . . . , 2.50} and ∈ {1, 2, . . . , 200}. Last, we defined the minimum necessary third-dose vaccine coverage (C) to bring the effective number below unity using the relation sampling from previous estimates 5 of the effectiveness of mumps vaccination against infection, prior to waning of immunity (here defined as VE (0)).

Transmission reconstruction using outbreaker
We used the R package outbreaker 66 to reconstruct transmission for samples included in clade II-outbreak. We estimated the generation interval by fitting a gamma distribution, via maximum likelihood, to the time between symptom onset dates for cases with confirmed epidemiological links (Extended Data Fig. 5e, Supplementary  Data). The resulting estimates are nearly identical to those reported in previous studies 67 . We ran outbreaker six times in parallel, each with 1 million MCMC steps, and discarded the first 10% of states as burn-in. We assessed run convergence and combined results for five of the six parallel runs to determine the reconstructed transmission tree (Fig. 2c, right). To reconstruct transmission using SH sequences only (Extended Data Fig.  6c), we extracted the SH gene from the II-outbreak alignment and ran outbreaker as described above, using the results from all six parallel runs in the analysis.

SH and HN multiple sequence alignment
To analyze all published SH and HN MuV sequences, we searched NBCI GenBank in July 2017 for all nucleotide sequences with organism 'Mumps rubulavirus'. We performed a pairwise alignment between each sequence s i and a reference genome (accession: JX287389.1) using MAFFT v7.221 35 with parameters: '-localpair --maxiterate 1000 --preservecase'. We then extracted the SH sequence from each s i based on the reference coordinates in the alignment, removing all SH sequences without the full 316-nucleotide region and all SH sequences with an insertion or deletion ('indel') relative to the reference. We then used MAFFT with parameters '--localpair --maxiterate 1000 --retree 2 --preservecase' to create a multiple sequence alignment of the extracted SH gene sequences and removed any sequences with indels in this final alignment. We repeated the same process for the HN region, requiring the full 1749-nucleotide coding region.
In both the SH and HN alignments, we removed sequences from vaccine strains (i.e., genotype N, or another genotype marked as "(VAC)" or "vaccine"). We also removed sequences with GenBank records indicating extensive passaging. In the SH alignment only, we removed sequences with no reported collection date or country of origin, as these data are required for phylogeographic analyses. In samples with a collection decade (e.g., 1970s) but not a specific year, we assigned the first year of the decade; in samples with only a collection year, we assigned a decimal year of + 0.5 (e.g., 1970.5); in samples with year and month but no day, we used the day halfway through the given month (e.g., 2015-03 becomes 2015-03-15) to calculate the decimal year; and in samples with an epidemiological week but no specific day, we approximated the decimal year as + ( / 52), except samples collected in epidemiological week 52 were relabeled as week 51.999 to avoid confusion with year-only samples.
In both the SH and HN alignments, we relabeled outdated genotypes (M, E, and any sub-genotypes 68 ) and constructed a maximum likelihood tree (using IQ-TREE with a GTR substitution model, as described above) to assign a genotype if one was not reported on GenBank. We preserved genotypes designated as 'Unclassified' 68 .
To each alignment, we added all SH or HN sequences from individual patients generated in this study, except those with 2 or more consecutive ambiguous bases ('N's) in the SH or HN region. The sequences used in the SH and HN analyses are listed in Supplementary Table 4.

SH phylogeographic analysis
To perform phylogenetic and phylogeographic analyses of the SH gene sequence, we first sampled trees using BEAST v1.8.4 43 . We used constant size population and strict clock models. Other demographic and clock models would have likely provided a better fit to the data, but it appeared that using them (especially the use of a relaxed clock) would have made it impractical to achieve convergence and sufficient sampling of trees and all parameters on the full dataset. We used the HKY substitution model 46 with 4 rate categories and no codon partitioning. We ran BEAST in 4 replicates, each for 500 million states with sampling every 50,000 states, and removed the first 150 million states as burn-in. We verified convergence of all parameters across the 4 replicates and then combined the 4 replicates using LogCombiner.
We used TreeAnnotator to determine the maximum clade credibility (MCC) tree (Fig. 3a) from a resampling of 350 trees and visualized the result in FigTree v1.4.3 39 . We computed a kernel density estimate of the probability distribution of the tMRCA over all sampled states for the 11 genotypes in this dataset (Fig. 3b).
The full dataset has large temporal and spatial sampling biases that might affect phylogeographic results, such as estimates of migration. Using a structured coalescent to model migration and coalescent processes could alleviate these biases, but might face practical limitations on this large dataset 69 . Instead, we used resampling on the input sequences to construct distributions of estimates. To perform this resampling, we focused on only samples that were collected both within a window of time and from a geographic region with sufficient sampling. Namely, we considered only sequences sampled in 2010 or afterward and collapsed the locations shown on the full dataset (Fig. 3a) to just 4 regions: United States (consisting of only samples from the United States), Europe (consisting of samples whose location was labeled as Eastern Europe, Northern Europe, Southern Europe, Western Europe, or the United Kingdom), East Asia (consisting of samples whose location was labeled as Eastern Asia or Japan), and South/Southeast Asia (consisting of samples whose location was labeled as Southeast Asia or Southern Asia). We ignored samples from 5 locations: Canada; Caribbean, Central America, and South America; Middle and Eastern Africa; Northern Africa; Middle East). Then, we randomly sampled 10 sequences (without replacement) from each region for each year (i.e., 2010-2011, 2011-2012, etc.). We resampled the input sequences with this strategy 100 times.
Note that sampling biases affect this resampling strategy as well. Notably, several years in the East Asia and the South/Southeast Asia regions have fewer than 10 sequences (sometimes, zero) available for resampling (Supplementary Table 4), which may lead to an underestimate on the relative rates of migration involving these regions. Moreover, the high sequence similarity between SH gene sequences from United States and Europe (Fig. 3d) may bias upward the distributions on the relative rates of migration between these regions and on the proportion of ancestry shared between them (Extended Data Fig. 7e) (e.g., if there have been few true migrations between these regions, but the gene has not accumulated substitutions in the time between those migrations).
For each of the 100 resamplings of the input sequences, we ran BEAST to sample trees, as described above, for 100 million states sampling every 10,000 states; we removed 10 million states as burn-in and resampled to obtain 1,000 sampled trees.
Then, we performed phylogeographic analyses on each of the 100 samplings of input sequences by drawing from their sampled trees. We used a discrete trait substitution model 70 on location in BEAST v1. 8.4. To estimate transition rates between locations we used an nonreversible CTMC model with 4 2 -4 = 12 rates. Furthermore, to evaluate the significance of routes in the diffusion process, we added indicator variables to each rate through Bayesian stochastic search variable selection (BSSVS); we set the number of non-zero rates to have a Poisson prior with a mean of 3.0, placing considerable prior probability on having the fewest rates needed to explain the diffusion. We ran BEAST with 10 million states, sampling every 1,000 states, and removed the first 1 million as burn-in. At each sampled state, we logged the complete Markov jump history 71,72 , as well as a tree with the reconstructed ancestral location of each node.
To determine an MCC tree across the 100 samplings of input sequences, we ran TreeAnnotator on the sampled trees from each of the 100 samplings and then selected the MCC tree, from the 100 options, with the highest clade credibility score. We show this one, colored by reconstructed ancestral locations, in Extended Data Fig.  7b.
For each sampling x i of the 100 samplings of input sequences, we counted the number of jumps between each pair of locations at each state using the complete Markov jump history after resampling the jump history every 10,000 states. For each x i , at each state, we calculated the fraction of migrations between each region pair by dividing the number of migrations between the pair by the total number of migrations at that state. To quantify support for migration routes in each x i , we calculated Bayes factors (BF) on the rate indicator variables. We calculated the posterior probability that a rate is non-zero as the mean of the indicator variable over the MCMC states, thereby providing a posterior odds. We calculated the prior probability that a rate is non-zero as the expected number of non-zero rates divided by the number of rates, which reduces to 1/N, where N is the number of locations; thus, the prior odds is 1/(N-1) or, in this case, 1/3. We set an upper limit of 10,000 on the BF and a lower limit of 1.0. To estimate the proportion of ancestry for each x i , we used skyline statistics via PACT 73 to calculate proportion of ancestry at each location from each other location: in particular, for each x i , we used the sampled trees with ancestral locations as input (after resampling them every 10,000 states), padded the trees with migration events, broke the trees into temporal windows of 0.1 year going back five years prior to sampling, and estimated the proportion of history from tips in each time window.
To summarize phylogeographic results across the 100 resamplings x i of the input sequences, we show probability distributions across the x i . When plotting the fraction of migrations to each region from each other (Fig. 3c), we calculated the mean of this fraction across all the sampled states in each of the 100 runs to produce a point estimate for each x i , and show the distribution of these means across the 100 x i . We calculated the Bayes factors on migration routes between the 4 regions by combining the sampled indicator variables across all 100 x i to compute the posterior odds (Extended Data Fig. 7c). Similarly, the proportion of ancestry plotted between a pair of locations in a time window (Extended Data Fig. 7d) is the mean across the 100 x i of the mean proportion for that pair in that time window from each x i . We calculated the pointwise percentile bands in Extended Data Figure 7e from the mean proportions in each x i across the x i (i.e., they are percentiles across the resamplings of the input sequences).  HN protein only, (c) a concatenation of the HN protein, the fusion protein (F) coding region, and the SH gene, and (d) the complete MuV genome. In all panels, tips are colored by clades as defined in Fig. 1a and Extended Data Table 3. The HN protein sequence does a significantly better job at capturing the epidemiologically-relevant clades than the SH gene, and the tree created from SH+HN+F (nearly 25% of the genome) closely resembles the tree created from whole genome sequences.

Extended Data Figures and Tables
Extended Data Table 1 Extended Data Table 2. Viruses identified in mumps PCR-negative samples. Influenzavirus B is the only multipartite virus listed, and we identify 51 reads mapping to 6 of the 8 segments: in order, 2, 6, 0, 0, 8, 10, 4, 21 reads to each segment. Table 3. Model selection and tMRCA estimates across models. (a) Marginal likelihoods estimated in 6 models: combinations of three coalescent tree priors (constant size population, exponential growth population, and Skygrid) and two clock models (strict clock and uncorrelated relaxed clock with log-normal distribution). Estimates are with path-sampling (PS) and stepping-stone sampling (SS). The Bayes factors are calculated against the model with constant size population and a strict clock. (b) Mean estimates of clock rate, date of tree root, and tMRCAs of the clades shown in Fig. 1a (excluding clade 0-HU, which consists of one sample). USA-4 corresponds to 'Clades I and II' in Fig. 1. Below each mean estimate is the 95% highest posterior density interval. We calculated tMRCAs of additional unlabeled nodes (see Supplementary Data). Tables   Supplementary Table 1. Information and metadata regarding all mumps PCR-positive and PCRnegative samples on which sequencing was attempted. Table 2. List of identified nonsynonymous single nucleotide polymorphisms (SNPs), including the frequency of each SNP. Separate list of identified within-host variants above 2% frequency. Table 3. Calculated dN/dS value and confidence interval for each amino acid position across the MuV genome. Table 4. List of SH and HN sequences and corresponding metadata used in analysis, as well as SH sequences available for resampling.