Tracking a serial killer: Integrating phylogenetic relationships, epidemiology, and geography for two invasive meningococcal disease outbreaks

Background While overall rates of meningococcal disease have been declining in the United States for the past several decades, New York City (NYC) has experienced two serogroup C meningococcal disease outbreaks in 2005–2006 and in 2010–2013. The outbreaks were centered within drug use and sexual networks, were difficult to control, and required vaccine campaigns. Methods Whole Genome Sequencing (WGS) was used to analyze preserved meningococcal isolates collected before and during the two outbreaks. We integrated and analyzed epidemiologic, geographic, and genomic data to better understand transmission networks among patients. Betweenness centrality was used as a metric to understand the most important geographic nodes in the transmission networks. Comparative genomics was used to identify genes associated with the outbreaks. Results Neisseria meningitidis serogroup C (ST11/ET-37) was responsible for both outbreaks with each outbreak having distinct phylogenetic clusters. WGS did identify some misclassifications of isolates that were more distant from the outbreak strains, as well as those that should have been included based on high genomic similarity. Genomes for the second outbreak were more similar than the first and no polymorphism was found to either be unique or specific to either outbreak lineage. Betweenness centrality as applied to transmission networks based on phylogenetic analysis demonstrated that the outbreaks were transmitted within focal communities in NYC with few transmission events to other locations. Conclusions Neisseria meningitidis is an ever changing pathogen and comparative genomic analyses can help elucidate how it spreads geographically to facilitate targeted interventions to interrupt transmission.


Methods
Whole Genome Sequencing (WGS) was used to analyze preserved meningococcal isolates collected before and during the two outbreaks. We integrated and analyzed epidemiologic, geographic, and genomic data to better understand transmission networks among patients. Betweenness centrality was used as a metric to understand the most important geographic nodes in the transmission networks. Comparative genomics was used to identify genes associated with the outbreaks.

Results
Neisseria meningitidis serogroup C (ST11/ET-37) was responsible for both outbreaks with each outbreak having distinct phylogenetic clusters. WGS did identify some misclassifications of isolates that were more distant from the outbreak strains, as well as those that should have been included based on high genomic similarity. Genomes for the second outbreak were more similar than the first and no polymorphism was found to either be unique or specific to either outbreak lineage. Betweenness centrality as applied to transmission a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 networks based on phylogenetic analysis demonstrated that the outbreaks were transmitted within focal communities in NYC with few transmission events to other locations.

Conclusions
Neisseria meningitidis is an ever changing pathogen and comparative genomic analyses can help elucidate how it spreads geographically to facilitate targeted interventions to interrupt transmission.

Background
The incidence of meningococcal disease in the United States has declined steadily since 1996 [1]. Against this backdrop, New York City (NYC) experienced 2 outbreaks of invasive meningococcal disease (IMD) caused by serogroup C Neisseria meningitidis (MenC) that prompted vaccination interventions [2,3]. The first NYC MenC outbreak (Ob1) occurred in the years 2005-2006 and affected current and former drug users and their close contacts [2]. The second community outbreak of IMD (Ob2) began four years later and affected 22 men who have sex with men (MSM) [3]. Previous outbreaks of IMD among MSM were seen in the early 2000s in Toronto [4] and Chicago [5] and after the NYC outbreak, clusters of IMD in MSM were recognized in Southern California [6], Chicago [7], Germany [8], France [9], and Spain [10].
N. meningitidis (Nm) is found in the nasopharynx in up to 35% of healthy adults [11] and it is this colonization state that is believed to be responsible for IMD transmission and outbreaks. Secondary IMD cases are rare and outbreaks occur in either organizational settings, such as universities, or in members of discreet community groups, such as MSM. Pulsed-field gel electrophoresis (PFGE) has been used in NYC to detect links among IMD cases not identified by case investigations [12]. However, PFGE may misclassify due to a lack of discriminatory power [13]. Additionally, PFGE does not provide details on polymorphisms in the Nm genome that can be used to track its spread in space and time.
Advances in whole genome sequencing (WGS) technology have allowed epidemiologists to uncover disease transmission patterns through phylogenetic analyses. WGS has successfully been used to inform outbreaks caused by tuberculosis [14], cholera [15], Klebsiella [16], Legionella [17], and Listeria [18]. In an IMD outbreak associated with the 23rd World Scout Jamboree held in Japan, WGS was used to elucidate relationships among cases in Scotland and Sweden [19]. Also, WGS has led to recent reports suggesting that N. meningitidis has adapted to become a urogenital pathogen. Taha et al. sequenced N. meningitidis isolates from the 2012-13 MSM outbreak in Germany and France and several antecedent rectal and genitourinary cultures [20]. They found evidence of an in-frame aniA gene in MenC similar to that present in Neisseria gonorrhoeae that confers the ability to survive under anaerobic conditions [20]. Following a large cluster of N. meningitidis urethritis in heterosexual men in Ohio, U.S., WGS identified an insertion sequence that had replaced much of the cps capsular locus of a MenC strain, rendering it unencapsulated and nongroupable [21,22].
We sequenced a collection of archived N. meningitidis isolates' genomes from NYC from 2003-2013 and analyzed phylogenetic relationships of the isolates based on core genome alignment, patient risk factors, and demographics to explore spatial connections among cases not evident from the epidemiologic data alone. The study was undertaken to better understand how IMD transmission occurs within a large urban center with the goal of improving IMD outbreak response, prevention, and control.

Study population and case investigation
IMD is a reportable condition in NYC. Medical providers and laboratories are required to report suspected and confirmed cases within 24 hours as defined by the Council of State and Territorial Epidemiologists (CSTE)/Centers for Disease Control and Prevention's (CDC) case definition [23]. All reports are investigated to confirm the diagnosis, identify close contacts who qualify for post-exposure prophylaxis, and to uncover links that could indicate an outbreak. Sterile site isolates are sent to the NYC Public Health Laboratory for serogrouping and molecular typing. A significant proportion of IMD cases have negative cultures [24] and beginning in 2006, clinical samples were routinely sought on culture-negative cases for RT-PCR testing at the NY State Wadsworth Center [25]. Non-sterile N. meningitidis reports (pharyngitis, urethritis, proctitis) were occasionally received and processed as above on a case-by-case basis. Isolates from non-NYC residents who had an epidemiologic link to NYC were sought for molecular characterization and comparison. Ob1-associated cases were defined as those meeting the CSTE/CDC case definition with N. meningitidis serogroup C and PFGE findings of � 85% similarity to the designated outbreak strain, or for probable cases (culture negative), patients who were household contacts to or shared an epidemiologic link with an outbreakassociated case [2]. Ob2-associated cases were defined as confirmed or probable N. meningitidis serogroup C MSM NYC residents with onset during August 2010-February 2013 [3].
IMD cases were investigated using a structured data collection instrument. Death from IMD was defined as occurring within 30 days of diagnosis and was obtained through a data match with the NYC Office of Vital Statistics death registry; HIV status was determined through matching with the NYC HIV Surveillance Registry. The earliest available MenC isolate was from 2003, therefore, isolates from 2003-2013 were eligible for inclusion in the study. A selection of other serogroups, non-NYC residents, and non-sterile sources were included for comparison.
Isolate regrowth, DNA extraction, sequencing, assembly, and annotation N. meningitidis isolates were preserved in Trypticase soy broth with 20% glycol at -70˚C. Isolates were removed from storage and plated on Chocolate II Agar (BBL). Following overnight growth at 35˚in a humidified 5% CO 2 incubator, single colonies were selected to inoculate a subculture. DNA was extracted from fresh N. meningitidis subculture colonies using Genomic-tip 20/G (Qiagen). Post-extraction quality control was performed to ensure concentration, purity, integrity, and sterility. WGS was performed using the HiSeq 2000 platform (Illumina) at the Edgewood Chemical and Biological Center (ECBC) sequencing facility. DNA was prepared for sequencing using the Nextera DNA Library Preparation kit (Illumina) and pairedend sequencing was performed on the dual-indexed isolate libraries (Illumina) according to the manufacturer's protocol. After sequence capture, reads were assembled de novo using either Velvet (1.2.10) [26] for higher coverage read files (>200Mb) or CLC Genomics Workbench (6.5.1) (Qiagen) for lower coverage read files (<200Mb). Consensus genomes were then annotated using RAST toolkit (1.3.0) [27,28]. Sequencing personnel were blinded to the epidemiological and geographic metadata.

Determining orthologs, core genome, and pan genome
Ortholog clusters were determined based on the comparison of all protein coding sequences with OrthoMCL (version 2.0.9) [29,30] with an e-value cut off of 10 −5 on the BLAST+ (version 2.27) [31]. The output was then parsed with custom Python scripts to find the core genome for different sets of isolates and genes specific to only isolates with particular metadata characteristics. The ortholog analysis also included 10 additional finished N. meningitidis genomes from GenBank and one draft genome from Paris reported to be a part of the MSM outbreak [32]. Assembled genomes were queried using BLAST against the PubMLST Neisseria database (https://pubmlst.org/neisseria/) to identify N. meningitidis clonal complex and the presence of a frameshift mutation in the aniA gene [33].

Phylogeny
The phylogeny of the genomes included in the OrthoMCL analysis was determined based on the core genome. We used a strict definition of core genome as being those ortholog clusters that contain only a single gene from each genome, excluding clusters with any paralogs. No epidemiological data were taken into consideration for determining orthologs or the initial phylogeny. The DNA sequences for each cluster were individually aligned using MAFFT (version 7.130b) [34], then Trimal (version 1.2) [35] was used to obtain the mean percent identity from each individual alignment. Alignments with a mean percent identity less than 90% were inspected individually and those with poor alignments, suggesting that they may not be orthologs, were removed from further analysis. The remaining alignments were concatenated into one core genome alignment with FASconCAT (version 1.02) [36] to be used by RAxML (version 8) [37] to determine phylogenetic relationships using maximum likelihood optimality criterion. The same methods were used for determining the core genomes of a subset of genomes for analyses (i.e., Genome Wide Association Studies and Betweenness centrality (see below)).

Comparison of isolates from the same patient
The OrthoMCL output was parsed to determine the core and unique genes for each set of isolates from the same patient. The strict core genome clusters were aligned individually using MAFFT [34] then a custom python script was used to find SNPs in each of the alignments. A SNP was declared only in instances where there was known nucleotide differences, not in cases where there was a gap or N. Alignments with high numbers of SNPs (greater than 10 for the gene) were examined by eye to ensure the number of SNPs was not due to poor sequence alignment.

Genome wide association studies
Genome Wide Association Studies (GWAS) were done on two different input data types: (1) gene present or absent in a genome; (2) single nucleotide polymorphisms (SNPs) in the core genome. Custom Python scripts were written to parse the OrthoMCL output to determine if a gene was present or absent in each genome based on whether there was a gene listed within orthologous clusters and to produce PED and MAP files. Therefore, the number of genes considered is equivalent to the total number of gene clusters determined by OrthoMCL, 5,280 gene clusters. In the PED file, one base was assigned if the gene was present and a different base used if the gene was absent for that genome for each orthologous cluster. Custom python scripts were written to find SNPs in alignments made by MAFFT for each individual gene cluster of the strict core genome and to produce PED and MAP files. Those alignment locations that contained more than two SNP variations were excluded due to limitations in the GWAS software, but the presence of an N was included as missing data for that genome. The program PLINK (version 1.07) [38] was then used to perform the GWAS with the genomic location set to the haploid mitochondria in the input files. P-values were adjusted by PLINK for population stratification using a genomic control and for multiple testing using a conservative Bonferroni correction. Gene Ontology (GO) terms were determined by BLAST2GO [39].

Betweenness centrality
For the betweenness centrality analysis we used methods similar to those used in Janies et al. [40]. Betweenness measures the centrality of a node by computing how many times a node of interest is on the shortest paths between any two other nodes normalized by the number of pairs of other nodes. In our analyses, we used the patients' borough and neighborhood of residence as nodes. High betweenness centrality indicates that the geographic location is important in the traffic of the pathogen. We used the subset of isolates with home addresses that could be geocoded plus an isolate from Paris, LNP27256 (GenBank: AVOU01000001.1) related to the outbreak [33]. When multiple isolates originated from the same patient, one isolate was selected as a representative. We then used the phylogenetic methods describe above to produce a tree based on the strict core genome for only these isolates [41]. When RAxML was run on the concatenated core genome, the earliest isolate, S00-1196, was used as an outgroup. This tree was then brought into PAUP � (version 4.0a147) with the cases' home borough or neighborhood as units of geography. Borough was determined from the latitude and longitude of the home address using ArcGIS (http://www.esri.com/software/arcgis/explorer/index. html). New York City neighborhood was determined based on NYC Department of City Planning Neighborhood Tabulation Areas (NTA, Table A in S1 File). Isolates from outside NYC maintained their individual location for both analyses. PAUP � was then used to calculate the ancestor descendent changes in geographic location at each branch of the phylogenetic tree. The directionality and number of each type of change was then measured. These data were then integrated to form the network of bacterial transmission using igraph and qgraph packages in R [42,43].

Ethical statement
The Department of Health and Mental Hygiene (DOHMH) Institutional Review Board determined the study to be public health surveillance that is non-research.

Study population and isolate sequencing
A total of 102 isolates were submitted for WGS. Six isolates were excluded due to poor sequencing or assembly of their genomes and there were 11 repeat isolates (same patient, different specimen source) from 10 individuals leaving 85 unique patients (75 NYC residents and 64 invasive infections) in the analysis (S1 Fig). During the years 2003-2013, a total of 322 IMD cases were reported in NYC residents (incidence 0.4 per 100,000), of which 263 were culture positive (the universe of possible study isolates). The proportion of eligible invasive isolates in NYC residents included in the study by serogroup is as follows: 5/60 (8%) MenB, 52/86 (60%) MenC, 4/22 (18%) MenW, 3/89 (3%) MenY, and 0/6 (0%) nongroupable. Fourteen MenC patient isolates were included from each outbreak (Ob1 had 21 and Ob2 had 15 culture positive cases). The demographic characteristics of invasive MenC isolates from NYC residents included in the study were compared to MenC patients during the same interval whose isolates were unavailable (includes culture negative cases that met the case definition). Patients with MenC included in the study were more likely to be older, male and non-Hispanic (Table 1).
Our assembled draft genomes had between 218 and 1223 contigs with a mean contig number of 494 and read depth coverage ranging from 300 to 3000 (Table B in S1 File). The total genome length and gene count of these contigs are all within the expected size for N. meningitidis. The strict core genome, which explicitly left out orthologous clusters containing paralogs, was determined for the 96 genomes sequenced and 11 genomes from GenBank was found to contain 1,171 genes. When only the genomes sequenced in this study were considered for the core genome, it contains 1,373 genes. Since these were draft genomes, it was suggested that a better metric for determining the core genome would be to include gene clusters found in 95% of the genomes being analyzed [44]. Using this metric, the size of the core genome increased to 1,348 genes for 107 genomes and 1,545 genes for only the genomes sequenced in this study. The number of unique genes ranged from 7 to 376 (mean 33) per isolate ( Table B in S1 File). The estimated N. meningitidis pan genome based on these isolates contained 8,453 genes.

Core genome phylogeny
The phylogenetic relationships among the isolates was determined with maximum likelihood tree searches based on alignments of the strict core genome. From these analyses we determined that Ob1 isolates classified by epidemiology were mostly found on a single clade, with S00-1207 and S00-1244 as the exceptions (Fig 1, blue text). Likewise, the 14 Ob2 isolates mostly formed a separate, single clade, with only S00-1260 found on a different clade (Fig 1, orange  text). One Ob2 subclade with dates ranging from August 2011 to February 2013 showed less diversity (Fig 1, 11 isolates, outlined in red) and included two patients who were not NYC residents (S00-1266 and S00-1302). All of the patients in the subclade were male between the ages of 23 and 53 years; and of the nine NYC residents: seven were HIV-infected, three died, and seven reported drug use (methamphetamine, cocaine, and/or marijuana). The subclade contained 150 SNPs within the core genome, compared to 4,921 core genome SNPs in the entire outbreak, indicating that the isolates in the subclade were more closely related to one another than to the rest of the outbreak isolates. Of the eight isolates found within Ob2's clade that were not declared to be part of this outbreak based on epidemiology, six (S00-1269, S00-1259, S00-1227, S00-1226, S00-1302, and S00-1266) were collected during outbreak's time period, with S00-1220 and S00-1218 as the exceptions collected in 2008, before Ob2 began ( Table 2 and Fig 1). Non-resident patients S00-1266, S00-1269, and S00-1302 were not known to be MSM and were not considered part of Ob2 at the time of collection. Due to the isolates' position in the tree, they may have shared a common exposure with outbreak cases or their social network. Isolate S00-1259 was obtained from a 65 year-old woman in March 2012. She reported no epidemiologic link to the outbreak and was not included as an Ob2 case. While  the isolate's position on the tree suggests inclusion as part of Ob2, it is also possible that the case represents sporadic transmission.

Clonal complexes and in-frame aniA gene
The majority (78%, 66/85) of unique patient isolates included in the study were MenC and 85% (56/66) of these were ST-11/ET-37 complex. Two MenW isolates also belonged to the ST-11/ET-37 complex. No other clonal complex accounted for more than 4% of isolates. For 6 isolates the clonal complex could not be determined.

Comparison of isolates from the same person
Nine patients had two isolates collected from blood and cerebrospinal fluid (CSF), and 1 patient had 2 blood isolates with different collection dates and 1 CSF isolate (S00-1275, S00-1276, S00-1277). One additional patient isolate, obtained from the rectum, had two colony morphologies on regrowth and both isolates were sequenced (S00-1233, S00-1234). Phylogenetic analysis showed that isolates from the same patient were more similar to one another than they are to isolates from other patients (Fig 1).
The strict core genome for each set of isolates from the same patient consisted of an average of 2320 (stdev ±74) genes (Table 2). When comparing the isolates to the other(s) from the same patient, each isolate had an average of 174 (stdev ±43) genes that were not found in the other(s). The alignments of the core genome were examined for the presence of SNPs. Each group had an average of 148 (stdev ±90) SNP across the entire core genome. If a gene cluster contained SNPs, it was usually less than five SNPs. The few genes that did contain more than 10 SNPs were those that are known to have a high degree of variation in pilus related genes. No gene cluster had SNPs for every group of isolates.

Genome wide association studies
Genome Wide Association Studies were performed using only the genomes sequenced for the study using two metrics: (1) the presence or absence of genes; and (2) the presence of SNPs in the strict core genome. Due to software limitations, only SNPs with two variants could be considered. We examined 82,597 SNPs, while having to exclude 3,541 SNP locations from the analysis. We tested for the association of SNPs or genes to both Ob1 and Ob2 (Table B in S1 File). The analysis for associations with Ob2 excluded S00-1260 due to its phylogenetic distance from the rest of the Ob2 isolates.
We found the presence of five genes and the absence of one to be significantly associated with Ob1 (p<0.05, Table B in S1 File). Ob1 also had 30 SNPs significantly associated with it (p<0.05, Table B in S1 File). GO terms indicated that these, mostly synonymous, SNPs are found in genes of a wide variety functions.
Ob2 isolates have one gene encoding a protein associated with the outbreak clade which was annotated as a hypothetical protein (p = 0.0193, Table B in S1 File). One SNP in a TonBdependent receptor was found to be significantly associated with Ob2 (p = 0.04994, Table B in S1 File). This SNP is non-synonymous and results in a change from threonine to isoleucine. All 8 isolates that were not epidemiologically considered part of Ob2, but were present on the same clade also possess this change. Through a BLAST search, we found that this SNP is not unique to this clade of N. meningitidis.

Betweenness centrality
As detailed in the methods section, we optimized the home location of the patients on the phylogeny and converted the result of all the ancestor descendent changes to a network to determine how N. meningitidis moved around the city over the 11-year period. We then applied betweenness centrality as of a measure the importance of a place in the traffic of the pathogen across geography. Seventy-eight unique patient isolates (92%), with a strict core genome of 1320 genes, were included in the betweenness centrality analysis. At the NYC borough resolution, not all of the boroughs were connected to each other nor were all the connections bidirectional (Fig 2). Brooklyn had the highest betweenness centrality as the only borough connected to all other boroughs and the one with the most connections to non-NYC locations. Brooklyn also had the most intra-borough transmission events (Fig 2, indicated by the arrow that loops back). The Bronx had the second highest betweenness centrality of the boroughs. The Bronx was connected to all the boroughs except Staten Island. We observed bidirectional movement between the Bronx, Brooklyn and Queens, and connections to three non-NYC locations.
While betweenness centrality at the level of the borough allowed us to see an additive summary of N. meningitidis movement over 11 years, it did not elucidate the details of transmission events for the two outbreaks. When the betweenness centrality analysis was examined at the level of neighborhood, the arrows represented much more granular transmission events between locations elucidating the transmissions specific to each outbreak (Fig 3). Most transmission events at the level of neighborhood were single events. However, there were some multiple transmissions between or within locations: (1) two transmissions occurred between Stuyvesant Heights (BK35) and Bedford (BK75); (2) three transmissions occurred within Stuyvesant Heights (BK35); and (3) two transmissions occurred within Crown Heights North (BK61) (Fig 3). All three of these multiple transmission events were from Ob1 and only two non-Brooklyn neighborhoods were involved with spread which establishes the geographic concentrations of this outbreak in Brooklyn (Fig 3, colored blue). The two non-Brooklyn neighborhoods involved in Ob1 were the Bronx neighborhood of Co-Op City (BX13) and the Manhattan neighborhood of Carnegie Hill (MN40). The transmission to Co-Op City (BX13) was a dead end for further transmission (Fig 3, colored blue). From Carnegie Hill (MN40), two transmissions occurred to two different Brooklyn neighborhoods that were disconnected from the rest of the Ob1 outbreak (Fig 3, colored blue).
For Ob2 Midtown/Midtown South (MN17) is the most important neighborhood for transmission (Fig 3, colored red and orange). The closely related Ob2 clade originates in this neighborhood before moving to Brownsville (BK81) from which the transmission fans out to multiple boroughs.

Discussion
We identified 102 N. meningitidis isolates from 2003-2013 and successfully performed WGS on 96 of them to uncover connections among patients not found during case investigations. Our de novo genome assembly metrics were comparable to that of other researchers using similar sequencing methods. Multiple studies have estimated that the core genome of N. meningitidis is around 1300-1600 genes [45,46,47], which is consistent with what we found for the genomes analyzed here. As expected, the pan genome for this group is larger than what has been seen in previous studies. This is because N. meningitidis has an open pan genome with an Phylogenetic relationships, epidemiology, and geography of two invasive meningococcal disease outbreaks estimated 38 new genes added to its pan genome with each newly sequenced isolate [46]. Consistent with this estimate, our genomes had an average of 33 unique genes.
The core genome phylogenetic analysis provided evidence that the epidemiologically determined Ob1 and Ob2 were separate outbreaks caused by different, yet related, strains. The overall shape of the phylogenetic tree indicates that while many of the isolates in NYC have highly similar core genomes, including those from Ob1 and Ob2, there are a number of isolates that are very different, having greater similarity to isolates sequenced as parts of other studies. This could be due to the introduction of N. meningitidis strains from other parts of the world through immigration and tourism.
Included in the Ob2 clade are two isolates (S00-1218 and S00-1220) from 2008 indicating that isolates with highly similar core genomes were present in NYC prior to the start of this outbreak. These isolates could be missing important genetic elements outside of the core genomes that conferred the Ob2 strain added virulence or afforded it a competitive advantage. Alternatively, and more likely, patient behaviors and social network dynamics differed for those isolates, preventing them from triggering an outbreak in 2008. Our analysis also identified isolates that should have been considered part of an outbreak based on genetic relatedness that were not epidemiologically categorized as such. For example, S00-1211 is located with the Ob1 isolates on the phylogenetic tree, however, the patient had onset more than six months after the end of the outbreak, was not known to use drugs, and did not live in Brooklyn. The patient resided in a shelter and her onset was close in time to two other IMD cases associated with shelters and substance abuse (neither isolate was available for WGS), suggesting that it is likely the patient came in contact with a person carrying the Ob1 lineage strain.
During Ob2, one case of serogroup C IMD, indistinguishable by PFGE from Ob2 isolates, occurred in a woman (S00-1259) who denied contact with the MSM community and was therefore not considered part of the outbreak. The position of S00-1259 on the phylogenetic tree suggests that she was infected by the Ob2 strain. Two non-resident male cases (S00-1266 and S00-1302) were also not considered part of the outbreak. One patient died and was not known to family members to be MSM and it was later learned that the other individual reported MSM in the year prior to his IMD. Both of these patient isolates lie within the Ob2 subclade and therefore were likely part of the MSM outbreak. One other MSM outbreak-associated patient isolate (S00-1260), which was not related by PFGE criteria, appears in a different part of the phylogenetic tree than the Ob2 strains and we now consider this to have been a sporadic case and unrelated to Ob2.
We found that the amount of within host diversity of N. menigitidis was relatively minor by comparing isolates from different specimen sources from the same patient. Same person isolates cluster more closely together in the phylogenetic analysis than they do with isolates from different patients (Fig 1). Same person isolates also had fewer SNPs across their core genomes.
Furthermore, these results support that our use of phylogenetics to determine transmissions events is not likely to be complicated by within host evolution or variation [45]. The total number of unique genes in each isolate is much higher than the number of unique gene for those isolates when compared to the entire OrthoMCL output indicating that most of these genes are present in the Nm pangenomes. The same patient isolate pair S00-1262 and S00-1300 lie within the highly similar Ob2 clade (red box, Fig 1) which serves to emphasize how closely related the Ob2 isolates are to one another.
Both outbreaks were caused by MenC of clonal complex ST-11/ET-37. This strain of Nm is known for its epidemic potential, accounted for 15% of all Nm isolates in Europe in 2011, and is not limited to MenC [48]. MenW strains of this clonal complex were responsible for outbreaks following the Hajj pilgrimages of 2001 and 2002 and subsequent outbreaks in many countries [49,50]. The recently described nongroupable US Nm urethritis clade is also a member of this clonal complex [22]. Our inability to identify specific markers of virulence among subsets of patients included in this study is in line with current knowledge and understanding of Nm virulence factors [48].
We were able to document the appearance of an intact aniA gene in a NYC IMD patient from 2003, and that the proportion of patient isolates with this polymorphism increased from Ob1 to Ob2. The public health implications of this adaptation are not fully understood, however, two recent observations are of concern: 1) the emergence in the US Midwest of N. meningitidis urethritis in heterosexual men; and 2) a NYC newborn with N. meningitidis conjunctivitis suspected to have been acquired by vertical transmission (DOMHM unpublished data). The N. meningitidis strain in these patients contained an intact aniA gene [22].
We were limited to the core genome for the SNP GWAS analysis and therefore were unable to look for SNPs in genes that were present in some, but not all, genomes examined. This may have restricted the identification of SNPs important to the epidemiology of the outbreak. Due to bacterial clonal reproduction, GWAS analyses are complicated by phylogenetic relationships and linkage disequilibrium. For this reason, we included in our analysis correction for population stratification using a genomic control and for multiple testing using the more conservative Bonferroni correction. Although we found a number of SNPs that were associated with Ob1, they varied widely in their functions and could not be correlated with any known association with outbreak transmission. Given that Ob2 occurred in the MSM community, we anticipated that there would be one or more identifiable genetic components to explain the focal epidemiology. Unfortunately, the GWAS analysis did not provide data beyond what has been found in other studies; that genomic distinctions between carriage, non-invasive, and invasive isolates of N. meningitidis are not easily determined by gene content and may be polygenic [46].
We used a transmission network analysis to evaluate how N. meningitidis moved among NYC communities during and between two outbreaks. Isolates from different specimen sources in the same patient were more similar to each other than other isolates in the analysis, demonstrating that within host evolution or variation is not a complicating factor in using the phylogenetics to determine transmission events [47].
Network analysis could assist in resource allocation, such as expanded chemoprophylaxis or vaccination that may abate or interrupt future outbreaks. This methodology can be particularly effective with granular isolate location data, as that is more likely to produce an accurate representation of the movement of bacterial lineages across a population in an area. For example, this concept is demonstrated by the isolate from the Bay Shore, NY patient, which connects to the rest of the locations in different ways in the borough level analysis compared to the neighborhood level analysis. Both the Brooklyn MenC outbreaks began with cases outside of Brooklyn (Ob1: S00-1200 and Ob2: S00-1224) and serve as additional examples [2,3]. Whether circulating Nm strains, and the individual immunity they provoke in carriers, are localized to neighborhoods or boroughs is not known. It is conceivable that the routine use of network analyses could identify a borough level jump of an Nm strain representing an elevated risk of initiating an outbreak.
Brooklyn was identified as an important hub for all N. meningitidis infections over the 11 years examined and in particular for the spread of Ob1. There were two transmission events (MN40 to BK37 and MN40 to BK78, Fig 3) within the neighborhood network analysis that were part of the Ob1 outbreak based on epidemiological data but are disconnected from the rest of the outbreak. This is in part because the epidemiological categorization of which isolates are part of an outbreak is imperfect as discussed above. It is also possible for a number of reasons that we are missing isolates that were part of both outbreaks. Carriers of Nm who do not develop meningococcal disease serve as intermediaries between clinical cases, however, they do not get cultured and links in the transmission chain go undiscovered. Similarly, meningococcal cases who are not NYC residents but may work or travel within the five boroughs, are often not reported to DOHMH. One neighborhood in Manhattan was identified as a hub for to the spread of Ob2. Evidence suggests that the Ob2 strain was transmitted to Europe and caused outbreaks in Germany and France [33]. The disconnection in our network emphasizes the difficulty with performing these types of analyses as a single city. The availability of worldwide transportation allows individuals to move from country to country potentially spreading disease. Perhaps there is no greater recent example than the 2014-5 Ebola virus outbreak in West Africa that spread to Europe and the United States. This only serves to emphasize the importance of countries and institutions sharing data in real-time to obtain the most accurate characterization of global disease movement.
Two Nm outbreaks occurred in NYC in the 21 st century and both were centered in Brooklyn. Whole genome sequencing of N. meningitidis isolates, phylogenetic trees, and betweenness analyses of transmission networks were used to characterize the relationship among cases and illuminate transmission. These tools add an important dimension to our understanding of outbreaks of IMD and have the potential to inform prevention efforts by identifying high transmission areas. One such approach might be to afford two or more IMD cases occurring in space and time in a Brooklyn neighborhood closer scrutiny to uncover links not immediately detected through routine investigations. Additionally, Nm carriage studies in MSM have either been completed (New York City) or are underway (Los Angeles) and will add to our knowledge of the evolution of this serial killer pathogen.

S1 Fig. Flow chart of Neisseria meningitidis isolates included in the study.
(TIF) S1 File. Study data. New York City Communicable Disease Surveillance data was used for this study and contains confidential elements. A reduced data set is provided to prevent identification of patients is provided along with data dictionary. (XLSX)