Whole genome SNP analysis suggests unique virulence factor differences of the Beijing and Manila families of Mycobacterium tuberculosis found in Hawaii

While tuberculosis (TB) remains a global disease, the WHO estimates that 62% of the incident TB cases in 2016 occurred in the WHO South-East Asia and Western Pacific regions. TB in the Pacific is composed predominantly of two genetic families of Mycobacterium tuberculosis (Mtb): Beijing and Manila. The Manila family is historically under-studied relative to the families that comprise the majority of TB in Europe and North America (e.g. lineage 4), and it remains unclear why this lineage has persisted in Filipino populations despite the predominance of more globally successful Mtb lineages in most of the world. The Beijing family is of particular interest as it is increasingly associated with drug resistance throughout the world. Both of these lineages are important to the State of Hawaii, where they comprise over two-thirds of TB cases. Here, we performed whole genome sequencing on 82 Beijing family, Manila family, and outgroup clinical Mtb isolates from Hawaii to identify lineage-specific SNPs (SNPs found in all isolates from their respective families, and exclusively in those families) in established virulence factor genes. Six non-silent lineage-specific virulence factor SNPs were found in the Beijing family, including mutations in alternative sigma factor sigG and polyketide synthases pks5 and pks7. The Manila family displayed more than eleven non-silent lineage-specific and characteristic virulence factor mutations, including in genes coding for MCE-family protein Mce1B, two mutations in fatty-acid-AMP ligase FadD26, and virulence-regulating transcriptional regulator VirS. This study further identified an ancient clade that shared some virulence factor mutations with the Manila family, and investigated the relationship of those and other “Manila-like” spoligotypes to the Manila family with this SNP dataset. This work identified a set of virulence genes that are worth pursuing to determine potential differences in transmission or virulence displayed by these two Mtb families.

Introduction highest rates of TB in the United States, with Hawaii's TB rate remaining steady even as TB rates throughout the US continue to decrease [12].
Thus, the need to further understand the characteristics of the Beijing and Manila families of Mtb is of considerable importance not only to countries throughout the Pacific, but also to the United States. Investigating unique differences in virulence factors that are characteristic of each of these families may help explain those families' success in their originating environments, and may aid in developing improved control strategies if differences affecting latency, reactivation, or host preference are identified. The Manila family, for example, continues to predominate in Filipino hosts, where other, more globally successful lineages have been unable to displace the Manila family and rise to high incidence in the Philippines. Considerably less research has been performed on the Manila family than on the Beijing family or on lineage 4, thus which attributes may contribute to the Manila family's persistence remains an open question.
This study seeks to utilize next-generation Illumina whole-genome sequencing on a diverse set of Beijing and Manila family-focused clinical Mtb isolates from Hawaii in order to identify lineage-specific virulence factor differences of these two families. Here, we examine the Mtb genomes to find non-silent (i.e. non-synonymous), single-nucleotide polymorphisms (SNPs) specific to each family in virulence genes as well as other genes of interest. These SNPs are alleles that are unique to either the Beijing or Manila family isolates among the 82 that we studied. In order for a SNP to be described here as a Beijing family-specific SNP, that SNP must have been found in all Beijing family isolates from this study, but not have been found in any of this study's non-Beijing family isolates. Similarly, SNPs described as Manila family-specific must have been found in all Manila family isolates from this study, but not have been found in any of this study's non-Manila family isolates.
Additionally, Hawaii and the Pacific region frequently see Mtb isolates with spoligotyping fingerprints that resemble the Manila family, but are less commonly encountered than isolates possessing spoligotyping patterns that have been established as "Manila family." This study will utilize phylogenetics from full-genome sequences to better describe the relationship of such patterns and isolates to the Manila family.

Selection of Mtb isolates for study
A comprehensive database of all 1,076 M. tuberculosis isolates processed for genetic fingerprinting for the State of Hawaii from 2002 through 2016 (and including a partial set from 2017) was obtained from the State of Hawaii Department of Health Tuberculosis Control Program for this study. Spoligotype octal codes are presented as recorded by the Centers for Disease Control and Prevention (CDC) contracted reference laboratory that performed the typing. Lineage or family names were assigned to spoligotype octal codes using the SpolDB4 database, with "EAI2_MANILLA" entries being considered "Manila family." [13]. Eighty-two Mtb isolates from this set were selected for use in this study (Table 1). Additionally, a Mycobacterium bovis BCG P3 isolate was used as an outgroup for phylogenetic analysis, as M. bovis has been shown to have diverged from the M. tuberculosis ancestor prior to the differentiation of the TB lineages examined in this study [5].
Selection of Mtb isolates for identification of lineage-specific virulence factor differences. From the database, 25 Beijing family (spoligotype 000000000003771) isolates covering a chronological range from 2005 to 2016 were selected for whole genome sequencing. Of those, 19 isolates were selected from putative transmission clusters, four were selected to increase our represented Beijing family diversity (based on their MIRU-VNTR [Mycobacterial were selected, of which 15 were selected from putative transmission clusters and 12 were selected based on uniqueness of MIRU-VNTR fingerprints to maximize this study's Manila family diversity. To further increase the diversity of Manila family isolates examined in this study, two additional Manila family isolates with uncommon Manila family spoligotyping patterns (isolates 96128 and 20034) and two Manila family isolates with unique large-sequencepolymorphism (LSP) deletion patterns from a prior study (09L0735 and 09L1111) were selected from our laboratory's Mycobacterial DNA Bank at the University of Hawaii, where their extracted DNA was archived [14]. Fourteen outgroup isolates from non-Beijing, non-Manila lineages from Hawaii during the 2002-2016 time period were selected for comparison, primarily on the basis of uniqueness of spoligotyping and MIRU-VNTR patterns. These isolates included four with the H3 spoligotype, one LAM9, three T1, one T3, and five from a suspected outbreak cluster with an undefined spoligotyping pattern (777777760000000). An additional three-isolate suspected transmission cluster with no SpolDB4 spoligotype match (737777377413771) was also selected. Selection of Mtb isolates for comparison of "Manila-like" strains. Two putative twoisolate transmission clusters whose spoligotypes had no match in SpolDB4, but whose spoligotyping patterns otherwise exhibited substantial similarity to the Manila family, were selected for whole-genome comparison to the Manila family (Manila-like Cluster 1: isolates 21 and 44; Manila-like Cluster 2: isolates 30 and 37). Two EAI5 isolates were also selected (isolates 5 and 33) for sequencing due to the EAI5 clade's similarity to the Manila family. An additional clustered isolate with no spoligotype match in SpolDB4 (isolate 84) that we considered to be Manila family based on the similarity of its spoligotype and MIRU-VNTR fingerprints was also included. Finally, two isolates with no spoligotype match in SpolDB4 and no near-matches with MIRU-VNTR patterns from other isolates (isolates 8 and 42) were selected to represent lineage-agnostic outgroups.

Whole genome sequencing
Recall of state of Hawaii Mtb isolates. All required isolates were requested from the laboratories where they had been previously sent by the State of Hawaii for contracted fingerprinting, and where they were archived (California Department of Public Health State Laboratory and Michigan Department of Community Health). Spoligotyping and 24-loci MIRU-VNTR fingerprints are displayed as provided by those laboratories. The Michigan Department of Community Health returned extracted DNA from their isolates, while isolates from the California Department of Public Health State Laboratory were returned as "double-killed" sample preps using a treatment of immersion in 70% ethanol and then heating at 80˚C for one hour.
DNA extraction and whole genome sequencing. DNA extraction was performed as previously described by the National Institute of Public Health and Environmental Protection (RIVM), Bilthoven, The Netherlands (Isolation of Genomic DNA from Mycobacteria Protocol), or according to the source state laboratory's internal protocol. DNA was quantified with the Qubit 2.0 dsDNA Broad Range Assay. Genomic libraries were prepared using the Illumina Nextera XT DNA Library Kit using manual normalization and sequenced on the Illumina MiSeq Platform with v3 Chemistry for 300 bp paired-end reads.

Data analysis
SNP matrices were produced using a modification of the NASP pipeline, with Bowtie2 used for alignment, GATK used for SNP-calling, and SNPs being filtered for a minimum of ten-fold read coverage and 90% read consensus [15,16,17]. Repetitive regions were removed by the NASP pipeline. A M. tuberculosis type strain H37Rv genome (GenBank NC_000962.3) was used as the scaffold for read alignment for identification of virulence factor SNPs, and our de novo sequenced representative Manila family genome "96121" (GenBank CP009427) was used as the scaffold for read alignment for construction of phylogenies and comparison of "Manilalike" strains. The number and percentage of mapped reads was determined using SAMtools flagstat. Percent coverage of the 4,411,532 base pair H37Rv reference genome NC_000962.3 was determined using SAMtools mpileup.
Identification of lineage-specific virulence factor differences. Output matrices from the NASP pipeline were imported into a Microsoft Access database and SQL queries were utilized to identify SNPs relative to H37Rv that were shared by all members of either the Beijing or Manila families that we sequenced. SQL queries were further used to search all non-Beijing isolates for those SNPs in order to exclude them as non-specific to the Beijing family if found. For the Manila family, SNPs that were shared by all Manila family isolates were searched for in all non-Manila family isolates-however, their presence in Manila-like isolates or isolates with no spoligotype match in SpolDB4 did not automatically exclude them, as the relationships of those isolates to the Manila family was examined separately and individually. SNPs in genes described in Forrellad et al.'s 2013 review of Mtb virulence were designated as "virulence factor SNPs" [18]. The strength of the association between lineage-specific and lineage-characteristic SNPs and their respective groups versus outgroups was calculated using the Fisher Exact method using R Statistics version 3.4.1 [19].
Exploring the relationship of Manila-like isolates to the Manila family. Spoligotypes of isolates (presented in this work as their CDC octal codes) were compared in order to identify the number of gained or lost spacers relative to the representative Manila family spoligotype (677777477413771). All isolates examined were individually aligned against Manila family do novo sequenced genome 96121 using the NASP pipeline to determine the numbers of SNPs separating those isolates from 96121. Putative Manila family-specific non-silent virulence factor SNPs were examined for all Manila-like, EAI5, and no-match isolates, with Mtb type strain H37Rv and the Beijing family serving as known outgroups.
A phylogenetic tree was constructed using MEGA7 utilizing maximum parsimony, obtaining the most parsimonious tree using the Subtree-Pruning-Regrafting algorithm with search level 2 where the initial trees were obtained by the random addition of sequences with ten replicates, and branch lengths were calculated using the average pathway method [20]. The maximum parsimony phylogeny was supported by 1,000 bootstrap repetitions, and rooted with Mycobacterium bovis BCG P3. To further support that phylogeny, a model averaged phylogeny was generated using jModelTest 2.1.10 with likelihood scores computed using a maximum likelihood optimized base tree with NNI base tree search and 11 substitution schemes [21]. Akaike Information Criterion (AIC), Akaike Information Criterion corrected for small sample sizes (AICc), and Bayesian Information Criterion (BIC) (all with model averaging and 100% confidence intervals) were evaluated to find the best-fit. The model-averaged phylogeny was created using AICc as the criterion for tree weights and with majority rule consensus and a 100% confidence interval.
Isolates were designated as "ancestral" or "modern" on the basis of the presence or absence of the TbD1 deletion region [22]. The presence of the TbD1 deletion was identified in the sequenced isolates by a gap in the bases aligned to de novo sequenced genome 96121 (GenBank CP009427) in the range of nucleotide positions 1,759,200 to 1,761,320.

Whole genome sequencing
Of the 82 sequenced and analyzed Mtb genomes, 71 provided 90% or greater coverage of the H37Rv reference genome at 10X depth (Table 2). Eight more showed intermediate (40-90%)   However, the three other identically fingerprinting isolates from the same putative cluster displayed >95% coverage, ensuring that the putative cluster was still able to provide data for this study.
The numbers of SNPs displayed by each isolate are generally determined by the isolates' families, with ancestral Manila family isolates, as expected, displaying the most SNPs against the modern H37Rv genome. The number of unique SNPs for each isolate was mostly consistent with that isolate's clustering. Isolates from direct transmission clusters displayed zero to six unique SNPs, while isolates that shared a MIRU-VNTR fingerprint, but were not actually transmission linked (common for Beijing and Manila family isolates) displayed dozens or more.

Identification of lineage-specific virulence factor differences
Multiple Beijing family specific virulence factor SNPs were identified in this study. In addition to identifying virulence factor differences specific to the Manila family, we also identified "lineage characteristic" virulence factor differences that were also found in a relatively uncommon non-Manila clade, but which may regardless be important for charactering the Manila family's virulence.
Beijing family-specific virulence factor SNPs. Six SNPs in five previously described virulence factor genes were identified as lineage-specific non-silent virulence factor SNPs for the Beijing family (Table 3). Four additional Beijing-specific mutations in four genes possibly contributing to virulence or drug resistance, or otherwise of interest, were also identified.
Manila family-specific virulence factor SNPs. Seven SNPs in six previously described virulence factor genes were identified as non-silent lineage-specific virulence factor SNPs for

<0.005
List of non-silent SNPS in possible virulence factor genes or genes of interest that were exclusively found in all Beijing family isolates. None of these alleles were found in any non-Beijing-family isolates in this study. Rv numbers and gene titles are those provided by TubercuList [23]. Nucleotide loci are genomic nucleotide positions in Mtb H37Rv genome NC_000962.3. P-values were calculated using the Fisher Exact method. F: forward strand. R: reverse strand. AA: amino acid.
the Manila family ( Table 4). Two of those mutations were located several bases upstream of the start codons, possibly in the promotor region or ribosome binding site. Four additional virulence factor mutations found in all Manila family isolates, but also in one epidemiological cluster with no SpoldDB4 spoligotype match and in one phylogenetically distinct EAI5 isolate, are also listed in Table 4.

Manila family-specific possible virulence factor or gene-of-interest SNPs.
Five Manilaspecific SNPs in genes possibly contributing to virulence or drug resistance, or otherwise of interest, were identified (Table 5). Six additional such mutations found in all Manila family isolates, but also in one epidemiological cluster with no SpoldDB4 spoligotype match and in one phylogenetically distinct EAI5 isolate, are also listed in Table 5.

Relationship of Manila-like isolates to the Manila family
Phylogenetic analysis based on all genomic SNPs among selected isolates was conducted to further investigate the relationships of "Manila-like" and other strains to the Manila family (Fig 1). This phylogeny showed that the Manila family isolates that were selected due to their uncommon (but still Manila family) spoligotyping patterns (isolates 89 and 90) grouped within List of family-specific and family-characteristic non-silent SNPs in virulence factor genes. All family-specific SNPs were exclusively found in all Manila family isolates.
None of these Manila-specific SNPs were found in any non-Manila-family isolates in this study. Family-characteristic SNPs are non-silent SNPs in virulence factor genes that were found in all Manila family isolates, as well as a limited set of similar isolates in this study. # Non-Manila is the number of clusters that are neither Manila family nor Manila-like that possess this allele (number of contained isolates is in parenthesis). Rv numbers and gene titles are those provided by TubercuList [23]. Manila family isolates that displayed the common Manila pattern. Likewise, the isolates selected based on previous deletion analysis to maximize the Manila family diversity in this study (isolates 69 and 70) grouped with other Manila family isolates as expected.
Other isolates displayed diverse relationships with the Manila family. One cluster of "Manila-like" isolates whose spoligotypes had no match in SpolDB4 (isolates 21 and 44) were shown to group inside the Manila family branch. However, one no-match Manila-like cluster (isolates 30 and 37), which was also observed to exhibit pigmentation when grown on Löwenstein-Jensen medium and differed from the Manila family at only two virulence factor SNP loci, occupies a branch of the phylogeny immediately divergent to the Manila family. Interestingly, one EAI5 isolate (5) and a three-isolate no SpolDB4 match epidemiological cluster (71-73) occupy a distinct branch between lineage 1 (Manila family and other EAI lineages) and the modern lineages that include lineages 2 (including the Beijing family) and 4 (including the LAM lineages, H lineages, and others). This was also an expected result, as those isolates matched the Manila-characteristic virulence factor SNP alleles and deviated from them in roughly equal number (Table 6). Other isolates with no SpolDB4 match (isolates 8 and 42, selected for diversity due to appearing to belong to no obvious lineage) grouped with lineage 4 isolates. Table 5. Manila family specific and characteristic SNP mutations in possible virulence factor genes and genes of interest. List of family-specific and family-characteristic non-silent SNPs in possible virulence factor genes or genes of interest. All family-specific SNPs were exclusively found in all Manila family isolates. None of these Manila-specific SNPs were found in any non-Manila-family isolates in this study. Family-characteristic SNPs are non-silent SNPs in virulence factor genes that were found in all Manila family isolates, as well as a limited set of similar isolates in this study.  Table 6 further illustrates the distribution of loci that were designated as either Manila-specific (found only in Manila family isolates) or Manila-characteristic (found in all Manila family isolates as well as certain related groups). Notably, two SNPs (1,038,500 and 1,875,295) that were designated as Manila-specific were not shared by one two-isolate Manila-like cluster (isolates 30 and 37), though the cluster matched the Manila family at the other virulence loci. Additionally, since octal coding might not intuitively convey how many spacers are gained or lost with a change from one digit to another, Table 7 presents the numbers of spacers gained or lost by each isolate relative to Manila family reference genome 96121. This comparison of SNPs versus 96121 and spacers gained or lost clearly illustrates that the number of spacers lost by an isolate relative to 96121 fails to predict an isolate or lineage's closeness to the Manila family.   673,701  T  T  T  T  T  T  T  C  C  C  C  C  C   3,129,675  T  T  T  T  T  T  T  C  C  C  C  C  C   3,244,126  A  A  A  A  A  A  A  G  G  G  G  G  G   1,038,500  G  G  G  G  G  G  T  T  T  T  T  T  T   1,875,295  T  T  T  T  T  T  C  C  C  C  C  C  C Comparison of SNP alleles between Manila family, Manila-like, unknown, and outgroup isolates. Shaded alleles match those displayed by Mtb type strain H37Rv, while non-shaded alleles match representative Manila family isolate 96121. Loci displayed in the lower two groups are found in Tables 4 and 5 as family-specific SNPs. Loci in the upper group are found in Tables 4 and 5 as family-characteristic SNPs; although not exclusive to the Manila family, these SNPs shared only with isolates from outside of existing major spoligotyping lineages may be important for defining virulence of the Manila family. All Manila family diversity selection isolates present the same alleles, as do two Manila-like clusters and one EAI5 isolate. One Manila-like cluster differed from the Manila family at two of these selected loci. One other EAI5

Family specific possible virulence factor mutations and genes of interest of the Manila family, non-silent
isolate and one cluster with no SpolDB4 match diverged from the Manila family at thirteen selected loci. These differences suggest possible differences in virulence for this small clade. Nucleotide loci are genomic nucleotide positions in Mtb H37Rv genome NC_000962.3.
Spacers gained or lost are relative to the Manila family reference spoligotype 677777477413771. Ã No match in SpolDB4, but Manila-like; representative of a two-isolate epidemiological cluster with both isolates sharing the same alleles at these loci.
ÃÃ No match in SpolDB4, but almost certainly Manila family; representative of a two-isolate epidemiological cluster with both isolates sharing the same alleles at these loci. † Representative of a two-isolate epidemiological cluster with both isolates sharing the same alleles at these loci. ‡ Representative of a three-isolate epidemiological cluster with all isolates sharing the same alleles at these loci.
phenyloxazoline synthase MbtB, which is involved in iron acquisition, an important virulence factor for growth inside a macrophage, where iron levels can be 1/1000 th those outside the leukocyte [18]. The final Beijing family virulence factor found to display a lineage-specific mutation was lipid-transfer protein or keto acyl-CoA thiolase Ltp2, which may be involved in cholesterol transport [18]. The Beijing family's lineage-specific mutations in other genes of interest included one in antibiotic transporter permease Rv1217c, which has been shown to actively transport the ionophore antibiotic tetronasin across the cell membrane, and one in antibiotic transporter ATPbinding protein Rv1458c, which is similar to other described antibiotic transporters [29]. A mutation was also found in a penicillin-binding protein (Rv1730c). Penicillin is widely regarded as ineffective against Mtb, but the rise of extensively drug-resistant TB has renewed interest in the use of beta-lactam drugs against it [30]. Additionally, studies combining beta lactam drugs with beta lactamase inhibitors have shown promise [31]. Finally, CRISPR type III-a/mtube-associated ramp protein Csm4 showed one mutation. Although the exact role of Csm4 is still being deduced, CRISPR-related changes in the Beijing family are of interest due to its lack of many of the CRISPR spacers from the set of 43 utilized by spoligotyping that are present in many other lineages.
The Beijing family has been shown to be split into ancient (atypical) and modern (typical) sublineages on the basis of the presence or absence of one to two IS6110 insertion sequences in the NTF region, with the modern sublineage being shown to display increased virulence relative to the ancestral sublineage [32]. Depending on their sublineage, Beijing family isolates display mutations in two DNA repair genes (mutT4 and mutT2) as well as ogg, which are thought to result in an increased mutation rate, allowing for quicker adaptation to new host pressures or environments [32,33]. The genetic difference between ancient and modern Beijing strains has been further explored, and while mutations have been identified, their phenotypic results still require investigation [34]. Our present study, however, did not seek to include isolates of the less clinically important ancient (atypical) clade. A previous study identified a set of 81 SNPs that characterized all modern Beijing family strains [35]. That study identified virulence factor SNPs in the mce and vapBC gene families, but our study indicated that those SNPs are not specific to the Beijing family (data not shown). Another study that sought to characterize the modern/typical Beijing lineage identified 51 SNPs that defined the lineage and were not found in a search of 29 non-Beijing strains [36]. Notably, their set of SNPs found regulatory network mutations to be over-represented. As our study only searched for mutations in established virulence factor genes, mutations in regulatory genes that might be responsible for major differences in virulence would not have been captured by this study unless those regulatory genes had already been characterized as virulence factors.
Lineage-specific Manila family virulence factor SNPs. The Manila family exhibited a similar number of lineage-specific virulence factor mutations as the Beijing family. However, we also identified a group of isolates (EAI5 isolate 5 and a cluster with no SpolDB4 match composed of isolates 71-73) that showed virulence factor SNPs matching the Manila family at some loci, but matching an outgroup at others. Those SNPs that were found only in the Manila family and in this small, ancient clade were presented as Manila family characteristic SNPs rather than Manila family specific SNPs. While not 100% specific to the family, these mutations may still be important for understanding the virulence or transmission characteristics of the Manila family.
Among the notable virulence factor genes hosting the Manila family's lineage specific SNPs, one mutation was found in mce1B. One study showed that when the mce1 operon was knocked out, the resulting infection was unable to enter a state of persistent infection in mouse lungs, instead exhibiting hypervirulence and killing the mice more rapidly than wild-type Mtb. From ex vivo infection of murine macrophages, this was proposed to have resulted from the mutant failing to stimulate a Th-1 mediated immune response that would have induced protective granuloma formation [37]. One Manila family-specific mutation was found five bases upstream of the zinc metalloprotease zmp1 start codon, which might place it within the ribosome binding site (RBS). Mycobacterial ribosome binding sites are typically G and A-rich regions of six to eight base pairs in length, located four to seven base pairs upstream of the start codon [38]. Zmp1 is further of interest as it contains a second Manila family specific mutation, and zmp1's deletion has been shown to result in hypervirulence in a C57BL/6 mouse model [39]. One mutation was found in the gene encoding PknG, which is secreted into the macrophage cytosol upon mycobacterial infection, where it is required for inhibition of phago-lysosomal fusion and mediates persistence under stressful conditions [40,41]. Another mutation was found in pknD, which is required for invasion of brain endothelia, but not lung epithelia or macrophages [42]. While the Beijing family displayed a family-specific mutation in pks7, the Manila family displayed a family-specific mutation nine bases upstream of the gene, possibly affecting the promotor or ribosome binding site. Finally, the Manila family exhibits a mutation in fadD26, whose disruption results in impaired synthesis of phthiocerol dimycocerosates and attenuation in a mouse model [43]. Interestingly, one potential Mtb vaccine that has now completed a phase I clinical trial, MTBVAC, was produced from a double deletion mutant that removed phoP and fadD26 [44].
Manila family characteristic mutations were found in several additional genes. Nucleoside diphosphate kinase NdkA inactivates GTPase Rac1 in macrophages (which reduces production of reactive oxygen species), and knock down of ndkA resulted in increased susceptibility to intracellular killing and reduced persistence in the lungs of infected mice [45]. Another mutation was also observed for fadD26 (in addition to the family-specific mutation), suggesting that FadD26 may be important in directing the virulence of the Manila family. VirS is a transcriptional regulator controlling the mymA operon, which is induced by infection in macrophages, and may be involved in remodeling Mtb's cellular envelope under acidic, intracellular conditions [46]. Finally, a mutation was found in mmpL8, disruption of which resulted in attenuation presenting as significantly increased time to death in a mouse aerosol infection model, corresponding to less efficient suppression of cytokines directing a Th1-type immune response [47].
SNPs in genes of interest were likewise split between lineage-specific and lineage-characteristic for the Manila family. The most notable lineage-specific gene was transcriptional regulator whiB5, which influences the expression of 58 genes, and which our study revealed is truncated due to a premature stop codon in the Manila family [48]. Both chronic and progressive infections have been found to be attenuated in a mouse model when whiB5 was deleted. The mutant was also shown to be unable to resume growth after reactivation from chronic infection. However, as previously noted, human infections with Manila family strains do not appear to display any deficiency in reactivation [49]. Furthermore, two mutations were observed upstream of whiB5's start codon (-27 and -31). These mutations may be located in the -35 promoter sequence, but that is difficult to determine as mycobacteria have little homology in their -35 promoter sequences [38]. Although studies have shown that the -35 region is not essential for promoter function in mycobacteria, manipulation of the region can vary the wild-type promotor activity from as little as 6% to as much as 200% of its original activity [38]. Regardless, it is interesting that while the whiB5-31 mutation was observed only in Manila family isolates, a whiB5-27 mutation was observed in the Manila family, both EAI5 clades, and the ancient three-isolate epidemiological cluster with no SpolDB4 match. Another gene of interest is the oxyR pseudogene. oxyR nucleotide 37 T has been used as a marker for the Manila family and EAI5 lineage [50]. However, this study revealed that oxyR 37 T is not fully exclusive to the Manila family, as the newly identified ancient clade with an EIA5 isolate and a no SpolDB4 match cluster also exhibited the mutation. Finally, a Manila family characteristic mutation was observed in mycobacterial persistence regulator MRPA, which of Mtb's~214 transcriptional regulators, was one of seven shown to be upregulated during nonreplicating persistent stage stage 2 (NRP2) [51]. MRPA had been previously shown to be required for entrance into and maintenance of persistent infection [52].

Relationship of Manila-like isolates to the Manila family
Full genome sequencing provides insight here into the connection of isolates with putative "Manila-like" spoligotypes to the Manila family. A phylogenetic tree determined from those full genome SNP sets has further established their relationships (Fig 1).
In general, the number of spoligotyping/CRISPR spacers lost by an isolate relative to the Manila family did not serve as a predictor of the number of total SNP differences between that isolate and reference Manila family isolate 96121 (Table 7). Multiple examples demonstrating this are presented below. Manila family isolate 89, with its three missing spacers relative to the Manila family's prototypical pattern, displayed the same number of SNPs from 96121 as EAI5 isolate 33 (which lost one spacer and gained three). Furthermore, both of those isolates had fewer SNPs versus 96121 than isolate 2 (chosen to represent a large Manila family epidemiological cluster). The epidemiological cluster represented by isolates 21 and 41 was missing six spacers, but did not display many more SNPs versus 96121. However, other than for EAI5, gain of spacers relative to the Manila family appeared to be a better indication of distance from the Manila family than loss of spacers. EAI5 isolate 5 and a three-isolate no SpolDB4 match epidemiological cluster (isolates 71-73) occupy a branch together, with both gaining the same three spacers, but the one spacer lost by isolate 5 and the two spacers lost by the 71-73 cluster do not overlap. Other isolates whose spoligotypes had no match in SpolDB4 differed from Manila by the largest numbers of SNPs, similar to the Beijing family outgroup isolates. Of those no-match isolates, isolate 8 was selected due to having a large number of spacers gained versus 96121, and it clustered closest to LAM9 isolate 1, while isolate 42, selected to represent a large number of spacers lost, clustered closest to the Beijing family isolates (despite the apparent difference in spoligotypes).
The numbers of SNPs between individual isolates and our Manila family reference genome further clarify the relationships of isolates displaying various spoligotyping patterns to the Manila family. Isolates 69 and 70, selected from a prior study (which used deletion-based analysis) in order to maximize our present study's diversity within the Manila family, expectedly clustered within the other Manila family isolates, and displayed the fewest SNPs versus 96121 (210 and 193 SNPs, respectively). Manila family isolate 89, selected due to losing three of the Manila family's characteristic spoligotyping spacers, likewise clustered with the rest of the Manila family, while Manila family isolate 90 (which lost only a single spacer) was more distinct, and clustered closer to isolate 84 (which lost two spacers and had no SpolDB4 match). Another un-matched, but seemingly Manila-like spoligotype displayed by epidemiologically clustered isolates 21 and 44 (missing seven spacers) clustered within the Manila family. An additional un-matched, but seemingly Manila-like, epidemiological cluster of isolates 30 and 37 (missing seven spacers) clustered furthest from the Manila family, yet still on a distinct branch from the Manila family's known outgroups. Despite this, it was the only "Manila-like" group not to match the Manila family in all of the virulence factor mutations that characterized the family in this study, differing in a SNP at transmembrane serine/threonine-protein kinase D pknD and a SNP 9 bases upstream of Rv1661 polyketide synthase pks7, possibly in the promotor. Furthermore, isolates from this cluster were also observed to exhibit pigmentation when grown on Löwenstein-Jensen medium.
The EAI5 spoligotype revealed interesting results. We have previously identified in other EAI5 isolates that this spoligotype covers both Principle Genetic Groups 1 and 2 [53]. Thus, this EAI5 spoligotype spans evolutionarily distinct clades, possibly as the result of convergent evolution of CRISPR regions due to phage pressure. Both EAI5 isolates in this study were found to be ancient, despite both phylogenetics and virulence SNPs showing them to be quite distinct. While EAI5 isolate 33 clustered with the Manila family, EAI5 isolate 5 clustered with a three-isolate epidemiological cluster with no SpolDB4 match (isolates 71-73). This sub-clade of ancient isolates was roughly split between matching the Manila family and matching outgroup H37Rv at the Manila family's lineage specific and characteristic SNP loci (10 to 13). Some notable virulence factor differences for those isolates include losing the Manila family's truncated transcriptional regulator WhiB5 and one of its two upstream mutations, SNPs in MCE-family protein Mce1B, zinc metalloprotease Zmp1 and its upstream mutation, and one of the two fatty-acid-AMP ligase FadD26 mutations, among others ( Table 6). These differences suggest that this ancient sub-clade displays virulence characteristics different from the Manila family.

Conclusions
This study identified distinct sets of non-silent virulence factor SNPs that were specific to the Beijing and Manila families of Mtb. Further study of these mutations could lead to greater understanding of differences in virulence in these two families. Future work investigating the effects of these SNPs in animal or macrophage infection models, which is beyond the scope of this present study, will be required to fully determine their phenotypes. However, future studies should not be limited to investigating only single SNPs. Mtb lacks singular genomic features that account for host preference, and epistatic interactions and coordination of transcriptional regulation must be considered together to account for Mtb's transmission and virulence [5]. Thus, investigating only single SNPs from a specific family alone (in the absence of the family's other mutations) may be insufficient to produce a phenotype with altered virulence. Additionally, this study utilized only a small (82-isolate) set of Hawaii-focused Mtb isolates. Future work will further establish the lineage-specificity of these mutations by significantly expanding the number of genomes investigated. Although repositories such as the National Center for Biotechnology Information (NCBI) GenBank already hold a large number of Mtb genomes, the lack of paired spoligotyping data attached to those genomes limits their use in this study. A tool for in silico spoligotyping of Mtb genomes, SpolPred, enables identification of spoligotypes from FASTQ sequencing read files, allowing data from repositories such as the NCBI Sequence Read Archive (SRA) to be investigated [54]. However, while Beijing family sequencing data is abundant, Manila family sequencing data currently remains uncommon.