Increasing Nucleosome Occupancy Is Correlated with an Increasing Mutation Rate so Long as DNA Repair Machinery Is Intact

Deciphering the multitude of epigenomic and genomic factors that influence the mutation rate is an area of great interest in modern biology. Recently, chromatin has been shown to play a part in this process. To elucidate this relationship further, we integrated our own ultra-deep sequenced human nucleosomal DNA data set with a host of published human genomic and cancer genomic data sets. Our results revealed, that differences in nucleosome occupancy are associated with changes in base-specific mutation rates. Increasing nucleosome occupancy is associated with an increasing transition to transversion ratio and an increased germline mutation rate within the human genome. Additionally, cancer single nucleotide variants and microindels are enriched within nucleosomes and both the coding and non-coding cancer mutation rate increases with increasing nucleosome occupancy. There is an enrichment of cancer indels at the theoretical start (74 bp) and end (115 bp) of linker DNA between two nucleosomes. We then hypothesized that increasing nucleosome occupancy decreases access to DNA by DNA repair machinery and could account for the increasing mutation rate. Such a relationship should not exist in DNA repair knockouts, and we thus repeated our analysis in DNA repair machinery knockouts to test our hypothesis. Indeed, our results revealed no correlation between increasing nucleosome occupancy and increasing mutation rate in DNA repair knockouts. Our findings emphasize the linkage of the genome and epigenome through the nucleosome whose properties can affect genome evolution and genetic aberrations such as cancer.


Introduction
With the advent of massively parallel DNA sequencing technologies it has become much easier to study and characterize somatic mutations and mutation rates across species [1]. Additionally, there are currently large projects underway attempting to catalog mutations responsible for the initiation and propagation of cancer [2][3][4][5][6][7][8][9]. These massive data sets represent some of the first and best sets for determining the various genomic and epigenomic factors that can affect mutation rates. Preliminary work has shown that various factors can affect regional mutation rates resulting in mutational heterogeneity. Of particular interest, recent work has shown that the mutation rate is strongly correlated with replication timing, transcriptional activity, and chromatin organization [10][11][12]. In eukaryotes, DNA is packaged into chromatin whose fundamental repeating unit is the nucleosome. Taken together, it is not surprising that previous work has demonstrated that nucleosome structure has played a role in human evolution [13]. Additionally, recent work in yeast has shown that nucleosome organization can affect base specific mutation rates [14]. In the context of the above, this study was carried out to further analyze the relationship between nucleosomes and mutation rates.
The nucleosome is comprised of two copies of each of the core histones (H2A, H2B, H3, and H4) wrapped around 147 base pairs (bp) of DNA, with the symmetrical center being called the dyad [15]. Besides being involved in packaging DNA, nucleosome positioning (the genomic location of nucleosomes), nucleosome occupancy (how enriched a genomic location is for nucleosomes), and epigenetic modifications (post-translational modifications of histones and DNA methylation) are thought to play a role in development, transcriptional regulation, cellular identity, evolution, and human disease [13,[16][17][18][19][20][21][22][23][24]. In order to determine its role in affecting mutation rates, we utilized paired-end sequenced Micrococcal Nuclease (MNase) digested DNA from H1 human embryonic stem cells (hESC), yielding~180x depth of coverage of the human genome. A nucleosome occupancy score (NOS) map, at single bp resolution, was then calculated (Methods) [25]. Finally, this nucleosome data was analyzed against a diverse set of genomic features and data sets [1-9, 22, 26-31].

Nucleosomes and human genetic variation, and mutations
We sought to integrate our data with human genetic variation [29,31]. Flagged single nucleotide polymorphisms (SNP) (SNPs deemed as potentially clinically significant with an allele frequency less than 1%) had an increased NOS in comparison to common SNPs ( Fig 1A). By integrating genetic variation data from 1,092 individuals, we calculated average SNP densities, nucleotide diversity (π scores), and the transition to transversion (Ts:Tv) ratio in 1,000 bp bins for 10 equally sized groups of increasing nucleosome occupancy (Fig 1B, S1A and S1B Fig). Intrigued by the increase in the Ts:Tv ratio, the fact that nucleosomes in yeast can affect basespecific mutations, and the observation that on evolutionary time-scales SNPs are more likely to occur within nucleosomes while inversions and duplications are more likely to occur in nucleosome depleted regions (NDR), we sought to address the relationship between increasing nucleosome occupancy and the base-specific mutation rate (MR) in the human genome by strictly following previously used methodology [13,14]. Our H1 single base pair resolution NOS map was used in all subsequent analyses. The ancestral genome was used to define mutations, with analyses kept to non-conserved, non-coding sites with high confidence ancestral allele information [1,9]. Taking into account strand symmetry, we calculated the mutation rate for all 6 types of mutations (A!C, A!G, A!T, C!A, C!G, C!T) for 10 equally sized groups (bins) corresponding to increasing nucleosome occupancy. Nucleosomes suppress three types of mutations but are associated with increased mutations in the three others and an overall increased Ts:Tv ratio (Z-test with Bonferroni correction all p-values < 0.01, Fig 1C and  1D and S2A Fig). These findings are highly consistent with previous work in yeast [14]. Overall, the data demonstrates an increase in the mutation rate for nucleosome favoring DNA B, The average transition to transversion ratio in 1,000 bp bins as a function of NOS, calculated from 1,092 individuals. C, The ancestral transition to transversion ratio calculated for 10 groups corresponding to increasing nucleosome occupancy. D, Normalized base-specific mutation rates (MR) of 10 groups corresponding to increasing nucleosome occupancy. E, Ancestral AA!AG MR in relation to nearest dyad. F, Fast Fourier transform (FFT) of the AA!AG MR. G, Effect of increasing nucleosome occupancy on germline mutations, asterisk denotes statistical significance (p-value < 0.01 by Z-test with Bonferroni correction) between first and last group. nucleotides as previous work by others has shown that the nucleosome core particle is enriched for Gs and Cs and relatively depleted of As and Ts [32]. This is consistent with recent work in yeast that observed selection against nucleosome favoring sequences in NDR and nucleosome disfavoring sequences in nucleosomal DNA [33]. The greatest overall increase was observed in the rate of change from A!G. Intrigued by the possibility that the structure of the nucleosome could be involved in this process, we analyzed the mutation rate at previously well described and evolutionary conserved DNA motifs within the nucleosome core particle. AA dinucleotides are an example of one such motif as they have been shown to be preferentially spaced approximately every 10bp at sites where the minor groove of DNA bends interiorly. As such, we calculated the AA!AG mutation rate and then plotted this rate for the highest NOS group against the closest dyad, revealing that it increases closer to the dyad ( Fig 1E). Interestingly, the mutation rate displays a 10 bp periodic decrease away from the dyad, as calculated by fast Fourier transform (FFT) (Fig 1F). A Fourier transform is a mathematical method, with many different applications, that converts a signal in space into a combination of pure frequencies. As such, FFTs were performed for the AG dinucleotide to more precisely determine if a periodicity (1/frequency) existed, and if so what it is within the nucleosome core particle. This periodicity corresponds to the preferred 10 bp spacing of AA sites, as per theoretical rotational constraints [15]. We then became interested in the overall effect of nucleosome occupancy on mutation rates since this has not been previously done in humans. Calculating mutation rate as a function of nucleosome occupancy revealed a positive correlation of rate with NOS (Pearson's correlation coefficient (PCC) = 0.817, S2B Fig). We repeated this analysis in yeast and found a similar result (data not shown). To further corroborate these findings we repeated our analysis, using the same methodology, on a germline mutation data set generated from an Icelandic population [30]. This same trend was found with germline mutations (Fig 1G).

Nucleosome occupancy and cancer mutations
We then hypothesized that nucleosome occupancy contributes to the heterogeneous nature of cancer mutations. As previously stated, currently there are major efforts underway to use sequencing technology to extensively catalog mutations involved in cancer[2-9]. Furthermore, one resulting conclusion from analyses of these studies is that the cancer mutation rate in the genome is heterogeneous [10]. The large size of these data sets allowed us to calculate these relationships at the level of a single base pair. Hence, in addition to repeating the binning analyses conducted previously, we directly analyzed mutation rates against NOS without binning. We find that the same mutation rate associations are observed within noncoding regions of cancers (PCC = 0.833, Fig 2A). Further characterization demonstrated cancer single nucleotide variants and microindels are enriched within nucleosomes, with a subset of indels being found at the theoretical start (74 bp) and end (115 bp) of linker DNA between two nucleosomes (Fig 2B and 2C). The total cancer mutation rate (non-coding and coding) is also highly correlated with increasing nucleosome occupancy (PCC = 0.989, Fig  2D). Finally, since huge genetic and epigenetic changes can occur in cancer which, in theory, could affect nucleosome occupancy, we sought to validate these findings by calling mutations in H1 cells directly. To this end, we conducted whole genome sequencing and called mutations in the same H1 cells we had used to generate our NOS map. We restricted our analysis to non-coding regions and found the same positive correlation between mutation rate and nucleosome occupancy (S3 Fig). Most interestingly, the PCC of this data set was highly similar to the somatic mutation dataset (0.854 for non-coding regions of H1 cells and 0.833 for the non-coding regions of cancers).

Nucleosome occupancy and DNA repair
These results are consistent with one of three possibilities: a confounding factor correlated with mutation rate which is also incidentally correlated with nucleosome occupancy; a biochemical mechanism mediated through nucleosomes which increases the number of mutations; and high nucleosome occupancy decreases access of the DNA mismatch repair machinery to DNA to fix replication errors and chemically modified nucleotides [34]. While it has been shown that nucleosomes do not entirely block access to the DNA repair machinery, this does not rule out that increased nucleosome occupancy can decrease efficiency of access, leading to an increased mutation rate as a result of less efficient repair [35]. Furthermore, our findings are highly consistent with this possibility since it would also explain our finding that the overall mutation trend is toward more nucleosome favoring bases. In order to test our hypothesis, we used a large data set of yeast DNA repair machinery knockouts consisting of 16 different mutant yeast strains to calculate mutation rates and analyzed it against yeast NOS [36][37][38]. This data demonstrated no correlation between mutation rate and nucleosome occupancy (Fig 3). Overall, these results are consistent with a model in which increasing nucleosome occupancy decreases access of DNA repair machinery to DNA, resulting in an increased mutation rate.

Discussion
We sought to understand the role nucleosomes play in affecting mutation rates, especially as it relates to human cancer and genome evolution. Previous work looking at potential epigenomic or chromatin effects has been done on kilo-or megabase scales. By utilizing our~180x depth of coverage nucleosome map, our analyses allowed us to analyze this relationship at single base pair resolution. We first integrated our data with genetic variation data. Most interestingly, we found an increasing transition to transversion ratio with increasing nucleosome occupancy. This was revealed by analyzing 1000 Genomes data and the ancestral genome in conjunction with our NOS map. This implies that these associations are related to DNA / histone interactions and not just a result of sequencing biases or biases in the 1000 Genomes data set. We kept our analyses to non-coding and non-conserved sites by excluding all areas under mammalian conservation [1]. By calculating base-specific mutation rates from the ancestral genome, we found that increasing nucleosome occupancy is associated with rate changes that are consistent with changes that would select for nucleotides which are favored within nucleosomes.
Under normal physiological conditions, DNA can locally denature to become single stranded. This concept is termed "DNA breathing" [39,40]. This phenomenon is important as "open" or "breathing" regions of DNA are more chemically reactive in comparison to those that are in a double helix. Importantly, the likelihood of a region of DNA to be breathing is inversely proportional to the nucleosome occupancy of that region (the higher the nucleosome occupancy, the lower the likelihood for a region to be breathing). As the different DNA bases have unique chemical reactivities, the nucleotide frequencies within the nucleosome core particle will also influence the mutation rate as a function of nucleosome occupancy. Conversely, there is a selective pressure against bases that are less favored within nucleosomes. The AA!AG mutation rate also corroborates this finding by demonstrating a periodicity within the nucleosome and decreasing at sites corresponding to preferred AA sites within the nucleosome core particle. Previous work demonstrates that nucleosomal DNA has an enriched G/C content [32,41,42]. In the context of these attributes, one would expect the absolute mutation rate of the different mutation types to reflect this. This can appreciated with our data.
We have recently demonstrated that DNA methylation is associated with increasing nucleosome occupancy in the human genome, and in the context that methylcytosines are more likely to undergo spontaneous deamination in comparison to cytosines, we believe that the latter increase in the C to T rate at higher nucleosome occupancies is due to methylated cytosines [41,43]. The two types of mutations with the highest absolute baseline mutation rate (rate within bin "1") are C!T and A!G. These two transition mutations are the most commonly observed mutations in genomes and can be caused by oxidative deamination of Cs and oxidative deamination and tautomerization of As [44]. Given the mechanism of these changes, one would expect a decreasing mutation rate as the NOS increases as this would permit for less DNA breathing and thus less reactivity. The opposite of this was observed for A!G mutations and thus led to our hypothesized mechanism.
Suppression of the mutation rate was observed for C!T mutations. However the decrease is perhaps not as much as one would predict given the previously observed decrease in S. cerevisiae [14]. This can perhaps be explained in part by the increased spontaneous deamination of 5-methylcytosine in comparison to unmethylated cytosines, and that S. cerevisiae has relatively few 5-methylcytosines [43,45]. In addition, increasing 5-methylcytosine content within the nucleosome core particle was correlated with increasing nucleosome occupancy [41]. The decreased mutation rate for C!T mutations in humans as a function of nucleosome occupancy is thus perhaps attenuated by the increased content of 5-methylcytosines in regions with a high nucleosome occupancy.
C!A mutations were the third most common type of mutation at the lowest nucleosome occupancy level. Interestingly, this type of mutation had the greatest fold reduction with increasing nucleosome occupancy. This type of transversion mutation can arise when guanine residues undergo oxidation to become 8-oxoguanine that can then form a Hoogsteen base pairing with adenine [46]. This mismatching can result in G!T substitutions by DNA repair machinery and thus C!A mutations [47]. 2-hydroxyadenine arises when adenine residues undergo oxidation [48]. Previously studies have demonstrated that DNA polymerases can incorporate dAMP opposite 2-hydroxyadenine and thus introduce A!T mutations [49]. With increasing nucleosome occupancy, one would expect less DNA breathing and thus a decreased susceptibility of guanines and adenines to these oxidation reactions and thus C!A and A!T mutations, respectively.
Previous work has indicated a selective pressure for an increase in nucleosome favoring DNA sequences [50,51]. In particular, G/C rich regions are more likely to be associated with increased nucleosome occupancy. Additionally, CC/CG/GC/GG dinucleotides are favored in locations where the minor groove faces away from the histone surface and AA/AT/TA/TT dinucleotides are favored where the minor groove is directed towards the surface of the histones. These selective forces may contribute to the increasing A!C mutation rate as a function of increasing nucleosome occupancy.
The absolute mutation rate of C!G varies the least for all of the different types of mutations. Of note, the lowest mutation rate for this type of mutation was observed for regions with the lowest nucleosome occupancy and was then increased but relatively invariably and marginally. Since nucleosomes favor both Gs and Cs within their core, the C!G mutation rate should be less affected by changes in nucleosome occupancy and the slight increased mutation rate with increasing nucleosome occupancy is probably largely a function of an increased G/C content within the nucleosome core particle Overall, these findings strongly imply that the DNA sequence preferences within the core particle have had an impact on the evolution of the human genome. This is demonstrated by DNA sequences drifting over time to nucleotide compositions that are more favored by nucleosomes, especially in areas characterized by high nucleosome occupancy sans natural selection pressure. These findings are consistent with initial evolutionary analyses and especially with work done in yeast [50].
We then became interested in deducing the overall effect of nucleosome occupancy on mutation rate. When all base specific rates were analyzed together, we found that increasing nucleosome occupancy was associated with an increasing mutation rate. We corroborated this conclusion by performing the same analysis using germline mutation data from an Icelandic population.
To test this correlation on the somatic cell mutation rate, we turned our attention to the cancer mutation rate as the abundance of sequencing data sets can be used to test these associations. We sought to address mutational heterogeneity as a function of nucleosome occupancy as this heterogeneity represents a substantial problem in cancer genomics. In cancer, the coding and non-coding mutation rate increased with increasing nucleosome occupancy. Interestingly, the PCC of the cancer non-coding mutation rate was highly similar to the PCC of the ancestral mutation rate (0.833 and 0.817, respectively), implying that these associations are related to DNA / histone interactions and not artifacts of the mutation data sets used. Additionally, we repeated this analysis by calling mutations in H1 cells directly and found the same positive correlation between mutation rate and nucleosome occupancy. This falls in line with our current unpublished work and previous work that has demonstrated that on a global level nucleosome occupancies are correlated between different cell types [52].
While it is interesting to note that the data appears to show that the germline mutation rate is lower in the first binned group than the mutation rate observed for the lowest somatic mutation groups, it must be stated that the germline mutation rate analysis was generated by binning NOS into 10 equal size bins and is not a direct comparison of mutation rate to NOS. This was done because the germline mutations were very few in number, 4,934 to be exact [30]. Hence, there were not enough data points to accurately quantify the mutation rate for every corresponding NOS score. Additionally, due to the limited nature of the germline data set, making direct comparisons to the somatic data set is difficult due to the fact that the cancer mutation data is comprised of hundreds of data sets. For us, the bigger point, which the data does show, is that the same overall trend is observed in the germline data set. In the future, it would be of interest to find out if there is a difference between germline and somatic mutation rates as it relates to low nucleosome occupancy and what could be potentially driving that variability. Overall, we can surmise that variations in nucleosome occupancy can account for a large proportion of the mutation rate variation in the genome.
While microindels behaved like cancer single nucleotide mutations in relation to nucleosome occupancy, indels were increased at 74 and 115 bp from the dyad, which correspond to the theoretical entry sites of DNA in the linker region between two nucleosomes. These findings suggest that nucleosome architecture can have a substantial impact on cancer mutations by increasing mutation rate within the core particle and influencing the sites of insertions, deletions, and duplications. This is in line with recent data from the Roadmap Epigenomics Project, which demonstrated cell-type specific cancer mutations are influenced by cell-type specific chromatin architecture [53]. Future studies integrating nucleosome occupancy data into mathematical models of cancer genomics may better determine which aberrations are cancer driver mutations.
Finally, we sought to explore a potential mechanism that could explain these findings. Three of the potential mechanisms that could explain our findings are: nucleosome occupancy is associated with another parameter responsible for mutations; nucleosomes biochemically increase mutations; and/or increasing nucleosome occupancy decreases access of DNA repair machinery to DNA, thereby increasing the rate of mutation by decreasing the efficiency of repair. The third possibility seemed most likely based on the totality of our data. The most convincing evidence of this is our findings that, over time, the human genome seems to drift towards nucleosome favoring sequences and the near linear relationship between nucleosome occupancy and mutation rate. In order to test for this possibility, we repeated our analyses using 16 different large data sets from yeast DNA repair knockout strains. In order to eliminate as much bias as possible, we conducted the analysis in the non-coding regions only. Yeast coding mutations were excluded from the final analysis for the following two reasons. First, coding mutations can alter phenotype and therefore be associated with a corresponding change in fitness. As such, variations in selective pressure can alter or bias any analysis of mutation rates. We calculated the non-coding mutation rate for all data sets, because, in theory, these mutations are not under selective pressure that can alter or bias their calculations. Second, it is well known that coding regions have higher nucleosome occupancy than non-coding regions [54,55]. This could significantly bias any analysis on nucleosome occupancy and mutation rates. The main point of our yeast analysis was to demonstrate the loss of this correlation with DNA repair knockouts. In fact, in these strains there was no correlation between nucleosome occupancy and mutation rate. Future biochemical studies are needed to shed light on the exact nature of the interaction between nucleosomes and DNA repair proteins.
In summary, our analyses have revealed that mutation rates are affected by nucleosome occupancy so long as DNA repair machinery remains intact. This association has significantly impacted genome evolution and cancer mutagenesis. Finally, this relationship can partially explain the heterogeneous nature of cancer mutations. Going forward, it will be interesting to integrate this relationship into mathematical models of cancer, with the aim of developing better tools for determining which mutations are driving cancer pathophysiology.

Cell culture
The UC Irvine Human Stem Cell Research Oversight Committee (UCI hSCRO) approved the use of human embryonic stem cells in this study. The H1 human embryonic stem cell line was purchased from WiCell Research Institute, Inc. This one of the first ever human embryonic stem cell lines derived and are approved by the NIH Human Embryonic Stem Cell Registry (http://grants.nih.gov/stem_cells/registry/current.htm) [56]. The NIH Registration Number for H1human embryonic stem cells is 0043. Feeder free cultures of H1 human embryonic stem cells were grown and passaged in mTeSR 1 (STEMCELL Technologies Inc) as previously described and in accordance with ENCODE protocols [26]. In total, approximately 100 million H1 cells corresponding to passages 33-35 were used in experiments.
Generation of mono-nucleosomal DNA sequenced reads H1 cells were subjected to MNase digestion by use of the EZ Nucleosomal DNA Kit (Zymo Research) in accordance with the manufacturer's protocol. The ideal digestion should yield approximately 80% mono-nucleosomal DNA [17][18][19][20]. In order to extract both easily digested nucleosomes and less digestible ones, we titrated the time of digestion in multiple replicates to yield 70% to 90% mono-nucleosomal DNA, with the average being 80% from all replicates combined. We then prepared paired-end libraries from this total mono-nucleosomal DNA with use of the Illumina Paired-End DNA Sample Prep Kit according to the manufacturer's instructions with the following exception. In order to reduce potential PCR amplification bias, we performed two separate PCR reaction steps and combined the product of the two reactions [57,58]. The libraries were then sequenced using PE54 chemistry on the Illumina HiSeq2000 in replicate on two flow cells (R51 and R54). Two biological replicates for H1 were performed, each consisting of six technical replicates.

Alignment and processing of nucleosome maps
Paired-end nucleosomal sequencing data for R54 was aligned to the hg19 reference genome using Bowtie 2 on default settings [59]. Data from R51 was processed similarly with the exception that 25 bases from the 3' end of read 2 were removed as these final cycles produced low Q-scores which caused excess reads to not align properly. All aligned data was processed using SAMtools to yield merged BAM files [60].
Nucleosome occupancy score map generation and calling nucleosomes BAM files were run through the DANPOS algorithm in which reads were clonally cut to remove potential PCR amplification bias, smoothed, and adjusted for nucleosome size to enhance signal to noise ratio, resulting in a nucleosome occupancy score (NOS) for each base in the human genome [25]. DANPOS settings were as follows:-d 150,-a 1,-k 1,-e 1,--paired 1.-d 150 denoted setting the minimal distance between nucleosome dyads to 150 bp. The distance between dyads was set to 150 bp as the average fragment size from our H1 paired-end sequencing dataset was 151 bp (corresponding to 75 bp on either side of a dyad). -a 1 set the resolution of the NOS maps at a single bp and thus obviated any further downstream signal smoothing. The setting -e 1 allows for an edge-finding step to be taken, which estimates the edges of the predicted nucleosomes. -k 1 led to all data from intermediate steps being saved.--paired 1 indicated that the input BAM files were from paired-end sequencing data. We also generated NOS and called nucleosomes for the H1 dataset corrected for MNase digestion bias with use of a genomic control and found no significant differences in sequence preference analyses (data not shown) [32,36,41,61]. For all subsequent analyses we used our original NOS map.

General software used for analysis
Operations on genomic intervals were performed using BEDTools [62]. Fast Fourier transforms were done using MATLAB. Statistics were done in R. Additionally, we made use of in-house Python 2.7, C++, and shell scripts that are available upon request.

Genetic variation
Flagged and common SNP data were downloaded from the UCSC genome browser table [31,63]. 1000 Genomes data was downloaded and processed using the VCFtools software package to calculate SNP densities, pi scores, and the transition to transversion ratio in 1,000 bp bins [64]. For the same bins, we also calculated average H1 NOS. We then partitioned our data into ten equal sized groups corresponding to increasing average NOS and all parameters were averaged within these groups.

Base-specific and total ancestral mutation rate
To ease downstream analysis we took our single bp resolution H1 NOS map and rounded all NOS values to the nearest whole integer and replaced all values above 545 as 545. This new whole integer NOS map was used for all subsequent analyses. We then downloaded coordinates of conserved elements in the human genome and aligned them to hg19 using liftOver [1,63]. We combined these coordinates with gene coordinates from RefSeq and ENCODE blacklist regions [26,65]. We used these combined coordinates to remove all conserved, blacklist, and coding sites from our NOS map. Using the ancestral genome from Ensembl, we calculated the ancestral allele and the current allele for all remaining sites taking into account strand symmetry [9]. Additionally, we removed all sites without known ancestral or current allele information and kept the analysis to sites in which the ancestral allele had a high confidence call according to Ensembl. In total, after this filtering, we ended up with greater than 2 billion bases. For each ancestral base, A or C, the data was broken up into 10 roughly equal sized groups corresponding to increasing NOS. Base-specific MR were calculated for each group as the number of base-specific mutations divided by the total number of bases. Finally, we calculated a total NOS specific MR by dividing the number of mutations by the total number of bases found for each whole integer NOS of 0-545.

Germline and cancer mutations
Germline mutations were downloaded as processed mutation calls with gene coordinates from supplementary information and converted to hg19 by use of liftOver [30,63]. We broke up the human genome into 10 roughly equal sized groups of increasing NOS. The germline mutation rate was calculated for each group by taking the total number of germline mutations in that group and dividing it by the total number of bases. For cancer mutations, we downloaded processed mutation data from six Cancer Genome Atlas Research Network studies and The Catalog of Somatic Mutations in Cancer (COSMIC) and combined all the mutations with two caveats [2][3][4][5][6][7][8]. First, we kept the analysis to one single nucleotide variant (SNV) per genomic coordinate for each cancer type, regardless of the frequency of a mutation within a cancer. SNVs in different cancers were analyzed as multiple mutations per genomic coordinate with the amount of mutations equaling the number of different cancers they were found in. For indels and microindels this same approach was used but for an indel to be excluded both the start and end coordinates must have been the same. Indel and microindel start and stop coordinates were used as the genomic coordinates of the mutation and all bases that fell within these coordinates were ignored for downstream analysis. All insertion or deletion mutations 50 bp or greater were called indels and all those greater than 1 bp but less than 50 were called microindels. As before, to ease downstream analysis we took our single bp resolution H1 NOS map and rounded all NOS values to the nearest whole integer and replaced all values above 545 as 545. From this map we assigned a NOS to each cancer mutation. We initially extracted just the non-coding mutations and calculated NOS specific cancer non-coding MR as the total number of non-coding mutations divided by the total number of non-coding bases found for each whole integer NOS of 0-545. The total cancer mutation rate was calculated as the total number of cancer mutations (coding and non-coding) divided by the total number of bases found for each whole integer NOS of 0-545.

H1 genome sequencing and analysis
We prepared paired-end libraries from H1 DNA with the use of the Illumina Paired-End DNA Sample Prep Kit according to the manufacturer's instructions with the following exception. In order to reduce potential PCR amplification bias, we performed two separate PCR reaction steps and combined the product of the two reactions [57,58]. The libraries were then sequenced using PE150 chemistry on the Illumina HiSeq2500 in replicate on two flow cells (R170 and R171). Two biological replicates for H1 were performed, each consisting of two technical replicates. In accordance with GATK Best Practices, this data was processed using base quality score recalibration, indel realignment, duplicate removal, SNP and INDEL discovery, standard hard filtering parameters, and variant quality score recalibration [66][67][68][69]. Called mutations were processed as above.

Yeast DNA repair knockout strains and mutations
We downloaded yeast genomic sequencing data from 16 mismatch repair knockouts, one control, and MNase-Seq data from the NCBI [19,37,38,70]. The yeast MNase-Seq data was processed through the same pipeline stated above. Using VarScan 2, mutations for all 16 knockouts were called against the control strain [71]. We kept our subsequent analysis to noncoding mutations and calculated mutation rates against yeast NOS, exactly as done for the human data as stated above.
Supporting Information S1 Fig. Mutation density and nucleotide diversity as a function of NOS. A, The average SNP density was calculated as the number of SNPs per 1,000 bp then averaged for each equal sized group corresponding to increasing nucleosome occupancy. Genetic variation data was generated from The 1000 Genomes Project. B, Groups 1-10 correspond to groups with increasing nucleosome occupancy scores (NOS). The π score is a measure of nucleotide diversity and was calculated in 1,000 bp bins. (TIF) S2 Fig. Base-specific and overall mutation rate as a function of increasing NOS. A, Ancestral base-specific mutation rates (MR) calculated for ten equally sized groups corresponding to increasing nucleosome occupancy scores (NOS) with color coded legend for the type of mutation at top, with asterisks denoting statistical significance (p-value < 0.01) between the first and last group.