Mutational impact of APOBEC3A and APOBEC3B in a human cell line and comparisons to breast cancer

A prominent source of mutation in cancer is single-stranded DNA cytosine deamination by cellular APOBEC3 enzymes, which results in signature C-to-T and C-to-G mutations in TCA and TCT motifs. Although multiple enzymes have been implicated, reports conflict and it is unclear which protein(s) are responsible. Here we report the development of a selectable system to quantify genome mutation and demonstrate its utility by comparing the mutagenic activities of three leading candidates—APOBEC3A, APOBEC3B, and APOBEC3H. The human cell line, HAP1, is engineered to express the thymidine kinase (TK) gene of HSV-1, which confers sensitivity to ganciclovir. Expression of APOBEC3A and APOBEC3B, but not catalytic mutant controls or APOBEC3H, triggers increased frequencies of TK mutation and similar TC-biased cytosine mutation profiles in the selectable TK reporter gene. Whole genome sequences from independent clones enabled an analysis of thousands of single base substitution mutations and extraction of local sequence preferences with APOBEC3A preferring YTCW motifs 70% of the time and APOBEC3B 50% of the time (Y = C/T; W = A/T). Signature comparisons with breast tumor whole genome sequences indicate that most malignancies manifest intermediate percentages of APOBEC3 signature mutations in YTCW motifs, mostly between 50 and 70%, suggesting that both enzymes contribute in a combinatorial manner to the overall mutation landscape. Although the vast majority of APOBEC3A- and APOBEC3B-induced single base substitution mutations occur outside of predicted chromosomal DNA hairpin structures, whole genome sequence analyses and supporting biochemical studies also indicate that both enzymes are capable of deaminating the single-stranded loop regions of DNA hairpins at elevated rates. These studies combine to help resolve a long-standing etiologic debate on the source of APOBEC3 signature mutations in cancer and indicate that future diagnostic and therapeutic efforts should focus on both APOBEC3A and APOBEC3B.

Here, we develop a human cellular system for mutation research and use it to compare the mutagenic potential of A3A, A3B, and A3H.The human cell line HAP1 was engineered to express a single copy of the HSV-1 thymidine kinase (TK) gene, which enables the drug ganciclovir to be used to select rare TK mutants and quantify mutation frequencies.Moreover, the TK gene can be amplified readily from ganciclovir-resistant (Gan R ) clones by high-fidelity PCR and sequenced to provide initial assessments of mutation spectra prior to undertaking additional experiments such as whole genome sequencing (WGS) and signature analysis.Using this system, only expression of A3A and A3B cause significant increases in Gan R mutation frequencies.Sanger sequences from panels of individual TK mutant clones show a clear TC-biased mutation pattern including two hotspots with no obvious hairpin structure.WGS of independent clones demonstrates that both A3A and A3B can generate the APOBEC3 mutation signatures SBS2 and SBS13.A3A-and A3B-induced single base substitution mutations are mostly dispersed (non-clustered) throughout the genomes and both enzymes exhibit similar frequencies of mutation in TCW motifs in chromosomal DNA predicted to form nonhairpin versus hairpin structures.However, in comparison to catalytic mutant controls, both enzymes exhibit higher frequencies of APOBEC3 signature mutation in the single-stranded DNA loop regions of predicted hairpin structures.WGS also shows that A3A has a strong (slightly over 70%) preference for triggering cytosine mutations in YTCW sequence motifs, whereas A3B has a weaker (slightly under 50%) preference for the same motif.In comparison, APOBEC3 mutation signature-enriched primary breast cancers show predominantly intermediate frequencies of mutations in YTCW motifs, between 50% and 70%, suggesting involvement from both enzymes.

HAP1-TK-M9 -a human cellular system to report DNA damage and mutagenesis
Model organisms such as E. coli and yeast are powerful systems for studying mutagens including DNA deaminases (e.g., original studies with A3 enzymes [34,[53][54][55][56][57]).However, these model organisms only recapitulate a subset of DNA repair and regulatory mechanisms found in human cells.We therefore sought to combine strengths of both approaches by introducing a single copy of a selectable reporter, the HSV-1 thymidine kinase (TK) gene, into the genome of the human cell line HAP1.Expression of the thymidine kinase (TK) protein confers exquisite sensitivity to the drug ganciclovir and, as for many antimicrobial agents, only TK-mutant, ganciclovir-resistant (Gan R ) cells survive selection by this drug.Gan R mutants can be characterized rapidly by conventional Sanger DNA sequencing because TK is a single open reading frame.Moreover, once informative Gan R mutants are revealed by selection, secondary analyses including WGS can be used to uncover additional and potentially global features of a given mutation process.
The overall experimental workflow is shown in Fig 1A .To generate a "mother" clone of the commercially available HAP1 cell line, Sleeping Beauty (SB)-mediated transposition was used to introduce a single copy of a TK-Neo cassette into the genome [9].Neo R clones were selected with G418, expanded into healthy clonal populations (ca. 10 6 cells/ml), and screened for ganciclovir sensitivity (Gan S ).One mother clone, HAP1-TK-M9, was selected for further studies because it is Gan S , it is mostly diploid (apart from pre-existing chromosome aberrations), it cultures, engineers, and clones well (below), and it has a favorable A3 expression profile (Figs 1B and S1).In particular, RT-qPCR measurements showed that its A3A and A3B mRNA levels are lower than those of the original parent line and that A3H mRNA levels are very low and near the detection threshold (Fig 1B).Genomic DNA sequencing also revealed that this cell line's only A3H allele is haplotype III (ΔAsn15), which is known to produce an unstable protein [58][59][60].

Functionality of human A3s expressed from MLV-based constructs
Plasmid constructs were assembled for these studies in which human A3 expression is driven by a MND promoter from within an MLV-based retroviral construct.An additional feature of these constructs is a downstream puromycin resistance cassette, which facilitates selection of expressing cells.The functionality of each construct was assessed by transfection into 293T cells and, following 24-48 hrs incubation, protein analysis by immunoblotting and ssDNA deaminase activity assays (Fig 1C -1E; custom rabbit anti-human A3A monoclonal antibody validation described in S2 Fig) .Each A3 expressed at the expected kilodalton size and only the wildtype enzymes exhibited catalytic activity.A3H-I was expressed at lower levels than A3H-II and, accordingly, had less ssDNA deaminase activity in soluble extracts (Fig 1E), in agreement with prior reports [58][59][60].
We next used immunofluorescent (IF) microscopy to examine subcellular localization and DNA damage responses triggered by expression of each enzyme 3 days post-transduction into HAP1-TK-M9 cells (without puromycin selection).A3A appeared cell-wide and associated with a large increase in overall staining of the DNA damage marker γ-H2AX as well as an increase in individual γ-H2AX foci (representative images in These results differed from prior reports on A3B and A3H overexpression causing elevated γ-H2AX levels [9,43,48,61], perhaps because expression levels here are lower with the MND promoter (versus strong Tet/Dox-inducible systems) and/or because the HAP1 system is somehow unique and more tolerant/adaptable to expression of these ssDNA deaminases.However, HAP1-TK-M9 cells are indeed capable of a canonical DNA damage response, as demonstrated by γ-H2AX accumulation following treatment with cisplatin (cis-diamminedichloroplatinum II; representative images in  To extend these results to a different cell line, our MLV-based A3 expression plasmids were transfected transiently into HeLa cells and, after 24 hrs incubation, subjected to additional analysis by IF microscopy.As with HAP1-TK-M9 cells, A3A appeared cell-wide (except nucleoli), A3B predominantly nuclear, and A3H haplotype II cytoplasmic with nucleolar accumulations (S4A Fig) .A3H haplotype I was not analyzed here due to low expression and weak ssDNA deaminase activity in HAP1-TK-M9 cells.Interestingly, A3A caused a strong pannuclear increase in γ-H2AX without concomitant focus formation (representative images in S4A and quantification in S4B Fig) .Similar results have been reported for A3A overexpression in other cell types [30,[62][63][64].Interestingly, although a dose-responsive accumulation of γ-H2AX was expected given a wide range of transient transfection efficiencies for individual cells in each reaction, only a weak positive association was found because even low A3A staining cells exhibit high γ-H2AX levels (S4B and S4C Fig) .In contrast, comparatively low levels of nuclear γ-H2AX were observed in cells expressing A3B, A3H, and catalytic mutant derivative proteins, although A3B expression uniquely triggered a modest elevation of nuclear γ-H2AX levels in comparison to the background observed in cells expressing the corresponding catalytic mutant E255A protein (S4A-S4D Fig) .Moreover, as expected from the strong increase in γ-H2AX levels, only wildtype A3A caused significant increases in DNA breakage as quantified by alkaline comet assays (representative images in S4E and quantification in S4F Fig) .Taken together, results with two different cell lines demonstrate that A3A expression induces a strong DNA damage response (high γ-H2AX and DNA breakage), A3B expression triggers a modest DNA damage response (low γ-H2AX and no overt DNA breakage), and A3H or catalytic mutant A3A/B/H expression is indistinguishable from background levels in negative vector control conditions.

TK mutation spectra of HAP1-TK-M9 with A3A, A3B, and A3H
To directly test which A3 enzymes cause genomic mutation in the HAP1-TK-M9 system, 24 independent single-cell derived daughter clones were obtained for A3A, A3B, A3H-I, and A3H-II expressing conditions as well as catalytic mutant and vector controls (Methods, Table 1).A classical fluctuation analysis was performed by growing each single cell clone for 1 month to >10 7 cells, subjecting each population to selection by ganciclovir, and allowing time for single Gan R mutant cells to grow into countable colonies.Vector control conditions yielded a median Gan R mutation frequency of 3 mutants per 5 million cells (mean = 3.6, SEM = 0.56).In contrast, A3A-and A3B-expressing clones caused median Gan R mutation frequencies to rise above 10 mutants per 5 million cells (A3A WT: median = 17, mean = 14, SEM = 2.We next asked what types of genetic alterations led to inactivation of the TK gene in Gan R granddaughter clones derived from the different A3-expressing and control conditions.The TK gene was PCR-amplified from genomic DNA of Gan R granddaughter clones and Sanger sequenced.C-to-T and C-to-G mutations in an APOBEC3-signature trinucleotide motif (APOBEC), all other single base substitution mutations (other SBS), and all insertion/deletion mutations (INDELs) were placed into groups for comparison (red, black, and blue tics in     We next examined the broader sequence context of the 22 A3A-and 19 A3B-induced APO-BEC3 signature mutations that occurred at 5'-TC dinucleotides in TK (S5B and S5C Fig) .In both instances, A was preferred over T at the +1 nucleobase position relative to the mutated C, and this bias was not significantly different between the two enzymes (68% for A3A and 74% for A3B; p = 0.367 by Fisher's exact test).Similarly, no obvious biases were evident at the +2 or -2 nucleobase positions with all four nucleotides observed at similar frequencies for both enzymes.Moreover, even when pyrimidines and purines were grouped for comparison, A3A did not show an overt preference for C/T (Y) or A/G (R) at the -2 or +2 nucleobase positions (51% vs 49% and 52% vs 48%, respectively).Likewise, A3B also failed to show an overt preference for C/T (Y) or A/G (R) at the -2 or +2 nucleobase positions (45% vs 55% and 53% vs 47%, respectively).These similarities underscore the fact that small mutation numbers are primarily useful for delineating major signature differences such as the shifts described above from a heterogeneous pattern in catalytic mutant-or vector control-expressing cells towards a predominantly 5'-TC focused SBS mutation pattern in A3A-and A3B-expressing cells.

WGS shows SBS2 and SBS13 signatures reflecting the intrinsic biochemical preferences of A3A and A3B
Independent granddaughter clones were selected using the workflow described above and RNA sequencing was done to confirm expression of each exogenously expressed A3 construct and compare mRNA levels relative to established breast cell lines and primary breast tumors.All expression values were determined relative to those of the conserved housekeeping gene TBP to be able to compare RNAseq data from different sources (e.g., unrelated cell lines and tumors).First, as expected from utilizing the same expression vector, mRNA levels of exogenously expressed A3A and A3B are similar (i.e., both averaging near 1 TBP equivalent; S6A Fig) .However, these A3A and A3A-E72A mRNA levels in granddaughter clones are over 5-fold higher than the average endogenous A3A expression levels of APOBEC3 signatureenriched breast cancer cell lines BT474 and MDA-MD-453, breast cancer cell lines of the CCLE, or breast tumors of TCGA (S6A Fig) .In comparison, the A3B and A3B-E255A mRNA levels in granddaughter clones are similar to averages reported for breast cancer cell lines of the CCLE and breast tumors of the TCGA, and approximately 2-fold lower than those of BT474 and MDA-MD-453 (S6A Fig) .In other words, A3A is overexpressed in this system and A3B approximates levels observed in breast tumors and cell lines.A3H (haplotype I) mRNA levels showed greater variance but only two clones were analyzed by RNAseq and WGS due to negative results above with IF microscopy experiments and TK mutation analysis.
The mRNA expression levels of the other four A3 genes, as well as AICDA (AID), APO-BEC1, APOBEC2, and APOBEC4, were also quantified and compared with those of A3A, A3B, and A3H (S6B Fig) .Endogenous A3C was expressed at similarly high levels in all granddaughter clones, providing a relatively stable internal control.Endogenous A3F and A3G were expressed at lower but still detectable levels, and endogenous AICDA, APOBEC1, APOBEC2, APOBEC4, and A3D were expressed at very low or undetectable levels.As expected, levels of ectopically expressed A3A, A3B, and A3H mRNA exceeded those in the HAP1-TK-M9 parent clone as well as those in vector expressing granddaughter controls.In addition, protein expression of A3A, A3B, and A3H was confirmed in granddaughter clones by immunoblotting and activity by ssDNA deamination assays (S7A and S7B Fig) .Finally, the integration sites of the TK reporter and MLV-based A3 expression constructs were determined using WGS reads.Data from representative clones demonstrated a single TK-Neo integration site in chromosome 3 between nucleotides 143996100-143996500, which is ~3.5 kbp downstream of the nearest gene (DIPK2A), as well as unique MLV-A3 insertion sites as expected from the independent reactions and low multiplicities of infection that were used initially to establish the clones (S8 Fig).
To investigate mutational differences genome-wide, Illumina short-read WGS was done for randomly selected granddaughter clones.Mutations unique to each granddaughter were identified by calling SBS variations versus the genomic DNA sequence of the HAP1-TK-M9 mother clone.This approach eliminated any somatic variation that accumulated in the Gan S mother clone prior to transduction with each A3 or control expression construct.Thus, all new SBS mutations had to be present in a significant proportion of reads from the granddaughter clones and absent from the reads from the original mother clone and, as such, must have occurred in the presence of an active A3 enzyme or a catalytic mutant control.
In A3A-expressing granddaughter clones, the total number of unique SBSs ranged from 2057 to 5256 This result was confirmed by assessing APOBEC3 signature enrichment scores [50,65], which indicated that 6/6 A3A-expressing clones and 4/5 A3B-expressing clones have significant enrichments of APOBEC3 signature mutations, whereas clones expressing catalytically inactive A3A or A3B, as well as clones expressing A3H-I or vector control have none (S11A Fig) .A complementary bioinformatics analysis, non-negative matrix factorization (NMF [66]), yielded similar results with "signature A" resembling SBS2 and SBS13 in A3A-and A3Bexpressing clones (i.e., APOBEC3 signature) and "signature B" occurring in all clones regardless of A3 presence or functionality (S11B Fig) .Signature B is comprised in part of a C-to-A mutation bias, which may be explained by the CA dinucleotide biased mutagenic activity of ganciclovir [67][68][69] but may also reflect a combination of mutational process and, in both cases, eclipse potential APOBEC3-instigated C-to-A mutational events; S11B  ).First, a bias for +1 A over +1 T emerged in A3A-expressing clones (48.3% > 36.3%),whereas the opposite bias was evident in A3B-expressing clones (33.9% < 41.2%).However, for both enzymes, the percentage of +1 A and T (W) was similar (84.6% and 75.1%, respectively).Second, no significant bias was noted at the +2 position except that guanine is slightly over-represented in pentanucleotide motifs derived from A3A-expressing conditions compared to motifs derived from A3B-expressing clones.Third and most importantly, a strong bias for a pyrimidine nucleobase (C or T) occurred at the -2 position in A3A- expressing clones (72.9% YTC/NTC), which was also evident in the broader APOBEC3 tetranucleotide context (72.7% YTCW/NTCW).This cytosine mutation preference resembles the strong -2 pyrimidine bias reported for human A3A in murine hepatocellular carcinomas [70], the chicken B cell line DT40 [71], and yeast [50,51].Fourth, in contrast, a slight bias for purine nucleobases (A or G) was apparent at the -2 position in A3B-expressing clones (51.1% RTC/ NTC), which was also reflected in the broader APOBEC3 tetranucleotide context (52.8% RTCW/NTCW).This latter result also agreed with prior data from yeast [50,51], but contrasted slightly with the mutation signature detected in tumors derived from human A3Bexpresssing mice (47% RTCW [72]).Possible explanations for this variability are considered in Discussion.

Features of A3A and A3B mutagenesis in the HAP1-TK-M9 system
X-ray structures revealed a U-shaped bend in ssDNA substrates bound by A3A and A3B [16,29], and other studies indicated that similarly bent ssDNA loop regions of hairpins (i.e., DNA cruciform or stem-loop structures) can be preferred substrates for deamination by A3A [52,73,74].To ask whether this preference extends to the HAP1-TK-M9 system described here, we analyzed our A3A and A3B TK PCR sequences and granddaughter clone WGS data for evidence of mutagenesis in the single-stranded loop regions of DNA hairpin structures.First, neither of the two A3A/B mutation hotspots in the TK gene reported above appeared to be part of predicted stem-loop structures.Second, none of the top-100 cruciform structures reported previously to harbor recurring APOBEC3 signature mutations in tumors [52] were mutated in our HAP1-TK-M9 WGS data sets.Third, in global comparisons of base substitution mutations, the frequency of APOBEC3 signature TCW mutations was similar in ssDNA loop regions of predicted hairpin structures versus non-hairpin regions (i.e., APOBEC3 signature mutation events were not enriched in the loop regions of stem-loop structures over those occurring in canonical ssDNA substrates; S13 Fig) .Moreover, the frequency of APOBEC3 signature mutations in hairpin or non-hairpin structures appeared similar (not distinguishable statistically) in A3A-and A3B-expressing conditions.However, interestingly, the frequency of APOBEC3 signature mutations in predicted loop regions of hairpin structures appeared higher in A3A and A3B expressing clones in comparison to all non-catalytic control conditions (S13 Fig; P = 0.0655 and P = 0.0367 by Welch's t-test, respectively).The A3A versus control comparison likely failed to reach statistical significance due to the small number of clones with WGS (n = 6) and the large variance in numbers of TCW mutations in hairpin loop regions in the different clones (range = 1-16 TCW mutations).Nevertheless, these results combined to indicate that single-stranded loop regions of hairpin structures in human chromosomal DNA may be similarly susceptible to deamination by both A3A and A3B.
To assess relative rates of A3A and A3B-catalyzed deamination of experimental hairpin versus non-hairpin substrates, we used purified enzymes and ssDNA substrates representing two previously reported RNA editing hotspots of A3A -SDHB and NUP93 [52,75].The SDHB hairpin is predicted to have a 5 bp stem and a 4 nt loop, and the NUP93 hairpin a 7 bp stem and a 4 nt loop (Fig 4A and 4B).The control oligonucleotides have the same loop region sequences and a randomization of one-half of the nucleobases in the hairpin stem to reduce base-pairing potential.In each case, the linear substrates migrated similarly on native and denaturing PAGE while the hairpin substrates migrated faster by native PAGE and similar to the linear substrates when denatured, thereby confirming the integrity of both hairpins (S14 Fig) .A3A and A3B were affinity-purified from human cells and incubated under single hit conditions with these oligonucleotide substrates over time.First, both A3A and A3B showed a strong preference for deaminating the SDHB hairpin substrate in comparison to a linear control with the same nucleobase content scrambled (~4-and ~8-fold preference, respectively; Fig 4A).A3A showed higher rates of deamination than A3B on both the hairpin and the linear substrate in agreement with prior studies [52].Thus, the relative deamination rates for SDHB substrates were: A3A/hairpin > A3B/hairpin > A3A/linear > A3B linear (120, 41, 31, and 5.1 nM/min, respectively).
However, a different picture emerged from analyses of deamination of NUP93-based substrates Fig 4B ).Rates of A3A-catalyzed deamination were high for both the NUP93 hairpin and linear control with the same nucleobase content scrambled.In contrast, A3B showed higher rates of deamination of the linear substrate and was only able to deaminate the hairpin substrate with low efficiencies and linear kinetics.Thus, the relative deamination rates for NUP93 substrates are: A3A/hairpin = A3A/linear >> A3B linear > A3B/hairpin (93, 83, 7.7, and 1.3 nM/min, respectively).These results combined to show that both A3A and A3B can deaminate hairpin and linear ssDNA substrates and, further, that it is possible to identify sites such as the NUP93 hairpin (here with DNA and previously shown with RNA and DNA [52]) that are strongly (and perhaps even exclusively, in a few instances) preferred by a single A3 enzyme.
Another feature of APOBEC3 mutagenesis in human cancer is clusters of strand-coordinated cytosine SBS mutations in TCA and TCT motifs most likely caused by processive deamination of exposed tracts of ssDNA (aka.kataegis; here defined as �2 strand-coordinated APOBEC3 signature mutations within a 10 kbp window) [5,7,66,76].No APOBEC3 signature kataegis events were observed in control conditions.However, clear APOBEC3 signature kataegis were evident in the genomic DNA of both A3A and A3B expressing granddaughter clones (e.g., Fig 4C ).For instance, one A3A-attributable kataegis event was comprised of 5 T (C>T/G)W mutations, and an A3B-attributable kataegis event included 3 T(C>T/G)W mutations.Interestingly, however, the frequency of APOBEC3 signature kataegis did not differ significantly between A3A-and A3B-expressing granddaughter clones (A3A: 4 events; A3B:1 event; p = 0.14 by Welch's t-test).These results show that both A3A and A3B can cause kataegis in a human cell line (also see S15 Fig for additional analyses and depictions of dispersed versus clustered single base substitution mutations).

APOBEC signature etiology in primary breast tumors
Sequencing data from model systems such as HAP1-TK-M9 are powerful because the resulting mutation signatures can help to establish cause-and-effect relationships for comparison to more complex tumor WGS data sets to identify similarities and, potentially, to infer the precise source of an observed mutation signature in individual tumors.We therefore performed an unsupervised clustering analysis to compare the pentanucleotide cytosine mutation signatures derived from sequencing A3A, A3B, and A3H expressing HAP1 clones and those from primary breast tumors with WGS available through the ICGC data portal resource.This analysis revealed three distinct tumor groups with respect to APOBEC3 signature mutations: 1) a group that showed similarity to A3A-expressing HAP1-TK-M9 clones, 2) a group that showed similarity to A3B-expressing HAP1-TK-M9 clones, and 3) a group that showed little to no significant APOBEC3 mutation signature (Fig 5).As expected from a lack of substantial APO-BEC3 signature mutations above in the TK reporter or in representative WGS, A3H-I expressing clones and catalytic-inactive clones clustered with non-APOBEC3 signature tumors.
Both the A3A-like and A3B-like groups were comprised of tumors that show significant levels of APOBEC3 signature mutations and correspondingly high enrichment scores (Fig 5 and S1 Table ).However, only a small proportion of the A3A-like group of breast tumors exhibit a percentage of APOBEC3 signature mutations in YTCW motifs that approaches the level inflicted by A3A here in the HAP1-TK-M9 system (i.e., �70%).Similarly strong YTCW preferences have also been reported recently for A3A expression in the chicken B cell line DT40 [71] and for A3A-driven murine hepatocellular carcinomas [70].The lower percentages of APOBEC3 signature mutations in YTCW motifs in most breast tumors therefore suggests that A3A alone may not account for the observed composite APOBEC3 mutation signature and, moreover, that many tumors in A3A-like group may include contributions from another A3 enzyme, most likely A3B.As regards the A3B-like group, most tumors have higher percentages of C-to-T and C-to-G mutations in YTCW motifs than can be explained by A3B alone.For instance, most tumors manifest a greater proportion of YTCW mutations than observed here for A3B in the HAP1-TK-M9 system (47.2%) and for A3B-driven tumors in mice (53%) [72].These intrinsic preferences therefore combined to suggest that the observed percentages of APOBEC3 signature mutations in most breast tumors may be a composite resulting from the combinatorial activities of both A3A and A3B activity.In comparison, the homologous recombination repair deficiency (HRD) signature (SBS3) appeared underrepresented in both the A3A-and A3B-like tumor groups, and the ageing signature (SBS1) occurred in all three groups regardless of the presence or absence of an APOBEC3 mutation signature enrichment (Fig 5).

Discussion
Here we report the development and implementation of a genetic system to investigate mutational processes in the human HAP1 cell line.Like many bacterial and yeast model systems, the HAP1-TK-M9 system enables a uniform cytotoxic selection with ganciclovir such that only TK mutant cells survive.An analysis of clonally derived, A3A and A3B expressing TK mutants by Sanger sequencing of high-fidelity PCR amplicons demonstrated a strong shift in the mutational pattern from a variety of different base substitution mutations in control conditions to a strongly 5'TC-biased pattern (Fig 2).Interestingly, the TK mutation spectra inflicted by A3A and A3B were very similar including two shared hotspots (Q8X, R212K) and no obvious mesoscale features such as palindromic sequences capable of hairpin formation.This result may be due to the limited number of mutable cytosines and TC motifs in TK that confer resistance to ganciclovir (local base composition) and/or to selective pressure.Regardless of the precise molecular explanation, an analogy can be drawn with the mutational spectrum of the PIK3CA gene in breast, head/neck, cervical, and others cancers, which has two prominent APOBEC3 mutation hotspots (E542K, E545K) and no obvious hairpin structures [52,77].These observations combined to suggest that selective pressure has the potential to overshadow the intrinsic preferences of individual APOBEC3 enzymes and complicate assignment of direct cause-and-effect relationships.
Drawing direct connections between A3A and/or A3B and a given mutation, even a prominent hotspot, has been additionally challenging due to the fact that both enzymes can deaminate DNA cytosines in linear substrates as well as single-stranded loop regions of stem-loop structures and, importantly, rates can vary dramatically between different substrates (e.g., Fig 4 and prior biochemical studies [52,73,74]).Thus, the TK-based system described here is capable of yielding informative, rapid, and inexpensive mutation data sets with positive results motivating genome-wide analyses where most mutations are unselected and larger mutation data sets enable broader analyses.For instance, our unbiased WGS analysis of A3A-and A3Bexpressing clones indicated that both enzymes are capable of deaminating the ssDNA loop regions of hairpin substrates at similar frequencies (S13 Fig) .This result was somewhat anticipated by structural studies where both A3A and a mutant A3B catalytic domain were shown to bind to ssDNA in a U-shaped conformation [16,29], but it was also unexpected due to recent reports describing strong RNA and DNA hairpin biases for A3A [52,73].Additional biochemical and WGS studies will be required to confirm these results as well as extend them to additional cell lines and experimental systems.One potential drawback of the HAP1-TK-M9 system is that an exclusive focus on selected granddaughter clones might overestimate the mutational impact of a given process.However, earlier work also used HAP1 cells to successfully characterize a variety of different mutation sources, but these studies did not leverage the power of a lethal genetic selection nor did they address the enzyme(s) responsible for APOBEC3 signature mutations [78,79].
Taken together, the HAP1-TK-M9 studies here have demonstrated unambiguously that both A3A and A3B can inflict an APOBEC3 mutation signature in human genomic DNA with, in both instances, ssDNA deamination events immortalizing predominantly as C-to-T and C-to-G mutations in TCA and TCT trinucleotide motifs (Figs 2 and 3, and S1 Table ).Over identical month-long timeframes, A3A causes 4-fold more APOBEC3 signature mutations in comparison to A3B (6070 vs 1528 mutations from 6 and 5 subclone WGSs, respectively).This difference may be explained in part by super-pathological A3A expression levels in the HAP1-TK-M9 system (at least 5-fold higher than levels in breast tumors or cell lines) and in part by the higher intrinsic activity of this enzyme in comparison to A3B.Regardless, both A3A and A3B inflicted thousands of TC-focused APOBEC3 signature mutations, which enabled comparisons between extended intrinsic preferences.Most importantly, A3A has a strong preference for a pyrimidine at the -2 position relative to the target cytosine (72.7% YTCW).This -2 pyrimidine bias mirrors original results from human A3A expression in yeast [50,51], as well as recent WGS results from human A3A expression in the chicken B cell line DT40 (~70% YTCW [71]) and from human A3A-induced murine hepatocellular carcinomas (70% YTCW [70]).Also, similar to original studies in yeast [50,51], human A3B showed a slight negative preference for YTCW motifs (47.2% YTCW) and a corresponding enrichment for RTCW motifs (52.8% RTCW; Figs 3B and S9, and S1 Table ).In comparison, recent studies showed that human A3B driven tumors in mice (hepatocellular carcinomas and B cell lymphomas) exhibit an opposite APOBEC3 mutation signature bias with 53% YTCW and 47% RTCW [72].These differences in APOBEC3B local preferences in the different systems may be due to genetic and/or epigenetic factors including but not limited to different base content, chromatin states, DNA repair processes, and/or post-translational regulatory mechanisms.They could also be due to simple stochastic variation and, accordingly, we hypothesize that A3B is non-discriminatory at the -2 position relative to the target cytosine and that this may relate to the amino acid composition of catalytic domain loop 1 residues relative to those of A3A (loop 3, loop 7, and most other active site residues are identical or nearly identical).
Regardless of the precise molecular explanation(s) for differences between experimental systems, the fact that both A3A and A3B can inflict YTCW mutations in human cells helps to inform interpretations of the APOBEC3 enzyme responsible for the overall APOBEC3 mutation program in cancer.For instance, based on comparisons of A3A-and A3B-attributable single base substitution mutation signatures observed here in the HAP1-TK-M9 system and extracted APOBEC3 mutation signatures from 784 breast cancer WGS, it is likely that neither enzyme's preferred motif fully explains the composite signature in most individual tumors (Fig 5 and S1 Table).As regards A3A, the majority of breast tumors have �72.7%APOBEC3 signature mutations in YTCW motifs, which is the overall preference of A3A here in the HAP1-TK-M9 system suggesting the involvement of at least one other A3 enzyme.As regards A3B, most individual breast tumors have a larger proportion of mutations in YTCW motifs than can be explained by A3B alone (�47.2%).Thus, it is likely that both A3A and A3B contribute to the composite single base substitution mutation signature observed in individual APOBEC3 signature enriched breast tumors.However, we cannot exclude the possibility that some breast tumors may be mutated exclusively by A3A or A3B due to factors listed above, or that some breast tumors may have small mutagenic contributions from other APOBEC3 enzymes.Of course, A3B is not a direct factor in A3B-null breast tumors, although the inherited deletion that removes all A3B coding sequences may dysregulate A3A expression [45,46].The ICGC breast tumor data set lacks sufficient A3B-null tumors for a robust SBS signature analysis but a retrospective analysis of 17 null tumors in the TCGA breast cancer cohort indicates a high proportion of APOBEC3 signature YTCW mutations indicative of (but perhaps not exclusively due to) A3A activity (67% YTCW from a total of n = 538 exomic SBS mutations) [46].Although genomic analyses here were focused on breast cancer, the WGS data sets for A3A-and A3B-expressing HAP1-TK-M9 clones including the extracted mutation signatures may also be useful for comparing with APOBEC3 attributable events in other tumor types.
In addition to selective pressures and mesoscale features, additional factors are likely to influence the APOBEC3-attributable fraction of an overall tumor mutational landscape including whether A3A and/or A3B is expressed, expression levels, duration of expression, intrinsic activity, and accessibility of chromosomal DNA (replication stress, R-loop levels, chromatin state, etc.).With regards to studies here with the HAP1-TK-M9 system, both A3A and A3B were expressed constitutively from the same promoter/construct for identical durations prior to ganciclovir selection, A3A is intrinsically more active than A3B, A3A is cell-wide and A3B predominantly nuclear, and yet these enzymes and cellular factors combined to yield remarkably similar TK mutation frequencies and only a 4-fold difference in overall genome-wide SBS mutation level.With respect to cancer, the A3A gene is expressed at lower levels than A3B in almost all cell lines and tumors, A3A is cell-wide or predominantly cytoplasmic where A3B is constitutively nuclear, and A3A has higher enzymatic activity that can vary from 2-to 100-fold above that of A3B depending on substrate (e.g., Fig 4 and prior biochemical studies [11,52]).It is therefore notable here that the overall genome-wide level of APOBEC3 signature mutation from A3A is only 4-fold higher than that attributable to A3B.Endogenous A3A and A3B also have both distinct and overlapping transcription programs, and both genes can be induced by a variety of conditions including viral infection and inflammation [28, 33, 35-37, 39, 42, 61, 80-85].In vivo, A3A and A3B gene expression is also likely to be affected by the local tumor microenvironment, which can vary both between and within cancer types, as well as by a patient's global state of health.Taken together with unknown and likely lengthy multi-year durations of pre-cancer and early cancer development prior to clinical manifestation, deducing the exact fractions of mutations attributable to A3A and/or A3B may be a fruitless endeavor (except in A3B-null tumors).Rather, it may be more prudent to focus on developing strategies to simultaneously diagnose and treat the mutagenic contributions of both enzymes.
Independent whole genome sequencing experiments have provided additional information on the APOBEC3 mutation process.Initial studies induced overexpression of A3B in 293-derived cell lines, documented the resulting DNA damage responses, and performed WGS to assess genome-wide associations [86,87].However, an unambiguous APOBEC3 mutation signature was difficult to extract from these whole genome sequences due to large numbers of mutations attributable to defective mismatch repair [86,87].A more recent study compared de novo mutations occurring in APOBEC3 signature positive cell lines during multiple generations of clonal outgrowth [88].An intriguing finding from this work is that APO-BEC3 signature mutations may be able to occur in an episodic manner, accumulating in some generations and not others, consistent with evidence discussed above that A3A and A3B expression can be induced by multiple signal transduction pathways.Episodic mutagenesis, however, is unexpected in cell-based systems in which continuous and relatively stochastic mutagenesis should predominate given defined media and well-controlled growth conditions.These studies were followed-up more recently by WGS comparisons of subclones of the same cancer cell lines CRISPR-engineered to lack A3A, A3B, or both genes [89].The results of over 250 WGS combined to indicate that A3A may be the source of a significant fraction of observed APOBEC3 signature SBS mutations, A3B a smaller fraction, and another as-yetundefined APOBEC3 enzyme an additional minor fraction.These data are complementary to the major results here, with both A3A and A3B proving capable of generating genome-wide APOBEC3 signature single base substitution mutations.Differences in the overall magnitude of A3A vs A3B mutagenesis may be due to cell line selection and factors described above including differential intrinsic activity, protein expression levels, genomic DNA accessibility, cell culture conditions, and importantly durations of mutagenesis.Both studies were necessarily done in model cellular systems, each with obvious strengths, but neither capable of fully recapitulating the wide repertoire of factors that impact the actual pre-and post-transformation environments in vivo, including anti-tumor immune responses, which are further likely to vary between different tissue types, tumor types, and patients.
A role for A3H, haplotypes I or II, in cancer is disfavored by our results here showing that these variants are incapable of eliciting DNA damage responses or increasing the TK mutation frequency.Two A3H-I expressing TK mutant clones were subjected to WGS and no APO-BEC3 mutation signature was evident.In addition, no specific evidence for A3H emerged from sequencing clonally-derived cancer cell lines [89].However, all of these cell-based studies have limitations as discussed above and have yet to fully eliminate A3H as a source of APO-BEC3 signature mutations in cancer.For instance, A3H-I may take more time to inflict detectable levels of mutation, it may be subject to different transcriptional and post-transcriptional regulatory processes, and/or it may only be mutagenic in a subset of cancer types subject to different stresses and different selective pressures.
Ultimately, the studies here show that A3A and A3B are each individually capable of inflicting a robust APOBEC3 mutation signature in human cells and, taken together with other work summarized above, support a model in which both of these enzymes contribute to the composite APOBEC3 mutation signature reported in many different tumor types.This conclusion is supported by clinical studies implicating A3A and/or A3B in a variety of different tumor phenotypes including drug resistance/susceptibility, metastasis, and immune responsiveness [9,20,21,27,32,42,47,[90][91][92][93][94][95][96].Thus, efforts to diagnose and treat APOBEC3 signature-positive tumors should take both enzymes into account, not simply one or the other.Such longer-term goals are not trivial given the high degree of identity between A3A and the A3B catalytic domain (>90%), the related difficulty of developing specific and versatile antibodies for detecting each enzyme, and the fact that each can be regulated differentially by a wide variety of common factors including virus infection and inflammation.Thus, we are hopeful that the HAP1-TK-M9 system, the whole genome sequences, and the A3A-specific rabbit monoclonal antibody described here will help to expedite the achievement of these goals.

HAP1-TK-M9 system
The HAP1-TK-M9 system was generated by co-transfecting HAP1 parent cells with a plasmid expressing the Sleeping Beauty transposase and a separate plasmid with TK-Neo coding sequences flanked by SB recognition sites [9,98].Semi-confluent cells in 6 well plates were transfected, treated 24 hrs later with G418 (1 mg/mL), and subcloned by limiting dilution in 96 well plates to create single cell derivatives.Single cell clones were then expanded and characterized as described in the main text.The A3H genotype was determined by Sanger sequencing exon-specific PCR amplicons [99,100] and further confirmed by WGS (below).
Standard molecular cloning procedures were used to create derivatives of MLV pQCXIP for expressing each A3 protein (Table 1).First, pQCXIP was cut with MluI and PacI to excise the strong CMV promoter and replace it with a weaker MND promoter (a synthetic promoter containing regions of both the MLV LTR and the myeloproliferative sarcoma virus enhancer).Second, this new construct was cut with SfiI and BsiWI to insert intron-containing A3 coding sequences [70].This was done for A3A, A3B, A3H haplotype-I, A3H haplotype-II, and appropriate catalytic mutant derivatives (E-to-A).An eGFP expressing construct was generated in parallel to use as a control in various experiments.All new constructs were confirmed by Sanger sequencing and functional assays as described in the main text.
Each MLV-based construct was co-transfected into 293T cells with appropriate packaging vectors and 48 hrs later the resulting viral supernatants were filtered (0.45 μm) and used to transduce semi-confluent HAP1-TK-M9 cells (MOI < 0.1).After 48 hrs incubation, transduced cells were selected with puromycin (1 μg/mL) and subcloned by limiting dilution to obtain A3 expressing daughter clones.Each daughter clone expressed only a single, integrated construct, which was anticipated by low MOI transduction and verified by mapping insertion sites for 5 representative daughter clones (S8 Fig) .These A3 expressing and control daughter clones were expanded for 1 month and characterized as described in the main text.No overt growth/proliferation defects were noted, and all granddaughter clones expanded at similar rates.Mutation frequencies were determined by plating 5 x 10 6 cells in 96 well flat bottom plates, treating with 5 μM ganciclovir (ThermoFisher), and after 14 days incubation counting the number of TK mutant colonies that survived selection.Single granddaughter clones were counted using a light microscope and expanded and characterized as described in the main text.

DNA deaminase activity assays
Whole cell extract (WCE) assays: ssDNA deamination activities were measured using WCE prepared using 100 μL HED lysis buffer per 1 million cells (25 mM HEPES, 15 mM EDTA, 10% Glycerol, 1 mM DTT, and 1 protease inhibitor tablet [Roche]).Samples were sonicated in a water bath sonicator to ensure complete lysis.A3-containing lysates were incubated at 37˚C for 1 hr (A3A and A3B) or 4 hrs (A3H) with purified human UNG2 and a ssDNA substrate containing either a single TCA or a single TCT trinucleotide motif (5'-ATTATTATTATTC AAATGGATTTATTTATTTATTTATTTATTT-FAM [A3A and A3B]; 5'-ATTATTATTA TTCTAATGGATTTATTTATTTATTTATTTATTT-FAM [A3H]) following established protocols [9,102].After this initial incubation, the reaction was treated with 100 mM NaOH for 5 min at 95˚C.The reaction was run out on a 15% TBE-urea acrylamide gel to separate substrate oligo from cleaved product oligo and imaged on a Typhoon FLA 7000 with ImageQuant TL 8.2.0 software (GE Healthcare).
Recombinant enzyme assays: A3A-and A3B-mycHis were prepared from transfected 293T cells as reported [14,16,[103][104][105].Single hit kinetics were ensured by incubating 25 nM of each protein with 800 nM substrate in reaction buffer (25 mM HEPES pH 7.4, 50 mM NaCl, 5 mM imidazole) for the indicated times at 37˚C.Reactions were stopped by freezing in liquid nitrogen and then were heated to 95˚C to denature the enzymes.Reactions were then treated with 0.5 U/reaction uracil DNA glycosylase (NEB, USA) for 10 min at 37˚C.The resulting abasic sites were cleaved by incubation with 100 mM NaOH and heating to 95˚C for 5 min.Products were separated by 20% TBE-Urea PAGE, imaged on a Typhoon FLA-7000 (GE Healthcare, USA), and quantified using ImageQuant TL 8.2.0 software (GE Healthcare).Deamination of the target cytosine was calculated by dividing the total reaction product by the total amount of starting substrate.The oligonucleotide substrates were analyzed by both denaturing and native 20% PAGE to determine the extent of hairpin formation.The oligos (and a previously reported NUP93-noHP oligo) were heated to 70˚C and slowly cooled to 37˚C in HEPES buffer as above. 1 pmol of each oligo was then mixed with agarose gel loading dye (30% Ficoll 400 in 1x TAE, xylene cyanol, bromophenol blue) or with DNA PAGE loading dye (80% formamide in 1x TBE, xylene cyanol, bromophenol blue) and separated by native and denaturing PAGE, respectively.

DNA content by flow cytometry
Propidium iodide (PI) staining was used to assess the ploidy of HAP1 clones relative to THP1 (ATCC, Cat#: TIB-202) as a confirmed diploid control [106].Cells were trypsinized and suspended in 100 μL of 1X PBS per 1 million cells.Then, 500 μL ice-cold ethanol was added to cell suspensions and incubated at -20˚C for 1 hr to fix the cells.After fixation, cells were pelleted and washed in 1X PBS.Cells were finally suspended in 500 μL of FxCycle PI stain (Invitrogen) and incubated for 30 min at RT in the dark to stain the cells.Cells were spun down and resuspended in 300 μL of the PI stain solution and placed in a 96 well round bottom plate for flow cytometry analysis using a BD LSRFortessa flow cytometer (with high-throughput 96 well adapter system).A minimum of ten thousand events were acquired for each condition.

RT-qPCR
Total RNA was extracted using the High Pure RNA isolation kit (Roche).cDNA was synthesized using SuperScript First-Strand RT (ThermoFisher).Quantification of mRNA was done using validated primer sets for all human A3 genes relative to the housekeeping gene TBP [9,81,98,99].All RT-qPCR reaction were performed using SsoFast SYBR Green mastermix (Bio-Rad) in 384-well plates on a LightCycler 480 (Roche) following the manufacturer's protocol.Statistical analyses were done using GraphPad Prism 6 and R.

Immunofluorescent (IF) microscopy
IF microscopy was conducted as described [107,108].As a positive control for DNA damage, 4 μM cisplatin (cis-diamminedichloroplatinum II, Selleck Chemical) was incubated with cells for 24 hrs.Cells were grown at low density in 4-chamber, tissue culture-treated glass slides (Falcon) prior to fixation.The cells were then fixed with 4% paraformaldehyde in PBS for 15 min at room temperature.Permeabilization followed using PBS containing 0.2% Triton X-100 (Sigma Aldrich), before rinsing with PBS.The cells were blocked using an IF blocking solution (0.1% triton X-100, 5% goat serum in PBS) for 1 hr at room temperature, then incubated with primary antibody overnight at 4˚C.The primary antibodies used were anti-A3A/B/G (5210-87-13, 1:300) [101], anti-A3A (UMN-13, 1:300 [this study]), anti-A3H (Novus, 1:300) [58], and anti-γ-H2AX (JBW301, Millipore Sigma, 1:500).Following this, cells were washed with PBS and incubated with a fluorophore-conjugated secondary antibody for 1 hr at room temperature in the dark.The secondary antibodies used were Alexa Fluor 488 goat anti-mouse (Invitrogen 1:1,000) and Alexa Fluor 594 goat anti-rabbit (Invitrogen, 1:1,000).Both primary and secondary antibodies were diluted in blocking buffer.Hoechst 33342 (Mirus) was used at a final concentration of 1 μg/mL to stain nuclei for another 15 min, before the slides were washed three times with PBS and mounted using antifade mounting media (Cell Signaling 9071).Images were captured at 60x magnification using a Nikon ECLIPSE Ti2 microscope.The number of γ-H2AX foci per cell were counted for at least 50 cells per condition for each experiment.The nuclear intensity of the γ-H2AX signal and the cellular intensity of each A3 were measured by ImageJ2 software (2.9.0/1.53t).Statistical analyses were conducted in GraphPad Prism 9.

Alkaline comet assays
HeLa cells transfected with individual A3 or control constructs were harvested 24 hrs posttransfection and resuspended in ice-cold 1X phosphate-buffered saline (PBS, Ca 2+ and Mg 2+ free, at a density of 10 5 cells/mL).As a positive control, mock-transfected cells were treated with 2 μM camptothecin (CPT, Millipore Sigma) for 2 hrs prior to harvesting as above.The CometAssay ESII kit was used for all alkaline Comet assays, following the manufacturer's protocol (BioTechne).Cells were resuspended in low-melt agarose and spread at low density on a glass slide to secure the cells in place for lysis.DNA unwinding and electrophoresis were performed according to the manufacturer's alkaline protocol.Comet tail moments were measured for at least 50 cells per condition using the OpenComet plugin for Image J [109].Statistical analyses were done using GraphPad Prism 9.

Whole genome sequencing (WGS) and analyses
Genomic DNA was prepared from cell pellets (1 million cells) using Allprep DNA/RNA mini kit (Qiagen).Whole genome libraries were sequenced 150x2 bp on a NovaSeq 6000 (Illumina) to a target read depth of 30X coverage for all granddaughter clones as well as the parental HAP1-TK-M9 mother clone.Resulting sequences were aligned to the human genome (hg38) using SpeedSeq [110], which relies on the Burrows-Wheeler Aligner, BWA (version 0.7.17).PCR duplicates were removed using Picard (version 2.18.16).Reads were locally realigned around INDELs using GATK3 (version 3.6.0)tools RealignerTargetCreator to create intervals, followed by IndelRealigner on the aligned bam files.Single base substitutions and small INDELs were called in each clone relative to the bam file generated from the HAP1-TK-M9 mother clone using Mutect2 from GATK3 (version 3.6.0).SBSs that passed the internal GATK3 filter with minimum 4 reads supporting each variant, minimum 20 total reads at each variant site and a variant allele frequency over 0.05 were used for downstream analysis.SBSs were analyzed in R (version 4.0.5)using the MutationalPatterns [111] and deconstructSigs R packages (version 1.8.0 [112]).All visualizations were generated using the ggplot2 package (version 3.3.5).The indel landscapes were generated using the MutationalPatterns R package [111] following PCAWG definitions [1].All individual clone data from each condition were pooled for presentation.S2 Table (maf format) provides a tabular list of all single base substitution mutations in the WGS described here.
COSMIC single base substitution mutation signatures (v3 -May 2019 https://cancer.sanger.ac.uk/cosmic/signatures/SBS/) were obtained from https://www.synapse.org/#!Synapse: syn11738319.De-novo non-negative matrix factorization of mutational signatures was performed with the "extract_signature" command from the MutationalPatterns package, with a rank of 2 and 100 iterations.TCW mutation enrichment scores were calculated as described [50,65].Sequence logos of -2 to +2 sequence surrounding C-to-T mutations were created using the ggseqlogo (version 0.1) package.Putative hairpin structures were predicted using the nBMST tool [113] and human genome GRCH38 with a minimum stem length of 6 bases.The loop criteria established by Buisson et al. with ssDNA loop lengths of 3 to 11 nucleotides were used to search for APOBEC3 signature mutations in these regions [52].
Mutation clusters (kataegis) were identified using katdetectr [114].An APOBEC3 kataegis event is so defined if it is comprised of 2 or more APOBEC3 signature mutations within 10 kbp of each other (examples in the context of an intermutation distance plot in Fig 4C).In addition, SigProfilerClusters was used to quantify clusters of single base substitutions [115].This approach for mutation cluster analysis includes all base substitution mutations and, as such, may also include sequencing and bioinformatic errors, DNA polymerase mistakes, and/ or other mutagenic processes in addition to APOBEC [1,116,117].

APOBEC expression and mutation signature analyses in TCGA and ICGC data sets
TCGA primary breast tumors represented by both RNA-seq and whole exome sequencing were downloaded from the Firehose GDAC resource through the Broad Institute pipeline (http://gdac.broadinstitute.org/))for multiple tumor tissue types.APOBEC3 mutation signatures were determined as described [5,65] using the deconstructSigs R package [112].APO-BEC3 mutation enrichment scores were calculated using the hg19 reference genome and published methods [50].Enrichment score significance was assessed using a Fisher exact test with Benjamini-Hochberg false discovery rate (FDR) correction.All data analyses and visualizations were conducted using R and the ggplot2 package (https://www.R-project.org/).clones.Total numbers of single T deletions at homopolymers of 6 or more are too numerous to plot on the same axis and are therefore listed here (A3A, 39.5%; A3B, 44.2%; A3A-E72A, 40.6%;A3B-E255A, 40.0%;A3H, 31.8%; and eGFP, 27.4%).The cosine similarity of the indel landscape across all conditions is over 0.96 indicating no significant differences.8.625 mbp).For A3A data (red boxes) and A3B data (green boxes), the non-hairpin versus hairpin APOBEC3 TCW mutation frequencies are not significantly different (P = 0.25 and P = 0.19, respectively, by Welch's t-test).APOBEC3 TCW mutation frequencies are also not significantly different between the A3A and A3B data sets (red versus green boxes) for both non-hairpin regions (P = 0.087) or predicted ssDNA loop regions of hairpins (P = 0.149).However, in comparisons of A3A (red) and A3B (blue) APOBEC3 TCW mutation frequencies in ssDNA loop regions of hairpins and equivalent data sets from aggregate controls (blue boxes: the catalytic mutant of each protein and eGFP), the A3A data set approaches statistical significance (P = 0.0655) and the A3B data are significantly different (P = 0.0367) by Welch's t-test.support, and Kyle Richards for helping to set-up spectral karyotype analysis.We also thank the UMN Cytogenetics Core for spectral karyotyping and Scott McIvor for sharing a plasmid construct with the MND promoter.
Fig 1F, quantification in Fig 1G and 1H, and additional images in S3 Fig).Expression of an enzymatically inactive mutant, A3A-E72A, exhibited vector control levels of γ-H2AX staining.In comparison, neither expression of wildtype A3B nor A3H (haplotype I or II) was able to trigger statistically significant increases in γ-H2AX levels in HAP1-TK-M9 cells (representative images in Fig 1F, quantification in Fig 1G and 1H, and additional images in S3 Fig).These results were unexpected given that A3B localization is predominantly nuclear, and A3H (haplotypes I and II) can also access the nuclear compartment with some accumulation in nucleoli (Fig 1F and additional images in S3 Fig).
Fig 1F, quantification in 1G and 1H, and additional images in S3 Fig).
Fig 2B, respectively; individual sequence schematics in S5A Fig).TK sequences derived from A3A-expressing granddaughter clones harbored a greater number of APOBEC3 signature mutations [12/20 clones contained at least 1 APOBEC mutation, 22 T(C>T/G)W mutations total, range of 0 to 3 SBS per sequence] relative to catalytic mutant control clones [3/19 clones contained 1 APOBEC mutation, 3 T(C>T/G)W mutations total, range of 0 to 1 SBS per sequence].Similarly, TK sequences derived from A3B-expressing granddaughter clones also harbored a greater number of APOBEC3 signature mutations [11/20 clones contained at least 1 APOBEC mutation, 19 T(C>T/G)W mutations total, range of 0 to 3 SBS per sequence]

Fig 2 .
Fig 2. Characterization of TK mutations in ganciclovir-resistant clones.(A) A dot plot of Gan R colonies generated under the indicated A3 expression or control conditions.Each data point represents the number of Gan R mutants in a single clonal culture (mean +/-SD shown with p-values determined using Welch's t-test).(B) Schematics representing all TK mutations observed under the indicated A3 expression conditions (APOBEC3 signature T(C>T/G)W mutations in red, other SBSs in black, and INDELs in blue; see S5 Fig for schematics of individual TK mutants for these and vector control conditions).Q8X and R212K mutation hotspots are labeled.https://doi.org/10.1371/journal.pgen.1011043.g002 (n = 6, median = 3129, mean = 3494, SD = 1358; S9A Fig).The total number of SBSs in A3B-expressing clones was lower, ranging from 1920 to 2652 (n = 5, median = 2346, mean = 2334, SD = 278).In comparison, the total number of SBSs in catalytic mutant and eGFP control granddaughter clones ranged from 1230 to 2204 (n = 6; median = 1748, mean = 1729, SD = 388).SBS mutations are further broken down into those occurring within NCN and TCW motifs in dot plots in S9B-S9E Fig.Most importantly, analyses of the trinucleotide contexts of all unique SBS mutations revealed strong C-to-T and C-to-G mutation biases in 5'-TCA and 5'-TCT motifs in A3A-expressing clones and weaker, but still significant, mutation biases in the same motifs in A3B-expressing clones (Figs 3A, S9C, S9E and S10).In other words, only A3A-and A3B-expressing granddaughter clones exhibited significant accumulations of APOBEC3 signature single base substitution mutations.
Fig and also evident in trinucleotide breakdowns in Fig 3A).In comparison, genome-wide patterns of insertion/deletion mutations (INDELs) appeared largely unaffected by A3A or A3B (S12 Fig), in agreement with aforementioned TK mutation data where the majority of mutations are single base substitutions.We next analyzed the broader contexts of the 5'-TC-focused C-to-T and C-to-G single base substitution mutations that accumulated in A3A-and A3B-expressing clones (n = 7172 and n = 2033, respectively) in comparison to those that accumulated in aggregate control clones (n = 680) as well as the overall distribution of 5'-TC in the human genome (n = 339619283) (Fig 3B and S1 Table

Fig 3 .
Fig 3. Single base substitution mutation signatures in granddaughter clone genomes.(A) Trinucleotide profiles of pooled SBSs across all clones sequenced for each listed experimental condition (A3A n = 6; A3A-E72A n = 2; A3B n = 5; A3B-E255A n = 2).See S6 and S7 Figs for mRNA and protein level expression confirmation, respectively, and S10 Fig for SBS profiles from each WGS.(B) Pentanucleotide logos depicting -2, +1 and +2 sequence preferences flanking all C-to-T and C-to-G mutated TC motifs in WGS from HAP1-TK-M9 cells expressing A3A or A3B in comparison to aggregate controls (catalytic mutants and eGFP only conditions, which do not show evidence for APOBEC3 signature mutations; S10 Fig).The distribution of nucleobases flanking all TC motifs in the human genome is shown for comparison.https://doi.org/10.1371/journal.pgen.1011043.g003

Fig 4 .
Fig 4. Mesoscale properties of A3A and A3B in vitro and in the HAP1-TK-M9 system.(A-B) Deamination kinetics of A3A and A3B using SDHB and NUP93 DNA hairpin substrates in comparison to corresponding linear controls made by scrambling the 5' or 3' portion of the stem, respectively.See text for full description and S13 Fig for a genome-wide analysis and S14 Fig for a gel-based confirmation of DNA oligonucleotide integrity.(C) Rainfall plots of genome-wide intermutation distances for APOBEC3 signature single base substitution mutations in representative granddaughter clones (C>T mutations are red, C>G black, other SBS gray).See S15 Fig for an additional analysis of dispersed versus clustered single base substitution mutations.https://doi.org/10.1371/journal.pgen.1011043.g004

Fig 5 .
Fig 5.A composite origin of APOBEC3 signature mutations in breast cancer.An unsupervised clustering analysis of similarity between the pentanucleotide SBS profiles from WGSs of the A3A, A3B, and control granddaughter clones described here versus those from primary breast tumor whole-genome sequencing data sets (ICGC, n = 784).The APOBEC3 mutation signature is represented by both enrichment score and SBS2+13 (red), HRD signature as SBS3 (blue), and ageing signature as SBS1 (gray).https://doi.org/10.1371/journal.pgen.1011043.g005 mutation analysis.A list of the total number of SBS mutations in each HAP1-TK-M9 clone and numbers and frequencies mutations predicted to occur in non-hairpin regions or hairpin loop regions of the genome (see Methods for additional information).Columns 1 and 2: The A3 or control condition and clone number as listed in S5 Fig. Column 3: The total number of SBS mutations in each clone by WGS (human genome: 3000 mbp).Columns 4 and 5: The total number and frequency of non-hairpin APOBEC3 signature TCW mutations per clone (estimated non-hairpin genomic DNA: 2991.375 mbp).Columns 6 and 7: Total number and frequency of APOBEC3 signature TCW mutations in 3-11 nucleotide loop regions of predicted chromosomal DNA hairpin structures (estimated ssDNA loop region genomic DNA: of NUP93 and SDHB DNA hairpins and linear structures.Native PAGE (left) and denaturing PAGE (right) analysis of the indicated oligonucleotide substrates.The hairpin substrates migrate faster under native conditions, and their mobility is similar to the linear derivatives under denaturing conditions.The only oligonucleotide not used in biochemical experiments in Fig 4 is the NUP93-noHP (no hairpin), which has half of the stem replaced by adenines (5'-6-carboxyfluorescein-GCAAGCTGTTCAAAAAAATGA) and is included here as an additional control.(TIF) S15 Fig. Single base substitution mutation signatures and distributions in A3A, A3B, and eGFP expressing clones.(A) Trinucleotide profiles of total, clustered, non-clustered single base substitution substitution mutations in representative A3A (top), A3B (middle), and eGFP (bottom) clones.The intermutation distance (IMD) is indicated, together with the expected number number of kataegis events and the actual range of kataegis events observed for clones of each condition determined using SigProfilerClusters (Materials and Methods).(B) IMDs of A3A (top), A3B (middle), and eGFP (bottom) clones.Green lines are based on actual intermutation distances, and red lines are simulated distributions of intermutation distances.95% confidence intervals are shown in pink for the simulated IMD distributions.(TIF)