Gain- and Loss-of-Function Mutations in the Breast Cancer Gene GATA3 Result in Differential Drug Sensitivity

Patterns of somatic mutations in cancer genes provide information about their functional role in tumourigenesis, and thus indicate their potential for therapeutic exploitation. Yet, the classical distinction between oncogene and tumour suppressor may not always apply. For instance, TP53 has been simultaneously associated with tumour suppressing and promoting activities. Here, we uncover a similar phenomenon for GATA3, a frequently mutated, yet poorly understood, breast cancer gene. We identify two functional classes of frameshift mutations that are associated with distinct expression profiles in tumours, differential disease-free patient survival and gain- and loss-of-function activities in a cell line model. Furthermore, we find an estrogen receptor-independent synthetic lethal interaction between a GATA3 frameshift mutant with an extended C-terminus and the histone methyltransferases G9A and GLP, indicating perturbed epigenetic regulation. Our findings reveal important insights into mutant GATA3 function and breast cancer, provide the first potential therapeutic strategy and suggest that dual tumour suppressive and oncogenic activities are more widespread than previously appreciated.


Introduction
High-throughput genome sequencing has allowed the systematic analysis of the complex mutational landscape of tumours and has provided key insights into tumour evolution and cancer etiology [1][2][3]. Mutation patterns in individual genes also reveal important insights into their role in tumourigenesis and can assist in distinguishing driver from passenger mutations [1][2][3][4].
Mutation rates are elevated in protein domains or regulatory sites, indicating their functional importance for cancer development [5,6]. It is typically assumed that all mutations within an individual gene have the same downstream consequences for tumourigenesis. However, at least one notable example challenges this paradigm. Distinct mutations in the TP53 gene (encoding p53) lead to both loss-of-function and gain-of-function, impinging on multiple different pathways [7][8][9][10]. Yet, it is unclear if this type of dual activity of mutant p53 represents an exceptional case or is more common. We hypothesised that mutations in different positions in a cancer gene may result in different downstream consequences. To investigate this, we developed an unbiased computational approach and applied it to breast cancer, as large publicly available data sets are available for this cancer type.
Breast cancer has been studied extensively in terms of its molecular and genetic markers. Its classification into subtypes according to expression of receptors and gene expression profiles is used for diagnostic and prognostic purposes and forms the basis for treatment decisions [11][12][13][14][15][16][17]. Breast cancer is genetically heterogeneous and only four driver genes are mutated in more than 10% of patients [18][19][20][21][22][23][24][25]: PIK3CA (encoding the catalytic subunit of PI3K), CDH1 (encoding E-cadherin), TP53, and GATA3 (encoding GATA-binding protein 3). While the roles of the pro-survival PI3K pathway, cell adhesion, and p53 as the guardian of the genome in tumourigenesis are well studied, comparatively little is known about the role of the equally commonly mutated gene GATA3. To some extent this is due to the relatively recent discovery of the high prevalence of GATA3 mutations [19][20][21][22]26]. In addition, model systems (e.g., cell lines, animal models) to study GATA3 in breast cancer are lacking, hampering functional studies.
GATA3 is a member of the GATA family of transcription factors and forms homodimers that bind conserved hexanucleotide sequences containing the central GATA motif [27][28][29]. It is a master regulator of helper T cell specification [30] and plays a critical role in development and differentiation of various tissues, including the mammary gland [31][32][33]. During normal mammary development, GATA3, together with the estrogen receptor (ER) [34][35][36][37], controls differentiation of the luminal epithelium in the terminal end buds in the breast. In adult tissues, GATA3 helps to maintain the luminal identity [38][39][40][41].
The contribution of GATA3 to cancer is, in contrast, poorly understood. Most of our current knowledge regarding GATA3's potential function in breast cancer has been revealed from genomic studies highlighting an ER/FOXA1/GATA3 co-operating network of transcription factors in luminal tumours [14] and ER-positive cell line models [34,35,37,42,43]. Yet, the observation of GATA3 downregulation during tumour progression and predominant frameshift mutations have led to the view that GATA3 acts primarily as a tumour suppressor [44,45].
In this study, we identify differential functional consequences of mutation types in GATA3. We present evidence that the most common mutation type results in a protein with elongated C-terminus that displays effects consistent with gain-of-function activity in a cell line model. This is highly surprising, as frameshift mutations are generally believed to yield inactive proteins due to premature termination of translation. In addition, we describe a synthetic lethal interaction between this GATA3 mutant and drugs targeting the histone methyltransferases G9A and GLP, providing a first putative therapeutic opportunity for patients carrying GATA3 mutations. Together, our findings demonstrate that different mutations in the same gene can result in differential drug sensitivities and contest the view that GATA3 acts only as a tumour suppressor.

Mutation positions in breast cancer genes are associated with differentially expressed genes
To study mutation patterns in breast cancer, we used publicly available data from The Cancer Genome Atlas (TCGA) [23] and from the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC) [25]. Fig 1A shows the most commonly mutated genes in breast cancer. Somatic mutations in these recurrently mutated breast cancer genes are often mutually exclusive [46,47] (Fig 1A, S1 Table) and distributed in a non-uniform fashion along the gene body ( Fig 1B). The observed patterns are largely consistent between the TCGA and METAB-RIC datasets. For instance, PIK3CA mutations chiefly occur at just two positions corresponding to different protein domains: E545 in a helical regulatory domain and H1047 in the kinase domain [48]. Clear hotspot mutations at single amino acid residues or within narrow regions are also present in TP53, and to some extent in GATA3 [49]. Mutations in GATA3 (Entrez Gene ID: 2625) have not yet been extensively characterised, but the non-uniform distribution and mutual exclusivity with mutations in other cancer genes are strong indicators that GATA3 is a cancer driver gene [25,50] (Fig 1B, S1 Table).
In order to assess potential functional consequences of regional mutation patterns, we devised an unbiased, systematic approach for linking the position of a mutation within a cancer gene with gene expression data. We reasoned that such an analysis could highlight domains in cancer genes that-when mutated-would result in differential downstream effects. First, we extracted from TCGA the genomic position for each mutation found in a patient in the seven selected driver genes, and the non-driver control gene TTN [51], along with the gene expression profiles from the same patients. Next, we used a segmentation approach to identify regions within a driver gene that led to a change in expression levels of another gene (see Materials and Methods). The identification of such patterns would suggest that the mutations in a particular region of the gene are functionally distinct. We termed genes that displayed altered expression along distinguishable segments of driver mutations "response genes" and refer to the border between two segments as a "segmentation breakpoint" ( Fig  1C). Strikingly, we found that the highest number of response genes was associated with GATA3 mutations (Fig 1D, S2 Table), where more than 900 genes displayed a segmented pattern. In comparison, around 200 response genes were linked with PIK3CA mutations, and fewer than 100 response genes were identified for the non-driver control gene TTN. The observation that the majority of response genes displayed a single breakpoint (Fig 1D) suggests that patient-derived GATA3 mutations can be divided into two functionally distinct regions and that mutations in these regions are associated with differential gene expression in tumours. Different GATA3 frameshift mutation types are functionally distinct and affect disease-free survival Most GATA3 mutations (66/99; 67%) in the TCGA dataset are heterozygous frameshift mutations in exon 5 and exon 6 (Fig 2A). Frameshifts in general lead to premature stop codons, which can substantially disrupt protein function. Indeed, approximately 41% (27/66) of the frameshift mutations in GATA3 are predicted to result in an early stop codon ( Fig 2B). These truncated proteins (hereafter referred to as GATA3-trunc) are stable and expressed in tumours [52], and, as GATA3 likely forms a homodimer [29], it is probable that they may act in a dominant negative manner [53,54]. These mutations would thus be consistent with a haplo-insufficient tumour suppressor function of GATA3.
Interestingly, most (39/66, 59%) frameshift mutations in GATA3 result in a protein with an extended C-terminus. These extension mutations occur predominantly (33/39, 85%) in exon 6 and affect the resulting mutant proteins starting from different residues between alanine 395 and glycine 444, with a hotspot (11/39) at proline 409 (Fig 2A and 2B) [49]. The mutations are strongly biased toward the +1 frame (Fig 2A, bottom). This is surprising, as -1 frameshifts in this position would result in a shortened and aberrant C-terminus. The alternative +1 frame alters up to 49 amino acids of the original C-terminus and extends the protein by 63 novel amino acids (hereafter GATA3-ext, Fig 2B). Because frameshift mutations in the TCGA dataset as a whole do not display a frame preference, the bias toward the +1 frame in GATA3 is suggestive of positive selection. One potential explanation for this could be that the GATA3-ext mutation is functionally distinct from other (truncating) mutations, for instance by providing a gain-of-function. Together, this demonstrates that our analysis can identify functional distributions of mutations as well as novel candidate tumourigenic mechanisms.
We next revisited our segmentation analysis to investigate the positional distributions of mutations and segmentation breakpoints (S1A-S1H Fig). In GATA3, the distribution of breakpoints present in single-breakpoint response genes was distinct from the distribution of all mutations ( Fig 2C). It sharply peaked at a position that separated GATA3-ext from GATA3trunc mutations. This is illustrated with BCAS1, the strongest GATA3 response gene (Fig 2D, S3 Table). This indicates that genes like BCAS1 are differentially expressed in tumours that contain the GATA3-ext mutation but not in tumours harbouring the GATA3-trunc mutation.
Other response genes such as SYT17 displayed more complex patterns, but one of the breakpoints often tended to separate the extension and truncation mutants ( Fig 2D). Thus, the differential expression of response genes can functionally define the type of mutation. Together with the observation that the +1 frameshift mutations are under positive selection, this suggests that these GATA3-ext mutations are mechanistically distinct.
To investigate what functional aspects of cancer cells specifically correlate with GATA3-ext mutations, we calculated the association of this mutation class with gene expression levels without performing segmentation. We divided patients carrying GATA3 mutations into two groups: those with GATA3-ext mutations and those with all other types. Corroborating the segmentation analysis, we obtained differentially expressed genes between these groups (S4 Table), which matched many of the previously identified response genes. This confirms that cellular processes are indeed differentially affected in GATA3-ext tumours in comparison to the other GATA3 mutant tumours.
Differential gene expression in GATA3-ext tumours may indicate distinct tumour characteristics and this could affect disease progression and therapeutic response. To address this, we used the previously defined patient groups and performed survival analysis of the TCGA patients. Only GATA3-ext patients progressed during the follow-up period of the cohort but no patients with an alternative GATA3 mutation did. Accordingly, patients with GATA3-ext mutations displayed significantly (p = 0.0029) shortened disease-free survival in the TCGA cohort ( Fig 2E). This indicates that GATA3-ext is a putative biomarker for disease progression and is consistent with the notion that extension mutants have important mechanistic properties that are distinct from other GATA3 mutations.
To assess whether GATA3-ext mutations are associated with similar expression changes in an independent patient cohort, we analysed the METABRIC dataset. The METABRIC cohort carries relatively fewer GATA3-ext mutations and displays a moderately different mutation distribution in GATA3 (compare lower panels of S1D with S1I Fig). Interestingly, substantial differences in GATA3 mutation patterns have also been noted in a Chinese cohort [46] (see Discussion), suggesting that genetic background and environmental factors play an important role in GATA3-driven breast cancer. We further noted that in the METABRIC cohort disease-free survival was similar for patients harbouring GATA3-ext or other GATA mutations, further indicating considerable differences in cohort composition (S1J Fig).
Despite these dissimilarities, we repeated the segmentation analysis in both the METABRIC and TCGA datasets using the GATA3-ext gene signature derived from TCGA (S4 Table). The analysis was limited to 46/50 genes, as expression data for four genes were not present in the METABRIC cohort. As expected, all 46 genes showed segmentation for GATA3 in the TCGA dataset (S1K Fig). Notably, 34/46 signature genes qualified as GATA3-specific response genes in the independent METABRIC dataset ( Fig 2F, S1I Fig). This implies that these genes are differentially expressed in GATA3-ext tumours. Next, we calculated the fold change of the 46 genes in GATA3-ext samples relative to all other GATA3 mutant tumours. Strikingly, the changes in both datasets occurred in consistent directions for all 46 genes (Fig 2G), indicating qualitative agreement between TCGA and METABRIC for the GATA3-ext gene signature. Hence, the distinctive effects of GATA3-ext are recapitulated in an independent breast cancer patient cohort.

GATA3 mutant expression in cell lines
Following investigations of GATA3 mutation types in human patient data, we wished to study these mutations types in vitro in order to understand their functional implications in more detail. An endogenous heterozygous truncating mutation in exon 5 (cDNA:1006insG) in the commonly used MCF7 breast cancer cell line has been previously reported and was shown to decrease DNA binding and increase protein half-life [53,55]. However, we did not find cancer cell lines with GATA3-ext mutations by mining the Cancer Cell Line Encyclopaedia (CCLE) [56] and the Catalogue of Somatic Mutations in Cancer (COSMIC) [57] databases or by analysing 45 breast cancer cell lines by Sanger sequencing and Western blot. The high frequency of GATA3 mutations in breast tumours is thus not well represented in cell lines. We also tested tumour tissue from 10 luminal patient-derived xenograft (PDX) mouse models and did not detect any GATA3-ext mutations either. Although the small sample numbers preclude a strong conclusion, together these results suggest that GATA3-ext mutant cells may not adapt well to ex vivo culture conditions.
The lack of cell lines with naturally occurring GATA3-ext mutations impelled us to search for an alternative model system. To distinguish putative gain-of-function from dominant negative effects, we wished to study GATA3-ext in the absence of wild-type GATA3. We attempted to inactivate the endogenous locus by CRISPR/Cas9 gene editing in several ER-positive breast cancer cell lines (MCF7, T47D and CAMA1), but this did not yield viable homozygous null clones (~150 clones analysed). CRISPR/Cas9-directed replacement of endogenous GATA3 by GATA3-ext was equally unsuccessful (>100 clones analysed). This suggests that at least one copy of wild-type GATA3 is required for viability in these cell lines, which is in accordance with the findings from human cancer samples but complicates the introduction of a mutated version for in vitro models.
To establish an alternative model, we used the non-tumourigenic MCF10A breast epithelial cell line that naturally expresses very low protein levels of endogenous GATA3 (Fig 3A and  3B). We stably expressed wild-type GATA3 (GATA3-wt), GATA3-ext (cDNA:1224insG; p: P409fs) and GATA3-trunc (cDNA:1006insG; p:G336fs) through retroviral transduction and puromycin selection ( Fig 3A). The GATA3-ext protein was stable, albeit expressed at slightly lower levels than GATA3-wt. Importantly, the expression levels of the GATA3 proteins were in the physiological range of endogenous GATA3 observed in various other breast cancer cell lines ( Fig 3B). Furthermore, confocal microscopy showed nuclear localisation for both mutants as well as for the wild-type protein (S2A Fig).
We noted a slight but significant decrease in growth rate of MCF10A GATA3-ext cells as compared to GATA3-wt ( Fig 3C). This was consistent between independent infections of the parental MCF10A cells with titrated virus, excluding an effect of the viral transduction itself. The specific effect of GATA3-ext shows that this mutation affects MCF10A cells' ability to proliferate in standard tissue culture medium conditions. We next performed RNA sequencing on MCF10A GATA3-wt, GATA3-ext, GATA3-trunc or control vector expressing cells to characterise the effects of GATA3 mutations at the cellular level. The expression of GATA3-wt and GATA3-ext resulted in up-or downregulation of 725 and 853 genes, respectively, with respect to the control (p <0.05, FC >1.5), indicating widespread transcriptional changes. In contrast, expression of GATA3-trunc yielded a considerably smaller signature (134 genes), which could be indicative of loss-of-function (Fig 3D, S5 Table). The majority of the GATA3-wt and GATA3-ext signatures consisted of uniquely regulated, non-overlapping genes (Fig 3D, S2B Fig). Accordingly, gene ontology (GO) analysis revealed significant enrichment of gene sets relating to unique terms for each of the GATA3 constructs (S5 Table). For instance, the GATA3-wt gene set is enriched for cytokine-linked processes, whereas the GATA3-ext signature shows a significant enrichment for peptidyl-tyrosine modification processes. These results indicate that expression of GATA3-ext and GATA3-trunc invoke starkly distinct changes in gene expression, and the large number of uniquely regulated genes in GATA3-ext cells supports a gain-of-function of this mutant.
We found a small 4-gene overlap between the TCGA and MCF10A GATA3-ext signatures (S2B and S2C Fig). We validated one of these, the triglyceride metabolism gene PNPLA3, in an independent set of experiments by qRT-PCR and observed consistent downregulation in cells expressing GATA3-ext. (S2D and S2E Fig). This is in agreement with patient tumour data (S2F Fig). Yet, the observation that most signature genes derived from the ER-negative MCF10A cell line model do not overlap with the patient data reflects the well-known biological differences between patient tumour samples and cell culture model systems.
Together, these data indicate that GATA3-ext is functionally active upon overexpression and that GATA3-ext and GATA3-trunc mutants are mechanistically different from each other and from the wild-type protein.

GATA3-ext cells are sensitive to G9A/GLP inhibition
Chemical genetic interactions can reveal therapeutic vulnerabilities and pinpoint cellular processes that are affected by mutant proteins [58,59]. Therefore, we performed a chemical genetic screen to identify compounds that specifically affect GATA3-ext cells. We assembled a smallmolecule library containing~100 approved and experimental anti-cancer drugs, and a number of tool compounds. We used MCF10A cells expressing the GATA3-ext protein or a control vector and exposed them to compounds for 6 days before measuring viability. To mimic limited nutrient supply in a tumour and to render the cells more responsive to drugs, cells were treated under reduced media supplement conditions (S3A and S3B Fig, S6 Table).
We validated this unexpected interaction with short-and long-term treatment and in both full and reduced media supplement conditions (Fig 3G and 3H and S3C Fig). The effective concentration resulting in a 50% inhibition of viability (EC 50 ) for cells expressing the GATA3-ext mutant was consistently 5-to 10-fold lower than in control cells. To determine if this sensitivity was specific for GATA3-ext, we tested the compound on MCF10A cells expressing GATA3-wt or GATA3-trunc. The sensitivity of these cells to the compound was identical to control cells infected with an empty vector (Fig 3I and 3J). Accordingly, MCF7 cells, which heterozygously express GATA3-trunc, display average sensitivity to BIX101294 when compared to a panel of 25 other breast (cancer) cell lines (S3D Fig). Next, we wished to rule out that the observed effects of GATA3-ext overexpression were due to a dominant negative effect. To address this, we depleted endogenous GATA3 by shRNAs and tested if this could phenocopy GATA3-ext expression. The knockdown did not result in enhanced sensitivity to BIX101294 (S4A Fig). We thus conclude that the sensitivity arises from a specific interaction between the drug and the extended GATA3 protein.
GATA3-ext mutations in patients are predominantly heterozygous, and as endogenous GATA3 protein levels in MCF10A cells are very low, we co-expressed GATA3-ext and GATA3-wt to assess whether the presence of GATA3-wt alters the differential toxicity of BIX0124. GATA3-ext+wt cells were equally sensitive as GATA3-ext cells (S4B Fig). This experiment suggests that the GATA3-ext-induced BIX01294 sensitivity is independent of the presence of a wild-type GATA3 allele.
Together, these data further highlight functional differences between GATA3 truncation and extension mutants and imply that extension mutants act by a mechanism that is different from typical loss-of-function or dominant negative effects.
GATA3-ext cell sensitivity is due to on-target inhibition of G9A/GLP G9A and GLP are histone methyltransferases (HMTs) that form a heterodimer and catalyse specific mono-and di-methylation at histone 3 lysine 9 (H3K9) [62]. Di-methylation of this residue is associated with transcriptional repression and has been demonstrated to occur aberrantly at tumour suppressor genes, often coinciding with upregulation of G9A [63]. In the TCGA dataset, EHMT1 and EHMT2 are not differentially expressed in GATA3-ext tumours and do not show a segmentation pattern (S5 Fig).
To assess the specificity of the synthetic interaction between GATA3-ext and G9A/GLP, we tested a second G9A/GLP inhibitor (UNC0638 [64]). Although this compound did not score as a hit in the screen, possibly due to a suboptimal screening concentration, repeated validation showed a similar degree of hypersensitivity of GATA3-ext cells (Fig 4A and 4B and S3C Fig, S4  Fig). Next, we tested a set of inhibitors of various other HMTs and did not detect differential sensitivity (Fig 4C-4F), suggesting that the interaction with GATA3-ext does not occur with histone methyltransferase activity in general. Further, GATA3-ext and control cells were equally responsive to other structurally similar quinazoline compounds not targeting G9A/ GLP, consistent with a specific and on-target effect of BIX01294 and UNC0638 (Fig 4G, S6A  Fig). In order to verify the involvement of G9A and GLP more directly, we depleted them by shRNA in GATA3-wt, GATA3-ext and control cells. Only the viability of GATA3-ext cells was significantly affected (Fig 4H, S6B Fig), suggesting that both enzymes contribute to the sensitivity to BIX01294 and UNC0638.

G9A/GLP inhibitor sensitivity is due to increased apoptosis and is independent of estrogen receptor signalling
To characterise the mechanisms underlying the sensitivity of GATA3-ext cells to G9A/GLP inhibition, we first analysed potential cell cycle effects upon BIX01294 treatment. We did not observe a difference in cell cycle progression between GATA3-ext and control cells as assessed by BrdU incorporation or DNA content (S7 Fig). However, GATA3-ext cells were more prone to undergo apoptosis upon drug treatment than control cells ( Fig 5A).
As GATA3 is functionally linked with ER expression and activity [34][35][36][37], we also assessed the impact of ER signalling on sensitivity to G9A/GLP inhibition in GATA3-ext cells. We expressed ERα in our MCF10A model and confirmed that ER target genes were induced upon ER expression and/or treatment with the ER agonist β-estradiol (E2) (Fig 5B and 5C). The sensitivity of GATA3-ext cells to G9A/GLP inhibition was not significantly influenced by the level  of ER signalling (Fig 5D), suggesting a mechanism that is independent from previously described functional interactions of GATA3.
Protein-extending mutations in cancer genes are unusual but not unprecedented. Recently, frameshift extension mutations in CALR (encoding calreticulin) were identified in myeloproliferative neoplasms [77] and WT1 extension mutants have been described in Wilms kidney tumours [78].
Cancer driver mutations are often divided into gain-of-function and loss-of-function mutations. Loss-of-function mutations result in an inactive or less active protein, whereas gain-offunction mutations lead to a more active protein or acquisition of a different function. Several observations in our study indicate that GATA3-ext proteins are mechanistically distinct from other GATA3 mutants and GATA3 wild-type, hinting toward a gain-of-function: First, GATA3-trunc mutants lack a larger part of the normal GATA3 protein sequence than GATA3-ext. This makes it rather unlikely that GATA3-ext is more perturbed in its normal physiological function than other GATA3 mutants.
Second, in patients, GATA3-ext is associated with the differential expression of a distinct group of response genes that is not affected by other GATA3 mutants. Differential effects on gene expression were also observed in the MCF10A cell line model expressing GATA3-ext and GATA3-trunc.
Third, we have found differences in outcome for patients harbouring GATA3-ext mutations, at least in the TCGA cohort. There, GATA3-ext is associated with reduced disease-free survival compared to other GATA3 mutations, suggesting that these tumours display a different pathology with respect to recurrence. Of note, all GATA3 mutations together correlated with improved disease-free and overall survival in a Chinese patient cohort [46]. GATA3 mutations as a whole displayed a marginally significant trend to improved overall survival only in ER-positive patients in the TCGA and METABRIC cohorts [25,46] but not in a smaller Dutch study [76]. Interestingly, GATA3 frameshift mutations were strongly underrepresented in the Chinese cohort (22% vs. 78% missense mutations) as compared to TCGA (93% vs. 7%). The authors suggest different mutational evolution of luminal breast cancer in different populations as an explanation for these discrepancies, with few Asian patients being included in the TCGA cohort [46]. However, these studies do not discriminate between GATA3-ext, GATA3-trunc or other GATA3 mutations. Our survival analysis indicates that indeed this separation is important, as only GATA3-ext mutations are associated with reduced disease-free survival.
Fourth, we observe strong genetic selection for +1 frameshift mutations, leading to one specific C-terminal extension.
Fifth, GATA3-ext is stable in cells and displays functional characteristics (e.g., drug sensitivities) that are not observed in cells expressing other GATA3 proteins or cells in which GATA3 is depleted.
Taken together, these lines of evidence provide substantial support for the hypothesis that GATA3-ext adopts certain neomorphic functions that might replace or act in addition to its wild-type properties. Importantly, our findings challenge the view that GATA3 only acts as a tumour suppressor that is downregulated or inactivated in breast cancer [14][15][16][19][20][21][22][23][24][25][26]79]. This GATA3-ext gain-of-function hypothesis parallels TP53 mutations in certain aspects, including gain-and loss-of-function in the same gene. For this reason, we have adopted the gain-of-function terminology in analogy to p53 and propose to label GATA3-truncation mutations as primarily loss-of-function and GATA3-extension mutations as gain-of-function. Like GATA3, p53 is a transcription factor that acts as a homo-oligomer, and hence, gain-of-function mutations do not necessarily imply a constitutively active form of the protein, as it is observed for many kinase gain-of-function mutants. Instead, a plethora of different functions for oncogenic p53 have been described, including altered subcellular localization, changed DNA-binding affinities and a different spectrum of binding partners and target genes. Ultimately, these activities can lead to enhanced proliferation, inhibition of apoptosis, chemoresistance, or increased invasiveness [8][9][10].
It remains unclear how GATA3-ext exerts its specific activity. It has been postulated [80] that the GATA3 C-terminus is essential for maintaining protein stability but we did not observe strong differences upon ectopic expression in MCF10A cells. Therefore, an alternative mechanism is likely to underpin GATA3-ext function. For instance, GATA3-ext may display differential binding partners or altered DNA binding sites.
The GATA3-ext protein rendered cells sensitive to inhibition of the G9A and GLP histone methyltransferases. G9A and GLP are upregulated in a number of cancers, correlating with higher H3K9me2 levels and silencing of tumour suppressor genes [63]. Intriguingly, wild-type GATA3 and G9A have been recently found to physically interact [65]. The biochemical and functional interaction of GATA3 with histone methyltransferases may explain the changes of active histone modifications and altered enhancer accessibility in breast cancer cells depleted of GATA3 [37]. Yet, if and how this relates to drug sensitivity specifically in GATA3-ext expressing cells remains unclear.
Our MCF10A cell line model does not fully recapitulate the context of GATA3 mutations in tumours in several ways, among them the heterozygous mutation state and the ER status. Due to lack of a more appropriate model system, we addressed these concerns separately by coexpression and knock-down experiments of mutant and wild-type GATA3 and ESR1 (encoding ER). Even though RNA sequencing data from the MCF10A model only show a marginal overlap with the TCGA patient-derived GATA3-ext signature on an individual gene level, we believe that the MCF10A cell line model provides a valid context to study basic mechanistic differences between GATA3-wt, GATA3-ext and GATA3-trunc. The patient data and MCF10A model agree in that GATA3-ext and GATA3-trunc mutants act in a mechanistically different manner from each other and from the wild-type protein. We believe that this finding is biologically and potentially clinically relevant despite the exact mechanisms not yet being understood. In this regard, the identified synthetic lethal interaction between GATA3-ext and G9A/GLP inhibition provides the first clinically testable hypothesis for application of these drugs and the first lead for a treatment of this major subgroup of breast cancer patients. Thus, further preclinical study of the uncovered gene-drug interaction is warranted.
Together, our study provides important insights into the function and potential druggability of one of the most frequent breast cancer mutants and a striking example of how different mutations in the same cancer driver can result in distinct downstream consequences.

Plasmids, shRNAs and cloning
The GATA3-ext cDNA sequence was synthesised by Epoch Life Science and shuttled into Gateway-compatible pBABE-puro or -neo vectors. GATA3-wt and GATA3-trunc were generated using the QuikChange II Site-Directed Mutagenesis kit in the same plasmid. The ESR1 (ER) sequence was obtained from pEGFP-C1-ER (gift from Michael Mancini, Addgene plasmid #28230) and Gateway-cloned into pBABE-neo. Most shRNA sequences were obtained from The RNAi Consortium (TRC); shGAT3_1 was in pSicoR; shGATA3_2 and shGATA3_3 were cloned into pLKO.1-puro; shG9A_1, shG9A_2, shGLP_1, and shGLP_2 were cloned into pLKO.1-hygro, which was derived from pLKO.1-puro by replacing the puromycin with a hygromycin cassette (BamHI/NsiI). If not indicated otherwise, corresponding empty vectors were used as controls. The control cDNA for the small-molecule screen (pBABE-puro TBX3) was a gift from Thijn Brummelkamp. shRNA sequences are provided in S6 Table. Cell lines MCF10A cells were purchased from ATCC and grown in DMEM/F12 medium (Gibco) +5% horse serum (Gibco) +0.02μg/ml EGF (Sigma) +0.5μg/ml hydrocortisone (Sigma) +0.1μg/ml cholera toxin (Sigma) +10μg/ml insulin (Gibco) +1% penicillin/streptomycin (Gibco) (= full supplements). In "reduced supplement conditions", all ingredients except antibiotics were used at 20% of their original concentration. MCF7 cells were obtained from ATCC and grown in DMEM medium +10% FCS (Gibco or Sigma) +1% penicillin/streptomycin (Gibco). All other cell lines (except HMEC, which were a gift from Christoph Gebeshuber) were obtained from ATCC and cultured in recommended conditions (S6 Table). Cells were cultured at 37°C and 5% CO 2 (except for cells in Leibovitz's L-15 medium, which were cultured without CO 2 ) and regularly tested for mycoplasma infection.
Stable cell lines were generated by retro-or lentiviral infection and subsequent selection with puromycin (2μg/ml, Sigma) or neomycin (geneticin, 500μg/ml, Gibco) for at least 3 days. Viral particles were produced by transfecting the retro-or lentiviral vector and corresponding packaging plasmids (encoding polymerase and envelope proteins) into HEK293-T cells. The supernatant was harvested 48-72 hours post transfection and its virus titer was determined on MCF10A cells. Target cells were then infected with an approximate MOI of 1 in presence of 7-10μg/ml polybrene (Sigma, Millipore).

Viability assays and growth curves
MCF10A cells were seeded in reduced supplement conditions (unless indicated otherwise) and treated with drugs or DMSO for 4 days (dose response curves) or 10 days (colony formation assays) in triplicates. All other cell lines were seeded in their respective full media and treated for 3-11 days until reaching 90% confluency. Alternatively, cells were infected with lentivirus (shGLP/shG9A) on the next day and subsequently selected with hygromycin (50μg/ml, Sigma) for 4 days. Cell viability was measured by luminescent ATP read-out (CellTiterGlo, Promega) and normalised to the control (plotted as lowest concentration point to allow log-calculation), or cells were fixed using 3.7% paraformaldehyde and stained with 0.1% crystal violet in 5% ethanol. Data analysis and Area under curve (AUC) calculations were performed in GraphPad Prism.
For measuring doubling time, cells were seeded at defined numbers (CASY Cell Counter, OMNI Life Science; BioRad TC20 Automated Cell Counter), counted every 4 days and reseeded. Cumulative cell numbers and doubling times were calculated in Microsoft Excel and GraphPad Prism.

Compounds and small-molecule screen
The majority of small-molecules used in this study were purchased from SYNthesis med chem, Selleck and Sigma. Erastin, imatinib and erlotinib were gifts from Georg Winter and Giulio Superti-Furga, TH588 from Ulrika Warpman Berglund and Thomas Helleday, and DBZ JQ-1, IC092605.1 and IC040751.1 from James Bradner. Further information on vendors/sources and concentrations is provided in S6 Table. For the small-molecule screen, cells were seeded in 384-well plates in reduced supplement conditions and compounds were added the next day in quadruplicates at concentrations equalling a previously determined EC 20 . After 6 days, viability was measured as described.
Drawings of chemical structures were generated using Avogadro [81].

Apoptosis analysis
Cells were treated with BIX01294 or DMSO for 5 days and then stained with propidium iodide (0.02μg/μl; Sigma) and  Table. For the analysis of ER target genes, cells were starved for 16 hours and treated with 10nM β-estradiol (E2) for 6 hours in absence of serum and growth factors before total RNA was isolated.

Western blotting
Cells were counted and lysed in reducing sample buffer by boiling. Proteins were separated by SDS-PAGE on 4-12% Bis-Tris-Gels (Invitrogen) and then transferred to Amersham Hybond PVDF membranes (GE Healthcare). Blocking and antibody incubations were carried out in 0.2% I-Block (Tropix) in PBST and membranes were washed with PBS +0.1% Tween-20. HRPcoupled secondary antibodies (goat anti-rabbit or anti-mouse, BioRad, 1:10,000) were detected with Western Lightning ECL Plus (Perkin Elmer) and visualised using a BioRad ChemiDoc or MF-ChemiBIS 3.2 (DNR Bio-Imaging Systems) imaging system. Antibody details are provided in S6 Table. RNA sequencing

Cell cycle analysis
Cells were treated with 1μM BIX01294 for 3 days, followed by a 15-minute BrdU pulse. As control, an S phase arrest was induced with 1μM Camptothecin (Sigma) for 4 hours prior to BrdU incorporation. Cells were fixed in ice-cold 70% ethanol for 24 hours and resuspended in PBS containing 0.5% Tween-20, 10μg/ml propidium iodide and 500μg/ml RNaseA. Incorporated BrdU was detected using a FITC-conjugated anti-BrdU antibody (Becton-Dickson) according to manufacturer's instructions. 20,000 cells were analysed by FACS (BD FACSCalibur). Data analysis was performed with BD CellQuest Pro, FlowJo and GraphPad Prism.

Immunofluorescence microscopy
For immunofluorescence microscopy, cells were plated onto coverslips (VWR) in a 24-well plate. Next day, cells were washed twice with ice-cold PBS and fixed with 4% PFA +0.1% Triton X-100 in PBS for 20 minutes on ice. Cells were permeabilised with 0.5% Triton X-100 in PBS for 20 minutes and blocked with 10% FCS +0.1% Triton X-100 in PBS for 1 hour with three washes between individual steps. Primary (GATA3 D13C9, Cell Signaling #5852, 1:100) and secondary (AlexaFluor 546 goat anti-rabbit, Invitrogen, 1:500) antibodies were diluted in blocking solution and incubated for 1 hour at RT. Finally, cells were stained with DAPI (10μg/ ml, Sigma) for 10 minutes at RT in the dark. Slides were mounted in 85% glycerol and images were acquired on a Zeiss LSM 710 confocal imaging system.
For the initial segmentation analysis based on TCGA, expression and mutation data of the BRCA cohort were downloaded from the TCGA data portal (https://tcga-data.nci.nih.gov/ tcga/). Mutation data were subset to eliminate all mutations marked as "Silent". For each candidate gene with high mutation load, the data were further trimmed to eliminate patients with more than one mutation in that gene. The mutation positions in the gene of interest were ranked and then compared to ranked, normalised expression levels of every expressed gene in the transcriptome. Segmentation was carried out with the R package DNAcopy [88] using a strict cut-off for definition of breakpoints (alpha = 0.005), a corresponding large number of permutations (nperm = 20000), and a strict minimum for segment length (min.width = 5). Genes whose expression across the cohort showed at least one breakpoint along the target gene were termed response genes. Breakpoints found through the segmentation analysis were visualised along the body of the gene using density plots. Densities were computed using breakpoint positions for genes showing one breakpoint and for those showing two breakpoints or more. For GATA3, details for mutation start site and mutation sequence were jointly used to classify patients into groups: protein truncations in the +1 and -1 frames, protein extensions in the +1 and -1 frame, and all other mutations; patients with multiple mutations were excluded. Gene expression comparisons were then performed between the dominant group (protein extensions in the +1 frame) and all other mutant groups combined. In each comparison, we evaluated fold changes by comparing median expression in each group and statistical significance through Wilcoxon tests.
For the validation analysis using the METABRIC dataset, mutation profiles on driver genes and expression data for the GATA3-ext signature genes were obtained from the METABRIC consortium (see Acknowledgments). Mutation profiles were subset to remove mutations marked as "Silent" as before; mutations marked as "RNA" (noncoding substitution variants in untranslated regions of the genes) were also removed. All other calculations were carried out using the same procedures as for the TCGA dataset.
To assess consistency between TCGA and METABRIC on the GATA3-ext signature, we compared fold changes between GATA3-ext and other GATA3-mutant patients in the two datasets. The comparison was targeted on the small gene signature obtained from the TCGA analysis.