Fig 1.
Possible outcomes of GT insertions in different genetic features.
(A) A GT consists of direct long terminal repeats (LTRs), a strong splice acceptor (SA), a reporter gene (mCherry) and a poly-adenylation (pA) sequence. A schematic of the 5’ end of a gene, including the promoter region, is also shown. (B) A GT can disrupt a gene by inserting into an exon in the sense orientation (with respect to the coding sequence of the gene), interrupting the coding sequence and causing premature transcriptional termination due to the pA sequence. (C) An antisense GT insertion into an exon interrupts the coding sequence of the gene and typically causes a frameshift mutation that leads to premature translational termination, producing a truncated protein. (D) When a GT integrates into an intron in the sense orientation, the SA causes the reporter gene and pA sequence to be spliced to the preceding exon, inevitably leading to premature transcriptional termination due to the pA sequence. (E) An antisense GT insertion in an intron will typically not disrupt a gene due to the directionality of the SA; however, it could interfere with regulatory elements or with transcripts present on the antisense strand. (F, G) GT insertions in the promoter region of a gene in either the sense or antisense orientation generally do not affect the downstream transcript; however, they could potentially disrupt regulatory elements and alter transcription.
Fig 2.
Schematic of Bin-based Analysis of Insertional Mutagenesis Screens (BAIMS).
(A) The human genome is computationally divided into “bins” (pictured as rectangles with black dotted lines), which encompass contiguous segments of DNA of an equal arbitrary length. Throughout this study, we used bins of 250 bp or 1000 bp in length, depending on the resolution required for the analysis. The boundaries of annotated genetic features, including genes and regulatory elements, are ignored. The depicted fictitious gene is modeled after a RefSeq gene track following the University of California, Santa Cruz (UCSC) genome browser display conventions: coding exons are represented by tall blocks, UTRs by shorter blocks, and introns by horizontal lines connecting the blocks. The arrow indicates the gene’s TSS. (B) Sequencing reads flanking the location of individual GT insertions in the control and selected cell populations from a haploid genetic screen are mapped to the human genome and assigned to the bin that encompasses the location of the insertion. The orientation of each insertion relative to the chromosome is noted. Bins are also annotated with any overlapping genetic features. These include promoter (defined as the 2000 bp upstream of the TSS, indicated by a horizontal dotted line), 5’UTR, CDS, intron, and 3’UTR. The orientation of the feature relative to the chromosome is also noted. (C) For the bin-based analysis, the number and orientation of GT insertions in consecutive bins along any defined portion of the genome (including but not limited to genes) is determined and can be depicted in a histogram (the number of sense GT insertions per bin is arbitrarily shown above the horizontal line labeled “0”, and the number of antisense insertions below), enabling the visualization of insertion patterns at sub-gene resolution. (D) For the gene-based analysis, GT insertions in bins that overlap with genes can be summed to obtain a total insertion count for each gene. The significance of GT enrichment for every gene is calculated by comparing the total number of insertions per gene found in the selected versus the control cell populations (see Materials and methods for details).
Fig 3.
BAIMS identifies atypical GT insertion patterns in screens for regulators of WNT signaling.
(A) Schematic depicting various patterns of GT insertions relative to genetic features in the bins, used for the antisense intronic, upstream, and inactivating insertion enrichment analyses (see text for details). A fictitious gene modeled after a RefSeq gene track, with GT insertions in the sense orientation relative to the gene depicted above the track and in the antisense orientation depicted below it. The antisense intronic insertion enrichment analysis accounts for antisense GT insertions in bins annotated exclusively as intron (depicted in blue) and the upstream insertion enrichment analysis accounts for both sense and antisense insertions in bins annotated exclusively as promoter (depicted in orange). These two classes of insertions had been ignored in previous gene-based analyses of haploid genetic screens [3]. The inactivating insertion enrichment analysis accounts for both sense and antisense insertions in bins annotated as 5’UTR, CDS, or 3’UTR, as well as sense insertions in bins annotated exclusively as intron; these insertions (depicted in black) include all the gene-inactivating insertions used in previous analyses. (B-G) Circle plots depicting the results of antisense intronic (B, C), upstream (D, E), and inactivating (F, G) insertion enrichment analyses for the WNT positive regulator high stringency (B, D, and F) and low stringency (C, E, and G) screens. Circles represent individual 1000 bp bins. The y-axis indicates the significance of GT insertion enrichment in the selected versus the control cells, expressed in units of -log10(FDR-corrected p-value), and the x-axis indicates the 5000 bins with the smallest FDR-corrected p-values, arranged in random order. Circles representing bins with an FDR-corrected p-value < 0.01 are colored and labeled with the name of the gene with which the bin overlaps. Circles representing bins corresponding to the same gene are depicted in the same color. The diameter of each circle is proportional to the number of independent GT insertions mapped to the corresponding bin in the selected cells, which is also indicated next to the gene name for enriched bins.
Fig 4.
Antisense GT insertions in the first intron of TFAP4 disrupt a transcriptional enhancer element and impair WNT signaling.
(A) The histogram indicates the number and orientation of GT insertions mapped to TFAP4 in unsorted cells and in the sorted cells from the WNT positive regulator low stringency screen. Values above the horizontal line labeled “0” indicate sense insertions relative to the coding sequence of the gene, and values below it indicate antisense insertions. The x-axis represents contiguous 250 bp bins to which insertions were mapped (Chromosome 16, 4257249–4273000 bp). Insertions mapped for the different cell populations indicated in the legend are depicted by traces of different colors. A RefSeq gene track for TFAP4 (following UCSC genome browser display conventions, described in the legend of Fig 2A) and an ENCODE track for histone3-lysine27-acetylation, a marker for enhancer activity (taken from the UCSC genome browser), are shown underneath the graph. The black rectangle above the gene track indicates the location of the bin identified in the antisense intronic insertion enrichment analyses of both the WNT positive regulator low stringency and high stringency screens. The black star denotes the position of the antisense GT insertion (located at NC_000016.13:g.4271036_4271037insGenetrap (Dec.2013: hg38, GRCh38) [10]; see S2 File) in the TFAP4GT clonal cell line used for further characterization. A scale bar is provided beneath the gene track for reference. (B) Fold-induction in WNT reporter (median +/- standard error of the median (SEM) EGFP fluorescence from 10,000 cells) following treatment with 50% WNT3A conditioned media (CM). (C) AXIN2 mRNA (average +/- standard deviation (SD) of AXIN2 mRNA normalized to HPRT1 mRNA, each measured in triplicate qPCR reactions) relative to untreated cells. Where indicated, cells were treated with 50% WNT3A CM. (D) TFAP4 mRNA (average +/- SD of TFAP4 mRNA normalized to HPRT1 mRNA, each measured in triplicate qPCR reactions) relative to WT HAP1-7TGP cells. (E) Immunoblot of TFAP4. The middle panel shows a higher exposure of the same blot shown in the top panel, and the bottom panel displays Ponceau S staining of the same blot as a loading control. Molecular weight standards in kiloDaltons (kDa) are indicated to the left of each blot. (F) Histogram of GT insertions mapped to TFAP4 as in (A), with blue and orange boxes depicting the regions within the first intron tested in the transcriptional reporter assays shown in (G). GT insertions were enriched in the genomic region marked in blue (Chromosome 16, 4270498–4271890 bp) but were not enriched in a nearby control region marked in orange (Chromosome 16, 4264430–4265871 bp). (G) Luciferase reporter activity (ratio of firefly to renilla luciferase) in extracts of WT HAP1-7TGP cells transfected with a firefly luciferase gene driven by a minimal promoter alone (vector control, grey bar) or by the same minimal promoter with either of the two regions of TFAP4 shown in (F) cloned upstream (blue and orange bars). Renilla luciferase was driven by a constitutive promoter and serves as a control to normalize for differences in transfection. Bars show the average firefly to renilla luciferase ratio from 4 replicate wells, and circles indicate the ratio for each replicate.
Fig 5.
Antisense GT insertions upstream of LRP6 reduce LRP6 protein expression and impair WNT signaling.
(A) The histogram indicates the number and orientation of GT insertions mapped to LRP6 and to the region ~12.5 kbp upstream of the TSS in unsorted cells and in the sorted cells from the WNT positive regulator low stringency screen. See legend to Fig 4A for details. The x-axis represents contiguous 250 bp bins to which insertions were mapped (Chromosome 12, 12116000–12279249 bp). (B) The histogram shows an expanded view of the 5’ end of LRP6 and the region ~12.5 kbp upstream of the TSS (left of the vertical dotted line), with traces for GT insertions mapped in unsorted cells and in the sorted cells from the WNT positive regulator low stringency screen. The x-axis represents contiguous 250 bp bins to which insertions were mapped (Chromosome 12, 12262500–12279249 bp). The green rectangle above the gene track indicates the location of the LRP6 promoter according to Ensembl and the black rectangle indicates the location of the bin identified in the upstream insertion enrichment analyses of the WNT positive regulator low stringency and high stringency screens. The black and orange stars denote the positions of the antisense GT insertions (located at NC_000012.13:g.12268371_12268372insGenetrap (Dec.2013: hg38, GRCh38) and NC_000012.13:g.12269383_12269384insGenetrap (Dec.2013: hg38, GRCh38) respectively [10]; see S2 File) in the LRP6GT-1(Up) and LRP6GT-2(Up) clonal cell lines, respectively, and the blue star denotes the position of the sense GT insertion (located at NC_000012.13:g.12266072_12266073insGenetrap (Dec.2013: hg38, GRCh38); see S2 File) in the LRP6GT-3(Int) cell line. The inverted histogram below the RefSeq gene track for LRP6 indicates the maximum CAGE read count found in any tissue sample from the FANTOM5 database [11]. (C) Fold-induction in WNT reporter (median +/- SEM EGFP fluorescence from 20,000 cells) following treatment with 50% WNT3A CM. (D) Fold-induction in AXIN2 mRNA (average +/- SD of AXIN2 mRNA normalized to HPRT1 mRNA, each measured in triplicate qPCR reactions) following treatment with 50% WNT3A CM. (E) Quantification of immunoblot analysis of total LRP6 protein (average +/- SD LRP6 intensity normalized to ACTIN intensity from samples run in duplicate) shown as percentage of WT HAP1-7TGP. The blot used for quantification is shown in C in S3 Fig. (F) Cell surface LRP6 protein (median +/- SEM cell surface LRP6 immunofluorescence from 20,000 cells) shown as percentage of WT HAP1-7TGP. (G) LRP6 mRNA (average +/- SD of LRP6 mRNA, measured using two different primer pairs, normalized to HPRT1 mRNA, each measured in triplicate qPCR reactions) shown relative to WT HAP1-7TGP cells.