Fig 1.
Overview of human active transposon sequencing (HAT-seq).
(A) Fluorescence-activated cell sorting (FACS) of prefrontal cortex (PFC) nuclei labeled with NeuN. Two populations (NeuN+ and NeuN−) were sorted. (B) Schematic of the nucleotides-shifting design of the HAT-seq method. By adding two, four, or six random nucleotides upstream of L1Hs-specific primer (L1Hs-AC-28), we transformed the library from a uniform phase-0 amplicon library to a mixed library with phase-2, phase-4, and phase-6 amplicons, which remarkedly improved the base calling accuracy in Read 2. (C) HAT-seq libraries were sequenced with paired-end 150-bp reads. After merging paired reads into contigs that fully spanned the L1Hs-genome 3’ junction, genomic locations of each L1Hs insertion were determined by the alignments of their 3’ flanking genomic sequences. NeuN-PE, PE-conjugated anti-NeuN antibody.
Table 1.
Error filters used in the computational pipeline.
Fig 2.
HAT-seq performance evaluation using a positive control.
(A) Representative gel image used for the identification of ACC1-specific insertions based on 3’ PCR analysis. For each site, genomic DNA from ACC1 and ACC2 was amplified using the same protocol and the PCR products were run on the gel side-by-side (left: ACC1; right: ACC2). NTC: negative control. (B) Representative gel image used for the zygosity analysis of ACC1-specific insertions based on full-length PCR. The four sites on the left were homozygous L1Hs insertions and the others were heterozygous L1Hs insertions. (C) The distributions of signal counts (reads with unique start positions) per ACC1-specific insertion closely followed Poisson distributions (chi-squared goodness-of-fit tests). (D) Representative ACC1-specific insertion (ACC1_132 at chr21:29069173) in 1%, 0.1%, and 0.01% spike-in libraries. Read coverage and supporting signal counts (unique start positions were indicated by black arrows) were positively correlated with the spike-in concentration. (E) The effectiveness of error filters. 64 ACC1-specific germline insertions in 1%, 0.1%, and 0.01% spike-in libraries were considered as “true positives”; all other signals were considered as “false positives”, which might include both background noise and some true somatic insertions present in the blood gDNA.
Fig 3.
Profiling of somatic L1Hs insertions in multiple human tissues.
(A) The density distributions of L1 EN motifs around L1Hs integration sites. L1 EN motifs included seven specific motifs (TTAAAA, TTAAGA, TTAGAA, TTGAAA, TTAAAG, CTAAAA, TCAAAA). “Evrony KR” and “Evrony KNR” are germline L1Hs insertions identified in Evrony et al. 2012. (B) The density distributions of poly-A tail length for each category of L1Hs insertion. (C) The PCR validation scheme and locations of primers used. (D)–(H) Representative gel images of 3’ nested PCR validation for putative clonal somatic insertions. The Integrative Genomics Viewer screenshots for (D)–(F) showed the coverage track (gray) and the alignment track (blue for read strand [–]; red for read strand [+]) from HAT-seq data. Black arrows indicated bands with target size. 1Kb +: 1 Kb Plus DNA ladder. (I) Polymorphic poly-A tail sizes of clonal somatic insertions measured by capillary electrophoresis. Top: fibroblast-specific somatic L1Hs insertion at chr4:89253789 from Rett patient UMB#4516. Bottom: heart-specific somatic L1Hs insertion at chr10:545758 from Rett patient UMB#1420.
Table 2.
Overview of postmortem human tissues.
Fig 4.
A full-length heart-specific L1Hs insertion (1571_chr3:2944507) in a healthy individual.
(A) The agarose gel image of 5’ junction nested PCR validation for the heart-specific L1Hs insertion in the healthy individual (UMB#1571; upper panel). The locations of primers used in 5’ junction PCR assays were labeled on the top of each lane, where primers with the prime symbol denoted semi-nested PCR assays. The distances between each two adjacent 5’ step-wise primers were labeled on the top (dark blue). The lower panel represented a heterozygous, full-length L1Hs insertion (ACC1_16; S11 Table and S4 Appendix) in 1% ACC1 spike-in gDNA as the positive control. The yellow line highlighted the expected stair-step bands in 5’ junction PCR. 1Kb +: 1 Kb Plus DNA ladder. (B) The Sanger sequencing chromatograms of the 3’ and 5’ junctions of the somatic insertion 1571_chr3:2944507. The L1 EN motif and TSD were indicated by purple and blue lines. (C) Multiple sequence alignment of the 5’ end between the identified somatic insertion and three L1Hs consensus sequences (L1Hs Repbase consensus and two hot L1s in human [L1.3 and L1.4]). (D) The schematic structure of 1571_chr3:2944507. (E) The agarose gel image of “full-length PCR + 5’ junction PCR” assays for 1571_chr3:2944507 and ACC1_16 positive control.
Fig 5.
Abnormal L1Hs mobilization in patients with Rett syndrome.
(A) Percentages of somatic L1Hs insertions in exons and introns. (B) Percentages of somatic L1Hs insertions in exons of long (> 100 kb) and short genes (< 100 kb). (C) Percentages of clonal somatic L1Hs insertions in introns. (D) Percentages of sense-oriented clonal somatic L1Hs insertions. The gray lines in (A) and (C) denoted the expected proportion determined by the exact base-pair count of that specific region relative to the human genome. The gray line in (D) represented the expected proportion if the insertions occurred randomly in both directions. Error bars in (A)–(D) indicated the 95% confidence intervals.
Fig 6.
Genome-wide patterns of somatic and germline L1Hs insertions.
(A) Relative somatic L1Hs content in PFC neurons and non-brain tissue from the same donor. The read count ratio of somatic insertions to germline KNR was calculated and then normalized relative to the average value of non-brain samples. The linked dots represented pairs of brain and non-brain samples obtained from the same individual. (B) Histogram of estimated rate of somatic L1Hs insertions in each of tissue samples from the same donor based on the germline KNR copy number of each individual. (C) Estimated rate of somatic L1Hs insertions for different tissue types and cohorts. A circle with error bar denotes the estimate and the standard error of the mean (S.E.M) of all brain and non-brain samples. (D) Hierarchical clustering of all samples sequenced in this study. Each row represented a sample, and each column represents an L1Hs germline insertion. Black and white squares indicated the presence or absence of insertion, respectively. Column annotations showed categories for known reference (KR; blue), known non-reference (KNR; green), and unknown (UNK; red) insertions. (E) Percentages of sense-oriented germline and somatic L1Hs insertions in transcripts. The gray line represented the expected proportion if the insertions occurred randomly in both directions. Error bars indicated the 95% confidence intervals.