The Evidence for Increased L1 Activity in the Site of Human Adult Brain Neurogenesis

Retroelement activity is a common source of polymorphisms in human genome. The mechanism whereby retroelements contribute to the intraindividual genetic heterogeneity by inserting into the DNA of somatic cells is gaining increasing attention. Brain tissues are suspected to accumulate genetic heterogeneity as a result of the retroelements somatic activity. This study aims to expand our understanding of the role retroelements play in generating somatic mosaicism of neural tissues. Whole-genome Alu and L1 profiling of genomic DNA extracted from the cerebellum, frontal cortex, subventricular zone, dentate gyrus, and the myocardium revealed hundreds of somatic insertions in each of the analyzed tissues. Interestingly, the highest concentration of such insertions was detected in the dentate gyrus—the hotspot of adult neurogenesis. Insertions of retroelements and their activity could produce genetically diverse neuronal subsets, which can be involved in hippocampal-dependent learning and memory.

DNA amplification for library preparation was performed in two subsequent suppression PCR steps. See Table 3 for structures of all oligonucleotides used. The first step 25 µl PCR reaction contained 1/50 of the total amount of ligation products, 0.4 µM retroelement-specific primer (AY107 for Alu and 3-L1HS for L1 library preparation), 0.16 µM Na15 primer, 0.02 µM Na15Na21 primer, dNTP (0.125 µM each), 1 U of Encyclo polymerase (Evrogen, Russia) in the reaction buffer. The amplification profile for Alu libraries was as follows: initial end extension for 4 min at 72 о С, followed by 13 or 16 cycles (for Alu and L1 libraries respectively) of 20 sec at reactions were performed for library preparation. PCR products were combined, purified with the QIAquick PCR Purification Kit and concentrated to the volume of 120 µl.
The second step 25 µl PCR reaction contained 1/1200 of the combined first step products, 0.4 µM retroelement-specific primer (AY24 or AY18 for Alu and 3-end-L1 for L1 library preparation), 0.16 µM st19 primer, dNTP (0.125 µM each), 1 U of Encyclo polymerase (Evrogen, Russia) in the reaction buffer. The amplification profile for Alu library preparation was as follows: 10 cycles of 20 sec at 94 о С, 15 sec at 68 о С, 1 min at 72 о С and a final extension for 2 min at 72 о С. The amplification profile for L1 library preparation was: 12 cycles of 20 sec at 94 о С, 15 sec at 65 о С, 1 min at 72 о С and a final extension for 2 min at 72 о С. Thirty identical reactions were performed for library preparation. PCR products were combined, purified with the QIAquick PCR Purification Kit and concentrated to the volume of 60 µl.
DNA concentration was measured by Qubit 2.0. Five hundred ng of DNA from each library obtained with one of the restriction enzymes (which comprises 1/5 of the total amount of DNA in a produced sample) was taken for sequencing.

Sequence mapping and analysis
The analysis of Alu libraries reads consisted of the following steps: 1. Extraction of reads which contained an Alu fragment.
4. Trimming of the Alu fragment from the 5'-end of the reads and pair-end mapping to the reference human genome was performed by Bowtie2. Settings different from default were: -p 8 -X 600 -k 2 --no-mixed --no-discordant. 5. Extraction of the unambiguously mapped reads from the Bowtie2 output files.
6. Building tables of coordinates and merging coordinates into peaks. 7. Anti-chimeric filter 2: Removing coordinates which have a restriction site located within 50 bp towards the flanking region of the insertion.
8. Matching the obtained coordinates with the coordinates of the known Alu and L1 present in hg19 and databases of polymorphic retroelements (dbRIP and PRED [1,2]) by the Galaxy tool "Join".
The analysis of L1 libraries reads consisted of the following steps: 1. Extraction of reads which contained an L1 fragment.
4. Each of the mate-paired L1 libraries' files was split into 3. Reads which had a fragment of an informative genomic sequence in mate 1 (a fragment was considered informative if represented a non-LINE, non-polyA sequence and was at least 25 nt long) were extracted to the first pair of files (type 1 reads). Reads which had no informative fragment in mate 1, but had a stretch of at least 4 thymine nucleotides at the 3'-end of the mate 2 (which could represent the 3'-end nucleotides of the LINE polyA-tail) were extracted to the second pair of files (type 2 reads). Reads which had neither an informative fragment in mate 1 nor a LINE polyA-tail at the 3'-end of the mate 2 were excluded from further analysis. Type 1 reads were further processed as follows: the LINE fragment and the adjacent polyA tail were trimmed from the 5'-end of the mate 1. Pair-end mapping to UCSC hg19 reference genome was performed by Bowtie2. Settings different from default were: -p 8 -X 600 -k 2 --no-mixed --no-discordant. Type 2 reads were further processed as follows: the LINE polyA tail was trimmed from the 3'-end of the mate 2. Mate 2 single-end was mapped to UCSC hg19 by

Statistical data analysis
For the analysis of Alu and L1 distributions in different brain areas, we employed an overdispersion test using the binomial distribution to analyze whether all samples are equal or not. Poisson test was used to compare the distribution of Alu and L1 in the dentate gyrus with all other samples combined. Subsequently, we used Poisson tests for pair-wise comparisons between the samples.
For the analysis of the genomic distribution of the somatic L1 and Alu insertions, i.e. in genes, 5 kb regions upstream the genes or in all other regions, we used an overdispersion test, similarly to the analysis described above.
For the analysis of somatic L1 and Alu orientation relative to nearby genes (for those retroelements integrated into introns or 5 kb regions upstream genes only), we used binomial tests to check the null-hypothesis of the co-and counter-oriented Alu-and L1 being equifrequent. To study differences in retroelement orientation relative to nearby genes across the brain regions and myocardium, mean values for co-and counter-oriented L1 or Alu were used and analyzed by an overdispersion test.
For analyzing the randomness of the Alu and L1 distributions in promoters and genes, Monte Carlo simulations of random retroelement distributions throughout the genome were performed 1000 times and compared to the values obtained from sequencing analysis. For each sample the number of random coordinates equal to the number of somatic insertions in the sample was generated using the hg19 genome assembly (excluding telomeric, centromeric and other N-base regions) and intersected with a list of gene and promoter RefSeq coordinates. P-values were then produced by comparing the number of retroelements inserted in genes or promoters to the quantiles of the Monte Carlo distribution. To calculate the power of the analysis, we performed another 1,000 simulations, but now the distribution of Alu and L1 (i.e. the probability for LINE or Alu to be inserted into gene or promoter region) was adjusted to the data obtained from the sequencing results. We then calculated the percentage of the adjusted simulations that were significantly different from the initial 1,000 random simulations. We found that a difference of 15-30 and 30-70 retroelements (depending on the sample size) was required to achieve a power of 80% for promoter and gene regions, respectively, which was well within the range of our observed values. The data are summarized in the Table S2 Validation of the somatic insertions Nested PCR was performed for the validation of the chosen somatic retroelement insertions.