Investigation of epigenetic regulatory networks associated with autism spectrum disorder (ASD) by integrated global LINE-1 methylation and gene expression profiling analyses

Background The exact cause and mechanisms underlying the pathobiology of autism spectrum disorder (ASD) remain unclear. Dysregulation of long interspersed element-1 (LINE-1) has been reported in the brains of ASD-like mutant mice and ASD brain tissues. However, the role and methylation of LINE-1 in individuals with ASD remain unclear. In this study, we aimed to investigate whether LINE-1 insertion is associated with differentially expressed genes (DEGs) and to assess LINE-1 methylation in ASD. Methods To identify DEGs associated with LINE-1 in ASD, we reanalyzed previously published transcriptome profiles and overlapped them with the list of LINE-1-containing genes from the TranspoGene database. An Ingenuity Pathway Analysis (IPA) of DEGs associated with LINE-1 insertion was conducted. DNA methylation of LINE-1 was assessed via combined bisulfite restriction analysis (COBRA) of lymphoblastoid cell lines from ASD individuals and unaffected individuals, and the methylation levels were correlated with the expression levels of LINE-1 and two LINE-1-inserted DEGs, C1orf27 and ARMC8. Results We found that LINE-1 insertion was significantly associated with DEGs in ASD. The IPA showed that LINE-1-inserted DEGs were associated with ASD-related mechanisms, including sex hormone receptor signaling and axon guidance signaling. Moreover, we observed that the LINE-1 methylation level was significantly reduced in lymphoblastoid cell lines from ASD individuals with severe language impairment and was inversely correlated with the transcript level. The methylation level of LINE-1 was also correlated with the expression of the LINE-1-inserted DEG C1orf27 but not ARMC8. Conclusions In ASD individuals with severe language impairment, LINE-1 methylation was reduced and correlated with the expression levels of LINE-1 and the LINE-1-inserted DEG C1orf27. Our findings highlight the association of LINE-1 with DEGs in ASD blood samples and warrant further investigation. The molecular mechanisms of LINE-1 and the effects of its methylation in ASD pathobiology deserve further study.


Methods
To identify DEGs associated with LINE-1 in ASD, we reanalyzed previously published transcriptome profiles and overlapped them with the list of LINE-1-containing genes from the TranspoGene database. An Ingenuity Pathway Analysis (IPA) of DEGs associated with LINE-1 insertion was conducted. DNA methylation of LINE-1 was assessed via combined bisulfite restriction analysis (COBRA) of lymphoblastoid cell lines from ASD individuals and unaffected individuals, and the methylation levels were correlated with the expression levels of LINE-1 and two LINE-1-inserted DEGs, C1orf27 and ARMC8. PLOS

Introduction
Autism spectrum disorder (ASD) is a group of neurodevelopmental disorders that are characterized by behavioral impairments, including deficits in social interaction and communication, and by restricted interests and repetitive behaviors. The Centers for Disease Control and Prevention (CDC) estimated that the prevalence of ASD is approximately 1 in 68 children in the United States and is likely to increase in the future [1]. Accumulating evidence indicates that ASD is caused by multiple genetic risk factors [2,3], but abnormalities in the coding region of the genome are estimated to account for only 10-20% of all ASD individuals [4]. Other mechanisms, including epigenetic mechanisms, transcriptome profiles, and environmental factors, are therefore thought to contribute to the etiology and susceptibility to ASD. Epigenetic mechanisms can induce changes in gene expression patterns while the genomic DNA sequence remains unaltered. These changes can be inherited and may be modulated by environmental factors, including diet, chemical exposures, and lifestyle [5]. DNA methylation is a common and important mechanism of epigenetic modification in mammalian genomes that can regulate gene expression. DNA methylation is the process of moving the methyl group from S-adenosyl-methionine (SAM) to cytosine by DNA methyltransferases (DNMT), usually at the fifth position of the cytosine residue in CpG dinucleotides, which are commonly found in promoter regions. Several studies have investigated DNA methylation profiles in ASD using tissues from various sources, including lymphoblastoid cell lines (LCLs) [6], peripheral blood [7], and post-mortem brain tissues [8,9]. Ladd et al. (2014) and Nardone et al. (2014) identified differentially methylated regions and CpG sites in multiple brain regions of ASD and found that several differentially methylated genes were associated with ASD-related biological pathways [8,9]. In addition, Nguyen et al. (2010) conducted a largescale methylation profiling analysis of LCLs derived from monozygotic twins as well as sibling pairs discordant for a diagnosis of autism using CpG island microarrays [6]. Interestingly, they found that several genes, including RORA and BCL2, which are involved in biological processes associated with ASD, were differentially methylated in LCLs derived from ASD subjects. Moreover, they also found that RORA and BCL2 proteins were decreased in brain tissues of ASD individuals. These findings suggested that molecular changes observed in the peripheral blood or blood-derived cells may reflect some pathobiological conditions in the brain, further supporting the use of peripheral tissues as a surrogate to identify molecular marker candidates for ASD. However, these studies have focused mainly on the CpG islands of protein-coding regions, while DNA methylation in non-coding regions of the genome, which are thought to be involved in gene regulatory processes, has yet to be investigated.
Long interspersed element-1 (LINE-1) is a transposable element belonging to a group of non-long terminal repeat (non-LTR) retrotransposons. LINE-1 is an autonomous mobile element that comprises approximately 17% of the human genome where it remains active [10]. The full intact LINE-1 sequence is approximately 6,000 bp long and contains a 5' untranslated region (UTR) with an internal promoter, two non-overlapping open reading frames (ORF1 and ORF2), and a 3' UTR with a poly (A) tail [see review in [11]]. More than 500,000 copies of LINE-1 are present in the human genome [12] and among these, more than 10,000 LINE-1 copies are longer than 4.5 kb. These LINE-1 elements consist of a 5 0 UTR, two open reading frames, and a 3 0 UTR containing a poly (A) signal [13]. Retrotransposition of LINE-1 into a new genomic location may lead to duplication, deletion, or insertion of its nucleotide sequence at the target site, causing genomic instability and alteration of gene expression [14]. Evidence indicates that LINE-1 is involved in several processes, including double-strand break DNA repair [15], exon shuffling [16], gene silencing, and transgenesis processes [17]. Although regulation of LINE-1 transcription and retrotransposition remains unclear, accumulating evidence suggests that epigenetic mechanisms, including DNA methylation and histone modifications, are involved in the retrotransposition of LINE-1 and may impact the expression of target genes [18]. Aporntewan et al. (2011) found that hypomethylation of LINE-1 could repress the transcription of several genes related to cancer [19]. Moreover, altered DNA methylation levels and patterns of CpG residues in LINE-1 promoter regions have been reported in many diseases [20][21][22][23].
Accumulating evidence indicates that LINE-1 plays an important role in neurological functions [24]. Recent studies have demonstrated that the promoter of LINE-1 is transcriptionally activated during neuronal differentiation through the canonical WNT pathway, which is associated with the transition of neuronal stem cells, neurogenesis, and survival of neuronal progenitors [25][26][27]. LINE-1 contains CpG residues in the promoter region that can be regulated by Methyl-CpG binding Protein 2 (MeCP2), mutations of which are found in Rett syndrome, a severe developmental disorder with autistic phenotypes [28]. MeCP2 binds to methylated CpG residues and represses LINE-1 transcription in neural stem cells, and the transcription and retrotransposition of LINE-1 are increased in the absence of MeCP2 [28], suggesting that LINE-1 methylation plays an important role in the regulation of LINE-1 expression. Moreover, neural progenitor cells derived from iPSCs of individuals with Rett syndrome who carry MeCP2 mutations exhibit increased susceptibility to LINE-1 retrotransposition [28]. In addition to Rett syndrome, notably, ASD individuals who carry 15q11-q13 duplication have been reported to present lower levels of LINE-1 methylation in brain tissues [29]. Recently, Shpyleva et al. (2017) reported that LINE-1 is overexpressed and the binding of MeCP2 to LINE-1 sequences is significantly lower in the cerebellum of ASD individuals [30]. Through methylated DNA immunoprecipitation (MeDIP) analysis of cytosine-5 methylation in LINE-1, these authors also found that methylation of the LINE-1 5'UTR, ORF1, and ORF2 tended to be reduced in ASD, although this difference was not statistically significant [30], possibly due to heterogeneity within the ASD population.
To reduce the heterogeneity of ASD samples for transcriptomics analyses, Hu and Steinberg (2009) applied multiple clustering analyses to Autism Diagnostic Interview-Revised (ADI-R) scores for 123 items of nearly 1,954 individuals with ASD [31]. These multivariate clustering analyses showed that the ASD population could be divided into four phenotypic subgroups based on similarity of symptom severity across the 123 selected items. One of the clusters was characterized by severe language deficits, while the other three subgroups could be characterized by mild, intermediate, or savant phenotypes. To test the proof-of-principle that the phenotypic differences could be translated into differences in gene expression, ASD individuals from three subgroups were selected for transcriptome profiling analysis [32]. Interestingly, transcriptome profiling of LCLs derived from ASD individuals in each subgroup and from sex/age-matched unaffected individuals revealed differentially expressed genes unique to each phenotypic subgroup, as well as sets of overlapping differentially expressed genes among the subgroups [32]. More than 4,000 genes were reported to be significantly altered in the ASD subgroup with severe language deficits. The dysregulated genes included RORA, a gene encoding the nuclear receptor and transcription factor protein retinoic acid receptor (RAR)-related orphan receptor alpha, which exhibits the potential to transcriptionally regulate more than 2,500 ASD-associated genes and may play an important role in the sex bias of ASD [6,[32][33][34][35][36]. In addition to transcriptomic profiling, this same phenotype definition strategy has been employed in several genomic studies to identify novel ASD subtype-specific genetic variants and linkage regions as well as potential molecular markers for ASD [37][38][39][40]. These findings strongly suggest that using the phenotypic and/or clinical information of individuals to reduce the heterogeneity of ASD among subjects for 'omics analyses may lead to a better understanding of ASD pathobiology and help identify biomarker candidates or potential therapeutic targets specific to the specific ASD sub-populations. Although LINE-1 methylation has been recently investigated in ASD as mentioned above, whether LINE-1 methylation is dysregulated in peripheral cells or blood-derived cells from all ASD individuals as opposed to specific phenotypic subgroups remains unclear. Moreover, whether LINE-1 insertion is associated with differentially expressed genes reported in ASD is still unknown.
In this study, we sought to determine whether LINE-1 insertion is associated with differentially expressed genes reported in ASD. Moreover, we sought to measure LINE-1 methylation in lymphoblastoid cell lines from individuals with ASD compared to that in sex/age-matched unaffected individuals. To identify differentially expressed genes, we obtained and reanalyzed ASD transcriptome profiles from five previously published studies using data deposited in the NCBI GEO DataSets database. The list of differentially expressed genes from each study was then overlapped with the list of LINE-1-inserted genes from the TranspoGene database. Fisher's exact test was applied to determine the probability of an association between LINE-1inserted genes and the lists of differentially expressed genes. The list of differentially expressed genes with LINE-1 insertion that showed reproducibility in an independent study was then uploaded for Ingenuity Pathway Analysis to predict biological functions and pathways that may be impacted by LINE-1. We subsequently measured the methylation of LINE-1 in lymphoblasts from a combined group of ASD individuals, and also in a phenotypic subgroup of ASD individuals who exhibited severe language deficits as determined by the ADI-R clustering analysis approach, both of which were compared to the LINE-1 methylation state in lymphoblasts of sex/age-matched unaffected individuals. The level of LINE-1 methylation was also correlated with the expression of LINE-1 and the transcript levels of two differentially expressed genes with LINE-1 insertion. The experimental workflow of this study is illustrated in Fig 1.

Collection of transcriptome profile data
To identify differentially expressed genes in ASD, the transcriptome profiles of ASD individuals and unaffected individuals were obtained from the NCBI Gene Expression Omnibus (GEO) DataSets database (http://www.ncbi.nlm.nih.gov/gds) [41,42]. The transcriptomic studies selected for subsequent analyses had to meet the following criteria: i) the number of samples used in transcriptome profiling analysis corresponded to more than 40 individuals; ii) the analyzed samples were peripheral blood cells or cell lines derived from peripheral blood; and iii) transcriptome data and all supplementary information, including series matrix files and related platforms, must have been deposited in NCBI GEO DataSets and were available for reanalysis. Based on these criteria, five transcriptomic studies (i.e., GSE15402, GSE25507, GSE6575, GSE18123, and GSE42133) were selected. The details of each study are shown in Table 1.

Collection of LINE-1-inserted gene lists
The list of all LINE-1-inserted genes in Human Genome 18 (UCSC hg18, NCBI build 36.1) was downloaded from the TranspoGene database (http://transpogene.tau.ac.il) [47]. The list of LINE-1-inserted genes was also divided into different types of LINE-1 insertions (i.e., exonic, exonized, intronic, and promoter types). Multiple LINE-1 elements can insert into a single gene with different insertion types. For the LINE-1 inserted gene lists used in this study, we obtained all genes with at least one instance of one of the four insertion types.
The LINE-1-inserted genes used in our study were predicted by the TranspoGene database which may contain a full-length LINE-1 sequence and/or a conserved region of LINE-1. We obtained all genes that contained LINE-1 insertion in at least one element from the Transpo-Gene database, which covers transposable elements located inside human protein-coding genes. According to Asaf Levy et al. (2008), a LINE-1 sequence was retrieved from Human NCBI 36.1 (hg18) and overlapped with expressed sequence tags (ESTs) or cDNA alignments. Transposable elements with significant overlap were then identified and classified according to alignment location within the gene: proximal promoter, exonized (insertion within an intron that led to exon creation), exonic (insertion into an existing exon) or intronic transposable elements [47].

Identification of differentially expressed genes and their association with LINE-1-inserted genes
To identify genes that are differentially expressed in ASD patient whole blood or blood-derived cells, transcriptome profiling data from each study obtained from NCBI GEO DataSets were analyzed separately using the Multiple Experiment Viewer (MeV) program (microarray software suit; http://mev.tm4.org) [48]. All transcriptome data were filtered with a 70% cut-off filter that removes transcripts for which data were missing in > 30% of samples. The available transcripts were then used for the identification of differentially expressed genes in ASD individuals compared with unaffected sex/age-matched controls using the two-tailed t-test with adjusted Bonferroni multiple test correction.
The list of differentially expressed genes from each study was then overlapped with the list of all human LINE-1-inserted genes, as well as the lists of human genes with different types of LINE-1 insertion. The overlapping genes were subsequently divided into 4 groups (i.e., dysregulated genes with LINE-1 insertion, dysregulated genes without LINE-1 insertion, non-dysregulated genes with LINE-1 insertion, and non-dysregulated genes without LINE-1 insertion) in a 2x2 table. Fisher's exact tests were performed to determine whether LINE-1-inserted genes were significantly enriched in the lists of differentially expressed genes in ASD. Moreover, the differentially expressed genes were classified as downregulated or upregulated and used for association analysis with the human LINE-1-inserted gene lists. These procedures were repeated for each type of human LINE-1 insertion. A Fisher's exact test P-value less than 0.05 after Benjamini Hochberg correction (FDR 0.05) was considered statistically significant. In addition, to further confirm that LINE-1 insertion was significantly associated with ASD-related DEGs, randomly chosen genes that were equal in number to that of DEGs from each study were used as a controls. Hypergeometric distribution analyses were then used to compare the statistical probability of LINE-1 insertion within the DEGs and randomly selected genes from each study. Moreover, the lists of differentially expressed genes with LINE-1 insertion from five transcriptome studies were then overlapped with one another using the Venn Diagram program (http://bioinformatics.psb.ugent.be/webtools/Venn). Differentially expressed genes with LINE-1 insertion reproducibly observed in at least two studies were selected for subsequent biological function and pathway analyses.

Prediction of biological functions and networks associated with LINE-1-inserted differentially expressed genes in ASD
IPA software (Ingenuity System, Redwood City, CA, USA) was employed to identify biological functions, gene regulatory networks, diseases, and canonical pathways that are significantly associated with the dysregulated genes with LINE-1 insertion identified in at least two independent transcriptome studies. The significance of the association for each predicted biological function or canonical pathway was determined with Fisher's exact test (P-value < 0.05).

Phenotypic subgrouping of ASD
A total of 56 lymphoblastoid cell lines (LCLs) derived from peripheral mononuclear cells of ASD subjects (n = 36) and sex-/age-matched unaffected individuals (n = 20) were used for this study. These individuals were previously assigned to several phenotypic subgroups within ASD based on cluster analyses of ADI-R scores, as previously described in detail [31]. The exclusion criteria for this study included cognitive impairment (Raven's scores < 70), the presence of known genetic abnormalities (i.e., fragile X, Rett syndrome, tuberous sclerosis, or chromosome 15q11-q13 duplication), premature birth (< 35 weeks gestation), and diagnosis of comorbid psychiatric disorders (i.e., bipolar disorder, obsessive compulsive disorder, or severe anxiety). Impairment in spoken language was also confirmed based on low standard scores (< 80) on the Peabody Picture Vocabulary Test. These exclusion criteria are expected to reduce the heterogeneity of subjects to study idiopathic ASD. The LCLs used in this study represented individuals from three phenotypic groups based on clustering analysis: severe language impairment (subgroup L); milder symptoms (subgroup M); and savant skills (subgroup S) based on multivariate cluster analyses of ADI-R scores. These LCLs have been previously used in a transcriptome profiling study by   [32], which is one of the studies that we employed for the identification of differentially expressed genes in ASD (GEO accession number GSE15402). The demographic information of the individuals whose LCLs were used in this study is shown in S1 Table. Cell culture All LCLs were obtained from the Autism Genetics Resource Exchange Repository (AGRE, Los Angeles, CA, USA) and cultured according to the protocol of the Rutgers University Cell and DNA Repository, which produces and maintains cells in the AGRE collection as previously described [32]. Briefly, all LCLs were cultured in RPMI 1640 medium supplemented with 15% fetal bovine serum and 1% penicillin/streptomycin under optimum conditions, at 37˚C with 5% carbon dioxide. The cultured cell lines were split 1:2 every three to four days, and cells were harvested for analysis on day three after splitting, when the cultures were in the logarithmic growth phase. After harvesting, all cell lines were stored in RNAlater (Applied Biosystem) at -80˚C to maintain the genetic material until further analysis. Genomic DNA and total RNA were isolated from the LCLs of ASD individuals and unaffected sex/age-matched controls as previously described [49,50]. The concentrations of DNA and RNA were measured using a NanoDrop 1000 spectrophotometer (Thermo Scientific, USA).

Combined bisulfite restriction analysis (COBRA) of LINE-1 methylation
The methylation level of LINE-1 was measured using COBRA as described in previous reports [21,22,51,52]. Briefly, genomic DNA was treated with sodium bisulfite using the EZ DNA Methylation-Gold™ Kit (Zymo Research, Irving, CA, USA) according to the manufacturer's protocol. The bisulfite-converted DNA was then employed as a template in PCR using Hot-StarTaq DNA polymerase (QIAGEN, USA), the forward primer (5'-CCGTAAGGGGTTAGGG AGTTTTT-3'), and the reverse primer (5'-RTAAAACCCTCCRAACCAAATATAAA-3'), which are specific to the regulatory region of LINE-1 (based on GenBank: M80343), at a denaturing temperature of 95˚C, annealing temperature of 50˚C, and extension temperature of 72˚C. After amplification, the LINE-1 amplicons (160 bp) were digested at 65˚C overnight in NEB buffer 3 with the TaqI and TasI restriction enzymes (Thermo Scientific, USA), which can specifically cut at methylated cytosines and unmethylated cytosines, respectively. The digested LINE-1 amplicons were separated in an 8% non-denaturing polyacrylamide gel. Band intensities were measured and analyzed as described previously [20,49,50]. Based on the DNA methylation patterns of the 2 CpG dinucleotides, the LINE-1 amplicons from COBRA produced 4 bands corresponding to hypermethylated loci (C m C m ), hypomethylated loci (C u C u ), and 2 partially methylated loci (C m C u and C u C m ) in which the sizes of each product were 160 bp, 98 bp, 80 bp and 62 bp, respectively. To determine the accurate percentage of CpG dinucleotides, we calculated the LINE-1 methylation level of each pattern as described in a previous study [20]. Briefly, the band intensity of each product was determined as follows: % intensity of the 160-bp product/160 = A, % intensity of the 98-bp product/98 = B, % intensity of the 80-bp product/80 = C, and % intensity of the 62-63-bp product/62 = D. The percentage of the overall methylation levels of two CpG sites within LINE-1 was then calculated using the following formula: % C m = 100x(C+A)/(C+A+A+B+D), and the LINE-1 methylation levels of each pattern were estimated by following these principles: % C m C u = 100x

Quantitative reverse transcription-PCR analysis
Total RNA was treated with DNase prior to qPCR analysis using RQ1 RNase-Free DNase (Promega) according to the manufacturer's protocol. DNase-treated RNA was then subjected to qRT-PCR analysis. Briefly, 1 μg of DNase-treated RNA was converted to cDNA using 0.5 μg of primer (oligo dT 18 ) and AccuPower 1 RT PreMix (Bioneer) according to the manufacturer's recommended protocol. Real-time PCR analysis was performed with AccuPower 1 2X Green-Star™ qPCR MasterMix (Bioneer) using 1 μl of cDNA according to the manufacturer's protocol. The primer sequences employed in this study were as follows: LINE-1 forward primer, 5'-CACTAGGGAGTGCCAGACA-3' and reverse primer, 5'-GACCGTCACCCCTTTCTT-3'; C1orf27 forward primer, 5'-CAAGCTTCATTGGACAACACA-3' and reverse primer, 5'-AGATACCCGCAGCAAGGAC-3'; and ARMC8 forward primer, 5'-CCTCATTGCCATGC TTGCTG-3' and reverse primer, 5'-CGGAGTTCGTGAGCATGTTTT-3'. The levels of LINE-1 and selected gene transcripts were normalized to the housekeeping gene GAPDH (forward primer, 5'-ATGTTCGTCATGGGTGTGAA-3' and reverse primer, 5'-ACAGTCTTCTG GGTGGCAGT-3'). The qPCR conditions were set as follows: for LINE1 and C1orf27, predenaturation step at 95˚C for 15 minutes, 40 cycles of denaturation at 95˚C for 15 seconds, annealing at 60˚C for 30 seconds, and extension at 72˚C for 30 seconds; for ARMC8, pre-denaturation at 95˚C for 15 minutes, 45 cycles of denaturation at 95˚C for 15 seconds, annealing at 58˚C for 15 seconds, and extension at 72˚C for 30 seconds. A melting curve analysis was performed to confirm product formation. The expression levels of LINE-1 and the other genes were calculated using the 2 -ΔΔ Ct method.

Statistical analyses
The two-tailed t-test with adjusted Bonferroni correction was employed to determine differentially expressed genes in each transcriptomic analysis. Adjusted P-values less than 0.05 were considered statistically significant. The associations between the lists of dysregulated genes in ASD and the lists of human LINE-1-inserted genes were determined using Fisher's exact test with Benjamini-Hochberg correction (FDR = 0.05). Adjusted P-values less than 0.05 were considered statistically significant. Pathway and function analyses were performed with IPA using Fisher's exact test, and P-values less than 0.05 were considered significant. Two-tailed t-tests were used to analyze differences in LINE-1 methylation levels. Benjamini-Hochberg correction (FDR = 0.05) was applied to adjust P-values in multiple ASD phenotypic subgroup comparisons. Correlations between LINE-1 methylation levels and the expression levels of LINE-1, C1orf27, and ARMC8 were assessed using Spearman's correlation and P-values less than 0.05 were considered significant.

Ethics statement
The use of lymphoblastoid cell lines in this study was reviewed by the GWU Office of Human Research and determined to be "Exempt" from full IRB review for human subjects research because all cell lines and phenotypic data used in this study were de-identified with respect to donor, and there were never any direct interactions with any humans whose cells were used in this study.

Genes containing LINE-1 insertion are differentially expressed in whole blood and blood-derived cells of ASD individuals
To determine whether LINE-1 insertion is associated with dysregulated gene expression in peripheral blood and blood-derived cells from ASD individuals, we assessed the associations between the lists of LINE-1-inserted genes and differentially expressed genes in ASD. The transcriptome profiles of ASD individuals and unaffected sex/age-matched controls were obtained from five independent ASD studies ( Table 1) and reanalyzed. The differentially expressed genes in ASD detected in each study were identified. The list of human genes with LINE-1 insertions was obtained from the TranspoGene database and then overlapped with the list of dysregulated genes in ASD from each study. Interestingly, an association analysis using Fisher's exact test revealed that genes containing LINE-1, regardless of the insertion type (i.e., "all insertion"), were overrepresented among dysregulated genes in ASD whole blood or blood cell samples in four of the five transcriptomic studies (P-value < 0.05; Table 2). The lists of differentially expressed genes with LINE-1 insertion consisted of 1,013, 836, 530, and 312 genes from the GSE18123, GSE25507, GSE42133, and GSE6575 studies, respectively. Next, we categorized the dysregulated genes into up-and downregulated genes and subsequently identified the overlap with LINE-1-inserted genes to determine whether the overrepresentation of LINE-1-inserted genes was biased towards up-or downregulated genes in ASD. The results showed that LINE-1-inserted genes were significantly enriched in both upregulated and downregulated genes (Table 2). Moreover, to predict the types of LINE-1 insertion that are associated with differentially expressed genes in ASD, the lists of genes with specific LINE-1 insertion types (i.e., intronic, exonized, exonic, and promoter types) were produced and then overlapped with the dysregulated genes identified from each transcriptomic study. Interestingly, only LINE-1 intronic insertion was found to be strongly associated with dysregulated genes in ASD (Table 2). To further determine whether LINE-1-inserted genes were significantly over-represented within the lists of DEGs from each study, we also performed hypergeometric distribution analyses using randomly chosen genes from each study that were equal in number to the up-and down-regulated DEGs from each study. P-values were then calculated and adjusted using the Benjamini-Hochberg correction method for multiple tests (FDR < 0.05). Interestingly, almost all DEG lists which showed enrichment in known LINE-1-inserted genes had a P-value much lower than that of the list of randomly chosen genes, suggesting that there are significantly more LINE-1 insertions in ASD-associated DEGs than in randomly chosen genes. The results of hypergeometric distribution analysis are shown in S2 and S3 Tables. Table 2. Association analyses between the differentially expressed genes in ASD and the human LINE-1-inserted gene lists. Previous studies have demonstrated that phenotypic subgrouping of individuals with idiopathic ASD allowed for identification of genes associated with specific ASD sub-populations [31,[37][38][39][40]. To determine whether LINE-1-inserted genes are overrepresented in specific phenotypic subgroups of ASD, we reanalyzed the transcriptome dataset GSE15402 by dividing ASD individuals into subgroups based on multivariate cluster analyses of ADI-R scores of ASD individuals as conducted by Hu and Steinberg [31]. This method of subphenotyping of individuals with ASD revealed that the ASD population could be represented by four phenotypic subgroups: i) the L subgroup was characterized by severe language deficits in addition to an overall higher severity in most ADI-R items; ii) the M subgroup, despite the ASD diagnosis, exhibited milder symptoms across various behavioral impairments; iii) the S subgroup possessed a noticeably higher frequency of savant skills; and iv) the Intermediate (I) group exhibited intermediate severity across all domains [31]. Here, we conducted an association analysis of the DEGs identified in transcriptomic analyses of three of the subtypes (L, M, and S) and LINE-1-inserted genes (Table 3). Interestingly, LINE-1-inserted genes were found to be overrepresented only in down-regulated DEGs in the L subgroup with severe language deficits.
Moreover, to determine whether the dysregulated genes containing LINE-1 insertion are shared common among independent ASD studies, overlap analysis of the dysregulated genes containing LINE-1 insertion from each transcriptomic study was performed (Fig 2). We found that as many as 351 LINE-1-inserted genes were differentially expressed in ASD in at least two independent transcriptomic studies. Genes in each section of the Venn diagram are listed in S4 Table.   Table 3

Differentially expressed LINE-1-inserted genes in ASD are associated with neurological functions and disorders
IPA software was employed to predict biological functions, disorders, networks, and pathways significantly associated with LINE-1-inserted genes that are dysregulated in ASD. IPA biological function and disease analyses revealed that differentially expressed genes with LINE-1 insertion (351 genes) were significantly associated with neurological functions and disorders (P-value < 0.05, Table 4). As many as 25 genes were significantly associated with autism or intellectual disability (P-value = 7.05E-06). Interestingly, neurodevelopmental disorders comorbid with ASD, including mental retardation, cognitive impairment, and neuromuscular disease, were also found to be significantly associated with this gene set (P-value < 0.05). Moreover, an IPA canonical pathway analysis of the differentially expressed genes containing LINE-1 insertion revealed a significant association with several biological functions, including ERK/MAPK signaling [53], axonal guidance signaling [54,55], neurotrophin/TRK signaling LINE-1 methylation and gene expression profiling analyses of LCLs from ASD individuals [56], synaptic long-term potentiation, CREB signaling in neurons [see review in [57]], estrogen receptor signaling [33,58,59], androgen signaling [33], PTEN signaling [60], mTOR signaling [61], circadian rhythm signaling [ [62] and see review in [63]], and neuroinflammation signaling [64] (P-value < 0.05, Table 5), all of which have been implicated in ASD. The gene regulatory network analysis also revealed complex interactions between the genes and neurological functions including neuritogenesis, hyperplasia of the cerebral cortex, neurodegeneration, and movement disorder (Fig 3).

Hypomethylation of LINE-1 in ASD with severe language deficits
The results of our association analyses and biological function/network analyses strongly suggested that LINE-1 insertion may be associated with ASD, providing a rationale for further investigation of LINE-1. Recent studies have shown that LINE-1 methylation is dysregulated in several diseases and may serve as a promising biomarker candidate [21,49]; however, methylation of LINE-1 in ASD peripheral cells has not been investigated. In the current study, COBRA assays were performed to measure LINE-1 methylation levels and patterns in LCLs derived from ASD and unaffected sex-/age-matched individuals. The percentage of overall LINE-1 methylation (%C m ) was not significantly altered in ASD individuals compared with that in the controls (Fig 4). Because the association analysis revealed that LINE-1 insertion is significantly associated with decreased gene expression only in LCLs from ASD individuals with severe language deficits, we further classified ASD into phenotypic subgroups for methylation analyses. COBRA analysis showed that the percentage of overall LINE-1 methylation (% C m ) was significantly decreased (33.48% ± 1.92%, P-value = 0.039) in ASD with severe language deficits (Subgroup L), but not in the other two subgroups, compared to that in age/sexmatched unaffected controls (36.37% ± 1.16%, Fig 4).  05E-06  25  ADNP, ADRA1A, ARID1A, CAMTA1, CDC42, CREBBP, CSNK2A1, DYRK1A, ELOVL4, EP300, GNB1, GNB5,  GRIK2, HERC1, NRP2, PTEN, RNF125, SMARCA2, SON, ST3GAL3, SYNE1, TRIO,

Global LINE-1 methylation levels are correlated with the expression of LINE-1 and LINE-1-inserted genes in ASD with severe language deficits
To determine whether the methylation level of LINE-1 is correlated with LINE-1 expression in ASD with severe language deficits, a qRT-PCR analysis of LINE-1 was conducted using 5 ASD and 5 age/sex-matched control LCL samples and an analysis of the correlations between the level of LINE-1 methylation and the expression of LINE-1 was performed. Interestingly, the results showed that LINE-1 expression was modestly reduced in ASD individuals with severe language deficits compared with that in sex/age-matched controls (fold change = 0.148, Pvalue = 0.049). We subsequently correlated the levels of LINE-1 methylation and expression and found that the expression of LINE-1 in ASD with severe language deficits is inversely correlated with the percentage of overall methylation (%C m ) of the LINE-1 promoter. No such correlation was observed in the unaffected controls (Fig 5). Moreover, to determine whether LINE-1 methylation might be associated with differentially expressed genes in ASD with severe language deficits, correlation analysis of global LINE-1 methylation levels and the expression levels of two LINE-1-inserted genes (i.e. C1orf27 LINE-1 methylation and gene expression profiling analyses of LCLs from ASD individuals and ARMC8) was performed. We selected these two representative genes from the list of downregulated genes in ASD with severe language impairment (subgroup L), which was significantly associated with the LINE-1 insertion types "All insertion" and "Intronic insertion" (P-value < 0.05; Table 3). According to the Human Protein Atlas database (https://www. proteinatlas.org), both genes are highly expressed in the human brain. C1orf27, also called ODR4, is enriched in the cerebral cortex. In addition to ASD subgroup L, C1orf27 is also found among the list of overlapping LINE-1-inserted DEGs between the two transcriptomic datasets GSE18123 and GSE42133 (Fig 2 and S4 Table). The function of C1orf27 protein in human is still unclear but is thought to be involved in the trafficking of a diverse set of G-protein coupled receptors, specifically olfactory receptors, allowing for functional expression of the receptors [65,66]. We are interested in this gene because impairments in olfactory function has been associated with many neurological disorders, including ASD (see review in [67]). ARMC8, a gene which is involved in the innate immune system, is highly expressed in the cerebellum, a brain region that has long been implicated in ASD [68]. The role of ARMC8 in the brain is unclear, but a recent study has shown that ARMC8 promotes cell proliferation by upregulating cyclin D1 and MMP7 expression through the canonical Wnt-signaling pathway [69], which is  a pathway that have been associated with ASD (see reviews in [70,71]). Interestingly, similar to the findings regarding LINE-1 expression, the expression of C1orf27 in ASD individuals with language deficits was inversely correlated with the percentage of overall methylation (% C m ), but this was not the case in the matched controls (Fig 6). However, the expression of ARMC8 was not correlated with LINE-1 methylation.

Discussion
DNA methylation is a major mechanism of epigenetic modification that represses LINE-1 retrotransposons in the human genome. To date, DNA methylation in non-coding regions, including LINE-1 elements, which comprise most of the genome, has yet to be investigated thoroughly in the context of ASD. Aberrations in global DNA methylation of LINE-1 may lead to many alterations in the human genome, including gene expression and LINE-1 retrotransposition [14,72,73]. Several studies have demonstrated that LINE-1 is involved in gene expression regulation as LINE-1 retrotransposons contain many transcription binding sites, splicing sites, and polyadenylation signal sequences [74][75][76][77]. If a LINE-1 element integrates into or near host genes, its promoter and other signal sequences can drive host gene expression. Moreover, LINE-1 methylation may initiate heterochromatin formation that can spread into host genes [see review in [78]]. However, the association between LINE-1 retrotransposition and altered gene expression in ASD has not been previously studied.

Enrichment of LINE-1 retrotransposition in differentially expressed genes in ASD
In this study, we first performed an association analysis between dysregulated genes in ASD from five independent transcriptomic studies and LINE-1-inserted genes. We found a significant overrepresentation of genes containing LINE-1 insertion in the lists of differentially expressed genes in whole blood and blood-derived cells (but not LCLs) from individuals with ASD who were subjects of several independent studies (Table 2). Moreover, dysregulated genes showed significant associations with LINE-1 intronic insertion rather than other insertion types. This finding suggests that LINE-1 insertion within an intron of a specific gene may affect gene regulation, which is consistent with a previous study showing that insertion of active LINE-1 into the intron of a host gene could disrupt gene expression in vitro. Moreover, unique methylated introns are frequently found in highly expressed genes [14,79]. We also found that the selected dysregulated genes that were associated with LINE-1 insertion in at least two studies are implicated in autism or comorbid disorders, including intellectual disability, mental retardation, and cognitive impairment (Table 4).
Although as many as 351 DEGs with LINE-1 insertions were found in two or more independent cohorts, it is interesting to note that no DEGs were shared among all five transcriptomic studies (Fig 2). This may be the result of the well-recognized heterogeneity within the ASD population and/or the differences in the sample types used as a model in these studies. Other than the study by Hu et al. [32], the transcriptomic analyses used in our study did not classify ASD individuals into subgroups, or performed a classification using different subtyping strategy. Moreover, there were no clinical phenotypes reported in these studies (specifically ADI-R scores) available for us to perform subgrouping of their respective cohorts. For future studies, classification of ASD individuals into specific subgroups should be performed using a consistent protocol to reduce the heterogeneity of ASD to allow for the overlap analysis between a specific subgroup from multiple studies. For instance, using the subtyping strategy described here (see Table 3), might allow for identification of more DEGs with LINE-1 insertions in individuals with severe language impairment across multiple cohorts.

Pathways associated with differentially expressed genes containing LINE-1 insertion
In addition to disorders, the IPA also revealed interesting canonical pathways associated with the dysregulated genes containing LINE-1 insertion, such as estrogen receptor signaling and axonal guidance signaling. Regarding estrogen receptor signaling, estrogen is a sex hormone that plays important roles not only in the reproductive organs but also in the brain and neural systems that are known to be implicated in the sex bias of ASD. Estrogen acts through the estrogen receptor, which has two principal forms, estrogen receptor alpha (ERα) and estrogen receptor beta (ERβ) [80]. ERα can affect neurotransmitter systems and is an indicator of neuropsychiatric disorders [81]. ERβ is a major estrogen receptor that is expressed in the brain and acts as an intermediary of estrogen in some processes, such as learning behavior, which is impaired in ASD [59]. Crider et al. studied the regulation of ERβ in the middle frontal gyrus of ASD subjects and suggested that the ERβ signaling network may be impaired in subjects with ASD [58]. Altun et al. further reported that both ER mRNA and protein were reduced in the brain tissues and serum of ASD individuals, suggesting that some pathobiological changes in the brain may be reflected in peripheral tissues [82]. Similarly, RORA, which was found to be differentially methylated in LCLs of individuals with ASD, exhibited reduced expression and protein levels in both LCLs and brain tissues of ASD subjects [6], thus supporting the use of peripheral tissues as surrogates for examining biological processes associated with ASD.
Axonal guidance signaling is also highly associated with LINE-1-containing differentially expressed genes. Axon-guidance proteins play an important role in growing axons and neuronal connections during brain development. Alterations of axon-guidance proteins may induce abnormalities in neuronal systems, resulting in neurological disorders [83]. A previous study reported that the expression and protein levels of some axon-guidance receptors, such as roundabout guidance receptor 2 (ROBO2) and plexin A4 (PLXNA4), were significantly decreased in the brains of ASD subjects [54], and loss of function of these receptors affects the production of some neurotransmitters. In addition, interference with axon-guidance receptor activation may result in abnormalities of the corpus callosum, which is involved in problem solving and creativity, and failure of neuronal projections may contribute to neurological disorders [55]. Other canonical pathways that are known to be associated with ASD include ERK/ MAPK signaling [53], neurotrophin/TRK signaling [56], synaptic long-term potentiation, CREB signaling in neurons [see review in [57]], PTEN signaling [60], mTOR signaling [61], circadian rhythm signaling [ [62] and see review in [63]], and neuroinflammation signaling [64].

ASD subtype-dependent association of LINE-1 insertion, methylation, and expression
Interestingly, while LINE-1-containing genes were overrepresented among the differentially expressed genes from the four studies involving peripheral blood cells, LINE-1 insertion was not significantly associated with dysregulated genes in LCLs of ASD individuals (from GSE15402). However, the latter study involving LCLs was performed on subphenotypes of ASD to reduce the heterogeneity inherent within the ASD population [32].
Our recent study investigated the methylation of the repetitive element Alu and found that Alu methylation was not significantly different between ASD and sex-and age-matched control groups when all ASD individuals were combined. However, significantly altered Alu methylation patterns were observed in ASD cases sub-grouped based on ADI-R scores compared with those in matched controls, suggesting that classification of ASD individuals into subgroups based on phenotypes may be beneficial and provide insights into the still unknown etiology and underlying mechanisms of ASD [84]. We therefore reanalyzed the transcriptome profile by subtypes of ASD and discovered that only downregulated genes in the severely language-impaired subgroup (subgroup L) were associated with LINE-1 insertion (Table 3). To address our hypothesis that LINE-1 retrotransposition is involved in altered gene expression in ASD, we selected representative LCL samples from ASD subgroup L for determination of LINE-1 methylation levels and patterns that could disrupt gene regulation. Our findings indicated that compared to the matched unaffected control group, ASD subgroup L showed significantly reduced overall LINE-1 methylation (%C m ), while no differences in overall methylation were found in subgroups M and S or in the combined ASD samples. Moreover, LINE-1 expression was inversely correlated with LINE-1 methylation in subgroup L (Fig 5). We further examined the association of LINE-1 methylation with the expression of two differentially expressed genes in subgroup L. The correlation analysis showed that one of the two selected genes (C1orf27) correlated with LINE-1 methylation levels (%C m ) in LCLs of subgroup L but not in those of sex-and age-matched unaffected controls (Fig 6). These findings suggest that some but not all dysregulated genes containing LINE-1 may be affected by altered LINE-1 methylation, which is dependent on the subphenotype of ASD. Interestingly, a previous study has also demonstrated that the hypomethylation of intragenic LINE-1 repressed gene expression [19]. Although our previous results of LINE-1 methylation levels in ASD subgroup L showed that they were significantly reduced and should result in overexpression of LINE-1 because DNA methylation is responsible for repressing gene expression, our findings demonstrated that the LINE-1 expression level was significantly reduced in ASD subgroup L. The discrepancies between global LINE-1 methylation levels and LINE-1 expression may result from differences in the regulatory mechanisms of LINE-1 expression in ASD individuals and unaffected controls. Our correlation analysis showed that the overall methylation levels of the LINE-1 promoter were correlated with LINE-1 expression only in ASD subgroup L but not in the control group (Fig 5), suggesting that the methylation of these CpG residues may regulate LINE-1 expression only in ASD subgroup L and not in controls. LINE-1 expression in unaffected individuals may be regulated by other transcriptional regulatory mechanisms independent of the methylation of these CpG residues. Alterations of these differential regulatory mechanisms may be involved in the reduction of LINE-1 expression in ASD individuals compared to that in controls regardless of LINE-1 methylation. LINE-1 regulatory mechanisms and the differences in such mechanisms in ASD children and typically developing children deserves further comprehensive investigation. However, Shpyleva et al. (2017) reported that LINE-1 is overexpressed and the binding of MeCP2 to LINE-1 sequences was significantly lower in the ASD cerebellum [30]. Similar to our findings, DNA methylation of the LINE-1 5'UTR, ORF1, and ORF2 tended to be reduced in ASD cerebellum although was not statistically significant, possibly due to the heterogeneity within the ASD population. Interestingly, we found that LINE-1 methylation within 5'UTR showed hypomethylated status only in LCLs of ASD subgroup L but not in all ASD individuals. This finding suggests that classification ASD individuals into subphenotypic groups may provide additional insights into the still unknown etiology of ASD. However, tissue samples used in our study are different from the previous study by Shpyleva et al. (2017). Although our preliminary findings revealed significantly lower LINE-1 methylation in the L subgroup of ASD, further confirmation in a larger cohort of the same clinical subphenotype is needed. In addition, similar studies need to be carried out using brain tissues and/or primary blood cells in order to conclude that LINE-1 methylation is dysregulated and is associated with ASD etiology and/or susceptibility. However, as shown here, the individuals included in any such study should be subphenotyped in order to reduce the heterogeneity for the LINE-1 methylation analyses.

Limitations and future directions
A possible limitation with respect to detecting significant LINE-1 methylation differences between LCLs from individuals with ASD and that from controls is passage number of LCLs. In this study, we cultured all LCLs in the same batch and employed multiple biological replicates for each group to minimize the effects of passage number on changes in the methylation pattern in LCLs. However, all LCLs used in this study were obtained from the Autism Genetic Resource Exchange (AGRE) repository and the exact passage number of each LCL was not available. Given the possibility that cell passage number might affect the DNA methylation [85], future studies should strive to exclude this confounding factor. Moreover, preferential amplification of unmethylated alleles (i.e. PCR-bias) might occur. In this study, the exact same procedure was therefore used for all DNA samples using the same protocol that has been used in several studies for LINE-1 methylation analysis [21,22,51,52] and has been demonstrated that the results were well correlated with pyrosequencing method [86]. Future studies may include data adjustments using correction methods, such as the method proposed by Moskalev et al. (2011), to reduce the bias [87].
Another limitation of this study is that the global DNA methylation analysis of LINE-1 promoters used in this study does not measure methylation of LINE-1 at specific dysregulated genes that contain LINE-1. Thus, it is possible that signals from significant methylation sites were dampened by noise from other non-significant sites in the genome. LINE-1 methylation within dysregulated genes or at specific genomic locations should be further studied. In addition, we analyzed LINE-1 methylation and expression using specific primers for the promoter of LINE-1 elements. Although these primers were designed to target highly conserved regions of intragenic LINE-1 elements containing a 5'UTR, two open reading frames, and 3'UTR rather than intergenic LINE-1 elements [13,19] and these techniques have been used in several recent studies in the context of several diseases/disorders [21,22,51,52], we could not demonstrate that the genome of LCLs or the genes of interest (C1orf27 and ARMC8) indeed contain full-length LINE-1 sequences due to the limitations of COBRA and qRT-PCR analyses of LINE-1 in which primers were generally designed to amplify only short and restricted regions. However, it is well-established that full-length LINE-1 sequences are present in human genomes [88,89], thus likely that LCLs derived from ASD and unaffected individuals also contain full-length LINE-1 sequences from their genomic DNA. Future studies may include whole LINE-1 genomic sequencing to demonstrate that the whole LINE-1 elements are indeed present in the genome and dysregulated in the genes of interest. Moreover, there might be differences in the copy number of LINE-1 elements between individuals within ASD and control groups as reported in a recent study [90]. Our study was therefore focused on conserved LINE-1 regions in the genome, included multiple biological replicates in each group, and used sex-and age-matched controls, all of which can help minimize the effect of such variations. The number of LINE-1 insertions in each individual should be determined in future experiments to assess such variation.

Conclusion
In conclusion, our findings suggest that LINE-1 insertion is associated with dysregulated genes in whole blood and blood-derived cells from individuals with ASD. These genes are highly associated with ASD as well as comorbid disorders and are involved in ASD-related canonical pathways. Moreover, altered global LINE-1 methylation and expression was found in LCLs from a specific subtype of ASD characterized by severe language impairment and may influence dysregulation of some but not all genes within this subtype. The findings from this pilot study provide additional evidence implicating noncoding LINE-1 retrotransposons and epigenetic modification in gene disruption in ASD. Our study will be useful for designing further experiments with different sample types, such as whole blood, somatic tissue, or brain tissue.
Supporting information S1 Table. Demographic information of the LCLs used in this study. (DOC) S2 Table. Hypergeometric distribution analysis of the overlap between the list of LINE-1-inserted genes and the list of DEGs from each transcriptomic dataset compared to the list of randomly selected genes equal in number to the respective list of DEGs. P-values were adjusted using Benjamini-Hochberg's multiple test correction method (FDR < 0.05). The DEG lists with P-values indicating significantly more LINE-1 insertions in the DEG datasets than in the list of randomly selected genes are highlighted in yellow. (DOCX) S3 Table. Hypergeometric distribution analysis of the overlap between the list of LINE-1-inserted genes and the list of DEGs from the transcriptomic dataset GSE15402 phenotypically subgrouped based on ADI-R scores compared to the list of randomly selected genes equal in number to the respective list of DEGs. P-values were adjusted using Benjamini-Hochberg's multiple test correction method (FDR < 0.05). The DEG lists with P-values indicating significantly more LINE-1 insertions in the DEG datasets than in the list of randomly selected genes are highlighted in yellow. L = ASD with severe language impairment; M = mild ASD; S = ASD with savant skills.