Landscape of epigenetically regulated lncRNAs and DNA methylation in smokers with lung adenocarcinoma

In this study, we identified long non-coding RNAs (lncRNAs) associated with DNA methylation in lung adenocarcinoma (LUAD) using clinical and methylation/expression data from 184 qualified LUAD tissue samples and 21 normal lung-tissue samples from The Cancer Genome Atlas (TCGA). We identified 1865 differentially expressed genes that correlated negatively with the methylation profiles of normal lung tissues, never-smoker LUAD tissues and smoker LUAD tissues, while 1079 differentially expressed lncRNAs were identified using the same criteria. These transcripts were integrated using ingenuity pathway analysis to determine significant pathways directly related to cancer, suggesting that lncRNAs play a crucial role in carcinogenesis. When comparing normal lung tissues and smoker LUAD tissues, 86 candidate genes were identified, including six lncRNAs. Of the 43 candidate genes revealed by comparing never-smoker LUAD tissues and smoker LUAD tissues, 13 were also different when compared to normal lung tissues. We then investigated the expression of these genes using the Gene Expression of Normal and Tumor Tissues (GENT) and Methylation and Expression Database of Normal and Tumor Tissues (MENT) databases. We observed an inverse correlation between the expression of 13 genes in normal lung tissues and smoker LUAD tissues, and the expression of five genes between the never-smoker and smoker LUAD tissues. These findings were further validated in clinical specimens using bisulfite sequencing, revealing that AGR2, AURKB, FOXP3, and HMGA1 displayed borderline differences in methylation. Finally, we explored the functional connections between DNA methylation, lncRNAs, and gene expression to identify possible targets that may contribute toward the pathogenesis of cigarette smoking-associated LUAD. Together, our findings suggested that differentially expressed lncRNAs and their target transcripts could serve as potential biomarkers for LUAD.

In this study, we aimed to investigate the relationship between DNA methylation and lncRNA expression in lung adenocarcinoma (LUAD) and thereby elucidate the landscape of lncRNAs associated with DNA methylation-mediated regulation in smokers.

Datasets
Level 3 expression and matched DNA methylation data for LUAD were downloaded from The Cancer Genome Atlas (TCGA) data portal (https://portal.gdc.cancer.gov/) in January 2017. Only patients with available smoking history with their clinical information were included, amounting to 184 LUAD and 21 normal lung tissues with fully characterized expression and matched DNA methylation data assayed using Illumina Infinium Human Methylation 450K.
For the validation cohort, 76 samples were collected from patients with LUAD who had undergone surgery at the Korean University Medical Center between 2010 and 2013 (Seoul, Korea). Samples were fixed and processed according to clinical standard operating procedures. The specimens and data used in this study were provided by Korea University Anam Hospital and approved by the appropriate Institutional Review Board (2014AN0393).

Differential lncRNA expression and DNA methylation
Our analysis strategy is depicted in Fig 1. Differentially expressed genes (DEGs), differentially methylated regions (DMRs), and differentially expressed lncRNAs (DE-lncRNAs) were identified between normal lung, smoker LUAD, and never-smoker LUAD tissues. Ingenuity pathway analysis (IPA) was used to map candidate lncRNAs from the DE-lncRNAs. The matched DEGs and DMRs included those whose change in DNA methylation was inversely correlated with DEG expression (p < 0.05).

Integrated analysis of DE-lncRNAs associated with DMRs
Lists of significant DEGs generated from TCGA data were subjected to IPA using web-based software from Ingenuity Systems1 (Qiagen, Redwood City, CA, USA) to produce a gene interaction network. DE-lncRNAs were subjected to biological process enrichment analyses. Functional enrichment analysis was also performed on these networks to understand the significance of the biological functions and/or disease phenotypes of the genes.

Validation analysis using bisulfite sequencing
Putative genes were validated using bisulfite sequencing in a validation cohort consisting of 76 samples. DNA was quantified using Picogreen (Invitrogen, California, USA) according to the manufacturer's protocol. Briefly, 1 μg of genomic DNA was bisulfite-converted using EZ DNA Methylation according to manufacturer's protocol (Zymo Research, California, USA). The regions of interest were amplified by PCR using a KOD-Multi & EPi (Toyobo, Osaka Japan), purified using QIAquick PCR columns (Qiagen, Venlo, Netherlands), quantified using Picogreen (Invitrogen), and verified using agarose gel electrophoresis. Libraries were prepared using an Illumina TruSeq Nano DNA sample prep kit (Illumina) according to the manufacturer's instructions and then quantified by qPCR using a CFX96 Real-Time System (Biorad, California, USA). After normalization, the prepared library was sequenced using a Miseq system (Illumina) with 300 bp paired-end reads.
Potential sequencing adapters and low-quality bases in the raw reads were trimmed using Skewer [33] and the remaining high-quality reads were mapped to the reference genome using BS-seeker2 software [34] with a 10% mis-mapping rate. To compare the CpG methylation profiles of different sample groups, only the CpG site values were selected and the Kruskal-Wallis test was performed.

Statistical analysis
To identify methylation markers for detecting CS-associated LUAD, we evaluated the distribution of mRNA expression and DNA methylation levels for each CpG site in normal lung, never-smoker LUAD, and smoker LUAD tissues. For candidate DMRs, pairwise comparisons were conducted to identify the genes that best distinguished each group.

Identification of DMRs and DE-lncRNAs
To investigate the DNA methylation patterns in LUAD related to CS history, we analyzed publicly available Human Methylation 450k TCGA data that measured methylation levels in normal lung and LUAD tissues. The data sets used in this study are summarized in Table 1. Three comparisons were made: 1) normal lung vs. smoker LUAD tissues, 2) normal lung vs. neversmoker LUAD tissues, and 3) never-smoker LUAD vs. smoker LUAD, identifying 8,513 DEGs, 24,783 DMRs, and 2,798 DE-lncRNAs (Fig 2A-2C). Among the 2,798 DE-lncRNAs, 1,079 were mapped by IPA (Fig 2C), while 1,865 differentially methylated candidate genes with negative correlation were identified ( Fig 2D) and annotated (S1 Fig

Pathway analysis and epigenetically regulated lncRNAs
A total of 1,865 DMRs were selected as candidate targets for the DE-lncRNAs. To determine the functions of these target genes and their potential network connections, we used IPA to identify the gene networks that may have been affected by these DE-lncRNA target genes ( Fig  3). The top ten significant canonical pathways based on the DMRs and DE-lncRNAs are shown in S2 Fig and S1-S3 Tables. Interactions between the DMRs and DE-lncRNAs were predicted using molecular networks based on the IPA molecular database. The most noticeable functional category between never-smoker and smoker LUAD tissues was the lipopolysaccharide (LPS)/IL-1 mediated inhibition of retinoid X receptors (RXR) function.
A total of 86 candidate genes including six lncRNAs were identified by comparing smoker LUAD and normal tissues. Of the 43 candidate genes identified by comparing never-smoker LUAD and smoker LUAD tissues, 13 also displayed differences when compared to normal tissues. Although the majority of top functional pathways and related molecules overlapped when comparing 1) normal lung vs. smoker LUAD tissues and 2) normal lung vs. never-smoker LUAD tissues, notable differences were observed when comparing smoker LUAD and never-smoker LUAD tissues, including the LPS/IL-1-mediated inhibition of RXR function and nicotine degradation III.

Validation of gene expression profiles using MENT and GENT
First, we investigated the expression and methylation levels of 86 genes in normal lung vs. smoker LUAD tissues and 13 genes in never-smoker LUAD vs. smoker LUAD tissues using the GENT and MENT databases. When comparing the 86 genes in smoker LUAD and normal lung tissues, seven up-regulated and six down-regulated genes were inversely correlated with methylation (Table 4), while five of the 13 genes were inversely correlated with methylation in smoker LUAD compared to never-smoker LUAD tissues ( Table 3).

Discussion
In this study, we integrated DNA methylation, lncRNA expression, and mRNA expression profiles from TCGA, identified biomarker candidates, and validated our findings using public datasets from the GENT and MENT databases as well as an external cohort. Together, our findings contribute toward our understanding of the interplay between lncRNAs and DNA methylation and provide a map of the epigenetic landscape of lung cancer. In addition, this study is the first to reveal the potential role of lncRNAs in CS-associated epigenetic regulation in LUAD.
Based on the findings of previous reports, we expected to find a significant difference in epigenetic alterations between smoker and never-smoker LUAD tissues. Consistently, we found differences in regulatory genes and identified ten lncRNAs: HOTAIR, SYN2, MALAT1, and H19 in smoker LUAD vs. normal lung tissues, CYP4A22-AS1 and Lnc-MUC2-1 in both smoker and never-smoker LUAD vs. normal lung tissues, and RP11-474D1.3, ATP11AUN, ADAM6, and CTC-518B2.9 in smoker vs. never-smoker LUAD tissues. And the main biochemical functions revealed by our analyses were inconsistent. For the differentially expressed transcripts in smoker LUAD tissues, the major enriched pathways were the coagulation system, granulocyte adhesion, and diapedesis, whereas the primary pathways for transcripts in never-smoker LUAD tissues were axonal guidance signaling and atherosclerosis signaling. GENT and MENT analysis in these two tissue types revealed five genes, including ADAM6 and SBSN, that displayed an inverse correlation between gene expression and methylation levels.
Until now, only a small number of lncRNAs have been identified in CS-associated lung cancer, several of which have been suggested as possible diagnostic and prognostic biomarkers. For instance, the novel lncRNA SCAL1 was reported to be overexpressed in lung cancer cell lines as a result of CS-induced oxidative stress [28]. Moreover, other studies have suggested that SCAL1 expression may be regulated by Nuclear Factor Erythroid 2-Related Factor (NRF2) and that it may mediate cytoprotective functions against CS-induced toxicity [28,35,36]. HOTAIR expression is also significantly up-regulated in lung cancer and correlates with metastasis and poor prognosis [37-42], furthermore, Liu et al. found that HOTAIR up-regulation contributes toward CS-induced malignant transformation mediated by STAT3 signaling [27]. Elevated H19 expression has also been detected in lung cancer [43][44][45] and its overexpression has been observed in smokers compared to never-smokers [46]. One in vitro study investigated CS-induced increases in H19 expression and attributed the increase to the monoallelic up-regulation of normally expressed alleles [29]. In addition, high MALAT1 expression has been identified in metastatic lung cancer and was shown to be an independent prognostic indicator of early-stage tumors [47], and further studies have reported MALAT1 to be involved in CS-induced epithelial-mesenchymal transition and malignant transformation via Enhancer of Zeste Homolog 2 (EZH2), a well-known epigenetic regulator [30, [48][49][50]. The majority of previous studies have investigated possible molecular mechanisms and novel biomarkers associated with epigenetic changes using wet laboratory experiments; however, integrated analysis based on bioinformatics methods and prediction may be more efficient for translational research, but such studies are currently lacking. Since a single lncRNA targets numerous transcripts and a single transcript is also regulated by numerous lncRNAs, lncRNAs can induce various functional pathways and have complicated regulatory networks. Consequently, it is difficult to rank candidate lncRNAs during the experimental design and validation processes when exploring the functions of lncRNAs. Considering this complexity, integrating datasets could be an effective and promising approach to infer functional networks and verify potential targets. Indeed, utilizing datasets and developing computational models to predict lncRNA associations and functional annotations are currently emerging fields [51,52].
In this study, we used bioinformatics methods to identify potential targets and their functions that may play critical roles in the control of lung cancer. The most significantly different functional category between never-smoker and smoker LUAD tissues was the LPS/IL-1 mediated inhibition of RXR function. RXRs are retinoid receptors that play a crucial role in regulating the growth and differentiation of normal and tumor cells [53], while retinoids are known for their role as epigenetic modifiers [54]. Su Man et al. previously observed that the effect of RXR gene methylation on prognosis differed significantly between never-smokers and smokers, and suggested that methylation-associated RXR gene down-regulation may play different roles in lung carcinogenesis depending on smoking status [55,56]. To some extent, our findings are consistent with those of this previous study and emphasize the importance of the identified molecules.
Besides the well-known lncRNAs mentioned earlier, we identified other significant DE-lncRNAs in this study, including SYN2, RP11-474D1.3, ATP11AUN, ADAM6, and CTC-518B2.9; however, their molecular mechanisms in CS-induced LUAD remain largely unknown. Since our findings suggest possible associations between these lncRNAs and lung cancer, we believe that their specific functions should be characterized experimentally.
Despite the important findings we have described, this study had several limitations. Firstly, the patients included in TCGA database were mostly white, whereas the samples used for bisulfite sequencing validation were derived from Korean patients. Since genomic mutations Network of epigenetically regulated genes identified using ingenuity pathway analysis. Each network was displayed graphically with genes or gene products as nodes (different shapes represent different functional classes of gene products) and lines indicating the biological relationships between nodes. The molecular network in normal lung vs. smoker LUAD tissues (A), normal lung vs. never-smoker LUAD tissues (B), and never-smoker LUAD vs. smoker LUAD tissues (C).
https://doi.org/10.1371/journal.pone.0247928.g003 such as epigenetic changes can differ between races [57][58][59], these racial disparities may have affected our results. Secondly, the mechanisms of epigenetic regulation by lncRNAs in CSinduced lung cancer development were not confirmed as this can be challenging; however, experimental strategies such as genetically manipulating the lncRNA locus or deleting of the full-length lncRNA locus or its promoter sequence in vivo could provide further functional information. Thirdly, we assumed a negative correlation between DEGs and DMRs when searching for candidate genes since methylation levels are generally negatively correlated with the expression levels of nearby genes [60]. However, when gene expression is tightly regulated Since we excluded the possibility of this effect in this study, more comprehensive algorithms will be required to determine the diversity of crosstalk between methylation, expression, and regulation elements. Lastly, we did not analyze any other environmental factor except for smoking due to a lack of information regarding the occupation or dwelling of the patients   In summary, we identified dysregulated lncRNAs that mediate DNA methylation in CSassociated LUAD using integrated analyses. Although the roles of these lncRNAs in LUAD are currently unclear, our findings suggest that their molecular mechanisms warrant further investigation. Therefore, the continued investigation of the lncRNAs identified in this study will aid the development of guidelines to assess individual risk for lung cancer and its prevention.