Browse Subject Areas

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

snoRNA and piRNA expression levels modified by tobacco use in women with lung adenocarcinoma

  • Natasha Andressa Nogueira Jorge,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Writing – original draft, Writing – review & editing

    Affiliation Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil

  • Gabriel Wajnberg,

    Roles Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil

  • Carlos Gil Ferreira,

    Roles Investigation, Supervision, Writing – original draft, Writing – review & editing

    Affiliation D’or Institute for Reserach and Education, Rio de Janeiro, RJ, Brazil

  • Benilton de Sa Carvalho,

    Roles Formal analysis, Funding acquisition, Methodology, Software, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Department of Statistics, State University of Campinas, Campinas, SP, Brazil

  • Fabio Passetti

    Roles Conceptualization, Data curation, Funding acquisition, Investigation, Methodology, Supervision, Writing – original draft, Writing – review & editing

    Affiliation Laboratory of Functional Genomics and Bioinformatics, Oswaldo Cruz Institute, Fundação Oswaldo Cruz, Rio de Janeiro, RJ, Brazil


snoRNA and piRNA expression levels modified by tobacco use in women with lung adenocarcinoma

  • Natasha Andressa Nogueira Jorge, 
  • Gabriel Wajnberg, 
  • Carlos Gil Ferreira, 
  • Benilton de Sa Carvalho, 
  • Fabio Passetti


Lung cancer is one of the most frequent types of cancer worldwide. Most patients are diagnosed at advanced stage and thus have poor prognosis. Smoking is a risk factor for lung cancer, however most smokers do not develop lung cancer while 20% of women with lung adenocarcinoma are non-smokers. Therefore, it is possible that these two groups present differences besides the smoking status, including differences in their gene expression signature. The altered expression patterns of non-coding RNAs in complex diseases make them potential biomarkers for diagnosis and treatment. We analyzed data from differentially and constitutively expressed PIWI-interacting RNAs and small nucleolar RNAs from publicly available small RNA high-throughput sequencing data in search of an expression pattern of non-coding RNA that could differentiate these two groups. Here, we report two sets of differentially expressed small non-coding RNAs identified in normal and tumoral tissues of women with lung adenocarcinoma, that discriminate between smokers and non-smokers. Our findings may offer new insights on metabolic alterations caused by tobacco and may be used for early diagnosis of lung cancer.


Lung cancer is one of the leading causes of death from cancer in both men and women worldwide [1]. The most common type of lung cancer is non-small-cell lung cancer (NSCLC), accounting for 85% of the cases [2]. Lung cancer is often detected when the disease is at an advanced clinical stage and thus has a poor prognosis [35] and with a high mortality rate.

Smoking is one of the risk factors for this disease [6]. There are many well-known carcinogens in tobacco that bind to DNA and create somatic mutations such as the ones observed in the KRAS gene in lung cancer and in TP53 in several cancer types [7]. However, most smokers do not develop lung cancer while 20% of women with lung adenocarcinoma are non-smokers [8]. Therefore, other factors than smoking status may contribute to the development of lung cancer [9].

On the other hand, cigarette smoke is known to change gene expression in the transcriptome of MSK-Leuk1 cell line [6], epithelial cells [10] and buccal mucosa [11] and has been reported to alter biological pathways related to signal transduction, asthma and cell proliferation [6,12,13]. Thus, many studies have searched for differences in gene expression that may reveal changes in the genetic profile between smokers and non-smokers [6, 12] but none have evaluated the expression patterns of small nucleolar RNA (snoRNA) or PIWI-interacting RNA (piRNA).

piRNA ranges from 26 to 31 nucleotides long and is one of the least investigated classes of sncRNAs, and many aspects of its biogenesis and mechanism are still unknown. The major role of piRNA is to silence transposable elements [14,15] in germline cells (reviewed by [12, 13]), thus acting in stem cell division, apoptosis, epigenetic control of transposons and telomeres, and translational control [16]. However, piRNAs are also expressed in somatic tissues [1719], where they act as gene expression regulators by inducing histone modifications and DNA methylation [20,21]. Importantly, these molecules are altered in several types of cancer [22,19,23,24]. Thus, a better understanding of the expression pattern of these molecules could contribute to lung cancer biology, early detection and survival.

On the other hand, snoRNAs are an abundant sncRNA class that comprises 60 to 300 nucleotides long sncRNA molecules, which associate with the nucleolar enzymes methylase and pseudouridine synthase to form the small ribonucleoprotein responsible for rRNA methylation and pseudouridylation, respectively [25]. However, some snoRNAs do not have an identified target rRNA. Different groups have been reporting 20 to 24 nucleotides long small RNAs derived from the further processing of snoRNAs. These new RNAs seems to act as miRNAs [26] suggesting other functions for these molecules [27]. Several snoRNAs have been found to be altered under hypoxia [28] and oxidative stress conditions [29]. Additionally, SNORA42 is amplified and up-regulated in NSCLC and its levels inversely correlate with survival in NSCLC [30].

One of the most common approaches to study sncRNA is to produce a large-scale profile with techniques like microarray, then validate the findings using strategies such as RT-qPCR [31]. However, these approaches require the use of probes or primers, which means that some sequence fragments must be known a priori, making the identification of truly novel genes a complicated task. Recently, the high-throughput sequencing (HTS) technology was used to study sncRNA. The technology is highly precise [32] and sensitive, allowing for the inference at gene expression level, detection of chromosome rearrangements, mutations, novel transcripts, isoforms, and low expressed genes besides the identification of novel genes [33,34]. For instance, Müller and collaborators [24] used high-throughput sequencing technology to evaluate the coding and non-coding transcriptome of six pancreatic cancer samples. Besides several deregulated miRNAs, the authors found that the snoRNAs HBII-296B and U104 as well as piRNA piR-017061 were differentially expressed in tumors when compared to normal cells. However, the role of piRNAs and snoRNAs in cancer is still unknown and warrants further investigation.

There are many commercially available HTS platforms, each with its specifications, such as throughput, read size, error frequency, cost, and the number of sequenced reads [32]. The vast amount of data produced can be stored in public databases and made available to the global scientific community. Different datasets can be combined, after quality control, normalization, and evaluation of the information regarding each experiment, to improve the detection of weaker signals and generate new knowledge, without the burden of one single research center generating all data. Rung and Brazma [35] performed a survey on how often public data from ArrayExpress were mentioned in published papers in 2011 and found that almost one in four papers used data that were available in that public database to answer new biological questions different from the question raised in the original study that collected the data. For instance, Kröger and colleagues (2016) used several public microarray experiments to identify new altered genetic pathways in blood mononucleolar cells of systematic lupus erythematosus patients [36], while Gonzalez-Porta and colleagues (2012) used public RNA-Seq data from HapMap to evaluate different splicing patterns in Caucasians and Yorubas [37]. Also, Cao and colleagues (2015) [11] used public microarray data to identify more than 300 genes differentially expressed between smokers and non-smokers.

In this article we detected, for the first time, differentially and constitutively expressed piRNAs and snoRNAs in smoker versus non-smoker women with lung adenocarcinoma. Our data were collected from publicly available datasets from samples of patients and may offer new insights in sncRNA biology and on the effects of tobacco use on molecules and cancer biology.

Material and methods

We obtained two datasets of small RNA-sequencing from lung adenocarcinoma samples belonging to female patients from The Cancer Genome Atlas [38] (TCGA) and from the work published by Kim and collaborators [39]. It is important to stress that throughout this study we use the term ‘normal tissue’ to describe and discuss data obtained from the normal tissue adjacent to tumors.

Kim and collaborators [39] used the Illumina Genome analyzer IIx to sequence the small RNAs (ranging from 22 to 30 nucleotides long) present in six matched primary lung adenocarcinoma tumors and normal tissues from never-smoker women [NCBI GEO:GSE37764]. Although the authors only assessed the expression of miRNAs, they mention the possibility of the raw data containing sncRNAs other than microRNAs.

The 36-nucleotide length reads from the data published by Kim and collaborators were investigated for quality and adaptor presence. The adaptors were removed with Cutadapt [40] and only the reads longer than 15 nucleotides were kept. Also, any read with more than 10% of its bases with quality lower than 20 was removed using the software FASTQ Quality Filter from FASTX-Toolkit (Gordon and Hannon, unpublished). The remaining reads were aligned to the human genome using Novoalign (version 3.02.13, returning reads that aligned at a single location in the human reference genome.

The TCGA project is a collaboration between the National Cancer Institute (NCI) and the National Human Genome Research Institute (NHGRI) that generated and made publicly available genomic, transcriptomic and methylomic data from 33 types of cancer. We downloaded (January 2016) paired normal tissue and tumor miRNA-Seq data from samples belonging to the same patient, obtaining a total of 6 paired samples from 6 women (dbgap data access committee Project ID 43224–3).

TCGA data were generated using either the Illumina Genome Analyzer or the Illumina HiSeq. The files downloaded in the BAM file format were converted to FASTQ using BEDtools [41]. Converted files were aligned to the human genome using Novoalign with the same configuration as described above, aiming for the detection of sncRNAs other than miRNA.

The piRNA annotation file was obtained from the piRNAbank database [42], and overlapping annotations that were in the same orientation were grouped into clusters (S1 File). This modified GTF file has 2,049 piRNA clusters and 11,710 piRNAs without any overlap. The annotation file for the snoRNA was obtained from UCSC [43]. The piRNA and snoRNA annotation files were merged into a single file (S2 File). Any piRNA regions that overlapped with snoRNA annotations were considered as snoRNA, and the ones that overlapped with miRNAs annotations were discarded as annotation errors.

The raw counts for each annotated piRNA and snoRNA present in the samples sequenced using the same technology were obtained using BEDtools [41] (S3 File).The identification of the differentially expressed piRNAs and snoRNAs was performed using R version 3.4.0 and following the Bioconductor package EdgeR version 3.18.1 manual [44]. In short, we used the TMM methodology [45] to normalize the count per million (CPM) values obtained from the raw counts, and the negative binomial test to identify the differentially expressed genes. In this step, only small RNAs with at least 50 reads mapped in at least half of the samples, and those with counts greater than 0 in all but two samples were considered. We called differentially expressed all snoRNAs and piRNAs with FDR lower than 0.01 and logarithm of fold change greater than 2 or lower than -2. Aiming to validate the efficacy of our strategy, we first used it to investigate the differentially expressed miRNAs in Kim and collaborators and TCGA data using an annotation file obtained from MiRBase version 20 [46].

According to Eisenberg and Levanon, housekeeping genes are constitutively expressed in all cell types under normal conditions [47]. Therefore, to identify only the snoRNAs and piRNAs altered by tobacco usage and not by tumor status, we performed a data dispersion analysis on the samples from non-smokers and smokers according to the procedures described in [47]. Briefly, we kept only the sncRNAs with more than 1 CPM in all samples, whose expression did not differ more than 2 times from the average log2 normalized counts, and those with standard deviation lower than 1. These sncRNAs were considered putative housekeeping genes.


Methodology validation

Kim et al. [39] pointed out 40 differentially expressed miRNAs in their work. Using our approach, we obtained 23 miRNAs that were differentially expressed between normal tissue and tumor samples belonging to non-smokers. Twenty of these were similar to those found by the original authors. Four other miRNAs found by Kim et al. [39] were also found by us but removed either by the false discovery rate (FDR) or logarithm of fold change filter (S4 File).

The same approach described above was adopted between normal tissue and tumor samples using smokers’ data. We found 23 differentially expressed miRNAs (S5 File). Only 5 miRNAs were found in both analyses. Out of those, 2 were found up-regulated in tumors, hsa-miR-183-5p, hsa-miR-210-3p, while 3 were down-regulated in tumors: hsa-miR-144-5p, hsa-miR-451a, and hsa-miR-30c-2-3p. The comparison between normal tissue samples from non-smokers and smokers revealed very distinct expression profiles with 130 differentially expressed miRNAs (S6 File). Although this distinction was not as clear when tumor samples were compared, we still found 135 miRNAs that were differentially expressed (S7 File).

Comparison between lung tissues of smokers and non-smokers

Aiming to detect differences in the expression levels of piRNA and snoRNA between lung samples belonging to smoker and non-smoker women, we performed two differential expression analysis involving four groups of samples: normal non-smoker (NNonS) versus normal smoker (NS) samples (Fig 1A), and tumor non-smoker (TNonS) versus tumor smoker (TS) samples (Fig 1B).

Fig 1. Differential expression comparisons performed in this study and heatmaps.

A) Non-smoker normal versus Smoker normal samples. B) Non-smoker tumor versus Smoker tumor samples. Heatmap of Log2 of Normalized CPM counts for differentially expressed genes. Data below the blue bar are from non-smokers, and those below the gray bar are from smokers. The figure shows four sets of genes whose expressions are very different according to the smoking status.

First, we analyzed the differentially expressed genes in normal tissue samples from non-smokers as compared to smokers. After applying our FDR and logFC criteria, we detected 49 differentially expressed snoRNA between NNonS and NS samples (S8 File). According to our analysis, 29 snoRNAs are more expressed in NNonS and 20 are down-regulated (Fig 1A). U60 (SNORD60) is the most up-regulated snoRNA (logFC = 6.22) and HBII-420 (SNORD99) is the most down-regulated snoRNA (logFC = -5.25). The magnitudes of fold-change are similar between the up- and–down regulated genes. Table 1 shows the logFC and logCPM of the 10 greatest changes found in this analysis. No piRNA was found differentially expressed between NNoS and NS.

Table 1. Top 10 differentially expressed snoRNAs between normal tissue samples from non-smokers and smokers.

The differential expression analysis of the Non-smokers tumor (TNonS) and Smokers tumor (TS) samples allowed the identification of 55 piRNA or snoRNA that presented altered levels of expression according to smoking status (S9 File and Fig 1B). In this analysis, 34 genes were found up-regulated in TNonS and 21 down-regulated. Again, U60 is the most up-regulated gene (logFC = 5.63), while U30 is the most down-regulated gene (logFC = -6.72). The magnitudes of fold changes are similar between the two groups as well. Table 2 shows the 10 greatest logFC changes. In this analysis, two piRNAs were found up-regulated in TNonS: has-piR-010894-3 (logFC = 3.43) and has-piR-001168-4 (logFC = 2.95).

Table 2. Top 10 differentially expressed snoRNA between tumor samples from smokers and non-smokers.

Discriminative expression profile between groups

Next, we sought to evaluate if the differentially expressed sncRNAs found in both comparisons can distinguish non-smokers from smokers by performing principal component analysis (PCA). This analysis showed distinct groups from non-smokers and smokers normal samples (Fig 2A) and tumor samples (Fig 2B). The greatest distinction was found in the comparison between non-smokers and smokers tumor, where the non-smoker samples showed a similar pattern while the smokers samples are dispersed.

Fig 2. PCA analysis of the differentially expressed piRNAs/snoRNAs.

A) Non-smoker normal (NNonS) versus Smoker normal (NS) samples. B) Non-smoker tumor (TNonS) versus Smoker tumor (TS) samples. The dots represent the normal samples, and stars the tumor samples. Light blue indicates Non-smokers normal samples, dark blue the Non-smokers tumor samples, light gray the Smokers normal samples and dark gray the Smokers tumor samples. Both analysis show distinct expression patterns between non-smokers and smokers.

According to our analysis, U60 (SNORD60) is up-regulated in the NNonS when compared to its expression in the NS and the same gene is also up-regulated in TNonS samples when compared to its expression in TS smokers. The same trend is found in U30 (SNORD30) that is down-regulated in NNonS when compared to its expression in NS and in TNonS when compared to its expression in TS. In fact, more than half of the differentially expressed snoRNAs were detected in both analyses (28 out of 49 normal samples and 28 out of 55 tumor samples) (Table 3). The complete list of differential expressed sncRNAs is provided as supplemental material (S8 and S9 Files). Additional comparisons between Normal and Tumor samples from Non-smokers and Normal and Tumor samples from Smokers are on S10 and S11 Files, respectively.

Table 3. snoRNAs differentially expressed in both normal tissue and tumor comparisons of non-smokers’ and smokers’ samples.

Based on this result, we investigated if the expression pattern of these snoRNAs presented the same trend, regardless of the sample pathological status. In this analysis, only snoRNAs and piRNAs that were differentially expressed in at least one set of samples were considered. According to our results, there was not a single sncRNA that showed opposite expression between the Normal and Tumor analysis (Pearson coefficient = 0.80) (Fig 3). The principal component analysis with all differentially expressed piRNAs or snoRNAs also shows clearly distinct expression differences between non-smokers and smokers (Fig 4). The normal and tumor samples from smokers also show distinct patterns, while the non-smokers samples present the same trend.

Fig 3. Scatterplot of the log2 fold change for the 28 snoRNAs shared between analysis.

Fig 4. PCA analysis of the differentially expressed piRNAs/snoRNAs.

The dots represent the normal samples, and the stars the tumor samples. Light blue indicates Non-smokers normal samples, dark blue the Non-smokers tumor samples, light gray the Smokers normal samples and dark gray the Smokers tumor samples. This PCA shows 3 distinct groups that correspond to Smokers normal samples, Smokers tumor samples and Non-smokers samples.

sncRNA expression dispersion analysis between non-smokers and smokers

We investigated patterns of data dispersion for samples belonging to non-smokers and smokers, regardless of their pathological status. We performed two data dispersion analysis: NNosS and TNonS and NS and TS.

After applying our constitutive expression criteria, we found 179 snoRNA or piRNAs to be constitutively expressed in samples from non-smokers (S12 File) and 33 genes identified as constitutive in samples from smokers (S13 File). The log2 of the normalized CPM of each sample, and the variance and the standard deviation of the 10 snoRNAs that presented the lowest standard deviation in non-smokers and smokers are shown in the Tables 4 and 5, respectfully. All constitutive snoRNAs found in the smokers were also found in the non-smokers samples. However, the lowest variations and standard deviations were found in the non-smokers samples. The only exception is U25, whose expression pattern is more uniform for smokers (Table 6). A total of 11 snoRNAs found to be constitutively expressed in both analyses had expression levels changed in the non-smokers versus smokers comparisons: 2 are up-regulated in non-smokers and 9 are up-regulated in smokers (Boldface type in Table 3).

Table 4. Top 10 constitutively expressed snoRNA in samples from non-smokers.

Table 5. Top 10 constitutively expressed snoRNA in samples from smokers.

Table 6. snoRNAs found constitutively expressed in samples from both smoker and non-smoker patients.


Small noncoding RNAs, such as snoRNAs and piRNAs, are involved in fundamental biological pathways, and have been considered as potential lung cancer biomarkers [25,30,48]. In this study, we compared the expression pattern of piRNAs, one of the least studied sncRNA classes, and snoRNAs, one of the most studied and well-known sncRNA classes, obtained from publicly available datasets from lung adenocarcinoma samples belonging to matched smoker and non-smoker women.

We obtained 5 matched samples from normal and tumor tissues belonging to non-smoker women from the work of Kim and colleagues [39]. Additionally, we also used TCGA data from 6 matched samples from normal tissue and lung adenocarcinoma belonging to smoker women. To validate our methodology, we first evaluated the profile of differentially expressed miRNAs and compared our results with the original publication. We found 20 miRNAs shared with the work of Kim and colleagues [39] while 4 other miRNAs found by the authors were removed by our FDR and logFC filters. For smokers from TCGA, 23 miRNAs were found differentially expressed. Several papers confirm our findings [4953]. Principal component analysis and multidimensional analysis showed clear differences between the normal tissue and tumor samples from smokers and non-smokers (SFile 6 and 7).

In total, we identified 49 differentially expressed snoRNAs or piRNAs between NNonS and NS samples, 20 down-regulated and 29 up-regulated (Fig 1A). A total of 55 snoRNAs or piRNAs with altered expression were also identified between TNonS and TS samples, 34 down-regulated and 21 up-regulated (Fig 1B). The changes in expression profile are also confirmed by principal component analysis and multidimensional analysis (SFile 8 and 9). Twenty-eight snoRNAs were found in both analysis and showed the same expression pattern in both normal tissue and tumor samples (Fig 3), thus further indicating that this alteration in expression profile is related to smoking. One of these is the snoRNA U15A (SNORD15A), which is up-regulated in samples from non-smokers. Curiously, this snoRNA was reported as having an ‘miRNA-like’ function and being capable of silencing the reporter gene [54] as well as being related to the regulation of chromatin structure in fibroblasts [55]. SNORD15A may have other putative roles beyond the modification of ribosomal RNAs and our results suggest that it may be involved in a mechanism in which smoking could alter its cell expression profile.

Out of the 28 sncRNAs altered between non-smokers and smokers, the data dispersion analysis showed that 11 snoRNAs presented a constant expression across normal tissue and tumor samples that belong to individuals in the same smoking status group. Only two of them were up-regulated in the non-smoker samples: U60 (SNORD60) and U63 (SNORD63). U60 has been reported as an attenuator of pulmonary vasoconstriction in rats [56] and as a key factor in the regulation of plasma membrane cholesterol [57], while U63 (SNORD63) is located in a chromosomic region frequently deleted in myelodysplastic syndromes [58]. Thus, we hypothesize that smoking may inhibit the expression of snoRNAs that are important for cell maintenance.

Interestingly, many of the remaining 9 snoRNAs that were found up-regulated in smokers were reported as altered in different cancer types. One example is the HBII-142 (SNORD66), located on a chromosome region frequently amplified in tumors [25]. SNORD66 was already suggested as a biomarker candidate for lung cancer due to its being up-regulated in the plasma of lung cancer patients [59]. For instance, U30 (SNORD30) was reported as up-regulated in pediatric gliomas [60] and seems to correlate with the shorter time to progression in multiple myeloma [61]. Another differentially expressed snoRNA was HBI-100 (SCARNA3), which showed to be up-regulated in breast cancer samples [62]. In peripheral T-cell lymphoma, U59B (SNORD59B) correlated with long-term survival [63]. Taken together, our findings suggest that tobacco use modifies the expression profile of several snoRNAs towards a pattern like that observed in the malignant phenotype.

Many snoRNAs still do not have an identified target and their expression patterns are still unknown. In the work of Lan and collaborators [64], authors assess the expression of HBII-420 (SNORD99) and ACA44 (SNORA44). Both genes are located in introns of the SNHG12 long non-coding RNA, and knocking down the host expression does not alter the snoRNAs expression [64]. Further studies on snoRNAs targets may reveal new biological pathways affected by smoking. These findings can be used to better understand different tobacco-related pathologies and improve their treatment.

We found that U44 was up-regulated in samples from non-smokers and constitutively expressed in samples from non-smokers. Interestingly, this is an snoRNA frequently used for normalization in qRT-PCR experiments [65]. According to our findings, the use of this snoRNA may create bias between non-smokers and smokers. Similar alterations in the expression of this snoRNA were already reported in colorectal and breast cancer [65,66].

Of the 33 snoRNAs found to be constitutively expressed, 16 were not significantly different between smokers and non-smokers, suggesting that they are not modified by tobacco use. Among those, U43 (SNORD43) is frequently used as normalization parameter for miRNA expression profile experiments due to its stable levels, thus reinforcing our findings [67]. More studies on the expression levels of these snoRNAs may reveal other candidates for normalization parameters.

As for piRNAs, it is known that PIWI proteins and piRNAs protect integrity and stability of the genome by regulating transposable elements. However, PIWI proteins have been described to be differently expressed in NSCLC and to be associated with patient survival [6870]. It is speculated that aberrant transposable elements could increase the number of deletions, rearrangements, and duplications that are frequently observed in the genome of cancer cells [71].

Although snoRNAs and piRNAs are receiving more attention from research groups, there are still few reports assessing changes in their expression under different biological conditions and, until the submission of this paper, we could not find any report that has evaluated the expression patterns of this many snoRNAs and piRNAs in smokers’ and non-smokers’ samples.

To our knowledge, in many lung studies searching for genes playing a role in the disease, data between non-smoker and smoker samples were not separated [30,4850,53]. However, here we show that these two groups have distinct gene expression profiles. This is line with data generated elsewhere [12,13]. Therefore, data from these two groups of individuals should be considered separately in future studies to avoid the introduction or errors and confusing factors that may lead to misleading results.

Our results show that smoking modifies the expression of many sncRNAs, thus changing their expression profile towards one that is more like to that reported in different types of cancer. Here we report distinct sets of sncRNAs that can be used to distinguish smokers and non-smokers and should be considered when analyzing data from these two groups. Further studies about sncRNA targets should reveal new affected biological pathways.


The identification of a molecular signature in lung cancer that permit the discrimination of tumors from smokers and non-smokers is still a challenge. In this article, we report several snoRNAs and piRNAs that are differentially expressed in lung adenocarcinoma samples from smoker and from non-smoker women. We believe that these sets of constitutively and differentially expressed snoRNAs and piRNAs can be used in the future to improve the molecular diagnosis and treatment of lung cancer patients. Our findings highlight the importance of studying sncRNAs in cancer biology and their application as potential biomarkers in the era of precision medicine.

Supporting information

S2 File. snoRNA piRNA annotation file in BED format.


S4 File. Non-smoker Normal vs Tumor miRNA analysis.


S5 File. Smoker Normal vs Tumor miRNA analysis.


S6 File. Normal Non-smoker vs Smoker miRNA analysis.


S7 File. Tumor Non-smoker vs Smoker miRNA analysis.


S8 File. Normal Non-smoker vs Smoker snoRNA piRNA analysis.


S9 File. Tumor Non-smoker vs Smoker snoRNA piRNA analysis.


S10 File. Non-smoker Normal vs Tumor snoRNA piRNA analysis.


S11 File. Smoker Normal vs Tumor snoRNA piRNA analysis



The authors acknowledge Plataforma de Bioinformática da Fiocruz RPT04A/RJ and Novocraft support, the University of Pittsburgh, the Christiana Healthcare, the Johns Hopkins, and the ABS–Indiana University–Purdue University Indiana tissue source sites and Canada’s Michael Smith Genome Sciences Centre sequencing. All authors acknowledge Otacilio da Cruz Moreira, Mariana Caldas Waghabi, Teca Calcagno Galvão, Rui Manuel Reis, Renato José da Silva-Oliveira, and Adriana Cruvinel-Carloni for comments and assistance.


  1. 1. Saalberg Y, Wolff M. VOC breath biomarkers in lung cancer. Clin Chim Acta. 2016;459: 5–9. pmid:27221203
  2. 2. Mendes R, Carreira B, Baptista P V., Fernandes AR. Non-small cell lung cancer biomarkers and targeted therapy—two faces of the same coin fostered by nanotechnology. Expert Rev Precis Med Drug Dev. 2016;1: 155–168.
  3. 3. Herbst RS, Heymach J V, Lippman SM. Lung cancer. N Engl J Med. Massachusetts Medical Society; 2008;359: 1367–80. pmid:18815398
  4. 4. Li L, Zhu T, Gao Y-F, Zheng W, Wang C-J, Xiao L, et al. Targeting DNA Damage Response in the Radio(Chemo)therapy of Non-Small Cell Lung Cancer. Int J Mol Sci. Multidisciplinary Digital Publishing Institute; 2016;17: 839. pmid:27258253
  5. 5. Gyoba J, Shan S, Roa W, Bédard ELR. Diagnosing Lung Cancers through Examination of Micro-RNA Biomarkers in Blood, Plasma, Serum and Sputum: A Review and Summary of Current Literature. Int J Mol Sci. 2016;17. 10.3390/ijms17040494
  6. 6. Gumus ZH, Du B, Kacker A, Boyle JO, Bocker JM, Mukherjee P, et al. Effects of Tobacco Smoke on Gene Expression and Cellular Pathways in a Cellular Model of Oral Leukoplakia. Cancer Prev Res. 2008;1: 100–111. pmid:19138943
  7. 7. Bialous SA, Sarna L. Lung Cancer and Tobacco. Nurs Clin North Am. 2017;52: 53–63. pmid:28189166
  8. 8. Vavalà T, Mariniello A, Reale ML, Novello S. Gender differences in lung cancer. Ital J Gender-Specific Med. 2016;2: 99–109.
  9. 9. Huang J-Y, Jian Z-H, Nfor ON, Ku W-Y, Ko P-C, Lung C-C, et al. The effects of pulmonary diseases on histologic types of lung cancer in both sexes: a population-based study in Taiwan. BMC Cancer. 2015;15: 834. pmid:26526071
  10. 10. Spira A, Beane J, Shah V, Liu G, Schembri F, Yang X, et al. Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc Natl Acad Sci. 2004;101: 10143–10148. pmid:15210990
  11. 11. Cao C, Chen J, Lyu C, Yu J, Zhao W, Wang Y, et al. Bioinformatics Analysis of the Effects of Tobacco Smoke on Gene Expression. te Pas MFW, organizador. PLoS One. 2015;10: e0143377. pmid:26629988
  12. 12. Wang J, Li MD. Common and Unique Biological Pathways Associated with Smoking Initiation/Progression, Nicotine Dependence, and Smoking Cessation. Neuropsychopharmacology. 2010;35: 702–719. pmid:19890259
  13. 13. Minicã CC, Mbarek H, Pool R, Dolan C V, Boomsma DI, Vink JM. Pathways to smoking behaviours: biological insights from the Tobacco and Genetics Consortium meta-analysis. Mol Psychiatry. 2017;22: 82–88. pmid:27021816
  14. 14. Le Thomas A, Tóth K, Aravin AA, Ghildiyal M, Zamore P, Malone C, et al. To be or not to be a piRNA: genomic origin and processing of piRNAs. Genome Biol. BioMed Central; 2014;15: 204. pmid:24467990
  15. 15. Siomi MC, Sato K, Pezic D, Aravin AA. PIWI-interacting small RNAs: the vanguard of genome defence. Nat Rev Mol Cell Biol. Nature Publishing Group, a division of Macmillan Publishers Limited. All Rights Reserved.; 2011;12: 246–58. pmid:21427766
  16. 16. Klattenhoff C, Theurkauf W. Biogenesis and germline functions of piRNAs. Development. 2008;135: 3–9. pmid:18032451
  17. 17. Yan Z, Hu HYH, Jiang X, Maierhofer V, Neb E, He L, et al. Widespread expression of piRNA-like molecules in somatic tissues. Nucleic Acids Res. 2011;39: 6596–607. pmid:21546553
  18. 18. Yang Q, Hua J, Wang L, Xu B, Zhang H, Ye N, et al. MicroRNA and piRNA profiles in normal human testis detected by next generation sequencing. PLoS One. 2013;8: e66809. pmid:23826142
  19. 19. Fu A, Jacobs DI, Hoffman AE, Zheng T, Zhu Y. PIWI-interacting RNA 021285 is involved in breast tumorigenesis possibly by remodeling the cancer epigenome. Carcinogenesis. 2015;36: 1094–102. pmid:26210741
  20. 20. Rajasethupathy P, Antonov I, Sheridan R, Frey S, Sander C, Tuschl T, et al. A role for neuronal piRNAs in the epigenetic control of memory-related synaptic plasticity. Cell. 2012;149: 693–707. pmid:22541438
  21. 21. Ross RJ, Weiner MM, Lin H. PIWI proteins and PIWI-interacting RNAs in the soma. Nature. 2014;505: 353–9. pmid:24429634
  22. 22. Busch J, Ralla B, Jung M, Wotschofsky Z, Trujillo-Arribas E, Schwabe P, et al. Piwi-interacting RNAs as novel prognostic markers in clear cell renal cell carcinomas. J Exp Clin Cancer Res. BioMed Central; 2015;34: 61. pmid:26071182
  23. 23. Cheng J, Guo J-M, Xiao B-X, Miao Y, Jiang Z, Zhou H, et al. piRNA, the new non-coding RNA, is aberrantly expressed in human cancer cells. Clin Chim Acta. Elsevier B.V.; 2011;412: 1621–5. pmid:21616063
  24. 24. Müller S, Raulefs S, Bruns P, Afonso-Grunz F, Plötner A, Thermann R, et al. Next-generation sequencing reveals novel differentially regulated mRNAs, lncRNAs, miRNAs, sdRNAs and a piRNA in pancreatic cancer. Mol Cancer. BioMed Central; 2015;14: 94. pmid:25910082
  25. 25. Thorenoor N, Slaby O. Small nucleolar RNAs functioning and potential roles in cancer. Tumor Biol. 2015;36: 41–53. pmid:25420907
  26. 26. Taft RJ, Glazov EA, Lassmann T, Hayashizaki Y, Carninci P, Mattick JS. Small RNAs derived from snoRNAs. RNA. 2009;15: 1233–1240. pmid:19474147
  27. 27. Stepanov GA, Filippova JA, Komissarov AB, Kuligina E V, Richter VA, Semenov D V. Regulatory role of small nucleolar RNAs in human diseases. Biomed Res Int. 2015;2015: 206849. pmid:26060813
  28. 28. Liu Z, Yang G, Zhao T, Cao G, Xiong L, Xia W, et al. Small ncRNA expression and regulation under hypoxia in neural progenitor cells. Cell Mol Neurobiol. 2011;31: 1–5. pmid:20886369
  29. 29. Michel CI, Holley CL, Scruggs BS, Sidhu R, Brookheart RT, Listenberger LL, et al. Small nucleolar RNAs U32a, U33, and U35a are critical mediators of metabolic stress. Cell Metab. 2011;14: 33–44. pmid:21723502
  30. 30. Mei Y-P, Liao J-P, Shen J, Yu L, Liu B-L, Liu L, et al. Small nucleolar RNA 42 acts as an oncogene in lung tumorigenesis. Oncogene. 2012;31: 2794–804. pmid:21986946
  31. 31. Tainsky M. Genomic and proteomic biomarkers for cancer: a multitude of opportunities. … Biophys Acta (BBA)-Reviews Cancer. 2009;1796: 176–193.
  32. 32. Zhou L, Li X, Liu Q, Zhao F, Wu J. Small RNA transcriptome investigation based on next-generation sequencing technology. J Genet Genomics. 2011;38: 505–513. pmid:22133681
  33. 33. Meyerson M, Gabriel S, Getz G. Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet. Nature Publishing Group; 2010;11: 685–696. pmid:20847746
  34. 34. Jorge NAN, Ferreira CG, Passetti F. Bioinformatics of Cancer ncRNA in High Throughput Sequencing: Present State and Challenges. Front Genet. 2012;3: 287. pmid:23251139
  35. 35. Rung J, Brazma A. Reuse of public genome-wide gene expression data. Nat Rev Genet. 2012;14: 89–99. pmid:23269463
  36. 36. Kröger W, Mapiye D, Entfellner J-BD, Tiffin N. A meta-analysis of public microarray data identifies gene regulatory pathways deregulated in peripheral blood mononuclear cells from individuals with Systemic Lupus Erythematosus compared to those without. BMC Med Genomics. 2016;9: 66. pmid:27846842
  37. 37. Gonzalez-Porta M, Calvo M, Sammeth M, Guigo R. Estimation of alternative splicing variability in human populations. Genome Res. 2012;22: 528–538. pmid:22113879
  38. 38. The Cancer Genome Atlas [Internet]. Recuperado:
  39. 39. Kim SCS, Jung YY, Park JJ, Cho S, Seo C, Kim JJJ, et al. A high-dimensional, deep-sequencing study of lung adenocarcinoma in female never-smokers. PLoS One. 2013;8: e55596. pmid:23405175
  40. 40. Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal. 201117: 10–12.
  41. 41. Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26: 841–2. pmid:20110278
  42. 42. Sai Lakshmi S, Agrawal S. piRNABank: a web resource on classified and clustered Piwi-interacting RNAs. Nucleic Acids Res. 2008;36: D173–7. pmid:17881367
  43. 43. Rosenbloom KR, Armstrong J, Barber GP, Casper J, Clawson H, Diekhans M, et al. The UCSC Genome Browser database: 2015 update. Nucleic Acids Res. 2015;43: D670–81. pmid:25428374
  44. 44. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26: 139–40. pmid:19910308
  45. 45. Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNA-seq data. Genome Biol. 2010;11: R25. pmid:20196867
  46. 46. Kozomara A, Griffiths-Jones S. miRBase: integrating microRNA annotation and deep-sequencing data. Nucleic Acids Res. 2011;39: D152–7. pmid:21037258
  47. 47. Eisenberg E, Levanon EY. Human housekeeping genes, revisited. Trends Genet. 2013;29: 569–574. pmid:23810203
  48. 48. Zheng D, Zhang J, Ni J, Luo J, Wang J, Tang L, et al. Small nucleolar RNA 78 promotes the tumorigenesis in non-small cell lung cancer. J Exp Clin Cancer Res. 2015;34: 49. pmid:25975345
  49. 49. Zhu W, Zhou K, Zha Y, Chen D, He J, Ma H, et al. Diagnostic Value of Serum miR-182, miR-183, miR-210, and miR-126 Levels in Patients with Early-Stage Non-Small Cell Lung Cancer. Zheng SG, organizador. PLoS One. 2016;11: e0153046. pmid:27093275
  50. 50. Chen S, Li P, Li J, Wang Y, Du Y, Chen X, et al. MiR-144 Inhibits Proliferation and Induces Apoptosis and Autophagy in Lung Cancer Cells by Targeting TIGAR. Cell Physiol Biochem. 2015;35: 997–1007. pmid:25660220
  51. 51. Iwaya T, Yokobori T, Nishida N, Kogo R, Sudo T, Tanaka F, et al. Downregulation of miR-144 is associated with colorectal cancer progression via activation of mTOR signaling pathway. Carcinogenesis. 2012;33: 2391–2397. pmid:22983984
  52. 52. Pan X, Wang R, Wang Z-X. The Potential Role of miR-451 in Cancer Diagnosis, Prognosis, and Therapy. Mol Cancer Ther. 2013;12: 1153–1162. pmid:23814177
  53. 53. Xia Y, Chen Q, Zhong Z, Xu C, Wu C, Liu B, et al. Down-Regulation of MiR-30c Promotes the Invasion of Non-Small Cell Lung Cancer by Targeting MTA1. Cell Physiol Biochem. 2013;32: 476–485. pmid:23988701
  54. 54. Brameier M, Herwig A, Reinhardt R, Walter L, Gruber J. Human box C/D snoRNAs with miRNA like functions: expanding the range of regulatory RNAs. Nucleic Acids Res. 2011;39: 675–686. pmid:20846955
  55. 55. Dupuis-Sandoval F, Poirier M, Scott MS. The emerging landscape of small nucleolar RNAs in cell biology. Wiley Interdiscip Rev RNA. John Wiley & Sons, Inc.; 2015;6: 381–397. pmid:25879954
  56. 56. Burghuber OC, Strife R, Zirolli J, Mathias MM, Murphy RC, Reeves JT, et al. Hydrogen peroxide induced pulmonary vasoconstriction in isolated rat lungs is attenuated by U60,257, a leucotriene synthesis blocker. Wien Klin Wochenschr. 1986;98: 117–9. pmid:3518246
  57. 57. Brandis KA, Gale S, Jinn S, Langmade SJ, Dudley-Rucker N, Jiang H, et al. Box C/D Small Nucleolar RNA (snoRNA) U60 Regulates Intracellular Cholesterol Trafficking. J Biol Chem. 2013;288: 35703–35713. pmid:24174535
  58. 58. Graubert TA, Payton MA, Shao J, Walgren RA, Monahan RS, Frater JL, et al. Integrated Genomic Analysis Implicates Haploinsufficiency of Multiple Chromosome 5q31.2 Genes in De Novo Myelodysplastic Syndromes Pathogenesis. Schrijver I, organizador. PLoS One. 2009;4: e4583. pmid:19240791
  59. 59. Liao J, Yu L, Mei Y, Guarnera M, Shen J, Li R, et al. Small nucleolar RNA signatures as biomarkers for non-small-cell lung cancer. Mol Cancer. 2010;9: 198. pmid:20663213
  60. 60. Jha P, Agrawal R, Pathak P, Kumar A, Purkait S, Mallik S, et al. Genome-wide small noncoding RNA profiling of pediatric high-grade gliomas reveals deregulation of several miRNAs, identifies downregulation of snoRNA cluster HBII-52 and delineates H3F3A and TP53 mutant-specific miRNAs and snoRNAs. Int J Cancer. 2015;137: 2343–2353. pmid:25994230
  61. 61. Lopez-Corral L, Mateos M V., Corchete LA, Sarasquete ME, de la Rubia J, de Arriba F, et al. Genomic analysis of high-risk smoldering multiple myeloma. Haematologica. 2012;97: 1439–1443. pmid:22331267
  62. 62. Liang F, Qu H, Lin Q, Yang Y, Ruan X, Zhang B, et al. Molecular biomarkers screened by next-generation RNA sequencing for non-sentinel lymph node status prediction in breast cancer patients with metastatic sentinel lymph nodes. World J Surg Oncol. 2015;13: 258. pmid:26311227
  63. 63. Valleron W, Ysebaert L, Berquet L, Fataccioli V, Quelen C, Martin A, et al. Small nucleolar RNA expression profiling identifies potential prognostic markers in peripheral T-cell lymphoma. Blood. 2012;120: 3997–4005. pmid:22990019
  64. 64. Lan T, Ma W, Hong Z, Wu L, Chen X, Yuan Y. Long non-coding RNA small nucleolar RNA host gene 12 (SNHG12) promotes tumorigenesis and metastasis by targeting miR-199a/b-5p in hepatocellular carcinoma. J Exp Clin Cancer Res. 2017;36: 11. pmid:28073380
  65. 65. Krell J, Frampton AE, Mirnezami R, Harding V, De Giorgio A, Roca Alonso L, et al. Growth Arrest-Specific Transcript 5 Associated snoRNA Levels Are Related to p53 Expression and DNA Damage in Colorectal Cancer. Calogero RA, organizador. PLoS One. 2014;9: e98561. pmid:24926850
  66. 66. Gee HE, Buffa FM, Camps C, Ramachandran A, Leek R, Taylor M, et al. The small-nucleolar RNAs commonly used for microRNA normalisation correlate with tumour pathology and prognosis. Br J Cancer. 2011;104: 1168–77. pmid:21407217
  67. 67. Patnaik SK, Kannisto E, Knudsen S, Yendamuri S. Evaluation of MicroRNA Expression Profiles That May Predict Recurrence of Localized Stage I Non-Small Cell Lung Cancer after Surgical Resection. Cancer Res. 2010;70: 36–45. pmid:20028859
  68. 68. Qu X, Liu J, Zhong X, Li X, Zhang Q. PIWIL2 promotes progression of non-small cell lung cancer by inducing CDK2 and Cyclin A expression. J Transl Med. 2015;13: 301. pmid:26373553
  69. 69. Navarro A, Tejero R, Viñolas N, Cordeiro A, Marrades RM, Fuster D, et al. The significance of PIWI family expression in human lung embryogenesis and non-small cell lung cancer. Oncotarget. 2015;6: 31544–56. pmid:25742785
  70. 70. Moisés J, Navarro A, Tejero R, Viñolas N, Cordeiro A, Marrades RM, et al. PIWI proteins as prognostic markers in non small cell lung cancer. Eur Respir J. 2015;46.
  71. 71. Qu X, Liu J, Zhong X, Li X, Zhang Q, Siegel R, et al. PIWIL2 promotes progression of non-small cell lung cancer by inducing CDK2 and Cyclin A expression. J Transl Med. BioMed Central; 2015;13: 301. pmid:26373553