Figure 1.
DNA feature distributions in the promoters of lncRNA genes and protein-coding genes.
DNA feature distributions in a sliding window of 100 bp with a step of 50 bp in the promoters of protein-coding and lncRNAs. Blue line corresponds to promoters of protein-coding genes; red line corresponds to lncRNAs gene promoters. Figure 1a–d shows distribution of the feature in a sliding window of 100 bp with a step of 50 bp, resulting in 39 windows on the plot. Figure 1e–f show the percentage of promoters where features were found. Transparent regions correspond to 5–95% bootstrap confidence interval of the statistics. WC: word commonality, PALIN: palindromes, CGI: CpG Islands, RE: repetitive elements, all types of repeats except “simple repeats”, “low complexity regions” and “satellite repeats”. The enrichment score was calculated using right-sided exact Fisher's test (Table S3).
Figure 2.
Distribution of histone modification marks in the GM12878 cell line across lncRNA and protein-coding gene promoters.
Figure demonstrates fraction of all promoters covered by chromatin a particular mark. Blue line corresponds to promoters of protein-coding genes; red line corresponds to lncRNA gene promoters. Transparent regions correspond to 5–95% bootstrap confidence interval of the statistics.
Figure 3.
Performance of the prediction model.
Quality of the models based on the complete feature set and several combinations of features. RE: repetitive elements, PALIN: palindromes, SKEW: A/T and C/G skews, CGI: CpG Islands, TFBS: transcription factor binding sites, WC: word commonality, CS: chromatin states, k-mer: mono-, di-,tri-nucleotide frequencies, COMBINE: combination of all types of features for complete promoter set (CPS).
Table 1.
Summary of the results for separation of promoters of protein-coding and lncRNA gene promoters using different combinations of features.
Table 2.
Summary of the results for separation of promoters from protein-coding and lncRNA genes having similar expression pattern in different cell lines.