Fig 1.
Analytical workflow for the transcriptome-wide identification of PASR (promoter-associated small RNA) and TASR (terminus-associated small RNA) peaks on protein-coding genes of Arabidopsis.
Four small RNA (sRNA) high-throughput sequencing data sets including GSM707678 (the flowers of wild type plants), GSM707679 (the leaves of wild type plants), GSM707680 (the roots of wild type plants) and GSM707681 (the seedlings of wild type plants) were utilized for this analysis. Each step including the parameters for PASR and TASR identification was provided. TSS: transcription start site. Terminus: transcription terminus. RPM: reads per million. See detailed description in “Materials and Methods”.
Fig 2.
Examples of the chloroplast genes generating PASRs (promoter-associated small RNAs) or TASRs (terminus-associated small RNAs) dominantly in green organ (leaves) of Arabidopsis.
(A) The PASR peak was identified on the sense strand of ATCG00540 (encoding photosynthetic electron transfer A). The x axis measures the genomic positions surrounding the TSS (transcription start site, marked by a black vertical bar) of this gene. The y axis measures the abundance of the small RNAs perfectly mapped onto the genomic region surrounding the TSS, which is also applied to the other figure panels. (B) The PASR peak was identified on both strands of ATCG01120 (chloroplast ribosomal protein S15). The x axis measures the genomic positions surrounding the TSS of this gene. (C) The TASR peak was identified on the sense strand of ATCG00270 (photosystem II reaction center protein D). The x axis measures the genomic positions surrounding the transcription terminus (marked by a black vertical bar) of this gene.
Fig 3.
Sequence characteristics of the PASRs (promoter-associated small RNAs) and the TASRs (terminus-associated small RNAs).
(A) Sequence length distribution of the PASRs identified on the sense strands of the protein-coding genes of Arabidopsis. (B) Sequence length distribution of the PASRs identified on the antisense strands of the protein-coding genes of Arabidopsis. (C) Sequence length distribution of the TASRs identified on the sense strands of the protein-coding genes of Arabidopsis. (D) Sequence length distribution of the TASRs identified on the antisense strands of the protein-coding genes of Arabidopsis. (E) 5’ terminal nucleotide compositions of the PASRs identified on the sense strands of the protein-coding genes of Arabidopsis. (F) 5’ terminal nucleotide compositions of the PASRs identified on the antisense strands of the protein-coding genes of Arabidopsis. (G) 5’ terminal nucleotide compositions of the TASRs identified on the sense strands of the protein-coding genes of Arabidopsis. (H) 5’ terminal nucleotide compositions of the antisense strands of the protein-coding genes of Arabidopsis.
Fig 4.
PASRs (promoter-associated small RNAs) identified on both strands of AT5G48000, and small RNA (sRNA) and double-stranded RNA (dsRNA) high-throughput sequencing (HTS)-based evidences showing potential Argonaute (AGO) loading preference and biogenesis pathways of PASRs.
(A) Total: Initially, four sRNA HTS data sets belonging to GEO (Gene Expression Omnibus; www.ncbi.nlm.nih.gov/geo) accession ID GSE28591 were utilized to identify PASR peak near the TSS (transcription start site, marked by a red vertical bar) of the gene. Refer to Fig 1 for the analytical workflow. (B) AGO: Eight sRNA HTS data sets belonging to GSE28591 were divided into AGO1 data group (GSM707682, GSM707683, GSM707684 and GSM707685) and AGO4 data group (GSM707686, GSM707687, GSM707688 and GSM707689). To analyze the AGO loading preference of the PASRs, an accumulation level-based comparison was performed between the two AGO-associated sRNA HTS data groups. The higher accumulation levels of the PASRs were detected in the AGO4 data group, and were denoted by black lines. (C) RDR-, DCL-dependence: sRNA HTS data sets from different mutants (including dcl and rdr mutants) involved in sRNA biogenesis pathways were recruited for this analysis. Prominently repressed abundances of PASRs observed in specific mutants were denoted by black lines. (D) Pol IV-dependence: sRNA HTS data sets from two mutants (nrpd1a and nrpd1b, and both were denoted by black lines) of RNA polymerase (Pol) IV were used to analyze the dependence of PASR biogenesis on Pol IV. For all the figure panels, the dsRNA sequencing read-covered regions (the positions were provided on the top right) were highlighted in semitransparent red (for sense strand) and green (for antisense strand) background color. For the detailed explanation of the HTS data sets, please refer to “Data sources” within the “Materials and Methods” section.
Fig 5.
TASRs (terminus-associated small RNAs) identified on both strands of AT3G41762, and small RNA (sRNA) and double-stranded RNA (dsRNA) high-throughput sequencing (HTS)-based evidences showing potential Argonaute (AGO) loading preference and biogenesis pathways of TASRs.
(A) Total: Initially, four sRNA HTS data sets belonging to GEO (Gene Expression Omnibus; www.ncbi.nlm.nih.gov/geo) accession ID GSE28591 were utilized to identify TASR peak near the transcription terminus (marked by a red vertical bar) of the gene. Refer to Fig 1 for the analytical workflow. (B) AGO: Eight sRNA HTS data sets belonging to GSE28591 were divided into AGO1 data group (GSM707682, GSM707683, GSM707684 and GSM707685) and AGO4 data group (GSM707686, GSM707687, GSM707688 and GSM707689). To analyze the AGO loading preference of the TASRs, an accumulation level-based comparison was performed between the two AGO-associated sRNA HTS data groups. The higher accumulation levels of the TASRs were detected in the AGO4 data group, and were denoted by black lines. (C) RDR-, DCL-dependence: sRNA HTS data sets from different mutants (including dcl and rdr mutants) involved in sRNA biogenesis pathways were recruited for this analysis. Prominently repressed abundances of TASRs observed in specific mutants were denoted by black lines. (D) Pol IV-dependence: sRNA HTS data sets from two mutants (nrpd1a and nrpd1b, and nrpd1a was denoted by black lines) of RNA polymerase (Pol) IV were used to analyze the dependence of TASR biogenesis on Pol IV. For all the figure panels, the dsRNA sequencing read-covered regions (the positions were provided on the top right) were highlighted in semitransparent red (for sense strand) and green (for antisense strand) background color. For the detailed explanation of the HTS data sets, please refer to “Data sources” within the “Materials and Methods” section.
Fig 6.
Examples of site-specific DNA methylation potentially mediated by PASRs (promoter-associated small RNAs) and TASRs (terminus-associated small RNAs) in Arabidopsis.
(A) Site-specific DNA methylation signals were observed within the genomic region surrounding the TSS (transcription start site; marked by a vertical dashed line) of AT1G53265. Accordingly, abundant small RNAs (i.e. PASRs) were mapped onto this region. (B) Site-specific DNA methylation signals were detected within the genomic region surrounding the transcription terminus of AT5G54700. Accordingly, abundant small RNAs (i.e. TASRs) were mapped onto this region. Arabidopsis epigenome maps (neomorph.salk.edu/epigenome/epigenome.html) [32] were employed for this analysis.
Table 1.
Examples of paired PASR (promoter-associated small RNA) and TASR (terminus-associated small RNA) peaks potentially mediating site-specific DNA methylation, and analysis of their biogenesis and action pathways.
Fig 7.
Proposed biogenesis pathways, action modes of the PASRs (promoter-associated small RNAs) and the TASRs (terminus-associated small RNAs) identified in Arabidopsis.
For a protein-coding gene possessing a bidirectional promoter, the precursors of the PASRs could be transcribed either upstream or downstream of the TSS (transcription start site), which needs experimental investigations. However, the underlying mechanism of the transcription of the TASR precursors could not be deduced. Although several pieces of high-throughput sequencing (HTS) data-based evidences pointed to the dependence of the biogenesis of the PASRs and the TASRs on RNA polymerase (Pol) IV, RDR2 (RNA-dependent RNA polymerase 2), RDR6, DCL2 (Dicer-like 2), DCL3 and DCL4, other key factors implicated in the accumulation of the two small RNA (sRNA) species might have not been uncovered owing to the limited sRNA HTS data utilized in this study. Based on the HTS data from the Argonaute (AGO)-associated sRNAs, many Pol IV-, RDR- and DCL-dependent PASRs and TASRs were preferentially loaded into AGO4 silencing complexes. And, site-specific DNA methylation was observed on the genomic regions well corresponding to the origins of the PASRs and the TASRs. Thus, the AGO4-associated PASRs and TASRs were proposed to mediate cis-methylation. Besides, the AGO1-associated PASRs and TASRs were proposed to mediate target cleavages similar to the plant microRNAs.