Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Transcripts with systematic nucleotide deletion of 1-12 nucleotide in human mitochondrion suggest potential non-canonical transcription

  • Ganesh Warthi ,

    Contributed equally to this work with: Ganesh Warthi, Hervé Seligmann

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Resources, Software, Validation, Visualization, Writing – original draft, Writing – review & editing

    g.warthi6791@gmail.com

    Affiliation Aix-Marseille Université, IRD, VITROME, Institut Hospitalo-Universitaire Méditerranée-Infection, Marseille, France

  • Hervé Seligmann

    Contributed equally to this work with: Ganesh Warthi, Hervé Seligmann

    Roles Conceptualization, Project administration, Supervision, Writing – review & editing

    Affiliations Aix-Marseille Université, IRD, MEPHI, Institut Hospitalo-Universitaire (IHU) Méditerranée Infection, Marseille, France, The National Natural History Collections, The Hebrew University of Jerusalem, Jerusalem, Israel

Abstract

Raw transcriptomic data contain numerous RNA reads whose homology with template DNA doesn’t match canonical transcription. Transcriptome analyses usually ignore such noncanonical RNA reads. Here, analyses search for noncanonical mitochondrial RNAs systematically deleting 1 to 12 nucleotides after each transcribed nucleotide triplet, producing deletion-RNAs (delRNAs). We detected delRNAs in the human whole cell and purified mitochondrial transcriptomes, and in Genbank's human EST database corresponding to systematic deletions of 1 to 12 nucleotides after each transcribed trinucleotide. DelRNAs detected in both transcriptomes mapped along with 55.63% of the EST delRNAs. A bias exists for delRNAs covering identical mitogenomic regions in both transcriptomic and EST datasets. Among 227 delRNAs detected in these 3 datasets, 81.1% and 8.4% of delRNAs were mapped on mitochondrial coding and hypervariable region 2 of dloop. Del-transcription analyses of GenBank's EST database confirm observations from whole cell and purified mitochondrial transcriptomes, eliminating the possibility that detected delRNAs are false positives matches, cytosolic DNA/RNA nuclear contamination or sequencing artefacts. These detected delRNAs are enriched in frameshift-inducing homopolymers and are poor in frameshift-preventing circular code codons (a set of 20 codons which regulate reading frame detection, over- and underrepresented in coding and other frames of genes, respectively) suggesting a motif-based regulation of non-canonical transcription. These findings show that rare non-canonical transcripts exist. Such non canonical del-transcription does increases mitochondrial coding potential and non-coding regulation of intracellular mechanisms, and could explain the dark DNA conundrum.

Introduction

Raw transcriptomic data include RNA reads that do not correspond to canonical transcription of the genome [1]. The DNA template of most known noncanonical RNAs is easily recovered because their sequence usually differs from the template DNA by few single nucleotide edits. Around 80–90% of the human genome is transcribed at some point during development [2,3] and has biochemical functions [4]. Most noncanonical transcripts are noncoding RNAs (ncRNAs). Small ncRNAs are <200bp long and contribute to transcription activation [5], transcription maintenance [6], translation inhibition [7], mRNA degradation [8], gene regulation [9], epigenetic modification [10,11], and RNA polymerase II backtracking [9]. Other unknown roles might exist. These coding and non-coding RNAs in the transcriptome were detected assuming canonical transcription, and transcripts not matching template DNA were ignored in further analyses. Hence this genetic information is lost because it is not considered in the analyses of non-canonical transcription. This lost genetic information could possess answers to many questions related to evolution, genetic diseases and could explain dark DNA conundrum.

Dark DNA are hidden genes in the genome of an organism that are essential for its survival whose translational product has been detected in that organism. The Dark DNA term was first coined in the genome study of sand rat that had 87 missing genes essential for its survival. However, the functional translational product of missing genes was detected in the tissue samples, inferring possible hidden genes in sand rat’s genome [12]. Similarly, 274 genes were also reported missing in a bird’s genomes essential for its survival [13]. It is considered that the high GC content of these hidden genes affects detection by current DNA sequencing technology.

The detection of hidden genes is mainly based on the sequence similarity of RNA sequencing reads by assuming canonical transcription. Similarly, with the recent advancement in sequencing and bioinformatic technologies numerous proteins are assigned as hypothetical genes/protein whose existence is predicted but lack experimental evidences. These hypothetical proteins are predicted assuming canonical transcription followed by translation based on the identification of open reading frames in sequenced genome. However, it is very possible that many of these gene products are not canonical RNA or peptide.

RNA-DNA differences (RDDs) in the human transcriptome [1422] due to post-transcriptional editing [2328] and post-transcriptional hyper edited RNAs [29] explain some noncanonical transcripts. Notably, RDDs deleting some nucleotides in mitochondrial rRNAs recover more stable ancestral rRNA structures [19]. Some noncanonical transcripts result from RNA fusion [30]. In this study we are reporting short non canonical mitochondrial RNAs (called delRNAs) detected in more than one independent sequencing datasets assuming non canonical phenomenon of systematic nucleotide deletion.

Two additional types of non-canonical mitochondrial RNAs have been detected in human whole cell transcriptomes. These RNAs differ from template DNA along systematic rules. Some noncanonical RNAs result from transcription systematically exchanging nucleotides along one among 23 bijective transformations. These produce 'swinger RNA'. Nine symmetric exchanges are of type X↔Y (for example A↔C) [3134] and fourteen asymmetric exchanges of type X→Y→Z→X (for example A→C→G→A) [3437]. Swinger DNA has been reported until now only for one symmetric exchange, A↔T+C↔G, in organelles [38,39] and for eukaryote nucleus-encoded rRNAs [40]. Swinger RNAs were confirmed by two NGS methods (454 and SOLID) in the amoeban giant virus Mimivirus [1]. Some RNAs are chimeric: they partly correspond to regular transcription, and an adjacent stretch is transformed along a non-identical swinger transformation [41]. Peptides matching translation of such chimeric swinger RNAs (chimeric peptides) also exist [42].

Other noncanonical RNAs result from systematic deletions of nucleotide(s) after each transcribed nucleotide triplet (NNNx), where N corresponds to transcribed nucleotides, and ‘x’ is a nucleotide missing in the RNA. Human mitochondrial RNAs with mono- (NNNxNNNxN …) and dinucleotide (NNNxxNNNxxN …) (Fig 1) deletions have been detected, and corresponding peptides were detected in mitochondrial mass spectrometry datasets by different search strategies [43,44]. These RNAs are termed delRNAs. RNAs following the pattern NNNx are indicated delRNA3-1, those along pattern NNNxx are delRNA3-2. The human mitogenome has more than expected inverted palindromes potentially forming secondary structures after each of these transformations [45]. Further analyses show that homopolymer triplets (AAA, CCC, GGG, TTT) that promote transcriptional frameshift [46] are overrepresented in mitogenomic regions covered by detected delRNAs [47]. DelRNAs have less than expected circular code codons, codons presumably regulating transcriptional [47] and ribosomal translation frames [4851].

thumbnail
Fig 1. Principle of systematic deletion transcription.

This principle is used for the construction of different deletion transformed versions of the mitogenome. To construct the delRNA3-1 transformation of the mitogenome, the 4th nucleotide following each transcribed trinucleotide is deleted. Similarly, every 4th and 5th nucleotides are deleted in delRNA3-2 transformations of the mitogenome. The nucleotide(s) highlighted are the nucleotide(s) missing in del-transcribed delRNAs. These principles work for any deletion size window of k (1 to 12) nucleotides. Where k is the number of nucleotides deleted after each transcribed nucleotide triplet.

https://doi.org/10.1371/journal.pone.0217356.g001

Here we use del-transformed versions of the human mitochondrial genome to mimic and detect actual noncanonical RNAs in two transcriptome datasets, one originating from the whole cell and one from purified mitochondrial lines. RNAs aligning with the transformed versions of the mitogenome are considered produced by del-transcription. To study noncanonical delRNAs, we focus on the mitochondrial genome because the entire mitogenome is transcribed i.e coding and non-coding sequences on both mitogenome strands are transcribed [52], and the short mitogenome is compatible with current computational limitations for analyses that consider transformations such as systematic deletions.

Here the principle of systematic deletions after each transcribed triplet is examined beyond deletion of mono- and dinucleotides [43], up to deletions of twelve nucleotides. Systematic deletions can start at different positions on a sequence, defining the transcriptional deletion 'frame', each producing different delRNAs. Numbers of potential deletion frames increase with deletion size (Fig 2), 3+k delRNAs for deletion size k. We also search for potential punctuation motifs signaling del-transcription, according to the previous study on homopolymers and circular code codons in delRNA3-1 and delRNA3-2 [47]. DelRNAs could result from enzyme slippage caused by homopolymers (AAA, CCC, GGG, TTT). However, on the other hand, delRNAs are expected to have less number of codons that usually conserves transcription- and/or translation frame, such as the natural circular code identified in prokaryotes and eukaryotes [4851], a set of 20 comparatively conserved codons (AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC) [53] that enables ribosomal translation frame retrieval [48,54,55] and apparently conserves transcriptional frames [47]. Punctuations by homopolymers and circular code codons have opposite roles, therefore they are expected over-, and under-represented in detected delRNAs, respectively. These patterns have been previously found for delRNA3-1 and delRNA3-2 [47]. We expect similar patterns for k = 3 to k = 12: over- and under-representation of homopolymers and circular code codons, respectively.

thumbnail
Fig 2. Mitogenomic versions of delRNA3-3 and delRNA3-4.

Nucleotides highlighted yellow are deleted during transcription. (A) Possible mitogenomic versions for del-transcription window size six, k = 3 (delRNA3-3.n). (B) Possible mitogenomic versions for del-transcription window size seven, k = 4 (delRNA3-4-n). ‘n’: nucleotides deleted before del-transcription initiation assumed to cover all possibilities (highlighted red).

https://doi.org/10.1371/journal.pone.0217356.g002

Analyses of whole cell transcriptome data [56] are replicated on the transcriptome data extracted from purified mitochondrial lines [57] and on GenBank's human EST database, to test del-transcription reproducibility across independent datasets and sequencing techniques, and verify that results are not due to confounding effects from cytosolic RNAs matching del-transformations of the mitogenome by chance. To test the hypothesis of transcriptional frame retrieval, analyses of the universal circular code codons (X) identified in the genes of bacteria, eukaryotes, plasmids, and viruses [4851] are compared with the circular code codons identified in the protein-coding genes of mitochondria (ACA, ACC, ATA, ATC, CTA, CTC, GAA, GAC, GAT, GCA, GCC, GCT, GGA, GGC, GGT, GTA, GTC, GTT, TTA, TTC) [58]. This candidate mitochondrial circular code includes 13 among 20 codons in common with the universal circular code but differs from the latter because it is not self-complementary, and its circular permutations do not produce circular codes (notated as the "C3" property occurring in the universal circular code). This mitochondrial circular code is derived from a much smaller sample than the universal circular code and hence could result from sampling biases. Nevertheless, if it is relevant to mitochondrial frame detection, the mitochondrial circular code should show stronger negative associations with detected delRNAs than the universal circular code.

Methods

In silico del-transformations of the human mitogenome

The human mitogenome (NC_012920.1) is transformed according to systematic deletions of 1 to 12 nucleotides after each transcribed nucleotide triplet as previously presented for deletion sizes 'k = 1' and 'k = 2' (Fig 1 in [43], Fig 1 in [59], Fig 1 in [60], Fig 2 in [47], Fig 2 in [61]). For example, to construct delRNAs with k = 3 (dRNA3-3) (Fig 2), after each transcribed trinucleotide, the next trinucleotide is deleted/not transcribed. The same principle produces delRNA3-4 (3 nucleotides transcribed followed by 4 deleted nucleotides), delRNA3-5 (3 nucleotides transcribed followed by 5 deleted nucleotides), delRNA3-6 (3 nucleotides transcribed followed by 6 nucleotide deletion) and so on until k = 12.

We produce in silico 3+k del-transformed versions of the human mitogenome, where each version corresponds to a different deletion frame (Fig 2). For example, a del-transformed version of the mitogenome with three transcribed nucleotides followed by deletion of the next three nucleotides (k = 3) is constructed and noted as “delRNA3-3.0” (Fig 2A.i). Similarly, a second mitogenome version follows the same systematic transcription/deletion pattern besides that it initially deletes the first nucleotide of the transformed sequence. This transformation is noted delRNA3-3.1 (Fig 2A.ii). A third mitogenome version excludes the 2 first nucleotides of the genome, then the rest of the genome is assumed transcribed along with the same systematic transcription/deletion pattern, noted delRNA3-3.2 (Fig 2A.iii). A fourth mitogenome version deletes the 3 first nucleotides before initiating the systematic transcription/deletion pattern delRNA3-3.3 (Fig 2A.iv). The fifth mitogenome version excludes the 4 first nucleotides, then applies the above systematic transcription/deletion pattern, noted delRNA3-3.4 (Fig 2A.v). A sixth mitogenome version excludes the 5 first nucleotides, then applies the above systematic transcription/deletion pattern, noted delRNA3-3.5 (Fig 2A.vi). Similarly, Fig 2B shows all possible del-transformed mitogenome versions for delRNA3-4 (i.e. k = 4), producing seven del-transformed versions of the mitogenome. For delRNA3-3, the six del-transformation frames include all possible theoretical delRNA transformations for transcription of the first three nucleotides followed by the deletion of the next nucleotide triplet (k = 3). There are 3+k del-transformed versions of any sequence. In total 114 del-transformed mitogenome versions were created along deletion sizes 1 to 12 (S1 File).

Whole cell transcriptome analyses

Analyses below follow methods previously described for the transcriptome analyses of delRNA3-1 and delRNA3-2 [43]. We analyse seventy-one samples (SRX768406-SRX768476) [56] of human transcriptomic datasets available in the Sequence Read Archive (SRA) of GenBank. Twenty SRA entries were analysed simultaneously by BLASTN detecting RNA reads aligning with in silico constructed delRNA versions of the human mitogenome. We used default alignment criteria for BLASTN searches.

Transcriptome analyses of purified mitochondrial lines

The same BLASTN analysis method was applied to transcriptome data (SRX084350-SRX084355 and SRX087285) extracted from purified mitochondrial lines [57]. BLASTN results of both whole cell and purified mitochondrial transcriptomes were compared to test the detection reproducibility of delRNA coverages.

EST database search

The same 114 del-transformed mitochondrial sequences as used to search SRA data are used to search for ESTs matching delRNAs in GenBank's human EST database. MegaBLAST parameters (word size of 16 and gap cost (Existence: 5, Extension: 2)) were tailored to ensure inclusion of short delRNAs matching ESTs. DelRNAs detected by MegaBLAST with more than 90% identity with input del-transformed mitogenome versions were scrutinized for further analyses.

Since both transcriptome and ESTs database originate from independent studies, delRNAs detected in these datasets were mapped on the mitogenome. In the case that delRNAs are random sequencing artefacts, or random BLAST hits due to nuclear DNA/RNA contamination, overlaps between mitogenome sequences covered by del-RNAs originating from ESTs and SRA datasets should be rare and follow random predictions. Positive bias for overlapping sequences would be considered as evidence for reproducibility in delRNA detection across independent experiments and independent sequencing methods, and strong confirmations that del-transcription exists.

Control sequence

A randomized mitogenomic sequence with the same nucleotide composition and size as the natural human mitogenome was created as a negative control. The randomized mitogenome had no significant sequence similarity with the actual mitogenome. This randomized mitogenome was used to produce 114 del-transformed (k = 1 to 12) randomized mitogenomic versions as explained in Fig 2. These sequences were separately used as a control for BLASTN and MegaBLAST searches and analyses.

However, to eliminate any false positive alignments, delRNAs that aligned in SRA/EST database and also mapped on natural mitogenome sequence (NC_012920.1). delRNAs that mapped with significant identity were removed from further analysis.

Results and discussion

BLASTN detects delRNAs with k = 1 to 12 in the human whole cell and purified mitochondrial transcriptomes

BLASTN yields alignment of 11869 reads (Sheet A in S2 File) with the 114 del-transformed versions of the human mitogenome, as detected within 71 publicly available human whole cell transcriptome datasets (SRX768406-SRX768476). Contig alignment of these reads gave 968 delRNA contigs (Sheet B in S2 File) with mean length of ~36bp, with average identity >90%. Similarly, 114 del-transformed versions of the human mitogenome aligned with 5084 SRA reads (Sheet A in S3 File) in 7 publicly available transcriptome datasets from purified human mitochondrial lines (SRX084350-SRX084355 and SRX087285). The mean length of 1767 detected delRNA contigs is 29.354bp, with mean average identity >93% (Sheet B in S3 File).

DelRNAs in both transcriptomic datasets were mapped on the mitogenome to find numbers of delRNAs covering the same mitogenomic regions. Details about delRNAs detected in the whole cell and purified mitochondrial transcriptome, i.e. the number of reads in BLASTN search for each delRNA, their position in the transformed mitogenome, percentage identity and the length of each delRNA for all deletion sizes are enclosed in S2 and S3 Files. A total of 367 delRNAs in the whole cell transcriptome (S2 File) and 390 delRNAs in the purified mitochondrial transcriptome (S3 File) had more than 2 reads.

Sequencing errors producing RNAs that artificially differ from the original natural sequences typically occur specifically for one specific strand, as cDNA libraries consist of double-stranded DNA produced on the template of natural RNA. DelRNAs could result from such sequencing artefacts. This possibility would be confirmed if delRNAs covering the same genome region overwhelmingly originate from the same DNA strand. On the contrary, if delRNAs originate from both strands, sequencing artefacts are far less likely, confirming that delRNAs are natural phenomena. This method has been previously used to confirm that A→I hyper-edited reads are not sequencing artifacts [29]. Indeed, 75.38% of delRNAs having more than 2 reads originate from both + and—strands (Sheet B in S2 File and Sheet B in S3 File). This result suggests that most delRNAs detected in this study are not sequencing artifacts.

Whole cell vs purified mitochondrial transcriptomes

Surprisingly, coverages of del-transformed mitogenomes by detected delRNAs are greater for transcriptome data extracted from purified mitochondrial lines than those extracted from whole cells for 97 among 114 del-transformations of the human mitogenome (85.1%) (S4 File). This might be an effect of internal cutoffs automatically set by BLASTN considering the size of the analysed dataset (the purified mitochondrial line dataset includes fewer reads than the whole cell data). This could also reflect technical and/or biological effects specific to each experiment. Independently of this, this result shows that a wide majority of detected delRNAs are not false positives due to confounding effects of large populations of cytosolic RNAs. At most, few isolated delRNAs might be false positives due to cytosolic contaminations.

Mitogenome sequences covered by delRNAs detected in both whole cell and purified mitochondrial data are more frequent than expected by chance for 109 among 114 del-transformed versions of the mitogenome (95.6%) (S4 File). Fig 3 gives an overview of number of delRNAs detected in both datasets along with number of overlapping delRNAs in both datasets compared to number of overlaps estimated by chance for each systematic nucleotide deletion (k = 1 to 12). All 12 del-transformations (k = 1 to 12) across both transcriptomes had more overlapping delRNAs than expected by chance (P = 0.000122 according to a one-tailed sign test using the binomial distribution). Overall, 436 delRNAs detected in both transcriptomes were overlapping same mitogenomic region with average length of ~24bp. This overlapping of delRNAs is four times more frequently than expected by chance (S4 File).

thumbnail
Fig 3. Number of delRNAs detected in whole cell and mitochondrial transcriptome for each systematic nucleotide deletions (k).

https://doi.org/10.1371/journal.pone.0217356.g003

Del-RNAs detected for a del-transformed randomized version of the human mitogenome in the mitochondrial transcriptome were also mapped on their respective del-transformed genome. Coverages of delRNAs detected in the purified mitochondrial transcriptome were higher for 11 among 12 deletion sizes (k) than for the randomized mitogenome sequence with the same deletion size. Obtaining this result across 12 del-transformations has P = 0.00158 (one tailed sign test, binomial distribution). Hence delRNA detections are reproducible across independent SRA transcriptome datasets and sequencing methods, and are not confounded by random matches.

MegaBLAST detects delRNAs in EST database

The Genbank ESTs aligned with the 114 del-transformed versions of natural mitogenome and randomized del-transformed mitogenome were mapped on respective del-transformed mitogenomic version. The total mean coverage by delRNAs for each deletion size (i.e k = 1 to 12) is higher for del-transformed versions of the natural mitogenome than del-transformed versions of randomized genome (S1 Fig). Obtaining this result across all 12 deletion sizes (k) has P = 0.000122 (one tailed sign test according to the binomial distribution). Fewer randomized del-transformed mitogenome versions obtained hits with the EST database than the del-transformed versions of the natural mitogenome (one tailed P = 0.0027, Fisher exact test) (Sheet A and B in S5 File). EST-delRNAs having more than 90% identity were further considered for analyses (S6 File). In total, 1395 ESTs were detected for 111 among 114 del-transformed mitogenomic versions in Genbank's human EST database (S6 File). The mean length of these delRNAs is 24 base pairs with average identity of 97.76%. The del-transformed randomized mitogenome sequence matches with 602 ESTs across 98 among 114 del-transformed randomized version (Sheet B in S5 File). The mean length of EST-delRNAs is comparatively smaller than the delRNAs detected in mitochondrial and whole cell transcriptome. This difference in mean length of delRNAs in the EST database could be due to the stringent MegaBLAST search used for delRNAs in Genbank’s human EST database than BLASTN search used for transcriptome search. The length of ESTs aligned with delRNA3-1 was larger than the rest del-transformed versions (k>1) (S6 File). We believe that this difference in size of EST-delRNAs is due to the limitation of BLAST search algorithm to allow alignment of sequences with more than one gap after each trinucleotide, affecting the alignment score.

DelRNAs detected in Genbank's human EST database were mapped with the delRNAs detected in purified mitochondrial transcriptome and whole cell transcriptomic data to test for overlaps between delRNA coverages from these independent experimental datasets and sequencing methods.

DelRNAs in Genbank's human EST database overlapping delRNAs in purified mitochondrial transcriptome.

Fig 4 shows delRNAs detected in the purified mitochondrial transcriptome, the EST database and numbers of delRNAs overlapping the same mitogenomic region and overlaps expected by chance for each systematic nucleotide deletion (k). Among a total of 1395 delRNAs from the EST database (S6 File), 615 (44.02%) EST-delRNAs overlapped with delRNAs detected in the mitochondrial transcriptome. Overall, these overlaps are 8 times more frequent than expected by chance (Fig 4, S7 File). DelRNAs in the EST database were detected in 105 del-transformed mitogenomic versions, among which 72 (63.15%) del-transformed versions had at least one overlapping delRNA with the delRNAs detected in the mitochondrial transcriptome. Among those 72 del-transformed versions, 71 (62.28%) had more delRNA overlaps than expected by chance with average overlap length of 22 nucleotides (S7 File).

thumbnail
Fig 4. Number of delRNAs detected in Genbank’s Human EST database and purified mitochondrial transcriptome for each systematic nucleotide deletions (k).

https://doi.org/10.1371/journal.pone.0217356.g004

DelRNAs in Genbank's human EST database overlapping delRNAs from the whole cell transcriptome dataset.

Fig 5 shows numbers of EST-delRNAs overlapping with delRNAs detected in the whole cell transcriptome database for each systematic nucleotide deletion (k). A total of 388 delRNAs in whole cell transcriptome overlapped 1395 (27.81%) EST-delRNAs with average overlap of 23 nucleotides (S8 File). Overlapping delRNAs were observed in 63 among 114 (55%) del-transformed versions of the mitogenome among which 61 (53.51%) del-transformed mitogenome versions had more overlaps than expected by chance.

thumbnail
Fig 5. Number of delRNAs detected in Genbank's human EST database and whole cell transcriptome transcriptome for each systematic nucleotide deletions (k).

https://doi.org/10.1371/journal.pone.0217356.g005

Fig 6 gives an overall view of the delRNAs detected in all three datasets. Among 1395 delRNAs detected in Genbank's EST database, 777 (55.70%) EST-delRNAs overlap with delRNAs detected in either whole cell and/or mitochondrial transcriptomes. Among 436 delRNAs detected in both transcriptomes, 227 delRNAs overlapped EST-delRNAs (S9 File), confirming that there are real short delRNAs non-canonically transcribed from the human mitogenome. Overlapping delRNAs in all three independent datasets are strong evidence that deletion transcription is a true phenomenon and not a sequencing artefact or contamination. Analyses below also underline the mitogenomic regions that are hotspots for deletion-transcriptions.

thumbnail
Fig 6. Numbers of overlapping delRNAs in three different datasets.

The Venn diagram shows the number of overlapping delRNAs across the three independent datasets and numbers of delRNAs not overlapping with any of the other two datasets. Total numbers of delRNAs detected in the whole cell transcriptome, the mitochondrial transcriptome and the human EST datasets are 968, 1767 and 1395, respectively.

https://doi.org/10.1371/journal.pone.0217356.g006

Association of delRNAs with protein coding genes.

The 227 EST-delRNAs (S9 File) that were also detected in whole cell and purified mitochondrial transcriptome were mapped on the mitochondrial genome (NC_012920.1) to determine if these del-transformation is associated with coding or non-coding region. However, with the increase in number of systematic deletions (k), the coverage of delRNAs will increase. Therefore, it was expected that most of these delRNAs will overlap on both coding and non-coding region making it difficult to link these delRNAs to any particular mitogenomic region. Surprisingly, among 227 EST-delRNAs, 222 (97.8%) mapped on either coding or noncoding region. Fig 7 gives an overview of number of delRNAs mapped on mitogenomic regions. Among 227 EST-delRNAs, 184 (81.05%) mapped on protein coding genes, 17 (7.5%) mapped on rRNA region, 19 (8.4%) on dloop, 2 (0.88%) and 5 delRNAs mapped on more than one mitogenomic regions. Interestingly, all the 19 delRNAs of dloop mapped on the hypervariable region 2 and 3 (HVR2 and HVR3; coverage 3.47% of mitogenome) suggesting possible association of delRNAs with HVR2. Results indicate that del-transformation is rare in conserved tRNA region and rRNA region whereas it is associated with dloop-hypervariable and protein coding regions of mitogenome. Since with increase in systematic deletion size the coverage of delRNAs on natural mitogenome increases. Therefore, we could not find an association among delRNAs and disease-causing mutations. DelRNAs that mapped on dloop hypervariable region had homopolymer frequency 2.85 times than the remaining del-transformed versions (S9 File). Similarly, 184 delRNAs mapping on protein coding genes had an overall homopolymer frequency 1.98 times than remaining del transformed mitogenome.

thumbnail
Fig 7. DelRNAs detected in three independent datasets mapped on coding and dloop hypervariable region.

Mitogenome composition (Protein coding genes: 68.44%, Dloop: 6.7%, rRNA region: 15.17%, tRNA region:9.35%).

https://doi.org/10.1371/journal.pone.0217356.g007

Homopolymers in delRNAs.

Homopolymer nucleotide triplets (AAA, CCC, GGG, TTT) cause frameshifts during DNA replication and transcription [46,62] and ribosomal slippages during translation [46,6366]. Frequencies of these homopolymers in detected delRNAs are compared with their frequencies in the remaining transformed mitogenome not covered by detected delRNAs. Table 1 shows frequencies of homopolymers in detected delRNAs for each del-transformation (k = 1 to k = 12) in the whole cell and purified mitochondrial line transcriptomes. In both datasets, across all del-transformation (k = 1 to k = 12), frequencies of homopolymers in detected delRNAs is higher than homopolymer frequencies in the remaining del-transformed mitogenome (Table 1). Obtaining this result across all 12 del-transformations has P = 0.000122 according to a one-tailed sign test using the binomial distribution. Chi-square tests show that the difference in homopolymer frequencies between regions covered by delRNAs and those not covered by delRNAs is statistically significant at P <0.05 for each del-transformation, for each whole cell and purified mitochondrial transcriptomes. The only exception is for delRNA3-6 for the whole cell transcriptome data (Table 1).

thumbnail
Table 1. Homopolymer frequencies of delRNAs detected in whole cell and purified mitochondrial transcriptome.

https://doi.org/10.1371/journal.pone.0217356.t001

Overall, homopolymer frequencies were 1.5X times higher in detected delRNAs than in the remaining del-transformed mitochondrial sequences for analyses of both transcriptome datasets (Table 1). This shows that polymerase slippage at homopolymers contributes to non-canonical del-transcription. These patterns are incompatible with spurious detections of delRNAs by alignment searches among massive sequence data.

To further strengthen our findings, we scrutinized homopolymer frequencies in overlapping delRNAs detected independently in the transcriptomes of the whole cell and purified mitochondrial cell lines. Among 12 del-transformations (k = 1 to 12), the percentage of homopolymers in overlapping delRNAs is greater than in delRNAs detected only once, either in the whole cell or the purified mitochondrial cell line transcriptomes in 11 among 12 comparisons (Table 1). This result has P = 0.00158 according to a one-tailed sign test using the binomial distribution.

Homopolymer frequencies of delRNAs detected in Genbank’s human EST database unsurprisingly gives similar result. DelRNAs detected for all 12 systematic nucleotide deletion (k) have overall higher homopolymer frequencies (P = 0.000122 according to a one-tailed sign test using the binomial distribution). Similarly, homopolymer frequencies of EST-delRNAs mapping with delRNAs detected in whole cell and mitochondrial transcriptome (S10 File) are higher for all delRNAs from k 1 to 12 with P = 0.000122 as per one-tailed sign tests using the binomial distribution, respectively.

Detection of overlapping delRNAs from two independent transcriptomes and in Genbank’s human EST database in itself supports our hypothesis of del-transcription, confirming result reproducibility. The higher homopolymer frequencies in overlapping delRNAs, as compared to those detected only in one dataset, indicates that high homopolymer densities increase the frequency of del-transcription, and further confirms that del-transcription exists.

Circular code and delRNAs.

The natural circular code (X) is a specific set of 20 codons universally overrepresented in the coding frame of protein-coding genes as compared to their remaining, non-coding frames [48]. As a group, codons of X enable recognizing the coding frame [55,67,68]. Though mechanisms by which this occurs are still unknown, it is believed that stretches of codons belonging to X enable ribosomes to recognize the translation frame [48,6870]. Distributions of triplets belonging to X in the structures formed by ribosomes [70] and tRNAs [69,71] suggest that X motifs in these molecules central to translation play roles in recognizing protein-coding frames. Previous analyses of delRNA3-1 and delRNA3-2 showed that X also regulates the frame of deletion transcriptions [47].

Here frequencies of motifs corresponding to the universal circular code X proposed for the protein-coding genes in prokaryotes and eukaryotes [48,50,51] in detected delRNAs (considering the nucleotide triplets that are not separated by deletions) are lower in 9 among 12 del-transformations than in the rest of the mitogenome not covered by delRNAs for whole cell transcriptomic and purified mitochondrial line transcriptomic data both (Table 2). This tendency for the avoidance of the universal circular code in mitochondrial delRNAs detected in the whole cell and mitochondrial transcriptomes has P = 0.0365, respectively (one-tailed sign tests using the binomial distribution, Table 2) and a combined P = 0.004 using Fisher's method for combining P values [72], based on a chi square distribution with 2xk degrees of freedom, where k is the number of independent P values that are combined.

thumbnail
Table 2. Frequencies of codons belonging to the universal circular code X {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} in delRNAs detected in transcriptomic data.

https://doi.org/10.1371/journal.pone.0217356.t002

We also analysed frequencies of codons belonging to the proposed mitochondrial circular code X0(MIT) [58]. Among 12 del-transformations, 11 del-transformations in the whole cell, and 9 del-transformations in the purified mitochondrial transcriptome (Table 3) had lower frequencies of X0(MIT) in detected delRNAs with P values of 0.00158 and 0.0365, respectively (one-tailed sign test using the binomial distribution for each transcriptome dataset). The combined P values according to Fisher’s method to combine P values yields P = 0.00028.

thumbnail
Table 3. Frequencies of codons belonging to the mitochondrial circular code X0(MIT) in delRNAs and in the remaining deletion-transformed mitogenome.

https://doi.org/10.1371/journal.pone.0217356.t003

Frequencies of codons belonging to the universal circular code were also calculated for delRNAs detected in the human EST database (S11 File). Among 12 del-transformations, 6 del-transformations had higher percentages of universal X than in the remaining del-transformed mitogenome (P = 0.3063 one-tailed sign test using the binomial distribution). Eight among 12 del-transformations had higher percentages of universal X in EST-delRNAs overlapping delRNAs detected in whole cell transcriptome than for the remaining non-overlapping delRNAs detected in only a single database (P = 0.0969). Similarly, percentages of codons belonging to the circular code in EST-delRNAs overlapping delRNAs from the mito-transcriptome is higher for 9 among 12 del-transformed versions (P = 0.0365, one-tailed sign test, binomial distribution). The combined P values according to Fisher’s method to combine P values yields P = 0.0126.

Proposed mitochondrial circular code X0(MIT) are slightly stronger than the universal circular code for delRNAs detected in EST-delRNAs. Among 12 del-transformations, 7 del-transformations in EST database, 8 del-transformations for EST-delRNAs overlapping delRNAs detected in the whole cell transcriptome and 8 del-transformations for EST-delRNAs overlapping delRNAs detected in the mito-transcriptome had higher percentages of X X0(MIT) (S12 File), with P = 0.0969 according to one-tailed sign tests using the binomial distribution. Fisher’s method to combine P values yields combined P = 0.011.

Positive results were stronger for avoidance of the proposed mitochondrial circular code in delRNAs as compared to the universal circular code X identified for prokaryote and eukaryote protein-coding genes (Tables 2 and 3). These results cautiously support that a different circular code exists in mitochondria [58]. We have no explanation for the higher frequency of X (Table 2, Table 3, S11 File, S12 File) in delRNAs detected in the mitochondrial transcriptome for high k (delRNA3-10, delRNA3-11 and delRNA3-12).

General discussion and future prospects

In this study we tested a phenomenon of systematic nucleotide deletion during transcription in three independent datasets. Results confirm the working hypotheses that sometimes, transcription systematically deletes nucleotides. Analyses detect delRNAs corresponding to systematic deletion of 1 to 12 nucleotides after every transcribed tri-nucleotide in the human mitochondrial transcriptome, for three independent transcriptome datasets, two sequenced by NGS methodologies and one by Sanger methodology. Numerous delRNAs were detected in all three datasets, indicating high reproducibility of the human mitochondrial del-transcriptome. DelRNAs detected in this study are around 25–30 bp long. BLASTN search of delRNA3-1 transformed mitogenome aligned with much longer ESTs having a systematic single gap in subject sequence and having continuous alignment at the terminals of the subject sequence (Not shown here). We believe these delRNAs could be part of much longer chimeric RNAs due to the RNA polymerase switching from canonical transcription to non-canonical del-transcription.

Homopolymers AAA, CCC, GGG and TTT, which cause polymerase slippage [46] and transcriptional frameshift [46] have higher frequencies in detected delRNAs than in the rest of the mitogenome. This suggests that these nucleotide triplets signal del-transcription. Their function seems opposite to codons belonging to the circular codes which apparently prevent del-transcription. DelRNAs might result from systematic deletions occurring during transcription, but a second possibility, RNA editing by an elusive mitochondrial spliceosome could also produce delRNAs, resembling observations from eukaryotic cytosols [73]. Circular code codons (X) enable recognizing the coding frame [55,67,68] and transcriptional frame [47]. Higher circular code frequencies in higher del-versions (k>12) of delRNAs might indicate the role of circular codes in post transcriptional editing by the mitochondrial splicesome.

DelRNAs detected in the purified mitochondrial line transcriptome have particularly high homopolymer frequencies (Table 2). Notably, detected delRNAs are poor in codons belonging to the hypothetical mitochondrial circular code, strengthening the still weak evidence indicating the existence of a different circular code in mitochondria as opposed to the universal circular code. This observation strengthens the discussion about mitochondria representing an independent branch of the tree of life [74], perhaps together with giant viruses [75].

Observations here support the hypothesis that mitochondrial genomes have greater coding potential than believed, also supported by the observations of two lncRNAs in human mitochondrial DNA [76]. An association between detected delRNAs (k = 1 and 2) and peptides matching human proteome data [43] suggest translation of delRNAs. We recommend similar mass spectrometry human proteome analyses for peptides translated from delRNAs with k = 3 to 12. Noncoding RNAs associated with ribosomes are translated into peptides [77], suggesting possible dual roles of delRNAs. Further in-depth analyses of delRNAs may help to explain various missing/hidden genes called “Dark DNA”, whose translational products are detected in mass spectrometry data [12].

Supporting information

S1 Fig. Percentage coverage by delRNAs detected in human EST database for del-transformed mitogenomes and del-transformed randomized sequences.

https://doi.org/10.1371/journal.pone.0217356.s001

(TIF)

S1 File. Del-transformed 114 mitogenomic versions assuming k = 1 to 12 systematic nucleotide deletion.

https://doi.org/10.1371/journal.pone.0217356.s002

(TXT)

S2 File. BLASTN search results for 114 del-transformed mitogenome in whole cell transcriptome.

‘n’ is nucleotides skipped before del-transcription starts. For delRNA3-1.n, ‘n’ corresponds to delRNA3-1.0, delRNA3-1.1, delRNA3-1.2, and delRNA3-1.3. 5’ and 3’ are the position of delRNAs on the transformed mitochondrial region detected in the whole cell transcriptome. Sheet2: shows the number of reads detected for each delRNA, percent identical nucleotides in the alignment (Id), the length of each delRNA, and the strand of reads detected.

https://doi.org/10.1371/journal.pone.0217356.s003

(XLSX)

S3 File. BLASTN search results for 114 del-transformed mitogenome in purified mitochondrial transcriptome.

Description same as S2 File.

https://doi.org/10.1371/journal.pone.0217356.s004

(XLSX)

S4 File. Number of delRNAs detected in human whole cell and purified mitochondrial transcriptomes by BLASTN for each del-transformed mitogenomic version.

Columns are: 1. systematic deletion lengths (k = 1 to 12 deleted nucleotides); 2. ‘n’: nucleotides deleted before del-transcription starts; 3–5. Numbers of delRNAs detected with their coverage on del-transformed mitogenome and average lengths for each del-transformation in whole cell transcripome data (SRX768406-SRX768476); 6–8. Numbers of delRNAs detected, coverage on del-transformed mitogenome and their average lengths in purified mitochondrial transcriptome data (SRX084350-SRX084355 and SRX087285); 9 and 10. Number of delRNAs with their average lengths detected in both whole cell and purified mitochondrial transcriptomes and covering the same mitogenomic regions; and 11. Numbers of delRNAs covering the same mitogenomic region expected by chance.

https://doi.org/10.1371/journal.pone.0217356.s005

(XLSX)

S5 File. MegaBLAST search results for 114 del-transformed mitogenome in Genbank’s human EST database.

Sheet2: MegaBLAST results for randomized mitogenomic versions.

https://doi.org/10.1371/journal.pone.0217356.s006

(XLSX)

S6 File. DelRNAs detected in Genbank’s human EST database.

The EST-delRNAs mapping with delRNAs detected in the whole cell transcriptome are highlighted green, and EST-delRNAs mapping with delRNAs detected in the purified mito-transcriptome are highlighted yellow. EST-delRNAs mapping along with delRNAs of both transcriptomes are highlighted blue.

https://doi.org/10.1371/journal.pone.0217356.s007

(XLSX)

S7 File. Number of delRNAs detected in Genbank's human EST database and in the purified mitochondrial transcriptome.

Columns are: 1. systematic deletion lengths (k = 1 to 12 deleted nucleotides); 2. ‘n’: nucleotides deleted before del-transcription starts; 3–5. Numbers of delRNAs detected, coverage on del -transformed mitogenome with their average lengths for each del-transformation in the human EST database; 6–8. Numbers of delRNAs detected, coverage and their average lengths in the purified mitochondrial transcriptome data (SRX084350-SRX084355 and SRX087285); 9–10. Numbers of overlapping delRNAs overlap average lengths; and 11. Numbers of overlapping delRNAs expected by chance.

https://doi.org/10.1371/journal.pone.0217356.s008

(XLSX)

S8 File. DelRNAs detected in Genbank's human EST database and in the human whole cell mitochondrial transcriptome.

Description as in S4 and S7 Files.

https://doi.org/10.1371/journal.pone.0217356.s009

(XLSX)

S9 File. EST-delRNAs detected in both transcriptome and Genbank’s human EST database along with actual position (column 14–15) and genomic region of mapping on natural reference human mitogenome.

https://doi.org/10.1371/journal.pone.0217356.s010

(XLSX)

S10 File. Homopolymer frequencies of delRNAs detected in Genbank’s Human EST database.

The table shows the percentage of homopolymers in delRNAs detected for each del-transformation in Genbank’s Human EST database. It also shows the homopolymer percentage in mitogenomic regions covered by overlapping delRNAs detected in the EST database and the whole cell transcriptome, and delRNAs detected in the EST database and the purified mitochondrial transcriptome.

https://doi.org/10.1371/journal.pone.0217356.s011

(XLSX)

S11 File. Frequencies of codons belonging to the universal circular code X {AAC, AAT, ACC, ATC, ATT, CAG, CTC, CTG, GAA, GAC, GAG, GAT, GCC, GGC, GGT, GTA, GTC, GTT, TAC, TTC} in delRNAs detected in Genbank’s EST database.

Columns show their percentage in the delRNAs detected in Genbank’s human EST database and in the remaining transformed mitogenome for each del-transformed version, and the chi-square P value testing for difference in circular codon frequencies. The last two columns show the percentages of X in EST-delRNAs overlapping delRNAs detected in the whole cell, and in the purified mitochondrial transcriptome. Note: Total number of trinucleotides in the delRNAs and in the remaining del-transformed mitogenome for whole cell transcriptome, mito-transcriptome and in overlapping delRNAs is given in S10 File.

https://doi.org/10.1371/journal.pone.0217356.s012

(XLSX)

S12 File. Frequencies of codons belonging to the proposed mitochondrial circular code X0(MIT) {ACA, ACC, ATA, ATC, CTA, CTC, GAA, GAC, GAT, GCA, GCC, GCT, GGA, GGC, GGT, GTA, GTC, GTT, TTA, TTC} in delRNAs detected in Genbank’s EST database.

Description and columns as for S11 File.

https://doi.org/10.1371/journal.pone.0217356.s013

(XLSX)

Acknowledgments

We thank two anonymous reviewers for helpful and constructive comments on earlier draft of the manuscript.

References

  1. 1. Seligmann H, Raoult D. Stem-loop RNA hairpins in giant viruses: Invading rRNA-like repeats and a template free RNA. Front Microbiol. 2018; pmid:29449833
  2. 2. Alexander RP, Fang G, Rozowsky J, Snyder M, Gerstein MB. Annotating non-coding regions of the genome. Nature Reviews Genetics. 2010. pmid:20628352
  3. 3. Ariel F, Romero-Barrios N, Jégu T, Benhamed M, Crespi M. Battles and hijacks: Noncoding transcription in plants. Trends in Plant Science. 2015. pmid:25850611
  4. 4. Bernstein B, Birney E, Dunham I, Green E, Gunter C, Snyder M. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; citeulike-article-id:11191048\n pmid:22955616
  5. 5. Preker P, Nielsen J, Kammler S, Lykke-Andersen S, Christensen MS, Mapendano CK, et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science (80-). 2008; pmid:19056938
  6. 6. Seila AC, Calabrese JM, Levine SS, Yeo GW, Rahl PB, Flynn RA, et al. Divergent Transcription from Active Promoters. Science (80-). 2008; pmid:19056940
  7. 7. Calin GA, Dumitru CD, Shimizu M, Bichi R, Zupo S, Noch E, et al. Nonlinear partial differential equations and applications: Frequent deletions and down-regulation of micro- RNA genes miR15 and miR16 at 13q14 in chronic lymphocytic leukemia. Proc Natl Acad Sci. 2002; pmid:12434020
  8. 8. Si ML, Zhu S, Wu H, Lu Z, Wu F, Mo YY. miR-21-mediated tumor growth. Oncogene. 2007; pmid:17072344
  9. 9. Taft RJ, Glazov EA, Cloonan N, Simons C, Stephen S, Faulkner GJ, et al. Tiny RNAs associated with transcription start sites in animals. Nat Genet. 2009; pmid:19377478
  10. 10. Watanabe T, Tomizawa SI, Mitsuya K, Totoki Y, Yamamoto Y, Kuramochi-Miyagawa S, et al. Role for piRNAs and noncoding RNA in de novo DNA methylation of the imprinted mouse Rasgrf1 locus. Science (80-). 2011; pmid:21566194
  11. 11. DiGiacomo M, Comazzetto S, Saini H, DeFazio S, Carrieri C, Morgan M, et al. Multiple Epigenetic Mechanisms and the piRNA Pathway Enforce LINE1 Silencing during Adult Spermatogenesis. Mol Cell. 2013; pmid:23706823
  12. 12. Hargreaves AD, Zhou L, Christensen J, Marlétaz F, Liu S, Li F, et al. Genome sequence of a diabetes-prone rodent reveals a mutation hotspot around the ParaHox gene cluster. Proc Natl Acad Sci. 2017; pmid:28674003
  13. 13. Lovell P V., Wirthlin M, Wilhelm L, Minx P, Lazar NH, Carbone L, et al. Conserved syntenic clusters of protein coding genes are missing in birds. Genome Biol. 2014; pmid:25518852
  14. 14. Blank A, Gallant JA, Burgess RR, Loeb LA. An RNA Polymerase Mutant with Reduced Accuracy of Chain Elongation. Biochemistry. 1986;
  15. 15. Ninio J. Connections between translation, transcription and replication error-rates. Biochimie. 1991;
  16. 16. Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, et al. Widespread RNA and DNA sequence differences in the human transcriptome. Science (80-). 2011; pmid:21596952
  17. 17. Bahn JH, Lee JH, Li G, Greer C, Peng G, Xiao X. Accurate identification of A-to-I RNA editing in human by transcriptome sequencing. Genome Res. 2012; pmid:21960545
  18. 18. Strathern JN, Jin DJ, Court DL, Kashlev M. Isolation and characterization of transcription fidelity mutants. Biochimica et Biophysica Acta—Gene Regulatory Mechanisms. 2012. pmid:22366339
  19. 19. Bar-Yaacov D, Avital G, Levin L, Richards AL, Hachen N, Rebolledo Jaramillo B, et al. RNA-DNA differences in human mitochondria restore ancestral form of 16S ribosomal RNA. Genome Res. 2013; pmid:23913925
  20. 20. Knippa K, Peterson DO. Fidelity of RNA polymerase II transcription: Role of Rbp9 in error detection and proofreading. Biochemistry. 2013; pmid:24099331
  21. 21. Zhou YN, Lubkowska L, Hui M, Court C, Chen S, Court DL, et al. Isolation and characterization of RNA polymerase rpoB mutations that alter transcription slippage during elongation in Escherichia coli. J Biol Chem. 2013; pmid:23223236
  22. 22. Wang IX, Grunseich C, Chung YG, Kwak H, Ramrattan G, Zhu Z, et al. RNA-DNA sequence differences in Saccharomyces cerevisiae. Genome Res. 2016; pmid:27638543
  23. 23. Bass BL. RNA Editing by Adenosine Deaminases That Act on RNA. Annu Rev Biochem. 2002; pmid:12045112
  24. 24. Schaub M, Keller W. RNA editing by adenosine deaminases generates RNA and protein diversity. Biochimie. 2002;
  25. 25. Chen C, Bundschuh R. Systematic investigation of insertional and deletional RNA-DNA differences in the human transcriptome. BMC Genomics. 2012; pmid:23148664
  26. 26. Park E, Williams B, Wold BJ, Mortazavi A. RNA editing in the human ENCODE RNA-seq data. Genome Res. 2012; pmid:22955975
  27. 27. Wang IX, Core LJ, Kwak H, Brady L, Bruzel A, McDaniel L, et al. RNA-DNA differences are generated in human cells within seconds after RNA exits polymerase II. Cell Rep. 2014; pmid:24561252
  28. 28. Lee SY, Joung JG, Park CH, Park JH, Kim JH. RCARE: RNA Sequence Comparison and Annotation for RNA Editing. BMC Med Genomics. 2015; pmid:26043858
  29. 29. Porath HT, Carmi S, Levanon EY. A genome-wide map of hyper-edited RNA reveals numerous new sites. Nat Commun. 2014; pmid:25158696
  30. 30. Kumar S, Vo AD, Qin F, Li H. Comparative assessment of methods for the fusion transcripts detection from RNA-Seq data. Sci Rep. 2016; pmid:26862001
  31. 31. Seligmann H. Overlapping genes coded in the 3’-to-5’-direction in mitochondrial genes and 3’-to-5’ polymerization of non-complementary RNA by an ‘invertase’. J Theor Biol. 2012; pmid:22995821
  32. 32. Seligmann H. Polymerization of non-complementary RNA: Systematic symmetric nucleotide exchanges mainly involving uracil produce mitochondrial RNA transcripts coding for cryptic overlapping genes. BioSystems. 2013; pmid:23410796
  33. 33. Seligmann H. Triplex DNA:RNA, 3′-to-5′ Inverted RNA and Protein Coding in Mitochondrial Genomes. J Comput Biol. 2013; pmid:23841652
  34. 34. Warthi G, Seligmann H. Swinger RNAs in the Human Mitochondrial Transcriptome. In: Seligmann H, Warthi G, editors. Mitochondrial DNA-new insights. Chapter 4, 79–92.
  35. 35. Seligmann H. Systematic asymmetric nucleotide exchanges produce human mitochondrial RNAs cryptically encoding for overlapping protein coding genes. J Theor Biol. 2013; pmid:23416187
  36. 36. Seligmann H. Translation of mitochondrial swinger RNAs according to tri-, tetra- and pentacodons. BioSystems. 2016; pmid:26723232
  37. 37. Seligmann H. Swinger RNA self-hybridization and mitochondrial non-canonical swinger transcription, transcription systematically exchanging nucleotides. J Theor Biol. 2016; pmid:27079465
  38. 38. Seligmann H. Mitochondrial swinger replication: DNA replication systematically exchanging nucleotides and short 16S ribosomal DNA swinger inserts. BioSystems. 2014; pmid:25283331
  39. 39. Seligmann H. Sharp switches between regular and swinger mitochondrial replication: 16S rDNA systematically exchanging nucleotides A<->T+C<->G in the mitogenome of Kamimuria wangi. Mitochondrial DNA. 2016; pmid:25865623
  40. 40. Seligmann H. Species radiation by DNA replication that systematically exchanges nucleotides? J Theor Biol. 2014; pmid:25192628
  41. 41. Seligmann H. Swinger RNAs with sharp switches between regular transcription and transcription systematically exchanging ribonucleotides: Case studies. BioSystems. 2015; pmid:26163926
  42. 42. Seligmann H. Chimeric mitochondrial peptides from contiguous regular and swinger RNA. Comput Struct Biotechnol J. 2016; pmid:27453772
  43. 43. Seligmann H. Codon expansion and systematic transcriptional deletions produce tetra-, pentacoded mitochondrial peptides. J Theor Biol. 2015; pmid:26456204
  44. 44. Seligmann H. Natural mitochondrial proteolysis confirms transcription systematically exchanging/deleting nucleotides, peptides coded by expanded codons. J Theor Biol. 2017; pmid:27899286
  45. 45. Seligmann H. Systematically frameshifting by deletion of every 4th or 4th and 5th nucleotides during mitochondrial transcription: RNA self-hybridization regulates delRNA expression. BioSystems. 2016; pmid:27018206
  46. 46. Atkins JF, Loughran G, Bhatt PR, Firth AE, Baranov P V. Ribosomal frameshifting and transcriptional slippage: From genetic steganography and cryptography to adventitious use. Nucleic Acids Res. 2016; pmid:27436286
  47. 47. El Houmami N, Seligmann H. Evolution of nucleotide punctuation marks: From structural to linear signals. Front Genet. 2017; pmid:28396681
  48. 48. Arquès DG, Michel CJ. A complementary circular code in the protein coding genes. J Theor Biol. 1996;
  49. 49. Michel CJ, Ngoune V, Poch O, Ripp R, Thompson JD. Enrichment of Circular Code Motifs in the Genes of the Yeast Saccharomyces cerevisiae. Life. 2017; pmid:29207500
  50. 50. Michel CJ. The maximal C3 self-complementary trinucleotide circular code X in genes of bacteria, eukaryotes, plasmids and viruses. J Theor Biol. 2015;
  51. 51. Michel C. The Maximal C3 Self-Complementary Trinucleotide Circular Code X in Genes of Bacteria, Archaea, Eukaryotes, Plasmids and Viruses. Life. 2017; pmid:28420220
  52. 52. Kung JTY, Colognori D, Lee JT. Long noncoding RNAs: past, present, and future. Genetics. 2013; pmid:23463798
  53. 53. Dila G, Michel CJ, Poch O, Ripp R, Thompson JD. Evolutionary conservation and functional implications of circular code motifs in eukaryotic genomes. BioSystems. 2019; pmid:30367916
  54. 54. Gonzalez DL, Giannerini S, Rosa R. Circular codes revisited: A statistical approach. J Theor Biol. 2011; pmid:21277862
  55. 55. Fimmel E, Strüngmann L. Codon Distribution in Error-Detecting Circular Codes. Life. 2016; pmid:26999215
  56. 56. Garzon R, Volinia S, Papaioannou D, Nicolet D, Kohlschmidt J, Yan PS, et al. Expression and prognostic impact of lncRNAs in acute myeloid leukemia. Proc Natl Acad Sci. 2014; pmid:25512507
  57. 57. Mercer TR, Neph S, Dinger ME, Crawford J, Smith MA, Shearwood AMJ, et al. The human mitochondrial transcriptome. Cell. 2011; pmid:21854988
  58. 58. Arquès DG, Michel CJ. A circular code in the protein coding genes of mitochondria. J Theor Biol. 1997; pmid:9441820
  59. 59. Seligmann H. Natural chymotrypsin-like-cleaved human mitochondrial peptides confirm tetra-, pentacodon, non-canonical RNA translations. BioSystems. 2016; pmid:27477600
  60. 60. Seligmann H. Unbiased Mitoproteome Analyses Confirm Non-canonical RNA, Expanded Codon Translations. Comput Struct Biotechnol J. 2016; pmid:27830053
  61. 61. Seligmann H. Reviewing evidence for systematic transcriptional deletions, nucleotide exchanges, and expanded codons, and peptide clusters in human mitochondria. BioSystems. 2017. pmid:28807694
  62. 62. Jestin JL, Kempf A. Chain termination codons and polymerase-induced frameshift mutations. FEBS Letters. 1997.
  63. 63. Crick FH, Griffith JS, Orgel LE. CODES WITHOUT COMMAS. PNAS. 1957. pmid:16590032
  64. 64. Klobutcher LA, Farabaugh PJ. Shifty ciliates: Frequent programmed translational frameshifting in euplotids. Cell. 2002.
  65. 65. Ketteler R. On programmed ribosomal frameshifting: The alternative proteomes. Frontiers in Genetics. 2012. pmid:23181069
  66. 66. Advani VM, Dinman JD. Reprogramming the genetic code: The emerging role of ribosomal frameshifting in regulating cellular gene expression. BioEssays. 2016. pmid:26661048
  67. 67. Lacan J. Michel CJ. Analysis of a Circular Code Model. J Theor Biol. 2001. pmid:11894988
  68. 68. Ahmed A, Frey G, Michel CJ. Frameshift signals in genes associated with the circular code. In Silico Biol. 2007;7(2): 155–68. pmid:17688441
  69. 69. Michel CJ. Circular code motifs in transfer and 16S ribosomal RNAs: A possible translation code in genes. Comput Biol Chem. 2012;
  70. 70. El Soufi K, Michel CJ. Circular code motifs near the ribosome decoding center. Comput Biol Chem. 2015; pmid:26547036
  71. 71. Michel CJ. Circular code motifs in transfer RNAs. Comput Biol Chem. 2013; pmid:23727957
  72. 72. Fisher RA. Questions and answers #14. The American Statistician. 1948; 2 (5): 30–31.)
  73. 73. Khanna M, Van Bakel H, Tang X, Calarco JA, Babak T, Guo G, et al. A systematic characterization of Cwc21, the yeast ortholog of the human spliceosomal protein SRm300. RNA. 2009; pmid:19789211
  74. 74. Harish A, Kurland CG. Mitochondria are not captive bacteria. J Theor Biol. 2017; pmid:28754286
  75. 75. Seligmann H. Giant viruses as protein-coated amoeban mitochondria? Virus Res. 2018; pmid:29913250
  76. 76. Gao S, Tian X, Chang H, Sun Y, Wu Z, Cheng Z, et al. Two novel lncRNAs discovered in human mitochondrial DNA using PacBio full-length transcriptome data. Mitochondrion. 2018; pmid:28802668
  77. 77. Bazin J, Baerenfaller K, Gosai SJ, Gregory BD, Crespi M, Bailey-Serres J. Global analysis of ribosome-associated noncoding RNAs unveils new modes of translational regulation. Proc Natl Acad Sci. 2017; pmid:29087317