Inverse folding based pre-training for the reliable identification of intrinsic transcription terminators
Fig 2
Impact of sequence and structure on terminator and tRNA recognition.
(A) Intrinsic terminators comprise five sections: The hairpin structure in the center consists of a stem and a loop, framed by an A-rich zone (A-tail) on the 5’-end and a longer U-rich zone (U-tail) on the 3’-end. The terminator data used in this study additionally contain adjacent genomic sequences of the terminator (left pad and right pad). (B) Impact of terminator sections as relative activation impact on CNN models (left) and relative detection impact on ARNold (right). Random mutations were introduced in each of the 7 sections of the transcription terminators. The relative activation impact on the models is calculated from the difference between the model output corresponding to the original sequences and sequences with random nucleotide mutations in half of all nucleotides per section. The relative detection impact for ARNold is calculated for the same mutated sequences, and is estimated by averaging over binary outputs across the mutation data set. (C) Impact of the base pairings in the stem of terminators for a growing number of mutated base pairs as relative activation impact on CNN models and relative detection impact on ARNold. The relative activation impact is calculated from the difference between the model output corresponding to mutations which retain or disrupt the pairing state in the stem structure. The relative detection impact for ARNold is calculated for the same mutated sequences, and is estimated by averaging over binary outputs across the mutation data set. (D) Relative activation impact of the base pairings in the stems of tRNAs on CNN models, for a growing number of mutated base pairs. The relative activation impact is calculated from the difference between the model output corresponding to mutations which retain or disrupt the pairing state in the stem structure. (B), (C): For k = 1, …, 10 and n ∈ {93, 84, 102, 91, 94, 92, 93, 113, 99, 92} (D): For k = 1, …, 10 and n ∈ {198, 203, 194, 202, 201, 201, 199, 201, 194, 201}.