TFOFinder: Python program for identifying purine-only double-stranded stretches in the predicted secondary structure(s) of RNA targets

Nucleic acid probes are valuable tools in biology and chemistry and are indispensable for PCR amplification of DNA, RNA quantification and visualization, and downregulation of gene expression. Recently, triplex-forming oligonucleotides (TFO) have received increased attention due to their improved selectivity and sensitivity in recognizing purine-rich double-stranded RNA regions at physiological pH by incorporating backbone and base modifications. For example, triplex-forming peptide nucleic acid (PNA) oligomers have been used for imaging a structured RNA in cells and inhibiting influenza A replication. Although a handful of programs are available to identify triplex target sites (TTS) in DNA, none are available that find such regions in structured RNAs. Here, we describe TFOFinder, a Python program that facilitates the identification of intramolecular purine-only RNA duplexes that are amenable to forming parallel triple helices (pyrimidine/purine/pyrimidine) and the design of the corresponding TFO(s). We performed genome- and transcriptome-wide analyses of TTS in Drosophila melanogaster and found that only 0.3% (123) of total unique transcripts (35,642) show the potential of forming 12-purine long triplex forming sites that contain at least one guanine. Using minimization algorithms, we predicted the secondary structure(s) of these transcripts, and using TFOFinder, we found that 97 (79%) of the identified 123 transcripts are predicted to fold to form at least one TTS for parallel triple helix formation. The number of transcripts with potential purine TTS increases when the strict search conditions are relaxed by decreasing the length of the probe or by allowing up to two pyrimidine inversions or 1-nucleotide bulge in the target site. These results are encouraging for the use of modified triplex forming probes for live imaging of endogenous structured RNA targets, such as pre-miRNAs, and inhibition of target-specific translation and viral replication.

In addition, we are including one pdf copy and one docx source file of the manuscript: We would also like to thank the reviewers for their time and valuable assessment.
Sincerely, Atara Neugroschl and Irina Catrina >>The RNAMotif and TFOFinder programs are not redundant; RNAmotif can be used to identify potential RNA targets in a large dataset, which then can be analyzed using TFOFinder.However, while RNAmotif can be used to identify any regions that may contain a purine duplex in RNA targets of interest, it does not take into consideration the target structure.
This was briefly discussed in the manuscript, and at the Reviewer #1's recommendation we expanded this section to include more information in the paragraph first introducing TFOFinder, as shown below: >> Line clean file 94 & tracking file 77: Yes, "frame" here means backbone, and we did change it, as the reviewer implied it will make it clearer.<< • Line 81, "in a greater mismatch discrimination" should be "with a greater mismatch discrimination." >> Line clean file 98 & tracking file 81: This was corrected as indicated.<< • Line 100, "duple-formation" should be "duplex-formation." >> Line clean file 117 & tracking file 100: This was corrected as indicated.<< • Line 134, FRET should be defined.
>> Line clean file 144 & tracking file 127: This was defined in the text only, but it was not included in the list of abbreviations, as it only appears once.
"FRET (Fluorescence/Förster Resonance Energy Transfer)" << • Line 133, this paragraph seems to be out of place.The authors provided introductory material and then introduced their new tool.This paragraph of introductory material sits between two paragraphs that discuss the tool.Is there a better location in the introduction to move this paragraph?We added a reference for this number, which we defined in a previous publication (reference [47]) and we included the following information immediately following the sentence first mentioning "ss-count": "The ss-count fraction indicates the extent to which a sequence is predicted to be singlestranded in the predicted MFE and/or SO structures.The larger the value of the ss-count fraction, the more likely it will be that the sequence will have a single-stranded character, where 1 = fully single-stranded and 0 = fully double-stranded.The ss-count fraction was calculated by dividing the sum of the ss-count numbers of the individual bases in the TTS to the product of the probe length and number of total structures (MFE and SO structures) in the input file.The sscount number represents the number of structures of the total structures in which a base is predicted to be single-stranded, and the ss-count file is one of the output files obtained when predicting RNA structure using mfold."<< • Line 309, "one highlighted in the red box for the of the 12th 310 SO structure" does not make sense and needs editing.At the reviewer's request, we checked the above reference and found that it was imported in EndNote using the citation file provided by the Journal and we believe it is complete.If the reviewer has a specific observation regarding this reference, we would be happy to check it again.<<

Reviewer #2: General Comment
The authors have presented a potentially valuable and innovative tool for designing TFOs targeting RNA in the model species D. melanogaster (genome and transcriptome) and the vRNA8 of influenza A. They have searched for double-stranded fragments of a user-defined length (4-30 nt) composed of consecutive purines within predicted secondary structures of the RNA target of interest.
The literature review and description of the methods employed by the authors are clear and concise, and the rationale for the study is evident.We appreciate the authors providing the link to the Github repository containing the TFOFinder python code.
While we believe that the wider scientific and bioinformatics community can benefit from this work, we suggest the authors consider applying the FAIR (Findable, Accessible, Interoperable, and Reusable) principles to the manuscript to ensure reproducibility and reusability of the codebase.It would be helpful if the authors could provide test data to demonstrate the usage of the provided scripts and how they integrate with other tools used in the complete study.
>>We apologize for not including examples with our scripts.At the Reviewer #2's recommendation we updated the GitHub repository to include examples of TFOFinder input and output files for the 67 th RNA identified by our D. melanogaster transcriptome search (ove-RE mRNA, S1 Table ).We also wrote and uploaded in GitHub a tutorial file describing in detail the TFOFinder example mentioned above.We will make available upon request any of the files mentioned in our manuscript.
The GitHub repository was updated by making a new folder for all TFOFinder-related files, and the GitHub link was updated, and the following text was included in the revised manuscript: Line clean file 463 & tracking file 468: The link to the GitHub repository was updated: "(https://github.com/icatrina/TFOFinder)" Lines clean file 472-475 and tracking file 477-480: "A tutorial file can be found in the abovementioned GitHub repository.This tutorial provides details for the download and installation

•
PCOMBIOL-D-23-00670_with_tracking_071223.pdf file reflects all changes we made to our original submission as follows: o Insertions o Deletions o Changed lines = grey outside border o Moved from o Moved to o Change format = side bubble • PCOMBIOL-D-23-00670_clean_071223reftxt.docx file is the clean version of the revised manuscript.

>>
Lines clean file 340-345 & tracking file 343-349: The legend of Fig 5 was edited as follows: "Fig 5. Secondary structures for two ncRNAs, predicted with mfold.(A) Full MFE structure (mfold) of the shortest transcript (CR44598-RA) identified to contain one TTS, which is highlighted in the red box and also shown magnified (right).(B) A longer ncRNA (CR44619-RA, 1,023-nt) containing several TTS, and one TTS containing a mispair and 1-nt bulge in the MFE structure (left, red arrows) is highlighted in the red boxes for the MFE and 12 th SO structure (mfold; right)."<< • Line 377, define NCBI.>>Line clean file 404 & tracking file 409: This was defined in the text at the first mention, but it was not included in the list of abbreviations, as it only appears three times in the manuscript."NCBI (National Center for Biotechnology Information)" << • Ref 14 doesn't look complete.>>Lines clean file 540-543 & tracking file 560-563: "14.Gupta P, Zengeya T, Rozners E. Triple helical recognition of pyrimidine inversions in polypurine tracts of RNA by nucleobase-modified PNA.Chem Commun (Camb).2011;47(39):11125-7. doi: 10.1039/c1cc14706d.PubMed PMID: 21909545; PubMed Central PMCID: PMCPMC3757498."

Text. Example of descriptors used for the RNAmotif searches." <<
>> Line clean file 84 & tracking file 67: This was corrected as indicated.<< Lines clean file 170-176 & tracking file 153-159: "The TFOFinder program takes into consideration the predicted secondary structure(s) of an RNA target of interest and designs the corresponding TFO probes, features that are not implemented in RNAmotif.However, when large-scale • Line 77, what is meant by "frame"?Backbone?