Generation of full-length circular RNA libraries for Oxford Nanopore long-read sequencing

Circular RNA (circRNA) is a noncoding RNA class with important implications for gene expression regulation, mostly by interaction with other RNA species or RNA-binding proteins. While the commonly applied short-read Illumina RNA-sequencing techniques can be used to detect circRNAs, their full sequence is not revealed. However, the complete sequence information is needed to analyze potential interactions and thus the mechanism of action of circRNAs. Here, we present an improved protocol to enrich and sequence full-length circRNAs by using the Oxford Nanopore long-read sequencing platform. The protocol involves an enrichment of lowly abundant circRNAs by exonuclease treatment and negative selection of linear RNAs. Then, a cDNA library is created and amplified by PCR. This protocol provides enough material for several sequencing runs. The library is used as input for ligation-based sequencing together with native barcoding. Stringent quality control of the libraries is ensured by a combination of Qubit, Fragment Analyzer and qRT-PCR. Multiplexing of up to 4 libraries yields in total more than 1–2 Million reads per library, of which 1–2% are circRNA-specific reads with >99% of them full-length. The protocol works well with human cancer cell lines. We further provide suggestions for the bioinformatic analysis of the created data, as well as the limitations of our approach together with recommendations for troubleshooting and interpretation. Taken together, this protocol enables reliable full-length analysis of circRNAs, a noncoding RNA type involved in a growing number of physiologic and pathologic conditions. Metadata Associated content. https://dx.doi.org/10.17504/protocols.io.rm7vzy8r4lx1/v2.

Introduction Circular RNAs (circRNAs) are a class of noncoding RNA, which is generated by a form of alternative splicing termed back-splicing. Their ring-like structure and lack of free 5' and 3' ends render them exonuclease resistant and more stable than linear RNAs. This is the reason why they escape detection by the highly used poly(A)-selected mRNA sequencing [1,2]. cir-cRNAs regulate gene expression by e.g. binding to microRNAs or RNA-binding proteins, or interact directly with the transcriptional machinery [3][4][5]. The commonly applied Illuminabased short-read sequencing techniques can be readily used (total RNA-seq) to identify cir-cRNAs by their characteristic back-splice junction [6]. However, since circRNAs, which have an average size ranging from 200 to 800 nt [7,8], share their remaining sequence with their cognate linear RNAs from the same gene, the full-length information cannot be confidently retrieved by these methods. The recently developed long-read RNA-sequencing techniques such as the Oxford-Nanopore sequencing platform bear great potential to obtain this missing information due to their ability to sequence transcripts in full-length.
Sequencing of circRNAs by a direct RNA-sequencing protocol with Oxford Nanopore is an attractive option to analyze circRNAs including their epigenetic modifications without introduction of bias by PCR. However, such an approach requires a linearization of circRNA molecules and has a reduced sensitivity, since the detection of lowly abundant circRNAs is limited and the sequencing coverage is not as high as for Illumina-based methods. Wang et al. followed this approach to analyze plant RNA [9], which they fragmented to be able to sequence it. While several circRNAs were detected, the high amount of input RNA remains a critical limitation in particular when analyzing human and especially patient samples that are often degraded.
Recently, a workflow was published to sequence circRNAs by creating a cDNA library that is amplified by PCR, without the need of fragmentation [8]. The approach of Zhang et al. uses a ribodepletion followed by an enrichment of circRNAs by exonuclease treatment and a size selection of transcripts longer than 1 kb. The created library is then used for ligation-based sequencing with Oxford Nanopore. With this approach the team obtained between 0.8 and 4 million reads per library, of which 1-6% were circRNA-specific reads that mapped to the back-splice junction. Here, we present a modified version of this workflow that we adapted to retrieve full-length sequencing information also of shorter circRNAs to cover the whole spectrum of circRNAs. The workflow produces an increased library output to sequence several times, if needed, to potentially detect also lowly abundant circRNAs. In more detail, we changed the ribodepletion method from a commercial kit to the published method of Baldwin et al. [10], which worked more efficiently in our hands. This ribodepletion method is based on a pool of DNA oligonucleotides that hybridize with ribosomal RNA and a subsequent digest of DNA:RNA hybrids by RNaseH. For further circRNA enrichment a negative selection of poly (A) transcripts was added by using oligo(dT)-conjugated magnetic beads. The final size selection of the amplified circRNA library was adapted to include shorter circRNAs. Finally, we introduced a quality control by qRT-PCR to detect the enrichment of circRNAs and the depletion of unwanted transcripts, such as ribosomal RNA, mitochondrial RNA and small-nucleolar RNAs. Further, we provide recommendations for the Nanopore library creation and sequencing together with recommendations for the bioinformatics analysis.

Materials and methods
The protocol described in this peer-reviewed article is published on protocols.io, https://dx. doi.org/10.17504/protocols.io.rm7vzy8r4lx1/v2 (version 2) and is included for printing as S1 File with this article.

Expected results
Using the described protocol, we prepared sequencing libraries of 4 different anaplastic largecell lymphoma (ALCL) cell lines that served as a model to test the workflow (SU-DHL-1, Karpas-299, COST [11], SUP-M2). The obtained libraries had an average length of 606.8 nt and a concentration of 4.8 ng/μl (117-133 ng of library in total), respectively (Fig 1, Table 1). The library length corresponds with the average published size of circRNAs, which is between 200-800 nt [7,8], thus showing that our workflow maintains the size of circRNAs and does not degrade them.  As part of our introduced quality control workflow, a qRT-PCR was performed to detect the enrichment of circRNAs and the depletion of unwanted RNA transcripts (Fig 2). We used 3 different circRNAs as indicator for an enrichment, of which we know from previous Illumina-based RNA-sequencing experiments that they are well expressed in our cell line models. The enrichment was on average between 4 and 17-fold in comparison to the non-enriched control. Ribosomal RNA was depleted more than 30,000-fold and also other unwanted transcripts were degraded (mitochondrial RNA, small nucleolar RNA, signal recognition particle RNA, linear RNAs and mRNAs).
The created library pool was used as input for Oxford Nanopore sequencing with the ligation-based sequencing kit (SQK-LSK109) together with the native barcoding kit (EXP-NBD104) according to the manufacturer and sequenced on one flow cell on a MinION MK1C. The sequencing output was on average 1,536,229 reads per library and the reads were of high quality ( Table 2, mean Q-score 15).

SU-DHL1
Karpas Importantly, the pores were not completely saturated, so probably a longer sequencing run with more material would have been possible. Base calling of the raw sequencing data was performed with Guppy from the MinKNOW software (v22.05.8, part of the operation system of the MinION sequencing device) and fastq files were generated. Bioinformatics analysis of the fastq files involved cleaning the reads from adapter sequences with cutadapt (v.3.4, https://doi. org/10.14806/ej.17.1.200). Then the CIRI-Long software (v1.0.3, [8]) was used to detect cir-cRNAs using default settings (detailed recommendations are included in our protocol as S1 File). Following the analysis workflow described for CIRI-Long in our protocol we could identify on average 15,767 circRNA-specific reads, thus 1.0% of the total reads, of which 99% covered the full length of the circRNA (Figs 3 and 4), similar to the study from Zhang et al. [8]. On average 2,945 different circRNAs were identified. Of note, it is visible that when more reads are generated, more different full-length circRNA isoforms are detected, which could be another argument for deeper sequencing. The results were comparable among the samples from the 4 different human cancer cell lines, which demonstrates the robustness of the workflow.
We then selected randomly 5 different circRNAs detected by Nanopore for validation ( Fig  5 and S1 Fig). COST ALCL cells were used to isolate RNA and transcribe cDNA. Primers were designed to specifically amplify the circRNAs, which were submitted for Sanger sequencing. The characteristic back-splice junction essential for the formation of the RNA circle was detected for all of the circRNAs, which validates our sequencing workflow.
The limitations of this protocol involve the relatively low abundance of circRNAs in general. Since Nanopore sequencing creates less reads in comparison to Illumina-based techniques, especially lowly abundant circRNAs might not be detected by our protocol. Further, the recommended RNA input of 7 μg to have enough material for several rounds of sequencing and also detect lower abundant circRNAs might be too high for situations, where the material is limited. However, we successfully tested our protocol with as little as 3 μg, without impairing the sequencing output. If the obtained amount of DNA by the circRNA enrichment workflow is not high enough, then the PCR amplification of the cDNA libraries can be adapted by increasing the volume of the PCR reaction and the number of cycles (step 12 of the protocol). Further, if the expected size of circRNAs in the cell model of interest is lower or higher than 200-800 nt, the size selection (step 13 of the protocol) can be easily adapted by changing the ratio of beads to DNA (lower ratio selects for larger fragments, higher ratio selects for shorter fragments). Limitations concerning the used tool CIRI-Long to detect cir-cRNAs are that the tool does not detect circRNAs derived from fusion genes. Further, the alignment parameters are not modifiable and a bam file containing the aligned reads is not conserved. Therefore, we routinely perform a separate alignment using minimap2 (v.2.19, [12]) and a visualization by IGV Genomics Viewer (v2.9.4, [13]). By this means, chimeric alignments containing segments of the same read aligning to distant genes can be visualized and help to identify fusion gene derived circRNAs.
In summary, this protocol facilitates consistent full-length sequencing of circRNAs, which will help to study this noncoding RNA type in a variety of physiologic and pathologic contexts.  detected by Nanopore-seq were amplified by RT-PCR using cDNA from COST ALCL cells and analyzed by Sanger sequencing. For each panel, the alignment of long reads against the exons that form the circRNA is shown in the upper part as visualized by IGV Genomics Viewer (tracks as in Fig 3). In the lower part a scheme of the circRNA is shown, together with the localization of the used primers and the BSJ-sequence obtained by Sanger sequencing. The circbase.org ID is mentioned [14]. BSJ, back-splice junction. https://doi.org/10.1371/journal.pone.0273253.g005 Supporting information S1 Fig. Validation of circRNAs by Sanger sequencing. 2 circRNAs detected by Nanoporeseq were validated by Sanger sequencing similar as in Fig 5. Representative alignments and the BSJ-sequence obtained by Sanger sequencing are shown for A) circNFATC3 and B) circAR-ID1A (alignments see Fig 4). The circbase.org ID is mentioned [14]. BSJ, back-splice junction. (TIF) S1 File.
Step-by-step protocol, also available on protocols.io. (PDF)