Core non-coding RNAs of Piscirickettsia salmonis

Piscirickettsia salmonis, a fastidious Gram-negative intracellular facultative bacterium, is the causative agent o Piscirickettsiosis. P. salmonis has broad host range with a nearly worldwide distribution, causing significant mortality. The molecular regulatory mechanisms of P. salmonis pathogenesis are relatively unknown, mainly due to its difficult in vitro culture and genomic differences between genogroups. Bacterial non-coding RNAs (ncRNAs) are important post-transcriptional regulators of bacterial physiology and virulence that are predominantly transcribed from intergenic regions (trans-acting) or antisense strand of open reading frames (cis-acting). The repertoire of ncRNAs present in the genome of P. salmonis and its possible role in bacterial physiology and pathogenesis are unknown. Here, we predicted and analyzed the core ncRNAs of P. salmonis base on structure and correlate this prediction to RNA sequencing data. We identified a total of 69 ncRNA classes related to tRNAs, rRNA, thermoregulators, antitoxins, ribozymes, riboswitches, miRNAs and antisense-RNAs. Among these ncRNAs, 29 classes of ncRNAs are shared between all P. salmonis genomes, constituting the core ncRNAs of P. salmonis. The ncRNA core of P. salmonis could serve to develop diagnostic tools and explore the role of ncRNA in fish pathogenesis.


Introduction
The genus Piscirickettsia includes two species, the recently described P. litoralis [1] and P. salmonis. P. salmonis is the etiological agent of salmonid rickettsial septicemia (SRS) or Piscirickettsiosis [2]. SRS has a high impact on the Atlantic salmon (Salmo salar) aquaculture in Chile, with up to~100% of losses associated to P. salmonis infection in seawater [3]. This Gram-negative, intracellular facultative pathogen was first isolated from Coho salmon (Oncorhynchus kisutch) in Chile [4] and since then, it has been reported in different geographic locations (e.g. Canada, USA, Norway, UK, Greece), and isolated from different salmonid and non-salmonid species [5,6]. a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 The P. salmonis strain LF-89 isolated in Chile is the reference strain [7,8] but many others have been isolated and characterized [9,10]. The knowledge about P. salmonis regulatory mechanisms of pathogenesis and physiology are limited due to its fastidious nature [9,11,12]. P. salmonis causes a systemic infection associated with the Dot/Icm type IV secretion system (SSTIV), which is required for cell invasion, immune evasion, and intracellular replication [13]. Also, it has been reported that P. salmonis macrophage internalization is mediated by clathrin [14]. Additionally, it has been shown that P. salmonis secretes outer membrane vesicles (OMVs) that could deliver or translocate effectors and other virulence factors into the fish cell [15]. Recently, pathogenic genomic islands have been identified in P. salmonis [16]. However, the repertoire and the potential roles of non-coding RNAs (ncRNAs) in P. salmonis gene regulation and pathogenesis have not been described.
ncRNAs are functional molecules of RNAs that are not translated into protein [17]. Genomic regions transcribed into ncRNAs, beside tRNAs and rRNAs, were not considered relevant for biological roles. The discovery of the first functional microRNA (miRNA) in Caenorhabditis elegans [18], claimed the scientific attention back to ncRNAs. Today, it is known that ncRNAs play important biological roles in all kingdoms of life [19,20].
The advancement of high-throughput expression technologies over the last years boosted the prediction, characterization, and functional classification of different novel types of sRNAs [33]. This was followed by the development of several computational biology approaches, based on secondary structure predictions, sequence similarity searches, covariance analysis models, and minimum free energy models, which together allowed the identification of thousands of different RNA classes from different evolutionary branches [34,35].
Complexity of organisms along the evolution has been associated with the expansion of genomic elements [36,37]. Comparison between the increasing number of protein-coding genes and non-protein coding genes reveals that the expansion of ncDNA is much higher than the expansion of protein coding genes [38]. This correlates with the increasing number of sRNAs described in bacteria genomes [39].
Here we predicted the sRNAs of several P. salmonis genomes and identified the core ncRNA repertoire of P. salmonis. The ncRNAs repertoire of P. salmonis and the possible role in gene regulation and pathogenesis will contribute to understanding P. salmonis physiology and host-pathogen interaction, opening new venues for the control of this pathogen.

ncRNAs predictions in P. salmonis
The genome sequences of eleven P. salmonis strains (Table 1) were downloaded from National Center for Biotechnology Information (NCBI) [40]. The prediction was performed by comparing the secondary structures in covariance models from all RNA families available in the RNA families database (Rfam; version 12.0) [41] against the P. salmonis genome sequences ( Table 1). The comparisons were performed using an in-house developed tool called StructR-NAfinder [42]. This software automatically integrates different tools for ncRNAs prediction and secondary structure identification, including Infernal [43], RNAFOLD [44] and Rfam database. StructRNAfinder utilizes Infernal to generate covariance models and sequence comparisons, and RNAfold for secondary structure prediction. The functional annotation for the predicted ncRNAs is obtained from Rfam. Predicted ncRNAs overlapping the genomic coordinates of coding genes were detected using intersectBED v2.26.0 [45] and manually discarded. Also, ncRNAs predicted more than once in each P. salmonis genome were manually eliminated to reduce redundancy. Finally, ncRNAs detected in intergenic regions were considered as part of the P. salmonis ncRNA repertoire.

Determination of the P. salmonis core ncRNAs
We clustered the P. salmonis genomes based on the ncRNAs repertoire of each genome. ncRNA classes were hierarchically clustered using the "complete method" and Euclidean distance through hclust function from R environment. The final heatmap representation was built using gplots R package. The ncRNAs shared by all P. salmonis genomes were considered as part of the P. salmonis ncRNA core.

Determination of P. salmonis codon usage
The P. salmonis codon usage was determined mediated the web suite SMS (Sequence Manipulation Suite) [46]. Known functionally annotated and unique hypothetical P. salmonis proteins, based on NCBI annotation [40], were used to determine the codon usage (S1 Table).

P. salmonis growth conditions for transcriptome analysis
The reference strain LF-89 strain was maintained and cultivated in CHSE-214 cells at 18˚C [47]. From infected cells, the bacterium was streaked onto CHAB agar plates (Brain heart infusion supplemented with L-cysteine 1 gL -1 and 5% ovine blood) and incubated for 10 days at 18˚C, until the formation of slightly convex and grey-white shiny bacterial colonies [47]. Finally, 10 single colonies were inoculated in 50 ml of Austral-SRS broth [48] and incubated for 5 days at 18˚C with gentle shaking (100 rpm).

RNA extraction and cDNA synthesis
P. salmonis grown in Austral-SRS medium was used for RNA extraction. 50 ml of bacterial culture were centrifuged (6,000 x g) during 10 min and resuspended in 1 ml of Trizol (Invitrogen, Madison, USA). The mixture was vortexed and treated with 700 μl of chloroform. The aqueous phase was extracted and mixed 1:1 with isopropanol. Total RNA was concentrated by RNeasy cleanup QIAgene kit. The total RNA extracted was treated with Turbo-DNAase I during 30 min at 37˚C (Ambion). The absence of DNA was checked by PCR using the ITS primers RTS1 (5'-TGATTTTATTGTTTAGTGAGAATGA-3') and RTS4 (5'-ATGCACTTATTCACTTGAT CATA-3') [49]. The purity was determined (ratio A260/A280) with a Nanodrop ND1000 spectrophotometer (Thermo Fisher Scientific, Copenhagen, USA), and the integrity was determined by agarose gel under denaturing conditions.

RNA sequencing of P. salmonis LF-89
Double-stranded cDNA libraries were constructed using the TruSeq RNA Sample Preparation Kit v2 (Illumina1, San Diego, CA, USA). Two biological replicates were sequenced using the MiSeq (Illumina1) platform, at the Center for Genomics and Bioinformatics, Faculty of Sciences, Universidad Mayor, Huerchuraba, Chile. The raw sequencing reads were analyzed using CLC Genomics Workbench software, version 10.0.1 (Qiagen). The reads were trimmed using the quality score limit of 0.08 and maximum limit of 2 ambiguous nucleotides. Trimmed reads were mapped to the genome and the protein-coding genes of P. salmonis LF-89 (ATCC VR-1361; genome AMFF02000000). The expression levels were normalized and evaluated by RPKM method, as described by Mortazavi et al [50]. The raw data was made available at the NCBI SRA database [51], under the Accession number PRJNA383157.

ncRNAs identification and expression confirmation using RNA sequencing (RNA-seq)
We used the PRJNA383157 RNA sequencing data to validate the ncRNAs predicted in the genome P. salmonis LF-89 using covariance models searches. Also, the public P. salmonis RNA-seq, PRJNA413076, PRJNA413086, PRJNA413085 and PRJNA413083 available at NCBI were utilized. The software sRNA-Detect, which was designed to identify ncRNAs from RNAseq data [52] was utilized. sRNA-Detect search for reads that have a minimum depth coverage, with a length range corresponding to a ncRNA (< 250 bp), and a low coverage variation rate through their sequence. The input files in sequence alignment map (SAM) format were generated using Bowtie2 [53]. Predicted ncRNAs within coding regions were detected using inter-sectBED [45] and manually discarded as described previously. Also, we cross-referenced the genomic coordinates of the ncRNAs predicted by covariance models, against those predicted based on P. salmonis transcriptional activity through intersectBED. This step allowed us to validate the set of ncRNAs classes predicted in P. salmonis LF-89 strain using StructRNAfinder tool. Finally, the Bowtie2 alignment files were converted from sam to bam format, sorted, and indexed using SamTools [54]. These files from each RNA-seq data were visualized and compared with the Integrative Genomics Viewer (IGV) version 2.3.92 [55].

RNA-RNA interaction
In order to identify potential target coding genes regulated by a set of selected ncRNAs predicted in P. salmonis, we used IntaRNA tool [56]. Similarly to the RNA-seq assays, we used the protein coding genes from the reference strain LF-89 (accession number: NZ_AMFF00000000) to identify the set of candidate genes potentially regulated by four selected ncRNAs (CsrC, PrrB_RsmZ, MicX and Sx4) present in the repertoire of P. salmonis. These ncRNAs were selected because they were predicted by the StructureRNAfinder and detected by the sRNA-Detect tool. Additionally these ncRNA have found in other bacterial species. We set a value of minimum energy cutoff of ΔG < -15 to be considered as potential interaction. RNA-RNA binding specificity parameters used have been previously validated in other Gram-negative bacteria such as E. coli and Salmonella [56][57][58].

General prediction of Piscirickettsia salmonis ncRNAs using covariance models
Sixteen RNA families were found in the eleven analyzed P. salmonis genomes. Based on covariance models, we predict 2239 ncRNAs (Fig 1). As expected, the most abundant ncRNAs families were tRNAs (40.38%) and rRNAs (21.42%). sRNAs corresponded to the 21.42%, suggesting that sRNAs play an important role in P. salmonis gene regulation. We found around 3% of miRNA-like, 1.5% of ribozymes, 1.4% of antisense RNAs and long ncRNAs, and 1% of riboswitches. The remaining ncRNAs were distributed among thirteen families, including snoRNAs, cis regulatory elements, catalytic intron RNAs, snRNAs, antitoxin RNAs, and thermoregulators.

P. salmonis ncRNAs repertory
After manual depuration of the predicted ncRNA, we identified 1813 ncRNs predictions in the analyzed P. salmonis genomes (S2 Table). The most abundant classes were tRNAs, rRNAs and sRNAs. Within the P. salmonis sRNA repertory, we identify several types of sRNAs with known function. For instance, the CsrC sRNA related to carbon storage regulation in E. coli [59] and Salmonella Typhimurium [60], and the PrrB_RsmZ, which modulates the expression of genes related to secondary metabolism, swarming and lipase synthesis in Pseudomonas [61]. Also, we identified several sRNAs with unknown function, like the IsrK of S. Typhimurium expressed during stationary phase, and under low oxygen and Mg +2 conditions [62], the T44 sRNA induced during the early intracellular infection stage in S. Typhimurium [63], and the MicX outer membrane protein repressor of Vibrio cholerae [64]. Additionally, we identified sRNAs related to Gram-positive bacteria physiology, like the Sau-5971 associated to small-colony variants, and the RsaA that serves as repressor in Staphylococcus aureus [32,65]. We also found the ubiquitous sRNA 6S RNA that regulates the expression of sigma70-dependent genes [66] and the RimP-leader, a highly conserved motif terminator related to the maturation of the 30S ribosomal subunit [67].
Another sRNAs present in P. salmonis genomes are the Sok that is part of the toxin-antitoxin type I hok/sok system [68], the TPP riboswitch, also known as THI element [69], and the YybP-YkoY a riboswitch that directly binds Mn 2+ [70].

P. salmonis clusterization based on ncRNAs
The presence and absence of ncRNAs classes in the P. salmonis genomes were used to generate a heatmap representation of a hierarchical cluster through g-plots R package. The clustering was applied to both sides, one side where similar ncRNAs classes in all P. salmonis strains are clustered together, and the other side where P. salmonis strains with similar ncRNA classes are clustered together. We found that similar ncRNA clusters correlates with P. salmonis genome clusters (Fig 2). The ncRNA and the P. salmonis genomes were divided into two clusters (Fig  2). Suggesting that some ncRNAs could be strain related (Fig 3).

ncRNAs core of P. salmonis
Using the ncRNA repertoire we search for the ncRNAs present in all eleven genomes. We found 29 classes of ncRNAs present in all genomes analyzed (Fig 4A), where the most abundant classes were tRNA, rRNA and sRNA with 901, 475 and 7 predictions respectively ( Table 3). The sRNAs classes are reduced, in comparison with the tRNAs and rRNA, because most of these sRNAs were present only once in each genome. The T44, PrrB_RsmZ and RpsB (Rfam-RF01815) were present in a single copy per genome. Sx4 was the only one sRNA with more than one prediction per genome.
Also, the ribozymes RNase P class A and B [73], the riboswitches TPP and YybP-YkoY, the transcription attenuator RimP-leader, and the 6S RNA are present in all P. salmonis genomes.

RNA-RNA interaction
Using the IntaRNA tool, a total of 10821 possible interactions for the selected 4 ncRNAs (CsrC, PrrB_RsmZ, MicX and Sx4), with the P. salmonis coding genes were predicted without cutoff. After the cutoff (ΔG -15) was applied a total of 55 possible interactions were predicted (S4 Table, Fig 5). Forty-three percent of the 55 possible targets genes, encode for hypothetical proteins. The C200_RS14095 pseudogene is a common target for CsrC, PrrB_RsmZ, MicX and Sx4 (S4 Table). Also, we found that the gene that encode for the hypothetical protein WP_033923871 is the common target of CsrC, PrrB_RsmZ and MicX ncRNAs. CsrC and PrrB_RsmZ have 6 targets in common (S4 Table). CsrC and PrrB_RsmZ targets the genes that encode for glycine dehydrogenase (WP_016209900) and phosphopentomutase (WP_016211224; also known in E. coli as deoB [74]). Likely, CsrC and PrrB_RsmZ are involved in the control of metabolic pathways, related to glycine hydrogen-cyanide [75]. Another target of CsrC is purM gene involved in the synthesis of purine nucleotides [76]. Also, we found that CsrC targets the murJ gene, which is involved in the biogenesis of cell wall [77]. The proton channel proteins MotA/TolQ/ExbB that energize TonB as well flagellar rotation also are targeted by CsrC [78]. We found that PrrB_RsmZ targets the central regulator of chemotaxis CheA and biofilm [79] and the long-chain-fatty-acid-CoA ligase also known as fadD in E. coli [80].
Additionally our analysis showed that ncRNA MicX targets the thiC gene, related to methionine synthesis [81], and the gene that encode for SecA protein that is an essential component of the Type II secretion system, which has also been found in P. salmonis [82,83]. Another predicted target of MicX was the gene that encodes for the outer membrane efflux protein TolC, which is an essential functional component of the Type I secretion system [84]. Among the targets predicted for the sRNA Sx4, we found the gene that encode for the arginine decarboxylase, related to acid stress [85], the purT gene that encode GAR transformylase T enzyme, involved in the purine biosynthetic pathway [86], and the encoding gene of ParB protein, responsible to avoid random segregation of the plasmids prior to cell division [87].

Discussion
Based on covariance models, we predicted 2239 ncRNAs in the eleven P. salmonis analyzed genomes. After manual depuration, 1813 ncRNAs were detected in non-coding regions and  Table). From this repertoire, 1383 ncRNAs (29 Rfam classes) were present in all P. salmonis genomes analyzed. These ncRNAs were considered as the P. salmonis "ncRNA core" (Fig 4A). Here we focus our discussion on the P. salmonis ncRNA core that correlates with our transcriptomic data analysis.
We found several ncRNAs that could be relevant to P. salmonis physiology, including Yyb-P-YkoY, related to Mn 2+ sensing response [68], and the sRNA IsrK, present in Salmonella enterica and E. coli, which regulates the expression of the transcriptional regulator AntQ that arrest bacterial growth [88]. Another sRNA present in the P. salmonis ncRNA core is MicX, which has been described as a regulator of genes that encoded for ABC transporters in Vibrio cholerae [62]. The RNA-RNA interaction analysis within the P. salmonis genome showed that MicX targets the gene that encodes for the ABC transporter substrate binding protein (WP_016210907), an orthologous of the Vibrio sp. ABC transporter (WP_099610902), suggesting a possible regulatory role of MicX in P. salmonis membrane transport. Additionally, we found that MicX targets the gene that encoded for the TolC protein, an essential component the Type I secretion system that plays a role in pathogenesis [89]. MicX also targets the coding gene for SecA, a Type II secretion component that is present in P. salmonis outer membrane vesicles [83,90]. Also, RNA-seq data analysis showed that MicX is transcribed in all P. salmonis transcriptomes analyzed (S2 Fig).
The RNA-RNA interaction analysis showed that the ncRNA Sx4 could regulate the expression of the enzyme arginine decarboxylase, which plays an essential role in the tissue colonization and acid resistance during pathogenesis in enterohemorrhagic E. coli and Shigella flexneri [91].
Additionally, CsrA has a high identity to RsmA, a post-transcriptional regulatory protein present in Pseudomonas aeruginosa, P. fluorescens CHA0, and Erwinia carotovora [96,97]. RsmA have global regulatory effects in P. aeruginosa, modulating pvdS (Iron-regulated sigma factor), vfr (transcriptional regulator) and pilM (type 4 fimbrial biogenesis protein) transcription levels [98,99]. RmsA is regulated by the two-component system GacS/GacA, also present in P. salmonis. It has shown that the GacS/GacA regulates RsmA/RsmB in E. carotovora, and CsrA/CsrB/CsrC in E. coli and S. enterica [59,96,100,101]. CsrC is part of the CsrB-CsrC sRNAs regulatory system of E. coli [59,102]. CsrB has similar functions to CsrC but it differs in the number of imperfect repeat sequences that serve as a binding site to CsrA [59]. Both CsrA and CsrB indirectly activate CsrA via the response regulator UvrY9 [59]. We did not found a CsrB orthologue in P. salmonis, however, we identified the PrrB_RsmZ sRNA, a P. aurigenosa orthologue that has similar structure and function than CsrB [59,61]. The CsrB/ CsrC system is also involved in pathogenesis, for instance, Salmonella enterica mutants of CsrC have a reduced cell invasion ability and expression of SPI1 (Salmonella pathogenicity island 1) related genes, and the double mutant of CsrB/CsrC is deficient for cell invasion [103]. These results suggest that the P. salmonis GacS/GacA-CsrA/CsrB/CsrC regulatory system ( Fig   Fig 6. Predicted P. salmonis GacS/GacA-CsrA/CsrB/CsrC regulatory system. https://doi.org/10.1371/journal.pone.0197206.g006 6) could have an important role in P. salmonis physiology and pathogenesis. However, despite the presence of this system and its possible target genes in P. salmonis genome, CsrC and PrrB_RsmZ did not show a strong interaction with the csrA P. salmonis, having a value under the defined ΔG< -15 cutoff for a strong interaction. Nevertheless, we found a strong interaction between CsrC and the proton channel MotA/TolQ/ExbB encoding gene. MotA/TolQ/ ExbB energizes TonB and flagellar rotation motor, both relevant for pathogenesis, especially TonB that is required for iron acquisition [104]. Furthermore, we found that CsrC is present in all transcriptomes analyzed and shows a high transcriptional activity suggesting an important role in P. salmonis (S3 Fig). It has been described that most of the P. salmonis isolates harbour 3-4 cryptic plasmids [105]. These results correlate with the strong predicted interaction between Sx4 sRNA and the ParB encoding gene. Also, Sx4 is the only P. salmonis core sRNA present in more than copy. Perhaps, Sx4 sRNA play a role during cell division, regulating the expression of ParB, responsible to avoid random segregation of the plasmids prior to cell division. This is the first description of the ncRNA present in P. salmonis genome. The different ncRNA families present in different P. salmonis isolates could be utilized to determine the geographic origin, the virulence of a specific isolate or as targets for novel antibacterial treatments. The abundant number of ncRNAs predicted in the genome of P. salmonis suggest that these genetic elements play an important role in physiology and pathogenesis. However, all those predicted ncRNA targets and regulatory circuits in P. salmonis need experimental validation. Unfortunately, the genetic tools for P. salmonis are not developed yet to generate the mutant to test the effects on physiology and pathogenicity. However, despite the lack of specific genetics tools for P. salmonis, it has been reported functional validation of predicted genes through heterologous expression [106]. These assays could be a good approach to test our predictions especially to test the function by conserved secondary structure in P. salmonis ncRNAs.
Supporting information S1