Computational search for UV radiation resistance strategies in Deinococcus swuensis isolated from Paramo ecosystems

Ultraviolet radiation (UVR) is widely known as deleterious for many organisms since it can cause damage to biomolecules either directly or indirectly via the formation of reactive oxygen species. The goal of this study was to analyze the capacity of high-mountain Espeletia hartwegiana plant phyllosphere microorganisms to survive UVR and to identify genes related to resistance strategies. A strain of Deinococcus swuensis showed a high survival rate of up to 60% after UVR treatment at 800J/m2 and was used for differential expression analysis using RNA-seq after exposing cells to 400J/m2 of UVR (with >95% survival rate). Differentially expressed genes were identified using the R-Bioconductor package NOISeq and compared with other reported resistance strategies reported for this genus. Genes identified as being overexpressed included transcriptional regulators and genes involved in protection against damage by UVR. Non-coding (nc)RNAs were also differentially expressed, some of which have not been previously implicated. This study characterized the immediate radiation response of D. swuensis and indicates the involvement of ncRNAs in the adaptation to extreme environmental conditions.


Introduction 1
Diverse natural and artificial environments exposed to extreme temperature, pressure 2 and/or radiation conditions are attractive sources of microorganisms with exceptional 3 phenotypic and genotypic properties. The high-mountain Paramo biome, similar to the 4 tundra biome of high latitudes, consists of high-elevation areas subject to harsh 5 environmental conditions. The Paramo biome has a high solar incidence that can induce 6 damage by ultraviolet radiation (UVR) that represents a survival challenge for 7 organisms [50]. Ionizing radiation and UVR affect organisms by damaging cellular 8 components such as nucleic acids, proteins, and lipids [30]. The deleterious effect on 9 cells is caused by direct damage to DNA, such as chromosomal lesions that introduce 10 both double-strand breaks (DSBs) and single-strand breaks (SSBs), and damage due to 11 pyrimidine dimerization and photoproducts that inhibit DNA replication and work [57], also indicate differential expression of a subset of small and noncoding RNAs 48 (sRNAs/ncRNAs), molecules that do not encode functional proteins but can play 49 important roles in regulation of transcription and translation [61]. 50 The differential expression of ncRNAs, upon UVR treatment suggests that these 51 molecules could be important in triggering protective mechanisms, even though their 52 precise role during the stress response to high doses of radiation still remains to be 53 determined. A new hypothesis suggests that sRNAs could contribute to cellular 54 post-exposure recovery because they would remain largely undamaged due to their 55 small size [57]. Experimental evidence places these sRNAs into different metabolic 56 pathways, such as response to changes in temperature, pH and other lethal 57 stressors [59]. Recently reported sRNAs identified to be involved in radiation resistance 58 are Y-RNAs, molecules that adopt specific secondary structures and bind to proteins 59 known as Ro that are conserved in several organisms [29]. In D. radiodurans Y-RNAs 60 were found to bind the Ro orthologue Rsr to form a ribonucleoprotein (Ro-RNP) 61 complex that functions as an effective machinery for bacterial RNA degradation [8]. D. 62 radiodurans was found to upregulate and accumulate Ro-RNPs in response to UVR and 63 cells lacking the Ro protein had decreased survival following UV exposure [9].
In this study we hypothesized that microorganisms capable of resisting UVR should 65 be present in locations exposed to high solar incidence, such as the Andean mountain 66 high-altitude Paramo biome. Previous results indicated that the phyllosphere 67 microbiota associated with Espeletia sp., a plant endemic to the Paramo, contained 68 diverse microbial communities and genes involved in resistance to UV and other stress 69 conditions [50], and could thus provide insight into microbial resistance strategies. The 70 main goal of this work was to isolate UV resistant microorganisms from this plant 71 phyllosphere and study their resistance mechanisms through gene expression analysis. 72 One bacterial strain identified as D. swuensis showed high resistance to UV exposure in 73 laboratory settings and differential regulation of genes and sRNAs that provide clues to 74 the early adaptation of D. swuensis to extreme environmental conditions, such as those 75 found in high Andean ecosystems. Microorganisms were isolated from Espeletia hartwegiana leaf surfaces by first 80 dislodging microbes, as previously reported [5], and then plating serial dilutions on R2A 81 Agar (BD Difco, Franklin Lakes, NJ) and Tryptone soy agar (TSA, Oxoid), 82 supplemented with 50 mg/ml Nystatin (Sigma-Aldrich, St. Louis, MO) to avoid fungal 83 growth, when necessary. Plates were incubated at 25 • C for 15 days and checked daily 84 for growth. Colonies with distinct morphologies were re-streaked in the same growth 85 media until pure colonies were obtained. Strains were characterized microscopically 86 using Gram staining and taxonomic identification was done by analysis of the 16S 87 rRNA gene or the ITS region for fungi. DNA was obtained by resuspending colonies in 88 1ml Tris 10mM (pH 8.0), adding 25µl proteinase K (10mg/ml) and incubating at 55 • C 89 for 25 min. DNA was purified from 500µl of this cell suspension using the MO BIO 97 GGAAGTAAAAGTCGTAACAAGG 3') and ITS4 (5' TCCTCCGCTTATTGATATGC 98 3') were used to amplify fungi as described above but using 0.3 µM primers and PCR 99 reactions of 2 min at 94 • C, followed by 35 cycles of 60 s at 94 • C, 60 s at 55 • C, 1 min a 100 72 • C, and a final extension of 5 min at 72 • C. Sequencing was performed in an Applied 101 Biosystems 3500 Genetic Analyzer. Forward and reverse reads were assembled and swuensis was measured at various points along the growth curve using three replicate 112 cultures that were first grown overnight and then diluted 1:100 in 100 ml fresh TSB 113 medium, and incubated at 30 • C, with continuous agitation at 150rpm. Samples were 114 taken at 15,24,40,48, and 72 hours and exposed to 800, 1600 and 2400J/m 2 to 115 determine survival (CFU/ml), as mentioned above.

116
RNA extraction and sequencing 117 Triplicate 48-hour D. swuensis cultures were grown in 3ml TSB, diluted 1:100 into 100 118 ml fresh medium and re-grown for 24h. Ten ml of each 24h culture were transferred to a 119 sterile Petri dish and submitted to 400J/m 2 irradiation. After exposure, bacterial cells 120 were immediately placed on ice, and centrifuged at 4600 x g for 15 minutes (4 • C).

121
Control aliquots from the same culture were not submitted to irradiation. After 122 centrifugation, pellets were re-suspended in 1ml TriZol (Promega), lysed with Matrix B 123 lysing beads (MP Biomedicals) in a FastPrep (MP Biomedicals) using 6.5 m/s for 40 124 seconds, and then centrifuged at 15,000 x g for 1 minute at 4 • C. RNA in supernatants 125 was recovered with the DirectZol RNA extraction kit (Zymo Research  [47]. Sequences 135 were mapped against the D. swuensis NCBI reference genome DY59 (accession number: 136 GCF 000800395.1) [32] with Subread (V1.5.0-p3) [33], parameters included an insert size 137 of 250 bp, a maximum of 3 mismatches and 5 indels. Features present in the reference 138 annotation were extracted from the gff file and relative abundances were calculated with 139 featureCounts, a tool included in the R package subread (v.1.5.0-p3) [34], and an 140 in-house script. Remaining (unmapped) sequences were randomly subsampled (10%) 141 and searched against the nt (nucleotide collection) database with Blastn and processed 142 with MEGAN (v.6.9.4) at a threshold of 1X10 −5 [24].  Bayesian statistics). Counts were normalized to reads per kilobase of feature length per 152 million mapped reads (RPKM) and by trimmed mean of M-values (TMM). Filtering of 153 features with low counts was applied in order to remove those features that had an 154 average expression of less than 5 CPM (counts per million) per condition and a 155 variation coefficient higher than 100 in all conditions, which introduces noise and can 156 lead to unreliable results for differential expression analysis [52]. Genes with a cut-off were considered as differentially expressed genes. Sequences coding for annotated 159 hypothetical proteins were queried against the NCBI nr database using BLASTp and 160 used for domain search with HMMER (V.3.1) [40] against the PFAM database [17]. ncRNAs. Filters for the regions selected were based on the number of hits (read counts) 165 and the region length (>50pb). Candidate regions were compared against the Rfam and 166 NCBI nucleotide-nr databases [25] [48] using covariance models implemented in Infernal 167 (V.1.1) [41]. All ncRNA candidates were processed for differential expression analysis 168 using the workflow described above.  [46]. Primer 183 efficiencies were determined using 1:10 serial dilutions of genomic D. swuensis DNA and 184 the same PCR program described above.

186
Strain isolation and radiation resistance 187 Microorganisms were isolated from the phyllosphere of Espeletia plants located in the 188 National Park Los Nevados in Colombia [50]. Isolates with distinct colony morphologies 189 were obtained by plating dilutions of the material dislodged from leaf surfaces on 190 various media. Taxonomic identification using both 16S rRNA gene and ITS sequence 191 analyses showed that this collection of isolates consisted of 10 fungi, 11 Gram-positive 192 and 29 Gram-negative bacteria. To determine if any of these strains were resistant to 193 UV radiation, as predicted for organisms living at these high-altitude ecosystems [50], 194 all isolates were subjected to irradiation with UVC. A screen using varying levels of 195 exposure, up to 800J/m 2 , showed that very few strains were capable of surviving these 196 conditions. The most resistant strain was a bacterium identified as D. swuensis (strain 197 CG1225), followed by the fungi Cryptococcus flavescens and Rhodotorula mucilaginosa 198 Fig 1A. Other isolates showed reduced levels of resistance. Given that D. swuensis 199 CG1225 showed the highest resistance to UVC exposure, with >60% survival at the 200 highest dose tested (800J/m 2 ), this strain was selected to further study its response to 201 irradiation using RNAseq analysis.

202
In order to determine the best conditions for RNA extraction, a survival curve was 203 first performed by harvesting D. swuensis cells at five different times along the growth 204   Table 1. After quality processing and adapter removal, 215 rRNA filtering was performed using the SILVA database, which on average removed 216 90% of the reads, with the exception of samples C2 and IR2 for which 95% and 43% of 217 the reads were retained, respectively, likely due to variation in the efficiency of 218 experimental rRNA depletion [7]. Given the high number of reads retained after 219 filtering for samples C2 and IR2 Table 1, a Cochran C test was performed to estimate 220 significant differences in variance for any sample with respect to the entire group 221 variance. The sample variance for C2 (0.9226) was significantly higher than the variance 222 for the other samples (p-value of 2.2e- 16)  To identify genes potentially involved in resistance to UV exposure, differential   Only CDS with probability value over 0.8 and log 2 fold-change ≥ 1 are shown. 2 Irradiated (I) and non-irradiated (NI) expression means. 3 Theta: differential expression statistic calculated as (M + D)/2, where M is the log 2 -ratio of the two conditions and D is the difference in expression between conditions (including a correction for the biological variability of the corresponding feature). 4 Prob: probability of mixed distribution (mixed because it is calculated from features changing between conditions and invariant features). 5 Log 2 fold-change.
Given that some of the genes previously reported for Deinococcus strains as being 278 involved in UVR resistance [63] [37], such as DNA repair mechanisms, pigment 279 production and efflux pumps (for M n +2 mainly), were not present among the most 280 differentially expressed genes, a search for orthologues of radiation-resistance genes 281 reported from D. radiodurans and D. gobiensis was performed. All twenty-seven 282 reported genes (e.g., citB, ddrI, phoR, phrB and mutT) were recovered with significant 283 e-values (<0.01) but with low identity values (between 30-50% at the DNA level) and a 284 log 2 fold-change value not significant between the conditions tested (maximum log 2 285 fold-change 0.6). In consequence, those genes were not used for further analyses.  qRT-PCR validation 294 In order to validate the differential expression found with RNASeq, RT-qPCR was 295 performed on three genes that had the highest log 2 fold-change expression ratios: an 296 RNA helicase (QR90 RS09640), a GntR family transcriptional regulator 297 (QR90 RS11755) and the gene for proline dehydrogenase (QR90 RS11750). All three 298 genes tested showed over 2-fold increase in expression (2.31, 2.14 and 2.05, respectively), 299 thus confirming the observed RNA-seq data.

Identification of ncRNAs 301
To identify additional differentially expressed features in the transcriptomes of 302 UVC-exposed D. swuensis cells, and according to recently-proposed roles of ncRNAs in 303 the rapid recovery after cellular stress, a de novo search for these regulatory entities was 304 implemented [57]. Analysis of 3,355 intergenic regions from the D. swuensis reference 305 genome retrieved 1,979 candidate ncRNA sequences. The criteria included a minimum 306 cut-off for intergenic regions of 50bp and a minimum sequencing depth threshold of 6X, 307 based on the mapping distribution, to eliminate regions with low coverage Fig S2. 308 These candidates were compared against covariance models (CMs) built from the Rfam 309 database. CMs are statistical models of structurally annotated RNA multiple sequence 310 alignments that allow a flexible search for both primary and secondary RNA structures 311 against a known dataset [4].

312
The CM search reported a total of 1,598 matches, but only 290 were below a search 313 threshold of 0.1 (parameter that describes the number of hits one can "expect" to see by 314 chance when searching a database). These significant matches were composed by 109  Table 4 . Although this 324 probability is not equivalent to a p-value, the higher it is, the more likely that the 325 difference in expression is due to the change in the experimental condition and not to 326 chance.

328
Natural ecosystems, and the organisms that inhabit them vary in their exposure to 329 UVR. UVR determines the distribution and survival of microorganisms and 330 consequently influences ecosystem dynamics and biogeochemical cycles [3]. From an 331 evolutionary point of view, sensitivity to radiation indicates that UVR is an effective 332 promoter of mutations, stimulating genomic variation, and could explain why high 333 resistance is not widespread [20].

334
In this study, various isolates obtained from the Espeletia plant phyllosphere showed 335 differences in resistance to UVR, indicating variability in their adaptability to UV 336 exposure, despite being isolated from the same habitat. Experimentally observed 337 differences in sensitivity to UVR have been reported for strains coming from different 338 environments (related with their natural origin) [20], but similar levels of UVR 339 resistance across different groups have been reported for a phyllosphere microbial 340 community [54]. Our results indicate that habitat of origin is not necessarily indicative 341 of resistance to extreme radiation, in this case UV-C, which seems to be affected by 342 multiple factors in natural environments such as variable time and energy doses that are 343 not necessarily reproduced in controlled laboratory experiments.

344
D. swuensis CG1225, with >60% survival at the highest dose tested (800J/m 2 ), was 345 the most resistant isolate recovered based on our radiation resistance experiments. The 346 UV resistance of Deinococcus spp., a group described as being "among the most 347 radiation-resistant microorganisms that have been discovered", has been widely 348 recognized since 1956 [21]. Some Deinococcus members have shown tolerance to UV 349 radiation (100 to 295 nm), and tolerance to ionizing radiation (5000 -30000 Gy) [60]. 350 As reference, 5Gy can destroy a human cell, 200-800 Gy will kill Escherichia coli and 351 more than 4000 Gy breaks the resistance of tardigrades [60]. Deinococcus strains can 352 resist ionizing radiation that is damaging to other organisms due to multiple 353 mechanisms that can work synergistically to guarantee genomic integrity before the next 354 cell division cycle [53]. Current reported strategies include efficient DNA repair (such as 355 RecA, Pprl, Ppr), high antioxidant activities (CAT, SOD, POD, M n +2 ), a unique cell 356 structure (tetrad configuration for compartmentalization of DNA) [60], and protection 357 of proteins [13]. Recent reports also indicate that ncRNAs may play a role in this 358 resistance phenotype [57]. These organisms also have high genomic plasticity (i.e. high 359 variability in genome content among closely related strains) and interspecies genomic 360 diversity, when compared to the Escherichia genus, which can contribute to unique 361 responses to environmental stress.

362
In this study, we used differential gene expression analysis to identify genes involved 363 in the cellular response of D. swuensis CG1225 to UVR, a strategy which has been used 364 to study other Deinococcus isolates [57] [44] [37] [63]. In contrast to previous studies in 365 which treated cells were recovered after varying lengths of time, even up to three hours 366 post treatment [63], here the D. swuensis CG1225 cells were harvested right after 367 exposure and thus provide insight regarding the immediate response to UVR exposure 368 and irradiation stress. This might explain why relatively few differentially expressed 369 genes were identified, 14 CDS with log 2 fold-change values >1 and a probability >0. 8. 370 Although functional domains with diverse biological functions were identified in these 371 hypothetical proteins, none of them seemed to be associated with any known UV 372 stress-responses Table 3. It is therefore unclear what the function of several of these 373 proteins might be and how they may contribute to the UVR stress response. The 374 predicted genes, however, were involved in global responses to stress, such as 375 transcription regulation and transporters involved in cellular detoxification.

376
The overexpressed genes support the hypothesis of an organism that turns on its 377 transcriptional machinery, in this case as an immediate response to prepare itself for the 378 recovery of homeostasis as response to an environmental stressor. The highest log 2 379 fold-change in expression was registered for a transcription factor belonging to the GntR 380 family, which regulates several biological processes in diverse bacterial groups, however, 381 details regarding its specific mechanism of action remain largely uncharacterized [15]. A 382 few studies with mutants have shown that loss of function of this regulator protein is 383 associated with a decrease in resistance to stress in D. radiodurans [15] and Bacillus 384 subtilis [36]. However, the target genes for this transcriptional regulator remain 385 elusive [1]. Other genes overexpressed under radiation exposure were an ABC 386 transporter-system-related protein (ATP-binding protein), a M n +2 transporter that has 387 been shown to be key for ROS elimination [35] [63], and a hydrolase of the alpha/beta 388 family. These last two genes can be potentially associated with cellular systems involved 389 in cleaning toxic compounds produced during DNA repair. The activity of hydrolases 390 modulating cellular redox processes has been described for many organisms [56] and in 391 D. radiodurans it prevents incorporation of damaged nucleotides into DNA [1] [38].

397
In this work we observed differences with respect to previous studies with D. 398 radiodurans [37] and D. gobiensis [63]. In particular, we did not detect genes previously 399 identified to be involved in resistance of Deinococcus strains, such as ddrA/ddrB genes 400 for repair proteins, ddrC/ddrE/ddrP genes for damage response proteins and the fliY 401 transporter. Neither these genes nor their orthologues were present among the 402 differentially expressed genes. This discrepancy might indicate that different strategies 403 may be involved regarding tolerance to radiation for D. swuensis CG1225 compared to 404 both the widely studied D. radiodurans and D. gobiensis [42]. It could also reflect 405 differences in experimental conditions, such as exposure to different levels of 406 radiation [22] [14], the culture growth conditions and the amount of time allowed for cell 407 recovery after UV exposure (from minutes to hours). In our work, the cells were 408 exposed to a comparatively low level of radiation (considering the maximum level of 409 resistance expressed by D. swuensis) and were harvested right after UVR treatment, 410 thus providing an immediate snapshot, a "first quick response", of the cellular response. 411 In other studies, D. radiodurans transcriptome analysis has been conducted using longer 412 recovery times, and even several hours, after exposure [63]. Finally, difference in results 413 could also be due to genome variability. The high variability in genomic organization 414 (genome size, number of chromosomes, plasmids, etc.) in Deinococcus sequenced isolates 415 has been proposed as a potential source of interesting adaptations [21]. A comparison 416 among D. geothermalis, D gobiensis and D. proteolyticus showed, for example, a core 417 genome of 1369 genes and ∼600-1700 accessory species-specific genes [21], which could 418 harbor potential functional differences even among related species.

419
When analyzing the data obtained from the RNA-seq experiments, several reads 420 could not be mapped to the reference genome used (D. swuensis DY59). The 421 percentage of unmapped reads across the samples (∼10%) falls within the expected for 422 RNA-seq experiments in which reads are mapped to a reference genome different from 423 the evaluated isolate [63] [57], and is also consistent with the reported variability among 424 Deinococcus genomes [21]. However, 30% of these unmapped reads showed identity 425 values over 85% against the reference genome through a blast alignment, which reasserts 426 the idea of intraspecific diversity for Deinococcus sp. Furthermore, an average of ∼25% 427 of the mapped reads failed to map within annotated features. Most of these fell near 428 (50-100pb) to the start/end positions of annotated features, suggesting that they 429 correspond to either transcribed but un-translated regions or miss-annotated features in 430 the genome, a reasonable explanation due to the draft version of the available reference 431 sequence.

432
Given the recent reports regarding the identification of differentially expressed 433 ncRNAs in Deinococcus strains, we looked for these elements in our RNA-seq data.

434
Even though the libraries were not experimentally enriched for short ncRNAs, an 435 exploration of reads mapping to intergenic regions allowed the recovery of some 436 well-represented families, which can be potentially involved in the irradiation response 437 and have not been previously reported for the reference D. swuensis isolate [26]. The 438 mechanism of action for ncRNAs in radiation response is a topic of current active 439 research, and some recent studies suggest that ncRNAs, due to their small size, might 440 remain largely undamaged by radiation and hence be the first responders, inducing and 441 regulating cellular function recovery [57]. The authors declare that the research was conducted in the absence of any commercial 481 or financial relationships that could be construed as a potential conflict of interest.