A Large and Phylogenetically Diverse Class of Type 1 Opsins Lacking a Canonical Retinal Binding Site

Opsins are photosensitive proteins catalyzing light-dependent processes across the tree of life. For both microbial (type 1) and metazoan (type 2) opsins, photosensing depends upon covalent interaction between a retinal chromophore and a conserved lysine residue. Despite recent discoveries of potential opsin homologs lacking this residue, phylogenetic dispersal and functional significance of these abnormal sequences have not yet been investigated. We report discovery of a large group of putatively non-retinal binding opsins, present in a number of fungal and microbial genomes and comprising nearly 30% of opsins in the Halobacteriacea, a model clade for opsin photobiology. We report phylogenetic analyses, structural modeling, genomic context analysis and biochemistry, to describe the evolutionary relationship of these recently described proteins with other opsins, show that they are expressed and do not bind retinal in a canonical manner. Given these data, we propose a hypothesis that these abnormal opsin homologs may represent a novel family of sensory opsins which may be involved in taxis response to one or more non-light stimuli. If true, this finding would challenge our current understanding of microbial opsins as a light-specific sensory family, and provides a potential analogy with the highly diverse signaling capabilities of the eukaryotic G-protein coupled receptors (GPCRs), of which metazoan type 2 opsins are a light-specific sub-clade.


Introduction
Opsin proteins catalyze light-dependent processes in all three domains of life, including vision and circadian cycling in animals [1], as well as chlorophyll-independent phototrophy, osmoregulation and phototaxis in bacteria, archaea, microbial eukaryotes, and multi-cellular fungi [2]. The opsin proteins have been classified into two main categories, the type 1 and type 2 opsins. Type 1 opsins are found in diverse species and in all three domains of life. They have been shown to function as light-driven ion transporters and phototaxis receptors. Type 2 opsins by contrast are found in metazoan species and serve primarily as light dependent photoreceptors in animal eyes and various other tissues of higher eukaryotes [3]. The evolutionary relationships between these two classes of opsins remains unresolved [4][5][6], however, in both groups, photosensing depends upon a covalent interaction between a conserved lysine residue in the seventh transmembrane helix (bovine visual rhodopsin K296 / bacteriorhodopsin K216) and a retinal chromophore [7,8]. Recent studies have reported discovery of genes encoding opsin homologs lacking this residue in fungal, haloarchaeal and placozoan genomes [3,9,10]. However, these have been treated as isolated instances and the phylogenetic dispersal and functional significance of these abnormal sequences have not yet been investigated.
Here we report discovery of a large group of putatively non-retinal binding opsins comprising nearly 30% of opsin homologs in the archaeal family Halobacteriacea, a historically important model clade for study of opsin photobiology. This family of extremely halophilic archaea possesses a diverse range of opsins, which have classically been divided into four groups: the ion pumps halorhodopsin (HR) and bacteriorhodopsin (BR), which respectively regulate cytoplasmic osmolarity and create electrochemical gradients used in ATP production; and two classes of sensory rhodopsins (SRI and SRII), which serve as histidine kinase response regulators for phototactic and photophobic behaviors [11]. Recent studies have expanded our view of haloarchaeal opsin diversity by revealing a third sensory rhodopsin (SR3), a second group of bacteriorhodopsins (BR2), and a proposed intermediate between bacteriorhodopsin and the sensory rhodopsins (MR) [11][12][13][14]. Studies of these diverse haloarchaeal opsins have led to major advances in our understanding of the kinetics and structural intermediates of opsin photocycles [15], spectral tuning [16], and signal transduction pathways [17]. Haloarchaeal opsins have also served as models for protein crystallization [17], membrane protein folding [18] and development of optogenetic toolkits [19].
Based on genomic context, phylogenetic analyses, structural modeling and biochemistry, we propose that these abnormal opsin homologs, which are also present in some fungal, cyanobacterial, and chlorophytal genomes, may represent a novel family of sensory opsins potentially involved in taxis response to one or more non-light stimuli. If true these finding challenge current understanding of microbial type 1 opsins as a light-specific sensory family, and provides a potential analogy with the highly diverse signaling capabilities of the eukaryotic G-protein coupled receptors (GPCRs), of which type 2 opsins are a light-specific sub-clade. These results call for more work on this novel protein family and a renewed perspective on the roles of type 1 opsins in microbial physiological responses to diverse environmental inputs.

Results and Discussion
A large-scale survey of 80 complete and high-quality draft haloarchaeal genomes [20] revealed a novel, large opsin class, consisting of 48 homologs lacking the normally conserved lysine residue (K216) required for binding retinal out of 170 total haloarchaeal opsins (Fig 1, sequences available at the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.963hr)). Given their close evolutionary relationship with opsin proteins we term these opsin-related proteins (ORPs) thus comprises nearly 30% of all known haloarchaeal opsins. We deliberately use the term opsin to describe the ORPs, even though the term is traditionally used to describe the apoprotein of the retinal-bound rhodopsin, due to their proposed constitutively retinal-free nature and phylogenetically close relationship to sensory opsins. The ORPs are broadly distributed  [14]. Phylogenetic distribution of crtY and brp are identical except for one species (Natrinema versiforme) which has brp but no across 11 genera in all three major haloarchaeal clades [20], in species with and without canonical opsin homologs (S1 Table). All eight species whose genomes encode ORPs and lack canonical opsin homologs were also found to lack crtY and brp, genes encoding enzymes which catalyze the terminal steps in retinal biosynthesis [21]. Together with additional biochemical and structural modeling, these data suggest the hypothesis that ORP genes encode type 1 opsins that have a non-retinal dependent function, providing a potential functional analogy with eukaryotic GPCRs.
Phylogenetic analysis revealed that the ORPs form a monophyletic clade most closely related to sensory opsins (Fig 2A) and themselves sub-divide into two distinct groups, one consisting of sequences primarily from Halorubrum species (group A) and the other of sequences from Natrialba species and other Clade 1 haloarchaea (group B) ( Fig 2B). In group A ORPs, the Schiff base lysine was replaced with arginine, while in group B ORPs this position contained a leucine or other hydrophobic residue. All but one of the 16 group A ORPs were located adjacent to a predicted methyl-accepting chemotaxis (HAMP/MCP) signal transducer, providing evidence for the hypothesis that the ORPs represent a novel form of sensory opsins, which are often co-operonic with their cognate signal transducers [22] (S1A Fig). Many (17/32) group B ORPs were similarly linked to HAMP/MCP family signal transducers, with nine also located in the proximity of chemotaxis and flagellar biosynthesis operons (S1B Fig). This close genomic association suggests a functional role for at least a sub-set of ORPs in modulating the flagellar apparatus in response to an as yet un-identified signal(s). The remaining 15 group B ORPs, not located adjacent to signal transducer genes, belonged to 10 species, three of which have been described as non-motile [23,24]. We therefore propose that a number of group B ORPs may have signal-response functions unrelated to motility, as is the case for a large number of GPCR homologs [25] and has been suggested for Anabaena sensory rhodopsins (ASRs) [26]. Several of the ORPs not linked to signal transducers were located near genes implicated in various stress responses, including heat shock proteins, metal chaperones, and carbon starvation proteins-proposing future lines of research for deciphering the functions of these group B ORPs.
Both haloarchaeal bacteriorhodopsin and bovine visual rhodopsin have been shown to function at reduced efficiencies in the absence of K216/K296, when aminated retinylidene compounds are provided in lieu of retinal [27][28][29]. To investigate the possibility that the ORPs may function in a canonical light-sensitive manner by retaining interaction with retinal, or a retinal-like chromophore, but not Schiff base formation at K216, we conducted residue conservation analysis and structural modeling. Of three residues experimentally characterized as providing a hydrophobic cavity for the retinal ring in Natronomonas pharaonis SRII (V108, F127, W178) [30], only one is conserved as a hydrophobic residue of similar size in group A ORPs (M108) and none in group B ORPs ( Fig 3A). W178, which is universally conserved as an aromatic residue in canonical haloarchaeal opsins, has been converted to the much smaller alanine and glycine residues in group A ORPs and group B ORPs, respectively. Similarly, the highly conserved aromatic residue at position 127 has been converted to a polar amino acid (T/S) in most ORPs. In addition, group A ORPs are missing two (W76 ! D, Y174 ! L) and group B ORPs one (Y174 ! L) of the aromatic residues involved in steric constraint of the retinal polyene chain [30] (Fig 3A). These results strongly suggest that, in addition to having lost the Schiff base lysine for covalent binding of retinal, ORPs lack the canonical binding pocket to accommodate retinal-like chromophores. Conversely, residues involved in sensory signaling are conserved in one or both ORP clades. Y51 and R72, which together form a water-mediated hydrogen-bond complex important in propagation of signal to the linked signal transducer [31], are highly conserved in group B, but not group A ORPs. Y199, which forms a hydrogen bond with the signal transducer in Natronomonas pharaonis SRII [32], is universally conserved across the ORP clade. Similarly, D189, which in Nmn. pharaonis SRII also hydrogen bonds with the cognate signal transducer [32], is highly conserved in both group A and group B ORPs, however, this residue is poorly conserved in other canonical SRII homologs. The high level of conservation of residues involved in signal transduction, combined with lack of Maximum-likelihood phylogeny of all microbial (type 1) opsins obtained by BLASTp search of NCBI's nr and env_nr databases. Tree inferred using FastTree [71] and ComparetoBootstrap.pl [74] using 500 bootstrap replicates generated with SeqBoot [73]. Branches are colored by phylogenetic affiliation and bootstrap support values above 0.30 are shown for major clades. Colors: grey = unannotated/unassigned, brown = bacterial, salmon = dinoflagellates, dark green = viridiplantae, purple = haloarchaeal bacteriorhodopsin (BR), light green = haloarchaeal sensory rhodopsin (SR), orange = haloarchaeal halorhodopsin (HR), red = haloarchaeal opsin-related protein (ORP), light blue = fungal ORP, dark blue = other fungal opsin. Abbreviations as in conservation of the retinal binding pocket, strongly suggest that ORPs are a non-retinal utilizing family of sensory opsins.
Structural models of three ORP homologs, using Natronomonas pharaonis SRII (PDB ID: 3QAP) as a template, were consistent with residue analysis in showing lack of a canonical retinal binding pocket. Residues homologous to those lining the canonical binding pocket were shown to overlap the space normally occupied by retinal ( Fig 3B). Additionally, a novel extracellularly accessible binding pocket was identified which was missing or highly diminished in canonical haloarchaeal opsins (Fig 3C, S2 Fig). To identify candidate ligands for these novel binding pockets, we used the program iDock [33] to screen 229,358 natural product ligands against the three structurally modeled ORP homologs. An overlap analysis of the results showed that, despite their structural similarity, the three opsins likely have affinities for very different compounds. Nevertheless, most were sourced from the same ligand classes with similar scaffolds including oxygen and nitrogen heterocycles and sesquiterpene (See SI Discussion). Many examples of extracellularly accessible ligand binding pockets exist for Class A (rhodopsin-like) GPCRs [34][35][36][37], further suggesting a non-light related signal response function for the ORPs. Although both structural modeling and residue analysis suggest that the ORPs lack retinal binding capabilities, the possibility that they may utilize an alternate chromophore remains to be tested.
As type 1 opsin homologs lacking the Schiff base lysine have previously been reported for fungi [3], we were interested in determining whether haloarchaeal and fungal ORPs are monophyletic or represent independent losses of retinal-binding ability. We collected type 1 opsin homologs from the NBCI's nr and env_nr databases and performed maximum-likelihood phylogenetic analysis. A total of 1,077 opsin sequences were included. Interrogation of alignments for sequences lacking the Schiff base lysine revealed 45 non-haloarchaeal ORPs, none of which branched within the haloarchaeal ORP clade ( Fig 2C). Thus, the haloarchaeal ORPs are evolutionarily distinct from other potentially non-light sensitive microbial opsins. The 45 nonhaloarchaeal ORPs included two groups of fungal homologs, one containing 31 sequences from 10 Ascomycota genera and the other eight sequences from seven genera spanning the Ascomycota and Basidomycota. The remaining six sequences were singletons scattered across the phylogeny (S3 Fig). Thus, although loss of the Schiff base lysine, and therefore probable loss of retinal-binding ability, has occurred at least nine times in the evolutionary history of type 1 microbial opsins, the expansion and diversification of both haloarchaeal and fungal ORPs make these exciting targets for learning novel signal transduction strategies used by type 1 opsins.
We also performed a targeted screen for haloarchaeal-type ORPs in the NCBI nr and env_nr databases, as well as the CAMERA metagenomic databases. We detected only one new non-haloarchaeal ORP belonging to the basidiomycete Trametes versicolor (EIW60452), which also possesses a fungal-type ORP (EIW51701). The haloarchaeal-type Trametes versicolor ORP branched with haloarchaeal ion pumps (BR/HR) rather than the haloarchaeal ORPs and SRs (S4 Fig). We therefore propose that this sequence represents horizontal gene transfer of either BR or HR from the Haloarchaea, followed by loss-of-function, rather than fungal acquisition of a haloarchaeal ORP homolog.
As all previous mentions of K216/K296-lacking opsins in the literature have been based on genomic data [3,9,10], and these sequences have not yet been shown to be expressed, here we verified transcription of several ORP homologs in four haloarchaeal species (Halorubrum litoreum JCM 13561, Halorubrum distributum JCM 9910, Natrialba magadii DSM 3394 and Natronobacterium gregoryi SP2). Eight of the nine investigated ORP homologs were shown to be transcribed under standard laboratory conditions (Fig 4A). For one species (Hrr. distributum), transcription was also interrogated under conditions of maximal salt tolerance (5.0 M NaCl), reduced dissolved oxygen (shaking at 50 rpm vs 350 rpm), and high cell density (stationary phase). ORP transcript was detected under all conditions. Despite robust transcriptional response, we were unable to detect ORP protein in native hosts by LC-MS/MS, suggesting ORP expression may be very low or under post-transcriptional control (see SI Methods). Additional work will be required to determine biologically relevant expression conditions for ORP proteins. To experimentally verify the proposed inability of ORP proteins to bind retinal, we heterologously expressed His-tagged ORPs from two species (Hrr. distributum and Nab. magadii) (Fig 4B), and incubated purified ORPs with free all-trans retinal. Neither ORP showed absorption in the 480-580 nm range characteristic of canonical opsins [11]. This evidence further suggests that ORP proteins do not bind the retinal chromophore ( Fig 4C).

Conclusions
In summary, evidence from genomic context, phylogenetic analysis, structural modeling, and biochemistry provide strong support for the existence of a large family of non-retinal binding opsins derived from a haloarchaeal-specific duplication and divergence of canonical sensory opsins, many of which are possibly involved in chemotaxis and/or transcriptional response to environmental stresses. This new family comprises nearly 30% of all known haloarchaeal opsins, providing a rich set of models through which to explore evolutionary diversification of signaling protein inputs as well as an enriched understanding of the roles played by microbial opsins in integrating diverse environmental inputs into a coordinated physiological response.

Methods and Materials
Haloarchaeal opsin sequence acquisition, alignment, and phylogenetic inference During automated annotation of 80 haloarchaeal genomes with the Rapid Annotation Using Subsystem Technology (RAST) server [38], five sequences were annotated as opsin homologs which did not show significant sequence similarity to canonical haloarchaeal opsins by BLASTp search (e-value cutoff of 10 −5 ) and which were missing the Schiff base lysine required for binding of the retinal chromophore in all experimentally characterized opsins (K216 in the haloarchaeal model opsin Halobacterium sp. NRC-1 bacteriorhodopsin). These sequences were also annotated as opsins in the NCBI genomic database. A BLASTp search (e-value cutoff of 10 −5 ) against the 80 haloarchaeal genomes using these five opsin sequences as queries, recovered 43 additional K216-lacking, putative opsin homologs for a total of 48 homologs in 28 genomes. These were combined with 122 canonical haloarchaeal opsins recovered by BLASTp search (e-value cutoff of 10 −5 ) of the 80 haloarchaeal genomes using all Haloarcula marismortui ATCC 43049 opsins as query sequences. This set of queries was chosen because Har. marismortui possesses homologs for each of the six previously categorized classes of haloarchaeal opsins [11,12]. A total of 170 haloarchaeal opsin homologs were included in subsequent analyses (sequences available at the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad. 963hr)).
Multiple sequence alignments were created separately for canonical and non-canonical haloarchaeal opsins using MUSCLE 3.8 [39,40], and checked for accuracy against a previously published haloarchaeal opsin alignment [30]. Miscalled start sites for 56 sequences were manually corrected based upon alignment with sequences of experimentally characterized opsins. Individual alignments were combined using the profile-profile alignment option in MUSCLE 3.8 and manually trimmed in the alignment editor Jalview [41]. A total of 204 positions were used to infer phylogeny using the Bayesian tree-building software MrBayes [42,43]

Distribution of opsin classes and retinal biosynthesis genes in the Haloarchaea
The presence/absence pattern of haloarchaeal opsin subclasses and the retinal biosynthesis genes crtY, crtE, crtB, crtI, and brp were superimposed on a previously published multi-marker phylogeny of the haloarchaea [20] using iTOL [45,46]. Subclass membership for each opsin homolog was determined based on clade affiliation in haloarchaeal opsins tree. Sequences for crtY, crtE, crtB, crtI, and brp homologs were retrieved via BLASTp searches against the local haloarchaeal database, using query sequences from six haloarchaeal species (sequences available at the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.963hr)) and an e-value cutoff of 10 −20 . As all species with crtY and brp also had a full complement of crtE, crtB, and crtI, presence or absence of crtY and brp was used to represent retinal biosynthesis ability.

Genome context of ORPs
Genomic context of haloarchaeal ORPs was investigated using JContextExplorer [47]. Haloarchaeal genomes can be loaded using the "Retrieve Popular Genome Set" function. Annotations associated with this genome set were done using the RAST annotation service [38] (see [20]). To view ORP contexts, search by Cluster Number for "4077;15722" (group A) or "2453;13537" (group B). Cluster numbers represent homology families as defined in [20].

Structural modeling of ORP proteins
We used x-ray structure of Natronomonas pharaonis SRII (PDB ID: 3QAP) [48] as a template for generating structural models for three ORP proteins (Nab. magadii WP_004267173, Hrr. distributum ELZ45759, and Nbt. gregoryi WP_005575895) using the Rosetta-Membrane method [49][50][51]. We used Nmn. pharaonis SRII because it was identified by the HHpred server [52,53] as the closest structural homolog to the ORPs (21-25% sequence identity). Due to a two residue deletion in the loop between TM6 and TM7 in the Hrr. distributum ORP compared with Nmn. pharaonis SRII, we predicted the structure of this region de novo using Rosetta cyclic coordinate descent (CCD) and kinematic (KIC) closure loop modeling, developed to model loop structures with sub-angstrom accuracy [54,55]. Several rounds of CCD and KIC loop modeling were perforrmed with at least 10,000 models generated during each round. Models were ranked based on total Rosetta energy after each loop modeling round [49,51,56]. Ten percent of the lowest energy models were clustered [57] using a root mean square deviation (RMSD) threshold that placed 1-2% of all models in at least one of the largest clusters. Models representing centers of the top 20 clusters (early rounds) and/or the best 10 models by total energy (later rounds) were used as input for the next round of loop modeling. Rosetta's full atom relaxation protocol [51,56] was used to explore potential differences in backbone and side chain conformations of ORPs compared with the SRII template. Selection of the best ORP models was guided by clustering of the lowest energy models to generate the most frequently sampled conformations.

Intramolecular pathway and ligand-binding pocket predictions
Comparative structural models described above were used for prediction of ligand binding sites, internal cavities and tunnels using CAVER 3.0 [58]. Analysis was performed individually for each of the top five models generated by Rosetta for each ORP, which had a mean pair-wise RMSD across all models ranging from 0.610 to 0.694 Å. CAVER settings used were minimum probe radius of 0.9 Å, shell depth of 4 Å, shell radius of 3 Å, clustering threshold of 3.5 Å, 12 approximating balls, and maximum distance and desired radius for starting point optimization of 3 Å and 5 Å, respectively. Starting point coordinates were optimized to enable use of homologous starting positions for the majority of models. The starting point for all but two of the 15 ORP models was G193 (Nab. magadii) / A185 (Hrr. distributum) / G196 (Nbt. gregoryi). For two Hrr. distributum models, this starting position resulted in no predicted cavities, however, cavities similar to those predicted in other models were discovered with a starting position of Q175-I176. This starting position also resulted in no predicted cavities for SRII, as expected. For comparative visualization purposes, the starting position of P183 was used for SRII, resulting in prediction of a small, likely artifactual ligand binding pocket. For comparison, cavities were also predicted for bacteriorhodopsin (PMID: 4MD2) [59] and halorhodopsin (PMID: 1E12) [60] homologs. Starting points for these predictions were P4 and N3, respectively. Binding pockets and tunnels were visualized with MacPyMOL [61]. For comparison of cavity predictions between ORPs and canonical opsins, see Fig 3C.  Because the scoring functions of VINA and idock are non-deterministic we repeated the scoring for the best candidates in order to screen for outliers. For the creation of a small decoy library [70], known ligands from the canonical opsins were submitted to http://dude.docking.org/generate. The decoy library with 750 compounds was then used to generate energy cutoff values for the idock-based screening.
The hardware for virtual screening included a Dual Xeon High Performance Workstation with two Intel E5-2687W processors (16 cores, 32 threads), 8 Tb RAID10 disks, 2 Tb SSDs and 196 Gb RAM under Windows 7 64-bit. All modeling was performed on an 80 Gb SoftPerfect RamDisk allowing sequential read/write speeds of up to 4 Gb/second. For additional discussion of ligand screening, see S1 File.

Type 1 opsin sequence acquisition, alignment, and phylogenetic inference
Sequences for bacterial, archaeal, and microbial eukaryotic type 1 opsins were obtained by BLASTp search of the NCBI nr and env_nr databases (as of October 30 th , 2012) with an evalue cutoff of 10 −5 and maximum target sequences set to 100,000 using all Har. marismortui opsins as query sequences as described above. A total of 1,430 sequences were recovered. After removing mutants, synthetic constructs, highly fragmentary sequences and sequences lacking a predicted Bac_rhodopsin domain (Pfam clan CL0192), 907 opsin homologs remained. Sequences acquired from each database were independently aligned using MUSCLE 3.8.31 [39,40], manually trimmed to remove poorly aligned regions and to account for the fragmentary nature of metagenomic sequencing, and alignments combined using the profileprofile alignment option. After adding the 170 haloarchaeal sequences from our dataset, a total of 147 positions for 1,077 opsin sequences were used in tree inference. For these sequences, an initial guide tree was constructed using FastTree [71,72], then 1000 re-sampled alignments were generated using the Phylip package SEQBOOT [73] and used to determine bootstrap support values for clades in the initial tree with CompareToBootstrap.pl [74]. The resulting tree was visualized with FigTree [44]. The tree file is available from the Dryad Digital Repository

Phylogenetic distribution of haloarchaeal-type ORPs
All protein and peptide databases in CAMERA and the NCBI nr and env_nr databases (as of December 18 th , 2012) were interrogated for non-canonical opsin homologs using BLASTp searches with all 48 haloarchaeal ORPs as queries. BLASTp parameters and database details are listed in S2 Table. Redundant sequences were removed using Jalview [41] and unique sequences searched against the Pfam database [75] for presence of a Bac_rhodopsin domain (CL0192). Sequences with a Bac_rhodopsin domain were aligned using MUSCLE 3.8 [39,40] and presence or absence of K216 was recorded. To assess the evolutionary origin of the only non-haloarchaeal ORP recovered from this search (Trametes versicolor EIW60452), a phylogeny was inferred for all 170 haloarchaeal opsins and the T. versicolor ORP. Sequences were aligned using Muscle 3.8.31 [39,40] and alignment manually trimmed in Jalview [41]. After trimming, 220 positions were used in phylogenetic inference using FastTree [71,72] and the resulting tree was visualized in FigTree [44]. For tree see S4 Fig.

Confirmation of native transcription of ORPs
Liquid cultures of Hrr. litoreum JCM 13561, Hrr. distributum JCM 9910, Nab. magadii DSM 3394 and Nbt. gregoryi SP2 were grown to mid-log phase in JCM 168 (Hrr. litoreum and distributum) or DSM 371 (Nab. magadii and Nbt. gregoryi) media. Cell pellets were collected from 2 mL of mid-log phase cultures and RNA harvested using 1.0 mL TRIzol 1 Reagent with standard extraction protocol followed by a DNase digestion step and a second TRIzol 1 extraction with 0.5 mL Trizol 1 Reagent. DNase digestion was with NEB DNase I (M0303S) using standard protocol. Following second Trizol 1 extraction, RNA was tested for gDNA contamination using PCR with haloarchaeal 16S primers (S3 Table) prior to reverse-transcription. Reverse transcription was carried out with SuperScript 1 III Reverse Transcriptase from Life Technolo-gies™ using standard protocols, and random hexamers (Qiagen, 79236). Presence of transcript was confirmed via PCR with primers listed in S3 Table. Cross-reactivity of primers for species with multiple ORP homologs was tested and primer sets found to be gene-specific. For Hrr. distributum, additional cultures were grown under the following conditions, and transcript presence verified using methods described above: a) 350 rpm, 3.42 M NaCl, b) 350 rpm, 5.0 M NaCl, c) 50 rpm, 3.42 M NaCl. Conditions (a) and (b) were incubated using a G-53 gyratory tier shaker (New Brunswick Scientific, M1074), condition (c) in an Innova 44R incubator (New Brunswick Scientific, M1282). For each condition, samples were collected at mid-log and stationary phase.

Protein-level expression of ORPs in heterologous hosts
To express ORP proteins in E. coli, all clones were grown in Luria Broth (LB) under 50 μg/mL kanamycin resistance (kan) in an Innova 44R Shaking Incubator (New Brunswick) with shaking at 175 rpm. The pET29b+/ORP, pET29b+/BR2 and pET29b+/empty BLR(DE3) clones were revived from freezer stock overnight in 4 mL of LB + kan at 37°C. Revived cells were recultured in 25 mL of LB + kan with a starting OD 600 of 0.1 and grown at 37°C to mid-log (OD 600 = 0.4). The mid-log subcultures were then used to inoculate 1 L LB + kan with a starting OD 600 of 0.01 and placed in the 37°C shaking incubator. Once the cultures returned to mid-log (OD 600 = 0.4), all cultures were induced with 0.5 mM IPTG (Fisher Scientific, BP1755), supplemented with 10 μM all-trans retinal (Sigma-Aldrich, R2500), and incubated at 18°C for 18 hours with shaking. Cells were harvested by centrifuging at 8000 rpm for 10 min and stored at -80°C until use.
To purify, cell pellets were thawed and re-suspended in 15 mL of solubilization buffer (100 mM Na/K phosphate pH 7.4, 2% Triton-X100, 10 μM all-trans retinal) with one cOmplete 1 Mini EDTA free protease inhibitor cocktail tablet (Roche, 04693159001). Samples were sonicated three times with a Model 120 Sonic Dismembrator (Fisher Scientific, FB120) fitted with a Model CL-18 probe at 65% power alternatively for 2 seconds ON and 2 seconds OFF; for a total of 2 minutes ON. Sonicated samples were incubated on a rotisserie for 3 hours at 4°C. Cell debris was spun down by centrifugation at 8000 rpm for 10 minutes, and the supernatant was mixed with an equal volume of equilibration/wash buffer (50 mM Na + phosphate, 300 mM NaCl, 10 mM imidazole; pH 7.4, 0.5% Triton-X100, 10 μM all-trans retinal). 250 μL of HisPur™ Cobalt Resin (Thermo Scientific, 89964) was washed with 500 μL of equilibration/wash buffer and mixed with equilibrated lysate for 45 minutes in a rotisserie at 4°C. Conjugated resin was washed three times with 500 μL equilibration/wash buffer and eluted twice with 250 μL of elution buffer (50 mM Na + phosphate, 300 mM NaCl, 150 mM imidazole; pH 7.4, 0.5% Triton-X100, 10 μM all-trans retinal). 250 μL of the first eluent was concentrated to~100 μL using a Microcon 30 kDa MWCO centrifugal filter column (Millipore, 42410) and aliquoted into a Greiner Half Area UV-Star 1 microplate (Greiner Bio-One, 675801). Absorbance was measured from 250-800 nm in 2 nm intervals with an Infinite M200 plate reader (Tecan, 30016056). Background (elution buffer) was subtracted from all spectra and spectra were normalized to the 280 nm absorbance value of the Har. marismortui BR2 positive control. For spectra see Fig 4C. To verify expression of ORP protein, we performed a Western blot on HisPur™ purified lysate from cloned ORPs in E. coli BLR(DE3) background using the following protocol: SDS-PAGE was run as described in SI Methods and the gel was soaked in 15 mL of transfer buffer (25 mM Tris base, 192 mM glycine, 10% methanol, pH 8.4) for 15 min. A 0.2 μm pore diameter nitrocellulose membrane (Biorad, 162-0112), foam pads, and 3 mm Whatman filters were soaked in transfer buffer before assembling the sandwich for transfer. Proteins were transferred from the PAGE gel to the nitrocellulose membrane at 30V for 1 hr on ice. His-tagged proteins were probed using the SuperSignal 1 West HisProbe™ Kit (Thermo Scientific™, 15168) and signal was detected using Amersham ECL Prime Western Blotting Detection Reagent (GE Healthcare Life Sciences, RPN2232). For Western blot, see Fig 4B.

Protein-level expression of ORPs in native hosts
Several methods were used to investigate protein-level expression of ORP homologs in both native and heterologous hosts. First, we attempted identification of ORPs by LC-MS/MS in the native host Nbt. gregoryi SP2, which has four unique ORP genes. Cell pellets collected from mid-log phase cultures were lysed with 200 μL lysis buffer containing 100 mg SDS, 10 mL diH 2 O, 60 μL DNase I (GoldBio D-300-1), 0.25 mg RNase A (Roche, 10109169001), and 1 cOmplete 1 Mini EDTA free protease inhibitor cocktail tablet (Roche, 04693159001) per 15 mL buffer. Lysate was sonicated for 20 minutes (30 s on/off) on a Biorupter 1 UCD-200 (Diagenode). Cellular debris was removed by centrifugation, and supernatant denatured for 10 min at 100°C with 6x SDS buffer. Approximately 40 μg protein was loaded into pre-poured 4-20% SDS-PAGE gel (Bionexus Inc., 2BNPC420) and run approximately 1 cm into gel. Total protein band was excised and subjected to in-gel trypsin digest according to the following protocol: gel was cut into 1 mm 3 pieces, washed with 50 mM Ammonium Bicarbonate (AmBic), shrunk with acetonitrile (ACN), reduced with 10 mM DTT/50 mM AmBic, shrunk again with ACN, incubated in 55 mM iodoacetamide/50mM AmBic 20 min in the dark, washed with 50 mM AmBic, shrunk with ACN and partially dried in a vacuum concentrator (Labconco). Overnight digestion was carried out at 37°C with 250 ng of trypsin (Promega, V5117) in 50 mM AmBic (pH 8). The supernatant was sonicated in 60% ACN and 0.1% trifluoroacetic acid for 10 min, then dried in the vacuum concentrator. Digested peptides were analyzed by LC-MS/MS on a Thermo Q-Exactive mass spectrometer with Michrom Paradigm LC and CTC Pal autosampler. Peptides were directly loaded onto an Agilent ZORBAX 300SB C 18 reversed phase trap cartridge, which, after loading, was switched in-line with a Michrom Magic C 18 AQ 200 um x 150 mm C 18 column connected to a Thermo-Finnigan LTQ iontrap mass spectrometer through a Michrom Advance Plug and Play nano-spray source. The nano-LC column (Michrom 3μ 200Å MAGIC C18AQ 200μ x 150 mm) was used with a 90 min-long gradient (1-10% buffer B in 5 min, 10-35% buffer B in 65 min, 35-70% buffer B in 5 min, 70% buffer in 1 min, 1% buffer B in 14 min) at a flow rate of 2 uL min -1 for the maximum separation of tryptic peptides. A top 15 method was used with Xcalibur software to collect Q-Exactive data, with a scan range of 300-1600 m/z. Results were searched against a Nbt. gregoryi database with cRAP proteins and reversed sequences (5402 proteins total) in X!Tandem [76] with a fragment ion mass tolerance of 20 PPM and a parent ion tolerance of 20 PPM. Carbamidomethyl of cysteine was specified in X!Tandem as a fixed modification. Glu->pyro-Glu of the N-terminus, ammonia-loss of the N-terminus, gln->pyro-Glu of the N-terminus, deamidation of asparagine and glutamine, oxidation of methionine and tryptophan, dioxidation of methionine and tryptophan and acetylation of the N-terminus were specified in X!Tandem as variable modifications. Scaffold v.4.0.7 [77] was used to validate peptide and protein identifications. Peptide identifications were accepted if they could be identified with confidence of greater than or equal to 95% and protein identifications were accepted if they could be identified with confidence of greater than or equal to 95% and contained at least two identified peptides. Protein probabilities were assigned by the Protein Prophet algorithm [78]. Proteins that contained similar peptides and could not be differentiated were grouped. Proteins sharing significant peptide evidence were grouped into clusters. No ORP proteins were identified from Nbt. gregoryi SP2 lysate.
After failing to detect ORP protein in a native host, we purified His-tagged cloned proteins from a Hbt. sp. NRC-1 SD23 bop(-) background using HisPur 1 Cobalt resin (Thermo Scien-tific™, 89965). Exudate was loaded onto a 4-20% SDS-PAGE gel as described above, and size separated. Based upon presence of band in the three experimental samples and absence in Hbt. sp. NRC-1 SD23 control lacking expression vector, the region of the gel corresponding to 50 kDa MW proteins was excised and prepared for proteomics as described above. No ORP proteins were detected.
We next performed Western blot on lysate from cloned ORPs in Hbt. sp. NRC-1 SD23 bop (-) background using the following protocol. SDS-PAGE was run as described above and gel soaked in 15 mL of transfer buffer (25 mM Tris base, 192 mM glycine, 10% methanol, pH 8.4) for 15 min. PVDF membrane (Novex, LC2002) was successively soaked in 100% methanol, MilliQ™ H 2 O, and transfer buffer. Foam pads and 3mm Whatman filters were also soaked in transfer buffer. Western gel was run at 30V for 1 hr. His-tagged proteins were probed using the SuperSignal 1 West HisProbe™ Kit (Thermo Scientific™, 15168). Based upon presence of two bands at~15 and~20 kDa in the three experimental samples, the region of the gel corresponding to 13-25 kDa was extracted and prepared for proteomics as described above. No ORPs were detected, however, the three opsin proteins present in the host (HR, SRI, and SRII) were observed, indicating the region of the gel used was the correct MW range for opsins. Over 900 proteins were identified with a protein false discovery rate of 1.7% and peptide false discovery rate of 0.36%, using a Hbt. sp. NRC-1 database with ORPs added.  [58] showing predicted novel extracellularly accessible binding pocket and internal tunnel network for top five structural models predicted for Hrr. distributum ELZ45759, Nab. magadii WP_004267173, Nbt. gregoryi WP_005575895. (PDF) S3 Fig. Phylogeny of Microbial type 1 opsins (expanded). Unrooted maximum-likelihood phylogeny of all microbial (type 1) opsins obtained by BLASTp searches of NCBI's nr and env_nr databases using canonical haloarchaeal opsins as queries. Tree inferred using FastTree [71] and ComparetoBootstrap.pl [74] using 500 bootstrap replicates generated with SeqBoot [73]. Sequences lacking the Schiff base lysine (K216) are colored red. Tree file can be accessed at the Dryad Digital Repository (http://dx.doi.org/10.5061/dryad.963hr). (PDF) for compounds with binding energies less than -9 kcal/mol. B idock binding energies for 229,358 ligands from natural product space. Compounds with binding energies less than -9 kcal/mol were selected for further analysis. C Representative compounds identified during virtual compound screening. These compounds include naphthoquinones, nitrogen-containing heterocycles and a number of sesquiterpenes. The name or compound class is given, as well as the docking energy for each single compound with one of the three modeled ORPs. (PDF) S1 File. Supplementary discussion of Virtual Ligand Screening. Additional discussion regarding the results of the virtual ligand screening performed herein are described. (PDF) S1 Table. Haloarchaeal species with ORPs. Strain names and number of ORP homologs detected for each haloarchaeal species possessing putative retinal-free opsins. (PDF) S2 Table. Databases searched for haloarchaeal-type ORPs. Search statistics and results for search for haloarchaeal-type ORPs. For details on filtering parameters, see Methods and Materials. Briefly, databases were searched using all haloarchaeal ORPs as query, and results were filtered for unique hits containing a Bac_rhodopsin domain (Pfam clan CL0192). Results matching these criteria were aligned and checked for presence or absence of the Schiff base lysine (K216). (PDF) S3 Table. Primer sequences. PCR and cloning primers for ORPs from four species used in experimental confirmation of ORP expression. Expected product lengths and primer set used to screen for gDNA contamination also provided. (PDF)