Predicting peak spectral sensitivities of vertebrate cone visual pigments using atomistic molecular simulations

Vision is the dominant sensory modality in many organisms for foraging, predator avoidance, and social behaviors including mate selection. Vertebrate visual perception is initiated when light strikes rod and cone photoreceptors within the neural retina of the eye. Sensitivity to individual colors, i.e., peak spectral sensitivities (λmax) of visual pigments, are a function of the type of chromophore and the amino acid sequence of the associated opsin protein in the photoreceptors. Large differences in peak spectral sensitivities can result from minor differences in amino acid sequence of cone opsins. To determine how minor sequence differences could result in large spectral shifts we selected a spectrally-diverse group of 14 teleost Rh2 cone opsins for which sequences and λmax are experimentally known. Classical molecular dynamics simulations were carried out after embedding chromophore-associated homology structures within explicit bilayers and water. These simulations revealed structural features of visual pigments, particularly within the chromophore, that contributed to diverged spectral sensitivities. Statistical tests performed on all the observed structural parameters associated with the chromophore revealed that a two-term, first-order regression model was sufficient to accurately predict λmax over a range of 452–528 nm. The approach was accurate, efficient and simple in that site-by-site molecular modifications or complex quantum mechanics models were not required to predict λmax. These studies identify structural features associated with the chromophore that may explain diverged spectral sensitivities, and provide a platform for future, functionally predictive opsin modeling.


Introduction
Many living organisms rely upon vision as the dominant sensory modality for foraging, predator avoidance, and social behaviors including mate selection. Vertebrate visual perception is initiated when light strikes rod and cone photoreceptors within the neural retina of the eye. Within photoreceptors, the visual pigments absorb light and interact with downstream intracellular signaling pathways. Visual pigments consist of seven-transmembrane, G-protein-coupled receptor proteins (GPCR) called opsins, together with a chromophore covalently bound through a Schiff base attachment at a lysine residue. The spectral sensitivities of visual pigments are a function of the type of chromophore (11-cis retinal or 11-cis-3,4-didehydro retinal) and the amino acid sequence of the associated opsin protein [1][2][3][4].
Color vision is possible when different cone opsins with distinct peak spectral sensitivities are expressed in separate cone photoreceptor populations, providing differential input to downstream retinal neurons. Cone opsins are under strong natural selection [5][6][7][8], and minor changes in their amino acid sequences can result in large changes in spectral sensitivities of their corresponding pigments [4]. For example, the human green cone opsin is 96% identical at the amino acid level to the human red cone opsin, but their corresponding pigments show peak spectral sensitivities that are 28 nm different [9]. Vertebrate cone opsins are grouped into four families: SWS1, SWS2, RH2, and LWS, which typically produce pigments sensitive to very short wavelengths (UV-violet, 360-450 nm), short wavelengths (blue, 450-495 nm), medium wavelengths (green, 495-560 nm), and long wavelengths (yellow-red, 560-700 nm), respectively [10]. However, there is a large amount of spectral variation within each cone opsin family. For example, some SWS1 pigments are maximally sensitive to blue wavelengths (e.g. human blue cone opsin), and some LWS pigments are maximally sensitive to green (e.g. human green cone opsin) [11].
Fueling this variability within both primate and fish genomes is the presence of numerous tandemly-replicated cone opsin genes [12]. The phylogenies in Fig 1 show several examples of tandem replications in the rh2 opsin genes of selected teleosts. The zebrafish, Danio rerio, has four rh2 genes, which arose by multiple duplication events after the divergence of otomorpha (like D. rerio) from euteleosteomorpha (like medaka, Oryzias latipes). Tandem duplications also occurred in the common ancestor of O. latipes, Poecilia reticulata and Metriaclima zebra, and again in M. zebra (rh2Aα and rh2Aβ). Experimentally measured peak spectral sensitivities (λ max ) of these opsins reconstituted with chromophore are also indicated on the phylogenies, along with λ max for inferred ancestral sequences for the ancestors of the extant Rh2 opsins (Fig  1) [13]. Mutation studies have shown that position 122 in teleost Rh2 opsins predicts greenshifted λ max (> 495 nm) when occupied by a Glu (E), and blue-shifted λ max (< 495 nm) when occupied by a Gln (Q); this substitution alone has been demonstrated to account for~15 nm of spectral shift [13] (see SI S1 Fig for the E122Q substitution). There are two equally parsimonious explanations for the evolutionary timing of substitutions at position 122 that resulted in the λ max of ancestral and extant opsins (Fig 1A and 1B). However, given that the opsin genes are under strong selection, the most parsimonious explanation may not reflect the true evolution of these proteins.
Replicated opsin genes therefore provided the raw genetic material for tremendous diversity in spectral sensitivities through mutation and neofunctionalization. This diversity Evolutionary relationships of selected teleost Rh2 opsin proteins inferred using the neighbor joining algorithm. The Gonnet weight matrix was used and positions with gaps were excluded to determine distances. One thousand bootstrap replicates indicate that each branch is well supported with confidence values ! 78%. Peak spectral sensitivities (λ max ) (nm) of opsins reconstituted with chromophore are shown next to protein names, and are color-coded for being blue-(<495 nm) vs green-sensitive (>495 nm). The two trees depict equally parsimonious explanations for evolutionary timing of E122Q substitutions.
https://doi.org/10.1371/journal.pcbi.1005974.g001 contributes to organismal colonization of, and persistence within, novel environments. The high rate of mutation and unequal recombination within cone opsin genes in humans can also have deleterious consequences, including numerous types of color blindness and some cone degenerative diseases that drastically reduce visual function [14][15][16].
The ability of a pigment to absorb light at a specific λ max is determined by the conformation adopted by the chromophore; this conformation depends on the shape and composition of the binding pocket and the counter-ions that stabilize the Schiff base in the dark or ground state [17][18][19]. Laborious residue-by-residue substitution approaches, followed by reconstitution of opsin with chromophore and subsequent measurement of absorbance, have identified a small number of key residues that contribute to spectral shift within minor subsets of each cone opsin family. For example, the "five sites" rule states that the identities of five specific residues within the binding pocket of some mammalian LWS opsins can predict peak spectra [20], and the E122Q substitution described above predicts green vs. blue λ max in teleost Rh2 visual pigments [13]. However, the "five sites" rule does not extend beyond LWS opsins [21], and is not predictive beyond selected mammals [22]. In teleost Rh2 pigments, E122Q predicts green vs. blue, but further spectral differences cannot be explained by specific contributions of identified amino acid replacements [13]. Despite the functional significance of the evolution of color vision, there is currently no simple strategy for predicting λ max of a chromophore-bound cone pigment [4].
We report here an alternative, efficient and more accurate approach to predicting spectral peak sensitivities of cone opsins, using a spectrally-diverse group of 14 teleost Rh2 opsins for which sequences and λ max are known (Fig 1). Through the generation of homology structures and atomistic molecular dynamics (MD) simulations [23], we identified two parameters of chromophore conformation and fluctuation that together accurately predict peak spectral sensitivities. Furthermore, these studies identify structural features associated with the chromophore that explain diverged spectral sensitivities, and provide a platform for future, functionally predictive opsin modeling.

Results
To develop a model that predicts peak spectral sensitivities from amino acid sequence, we selected a spectrally diverse set of Rh2-type teleost cone opsins. The Rh2 opsin proteins were chosen because they are most closely related phylogenetically to Rh1 opsin proteins. RH1 opsins are present in vertebrate rods, and form the rhodopsin visual pigments [10], and the mammalian RH1 opsins are the only vertebrate opsins for which experimental protein structures are available [24,25]. Moreover, amino acid sequences and corresponding spectral sensitivities of pigments reconstituted with 11-cis retinal are known for many teleost Rh2 opsins and show a wide range of λ max (452-528 nm) (Fig 1; Table 1).
With this sequence information we built three-dimensional homology structures using the bovine rhodopsin (RH1 opsin + 11-cis retinal chromophore) structure as a template (PDB ID:1U19) [24]. These structures were embedded in explicit membrane bilayers and water with the chromophore bound covalently to the lysine residue in the binding pocket (Fig 2), and were then subjected to 100 ns classical MD [23] simulations using the protocol described in the methods section (SI S1 Movie).
We analyzed the MD simulations for all 14 pigments and identified structural features associated with the chromophore and attached lysine that could potentially be used to explain spectral sensitivity differences. For each pigment we examined a total of 19 angles (15 torsion angles and four geometric angles) formed by the heavy atoms of the lysine attached to 11-cis retinal (LYS+RET) ( This two-peak distribution was previously documented for the rhodopsin (RH1) visual pigment, using combined quantum mechanics/molecular mechanics (QM/MM) simulations [29].
To understand the dynamics of the chromophore within the opsin binding pocket we visualized the chromophore conformations seen in blue-vs green-sensitive pigments ( Fig 3B). A compact cluster of conformations was observed for blue-sensitive pigments, in contrast to a more broadly distributed cluster in green-sensitive pigments. We calculated root mean square fluctuations (RMSF) of all the heavy atoms of the chromophore and the attached lysine residue (LYS+RET) (Fig 3D) for each pigment as follows: where T is total number of molecular dynamics trajectory frames (V). The atoms within the blue-sensitive pigments clearly show lower RMSF values than the green-sensitive pigments ( Fig 3D).
The results describing the conformations and dynamics of the chromophore suggested that these parameters may be used to distinguish green-from blue-sensitive Rh2 pigments and potentially to accurately predict λ max . We used a standard model selection procedure to  of the removed pigment was predicted based upon the new linear model. This test showed that there were no individual Rh2 opsins disproportionately influencing our results, and that the correlation of the individual predictions based upon only 13 pigments was also high (R 2 = 0.91) (Fig 4A). These results demonstrate that our approach can be used to predict spectral peak sensitivities for a wide range of Rh2 pigments with high accuracy. We next wished to further test the approach itself, rather than the specific regression model described above. Therefore, we generated a linear model based upon only a subset of the Rh2 pigments: 11 pigments of medaka, guppy, zebrafish, and zebrafish (cyprinid) ancestors. This resulted in a linear model utilizing Torsions 4 and 14 (1190.6208+(-4.9097 Ã Torsion 4) +(2.9445 Ã Torsion 14), that predicted λ max of these 11 Rh2 pigments with high accuracy (R 2 = 0.94) (Fig 4B). A leave-one-out approach to test this model also showed good predictive ability (R 2 = 0.89) (Fig 4B). This model was then used to predict spectral peaks for the three cichlid Rh2 pigments (which were not used to generate this particular model) using Torsion 4 and Torsion 14 data obtained from their MD simulations. The predictive accuracy was again rather  (Fig 4B; cyan symbols); particularly surprising given that the λ max of two of these cichlid Rh2 pigments (519 nm; Rh2-Aβ; 528 nm, Rh2-Aα) reside outside of the wavelength range of the 11 Rh2 pigments used to generate this model (467-516 nm). Indeed, the predicted value for Rh2-Aβ was 519 nm, matching its experimental value. These results demonstrate the utility and generality of the overall approach: statistical models derived from MD simulations of predicted visual pigment structures have the power to predict their λ max .

Discussion
We have developed a new approach for the prediction of cone pigment peak spectral sensitivity with a high degree of accuracy over a large range of λ max (452-528 nm). The approach required only template pigment structure and opsin protein sequence data as input. MD simulations were performed on the protein structures, and parameters describing the conformation of the opsin were then used in a statistical model. This in silico process revealed structural features of visual pigments, particularly within the chromophore and attached lysine residue that predict λ max . The approach is accurate, efficient and simple in that site-by-site molecular modifications or complex quantum mechanics models were not required. Instead, a two-term, first-order regression model was sufficient to achieve high correlations with empirical data. Although cone pigment homology models have been built using a rhodopsin template [30], to our knowledge this is the first report of a molecular modeling approach that predicts peak spectral sensitivities of vertebrate cone pigments.
Previous strategies to predict visual pigment λ max include site-by-site amino acid substitutions followed by measurement of pigment spectra to identify potential contributions of specific amino acid residues to spectral shift. These approaches are informative but not able to provide accurate predictions over a wide range of spectra or opsins. For example, the "five sites" rule established for some LWS opsins, states that the amino acid changes H197Y, Y277F, T285A, and A308S shift λ max of these pigments toward lower wavelengths, with each change contributing in an additive manner [20,31,32]. However, this rule largely fails to predict relative λ max beyond selected mammals, even in other LWS opsins [21,22]. Similarly, the amino acid change E122Q in teleost Rh2 opsins predicts a change from green-to blue-sensitive λ max [13], but provides no further insights into specific λ max (Fig 1), and it is not known how broadly this rule applies to other vertebrate opsin classes. The approach described here does not require site-by-site manipulations and does not rely on identifying key residues. Rather, it considers each opsin sequence in its entirety, and provides highly accurate predictions of both green vs blue and specific λ max values across a wide range of Rh2 pigments.
One of the key elements of our predictive model is the AUC of RMSF (LYS+RET) ; larger values correspond to greater fluctuation of heavy atoms of LYS+RET, and green-shifting of the λ max . We believe these findings provide new insight into the mechanisms of how E122Q contributes to spectral shift. The chromophore's β-ionone ring resides in close proximity to the amino acid residue at position 122 [33]. The negative charge on E122 may encourage the hydrophobic β-ionone ring to explore other space in the binding pocket and increase its fluctuation, resulting in a green-shifted λ max , whereas Q122 discourages such fluctuations. If true, this increased motion of chromophore in green-shifted pigments in the dark state may raise the energy of this (ground) state, thereby decreasing the energy difference between the ground and excited states. We speculate that this decreased energy difference may in part underlie the higher λ max [4].
Another strategy previously described for the prediction of λ max focused on the shift in spectral sensitivity that takes place due to chromophore association with an opsin protein; this has been referred to as "opsin shift" [4,34]. This approach has only been applied to RH1 opsins (rhodopsins). The chromophore 11-cis retinal, in a Schiff base-bound state, absorbs at 360 nm, but shifts to 440 nm when the Schiff base is protonated, as in the environment of an opsin binding pocket [29,35]. Motto et al. [36] further explored mechanisms of spectral shift in bovine rhodopsins with specific mutations affecting amino acids lining the binding pocket, using combined quantum mechanical and molecular mechanical (QM/MM) methods, and MD simulations. They suggested that rotation along the chromophore's single bond C6 -C7 (Torsion 1 of the present study) blue-shifts the rhodopsin by providing a reduced degree of conjugation. The MD simulations of the present study also identified Torsion 1 as appearing predictive of blue-vs green-sensitive λ max (SI S2 Fig), but this parameter did not emerge as a key element of our predictive model.
We identified two distribution peaks identified for several torsions, including Torsion 1 (C5-C6-C7-C8) within cichlid Rh2 pigments, but not the other teleost Rh2 pigments. Several previous studies have estimated the torsional angle of C5 -C6 -C7 -C8 in bovine rhodopsin (RH1). Spooner et al. [37,38] estimated this value to be −28˚± 7˚based on solid state NMR data derived from 13 C labeled 11-Z-retinal substrate. Sugihara et al. [38] carried out geometry optimization and constrained MD simulation residues within 4.5 Å of the chromophore (27 amino acids) using the self-consistent-charge density-functional-based-tight-binding (SCC-DFTB) method to identify the preferred conformation of the chromophore in the active site. They obtained a value of −35˚. Rajamani et al. [29] performed combined QM/MM simulations using the chromophore in the rhodopsin-membrane-water configuration and tracked the instantaneous value of C5 -C6 -C7 -C8 torsion (Torsion 1). Interestingly, they also obtained two distribution peaks; a large fraction (86%) of the structures had a negative torsional angle of −68˚± 55˚and a smaller fraction had +68˚± 25˚with statistical weighted average of −49˚. In our studies Torsion 1 was predictive of blue-vs green-sensitive spectra (SI S2  Fig), even when the two peaks for cichlid pigments were included. However, this angle was not identified as a key element of our statistical model for predicting λ max .
The success of the present study in predicting λ max for Rh2 opsins points to elements of interest-Torsion 15 of the chromophore, and chromophore fluctuation (RMSF)-for future examination in determining mechanisms of color tuning in vertebrate visual pigments. While our correlational analyses provide predictive power, quantum mechanical studies are needed to mechanistically explain differences in λ max . Accuracy of the present approach also suggests numerous potential applications of this and similar approaches for biology, biophysics, and bio-engineering. The development of atomistic MD models for the other vertebrate visual pigment classes could potentially be used to predict λ max of any rod or cone pigment for which opsin sequence information is available. It is important to note, however, that in the current approach we restricted the analysis to Rh2 pigments, a class of pigments with high sequence similarity to each other and to the RH1 pigment template, and representing less than the entire range of vertebrate visual pigment λ max . A next logical step will be to build structural homology models for more divergent classes of vertebrate visual pigments. If successful, such models would lead to a more accurate understanding of mechanisms underlying spectral shift, and the range of evolutionary trajectories that lead to these shifts. Models could also be used to understand destabilizing effects of mutations associated with disease, and to design novel vertebrate opsins with specific spectral sensitivities for optogenetic applications as alternatives to channelrhodopsin [4].

Alignments and model building
The amino acid sequences of Rh2 cone opsins from zebrafish (Danio rerio) [13,26], medaka (Oryzias latipes) [27], guppy (Poecilia reticulate) [21] and cichlid (Metriaclima zebra) [28] were downloaded from UniProt (http://www.uniprot.org/) (14 total sequences; see Table 1 for accession numbers). A template structure search was first carried out using MODELLER v9.15 [39]. The bovine rhodopsin (RH1) structure (PDB ID: 1U19) [24] was chosen as a template for all of the teleost Rh2 opsins studied here because it satisfied the following criteria: i) sequence identity >60% (Table 1); ii) >95% sequence coverage with the target Rh2 sequence; iii) presence of 11-cis retinal bound to the binding pocket and occupied palmitoylation sites; iv) high X-ray crystal resolution (2.2 Å); and v) no mutations in the crystal structure protein. MODEL-LER v9.15 was then used to perform the sequence alignments and generate three-dimensional structures of Rh2 cone opsins. For each opsin sequence we generated five homology structures. Stereochemical checks were performed using the SWISS-MODEL structure assessment tool (https://swissmodel.expasy.org/) on all five structures and the best was chosen based on minimal stereochemical deviation and high QMEAN score.

System setup and molecular dynamics simulation
The final selected structure of each Rh2 opsin was first uploaded on the web server, Prediction of Proteins in Membranes (http://opm.phar.umich.edu/server.php). Membrane boundaries provided by this server along with the protein model were then uploaded onto the CHARMM-GUI server (http://charmm-gui.org/) for further processing. To obtain the 11-cis retinal chromophore within the binding pocket of each structure, we changed the three letter amino acid code of the lysine residue that binds covalently with the chromophore and forms the Schiff base, from LYS to LYR in the PDB file. This modification allowed CHARMM-GUI to recognize and build the Schiff base and use appropriate forcefield parameters available on the server. We chose to include the palmitate moiety only in zebrafish Rh2 opsins because, among all Rh2 opsin sequences used in this study, only zebrafish Rh2 sequences shared a conserved palmitoylation site (C323) (SI S1 Fig) with the template bovine rhodopsin sequence. The C323 residue in all seven zebrafish opsin structures was linked to a palmitate molecule using the "add palmitoylation sites" option in CHARMM-GUI. Protonation states of amino acid residues were assigned at the physiological pH of 7.4. The protein was embedded in an unsaturated homogeneous bilayer consisting 1-steroyl-2-docosahexaenoyl-sn-glycero-3-phosphocholine lipids to provide a realistic representation of the phospholipids found in the cone outer segment. The replacement method [40] was used to pack the opsin model with lipid bilayer. Lipid layer thickness was chosen to be 1.6 (~70 lipids in top leaflet and~70 lipids in bottom leaflet). Each system was placed in a rectangular solvent box, and a 10 Å TIP3P water layer (15 Å in the case of cichlid Rh2 opsins to prevent boundary effects) was added to solvate intra-and extracellular space. Charge neutrality of the system was achieved by adding Na+ and Cl− ions at a concentration of 0.15 mol/L to the water layers. CHARMM-GUI (incorrectly) assumed the retinal was in 11-trans conformation and thus after preparing the system we replaced the coordinates of the retinal with 11-cis conformation obtained from template bovine rhodopsin structure.
The CHARMM36 forcefield [41] parameters were used for the protein and lipids. Each system was first minimized using steepest descent for 5,000 steps. To allow equilibration of the water each system was then simulated for a total of 550 ps with the positions of all heavy atoms in the protein, phosphorus atoms in the lipid head group and all dihedral angles in the lipid carbon chains harmonically restrained. Each restrained simulation was divided in six steps where the restraints were gradually relaxed for each step. During the restrained simulations, the temperature of the system was set to 300 K and the pressure was maintained at 1 atm using the Berendsen algorithm. Production NPT simulations for each system were then carried out for 100 ns using Parrinello-Rahman barostat [42] with semi-isotropic pressure coupling and Nosѐ-Hoover thermostat [43] for maintaining the temperature. For all simulations, the LINCS algorithm was used to constrain all bonds involving hydrogen atoms to their ideal lengths. Particle mesh Ewald [44] was used for electrostatics with a real-space cutoff of 1.2 nm. Van der Waals interactions were cut off at 1.2 nm with the Force-switch method for smoothing interactions. Each trajectory was 100 ns long with time step of 2 fs and updated neighbor lists every 20 steps. Trajectory snapshots were saved every 10 ps.
All systems were prepared using the CHARMM-GUI (http://www.charmm-gui.org) web server. Molecular dynamics (MD) simulations were carried out using GROMACS v5.1.2 [45]. Analysis of the internal degrees of freedom i.e. torsion and bond angles was carried out using plumed-driver tool from PLUMED v2.2. [46]. Molecular visualization of MD simulations was done in VMD [47].

Quantification and statistical analysis
The best linear regression model to predict spectral peak for the samples of teleost Rh2 opsin proteins was determined using seven parameters obtained from the MD simulations: the medians for Torsion 1, Torsion 4, Torsion 14, Torsion 15, and Angle 3 (Fig 2C; SI S2 Fig), and the areas under the curves (AUC) for root mean square fluctuation RMSF (ring) and RMSF (LYS +RET) . A best subsets procedure was used in which regression models with number of covariates from one to seven were fitted using the regsubsets option in the "leaps" R library (https:// cran.r-project.org/web/packages/leaps/leaps.pdf), and the seven best models for each number of covariates was retained. Each of these 49 regression models was ranked based upon their Bayesian Information Criterion (BIC) value [48]. The best fitting statistical model based upon the BIC was examined for influential data points using a leave-one-out test, as follows. For each of the molecular Rh2 structures under consideration, one protein was removed from the regression analysis for the best fitting statistical model, and then the peak spectral sensitivity of the removed protein was predicted based upon the new linear model.