Assessment and Challenges of Ligand Docking into Comparative Models of G-Protein Coupled Receptors

The rapidly increasing number of high-resolution X-ray structures of G-protein coupled receptors (GPCRs) creates a unique opportunity to employ comparative modeling and docking to provide valuable insight into the function and ligand binding determinants of novel receptors, to assist in virtual screening and to design and optimize drug candidates. However, low sequence identity between receptors, conformational flexibility, and chemical diversity of ligands present an enormous challenge to molecular modeling approaches. It is our hypothesis that rapid Monte-Carlo sampling of protein backbone and side-chain conformational space with Rosetta can be leveraged to meet this challenge. This study performs unbiased comparative modeling and docking methodologies using 14 distinct high-resolution GPCRs and proposes knowledge-based filtering methods for improvement of sampling performance and identification of correct ligand-receptor interactions. On average, top ranked receptor models built on template structures over 50% sequence identity are within 2.9 Å of the experimental structure, with an average root mean square deviation (RMSD) of 2.2 Å for the transmembrane region and 5 Å for the second extracellular loop. Furthermore, these models are consistently correlated with low Rosetta energy score. To predict their binding modes, ligand conformers of the 14 ligands co-crystalized with the GPCRs were docked against the top ranked comparative models. In contrast to the comparative models themselves, however, it remains difficult to unambiguously identify correct binding modes by score alone. On average, sampling performance was improved by 103 fold over random using knowledge-based and energy-based filters. In assessing the applicability of experimental constraints, we found that sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Additionally, in the case of DOR, knowledge of a single specific ligand-protein contact improved sampling efficiency 7 fold. These findings offer specific guidelines which may lead to increased success in determining receptor-ligand complexes.


Introduction
Being able to model the complex interactions between receptors and small molecule ligands offers immense opportunities for the basic biochemical understanding of signaling processes and for the development of pharmacological tool compounds and drugs that modulate receptor function. The human genome encodes for approximately 800 G-protein coupled receptors (GPCRs) that orchestrate the communication between a cell and its surroundings -an obvious place for small molecule drugs to interfere [1]. While more than 26% of our current small molecule drugs target Class A GPCRs alone [2], structure-based drug discovery has played a limited role in developing these molecules. GPCRs have been the subject of many structural, comparative modeling and docking studies. However in many cases, models are affiliated with high uncertainty and inaccuracy. Primary reasons include a lack of adequate template structures, the existence of multi-conformational states which require intense conformational sampling of not only the protein side chain but also backbone conformational space, in combination with the large variety of ligands that interact with GPCRs, including very flexible molecules which are notoriously challenging subjects for accurate docking [3,4]. Nevertheless, increasing the availability of reliable GPCR models for structure-based drug discovery would be beneficial in the development of novel, potent and subtype-selective molecules. Since the landmark publication by Rasmussen et al. in 2007 [5], the number of experimentally determined GPCR structures has been increasing rapidly and now totals to 18 distinct GPCR structures that are recorded in the Protein Data Bank (PDB). While this is still just a small subset of the GPCR space, it provides a more substantial basis for comparative modeling and docking simulations.
Despite the increase in experimental structural information, it remains difficult to predict ligand-binding conformations in comparative models of GPCRs for all except the receptors most similar to those which have been solved experimentally [3,4,6]. This difficulty originates in part from the necessity of sampling both receptor and ligand flexibility which, due to the necessarily approximate nature of the force fields and protein-less/ligand-less sampling methods, results in the sampling of biologically irrelevant conformations. This complicates discrimination between the global minimum energy conformation (GMEC) and the local minimum energy conformations (LMEC) of the binding complex, as deeper sampling reveals many different energy-equivalent binding modes. The reason for the difficulty in GMEC and LMEC discrimination is, as discussed by Fleishman and Baker [7], related to the small energy gap in ligand binding, which moreover is challenging to measure as it is often mediated by polar contacts and water molecules.
For GPCRs, the ligand docking problem is even more difficult for three reasons. Firstly, the alignment is not trivial, as the transmembrane helices occasionally contain bulges and kinks and the length of the transmembrane helix is not conserved. Secondly, GPCRs are able to assume multiple different conformations with approximately the same energy, as demonstrated by studies on the beta-adrenergic receptors [8,9]. Thirdly, three extracellular loops must be modeled, as they often contact the ligand and are involved in ensemble stabilization in some receptors [10,11].
At the same time, improved algorithms and high-performance computing revolutionize our ability to sample protein conformational space swiftly, enhancing the possibility to accurately dock ligands into comparative models [12]. This, combined with the increasing number of available templates, lets us assess the applicability of rapid Monte Carlo Metropolis (MCM) sampling as implemented in the Rosetta suite of programs for GPCR comparative modeling and docking [13]. Specifically, we address the accuracy of backbone placement in transmembrane and extracellular loops, sampling of ligand binding modes and sidechain conformations in the binding site, and strategies to select accurate models from the large conformational space sampled.

Sequence Alignment and Threading
The first step in constructing the models was performing a sequence alignment of the target sequence with a structural alignment of the other 13 GPCRs to be used as templates. A structure-based sequence alignment was generated of all 14 GPCR templates using MUSTANG [28] as seen in Figure S2 (Figure 1, Step 1). The sequence of the target GPCR was then aligned with the profile of structurally aligned templates using CLUSTALW [29] (Figure 1, Step 2). The sequence of the target GPCR was then placed onto the helical backbone coordinates of each template structure ( Figure 1, Step 3). Any missing density and variable loop regions were constructed using the ab initio cyclic coordinate descent protocol in Rosetta [30,31] (Figure 1, Step 4).

Building in Missing Density and Extracellular Loop Regions in the Comparative Models
Missing density in the threaded models due to gaps or insertions in the sequence alignment were built in Rosetta using Monte Carlo Metropolis (MCM) fragment replacement combined with cyclic coordinate descent loop closure [30,31] (Figure 1, Step 4). Cyclic coordinate descent (CCD) was inspired by inverse kinematic applications in robotics and closes loops by minimizing the sum of the squared distances between three backbone atoms of the moving N-terminal anchor and the three backbone atoms of the fixed C-terminal anchor through the adjustment of dihedral angles. Its speed and its ability to close a loop over 99% of the time gives CCD an advantage over other loop closure methods. In brief, loop regions defined by the user are chosen in a random order and for each loop, w-y angles of backbone segments from homologous sequence fragments from the PDB, excluding those from the target experimental structure, are introduced into the loop regions. After the fragment substitution, small movements in the w-y angles are performed to close breaks in the protein chain. After each defined loop has been closed, resulting full sequence models were subjected to eight iterative cycles of side chain repacking and gradient minimization of w, y, and x angles using the Rosetta scoring function with an implicit membrane potential [32]. A total of 200 models were constructed with each threaded model and the lowest energy model was chosen for a full remodeling of the extracellular loops ( Figure 1, Step 5). Extracellular loops, as shown in Figure S3, were extensively rebuilt using both the cyclic coordinate descent loop closure method described above and the kinematic loop closure method described below. Approximately 1000 models were built for each target-template pair, resulting in a minimum of 13,000 comparative models per target structure.
A limited benchmark over the comparative modeling of six GPCRs was performed to compare the results of the kinematic loop closure (KIC) method in Rosetta [33] with CCD. KIC analytically determines all mechanically accessible conformations for six pivot torsion angles of a peptide chain using polynomial resultants. During kinematic loop closure, all mechanically accessible conformations for w and y dihedral angle torsions from the first, middle and last residues in a loop segment, designated as pivot torsions, are sampled. The remaining torsion angles are randomly sampled using Monte Carlo minimization from Ramachandran probabilities of each amino acid. The six pivot torsions are solved analytically to close the loop. The protocol is performed for 720 rounds of high resolution loop closure and models accepted by the Metropolis criterion are subjected to side chain repacking and gradient minimization as described above. The data from the benchmark set comparing the two methods indicated that overall, CCD produced comparative models with an average root mean square deviation (RMSD) of 2.0 Å over extracellular loops (ECL) 1 and 3, which was significantly lower than the average RMSD over ECL1 and 3 for KIC at 2.6 Å (Table S1). The difference between CCD and KIC for the average RMSD over the full receptor was even more striking at 3.8 Å and 6.9 Å respectively. Results from CCD loop modeling were consequently used for further analysis.

Selection Methods of Comparative Models for Docking
Comparative models were filtered for ligand docking using two different methods ( Figure 1, Step 6). Both methods partially build on the observation that receptor accuracy is correlated with the Rosetta energy function ( Figure S4). The first method was based on clustering of the 10% best scoring structures. Clusters were determined based on pairwise RMSD of all C-alpha atoms using bcl::Cluster [34] and a cluster radius of 3.0 Å . The best scoring models in each of the clusters were used for further analysis. The second method was created to avoid sampling of non-native ligand binding pocket conformations. Pocket residue positions were defined across all GPCRs as positions in the sequence alignment where C-alpha atoms of the residues had a distance of less than 4.0 Å to the ligand in at least one experimental structure. This yielded a list of 29 residue positions, which was reduced to 25 residue positions when the four residue positions at the top of transmembrane helices (TM) two and five were removed to avoid bias from structural alignment of the proteins. Pocket residues are shown in the alignment in Figure S2. Comparative models passed the filter only if C-alpha atoms of all pocket residues had an alignment equivalent pocket residue in another GPCR within a distance of a residue position specific cutoff. The cutoffs were chosen to be residue specific to represent varying flexibility in different parts of the receptor. The maximum distance between a specific pocket residue in any receptor and an equivalent pocket residue in any other GPCR, according to the alignment shown in Figure S2, was chosen as a distance cutoff for that particular residue position. When applying the knowledge-based filter, the self-experimental structure was not considered to avoid circular bias.

Generation of Ligand Conformers
In preparation for docking, ligand conformers were generated by MOE (Molecular Operating Environment, Chemical Computing Group, Ontario, Canada) using the MMFF94x force field and Generalized Born implicit solvent model ( Figure 1, Step 7). Conformers were generated using 10,000 iterations of the Low Mode MD method [35] with a redundancy cutoff of 0.25 Å . Energy cutoffs for ligand conformers were dependent on the number of rotatable bonds: 3 kcal/mol for 1-6 rotatable bonds, 5 kcal/mol for 7-9 rotatable bonds and 7 kcal/mol for 10-12 rotatable bonds [36]. The RMSD distribution for the generated ligand conformers compared to the bioactive ligand conformation is shown in Figure S10.
The ligand conformers were protonated as shown in Figure S1. These protonation states were determined based on the local environment in the individual experimental structures. In the case of ligand C-24, the protonation state is not what would be predicted without information from the experimental structure.   We note that this adds some bias to the method. Likewise, the stereochemistry of the ligand (E)-IDT in CXCR4 was taken directly from the experimental structure. Of note, the experimental structure was solved with a mixture of (E) and (Z)-form, which cannot be clearly distinguished from the electron density (Raymond Stevens, personal communication).

Docking Ligands into a Chosen Ensemble of Comparative Models
Ligand docking into the comparative models was performed with Rosetta Scripts [37][38][39][40] (Figure 1, Step 8). Each ligand was allowed to sample binding modes in a 5.0 Å radius from the coordinate representing the center of the ligand binding mode as given in the experimental structure. This adds some bias to the sampling, as the smallest unbiased docking sphere enclosing all ligand binding conformations has a radius greater than 5.0 Å . During the low-resolution phase of docking, rigid body orientation of the ligand centroid is performed through translation until the geometric center of the ligand is in a position not occupied by atoms in the receptor. High-resolution docking then begins with 1000 cycles of full rotational freedom until the attractive and repulsive forces fall below a threshold value. Six cycles of sidechain rotamer and ligand conformer sampling are then coupled with 0.1 Å , 0.05 radian ligand movements simultaneously in a Monte Carlo simulated annealing algorithm. All rotatable bonds within the ligand, except for planar conjugated bonds, were allowed full flexibility as indicated within the ligand parameters file. Ligand conformers are randomly chosen until the Monte Carlo criterion has been satisfied. A final minimization combines side-chain rotamer sampling with backbone torsion angle minimization with harmonic constraints on the C-alpha atoms.
The energy function used during the docking procedure contains terms for van der Walls attractive and repulsive forces, statistical energy derived from the probability of observing a particular side-chain conformation in the PDB, hydrogen bonding, electrostatic interactions between pairs of amino acids, and solvation assessing the effects of both side-chain/side-chain interactions and side-chain/ligand interactions. For each ligand, over 2,000 docked complexes were generated and evaluated in comparison to the experimental ligand binding mode using RMSD to the heavy atoms.

Assessing the Size of the Ligand Conformational Space
We propose a new measure to enable comparison of docking benchmark studies across targets and to test how the methods compare to random sampling -the uniform sampling efficiency (USE2.0). The proposed measure is equivalent to the sampling frequency of better-than-2.0-Å -RMSD-binding-modes that would occur by random sampling in a 5.0 Å docking sphere with no occluding protein, given a set of ligand rotamers and full rotational and translational freedom. To calculate USE2.0, each i rotamer of the N rotamers in the generated ligand ensemble was aligned to the experimental structure and rotated along its principal axes (w,h,y) using M (40) uniform spacings. For sampling to be uniform, a correction factor, Cw, is needed to account for the fact that the number of ways of choosing h, given w, is proportional to the circumference of the circle that h draws on the w,h sphere [41]. The translation distance that increased the RMSD to 2.0 Å was determined for each rotamer-rotation set r(i,w,h,y). USE2.0 was then determined as the fraction between the volume of space containing binding modes below 2.0 Å and sampled volume of the 5.0 Å (R) docking sphere.
The distribution of RMSDs that arise from uniform sampling of ligand conformations is nontrivial, dependent on the ligand size and on the generated conformers. For the ligands considered in this dataset, USE2.0 varies from 10 25 for DOR to 10 27 for S1P1. The algorithm to determine USE2.0 is available i (File S2).

Enrichment of Native-like Binding Modes using known Contacts between the Ligand and GPCR
When a mutation of a residue strongly affects ligand binding, this residue is often interpreted as having a direct contact to the ligand. To assess how this type of constraints enriches for the correct binding mode in our ligand-protein ensembles, we determined the average enrichment through 10,000 iterations of n randomly chosen known contacts with n running from 0 to all known contacts for a particular receptor-ligand interaction.

Ligand-protein Evaluation through RMSD-based Clustering and Binding Energy
Results from the ligand docking study were evaluated using clustering on pairwise RMSD values calculated over the ligand heavy-atoms using bcl::Cluster with a 2.0 Å cutoff ( Figure 1, Step 9). The lowest energy binding modes of the five largest clusters were chosen for further analysis. The coverage and accuracy of correct ligand-receptor contacts compared to the experimental structure was calculated on the top ranked models using SimiCon [42]. Contact coverage is calculated as the number of correct ligand contacts from the model divided by the total number of ligand contacts made in the experimental structure. Accuracy of correct contacts is calculated as the number of correct contacts divided by the total number of ligand contacts made in the model.

Results and Discussion
The results are presented in four parts. In the first part, we discuss the accuracy of comparative models generated by sequential sampling algorithms of the transmembrane and loop regions. Secondly, we discuss methods to select the most accurate models for ligand docking. Thirdly, we assess the equivalency of the local minimum interaction energy conformation (LMIEC) with the lowest energy that we sample and the experimental ligand binding mode and ask how much receptor flexibility can be sampled before the lowest energy LMIEC deviates from the experimental ligand-protein complex. In the fourth part, we assess the sampling efficiency of native-like ligand binding modes by docking the ligand ensemble into the ensemble of comparative models. Various methods used to identify native-like ligand binding modes in the resulting ensembles are explored.

Templates of Higher Sequence Identity Produce More Accurate Comparative Models
We generated 13,000 comparative models of each receptor through minimization and loop building in a sequential fashion as n described in the Methods section. To assess the parameters which determine comparative model accuracy in the initial receptor ensemble, we considered the total energy of the models and the sequence identity of the template. Sequence identity was calculated on the aligned GPCR sequences as seen in Figure S2. As shown in Figure 2A, the average root mean square deviation (RMSD) of comparative models built with templates having greater than 50% sequence identity are consistently below 5.0 Å compared to the experimental structure. For residues in the ligand binding pocket (the pocket residues), the average RMSD of comparative models built with templates above 70% sequence identity within the pocket residues are frequently within 2.0 Å of the experimental structure ( Figure 2B). In fact, only those targets with templates above 50% sequence identity were able to sample ligand binding pockets within 1.0 Å of the experimental structure ( Figure S5). As can be expected, this includes all target-template pairs within the same sub-family; for example, B1Ar and B2Ar serve as the best templates for each other at 74% sequence identity, as does M2R and M3R at 75% sequence identity. The b-adrenergic receptors also produced accurate models when used as templates for the muscarinic receptors at 48% sequence identity. The opioid receptors produced the most accurate comparative models when used as templates for each other at 65% to 75% sequence identity. In most cases, templates with high sequence identity also generate the lowest energy models in comparison to models based on other templates ( Figure S5). Where there were exceptions, the lowest energy models were produced with templates with at least 45% sequence identity to the target receptor. Without a template having sequence identity above 50%, it continues to be difficult to get accurate models of the ligand binding pocket. While it was demonstrated that building multiple models based on different templates provides a better opportunity to sample the correct conformation and this is leveraged here [12], the generation of a smaller but improved conformational receptor ensemble could benefit from using multiple template structures in a single model [45]. Recently, Worth et al. 2011 demonstrated that similar or improved comparative models could be generated using a multitemplate approach, where rotameric states as well as specific sequence and structural features could be modeled in light of the entire set of available experimental structures which otherwise might be absent when using a single template [46].

Correct Helical Conformations are Recovered in Regions of Aligned Secondary Structure
C-alpha RMSD in comparison to the experimental structure was measured for the full receptor, transmembrane region and second extracellular loop (ECL2) region in the lowest energy models and the top ranked models by clustering ( Table 2). Among the top ranked models for all 14 receptors, the transmembrane region was modeled with an average RMSD of 2.5 Å compared to their corresponding experimental structures. This average drops to 2.2 Å when considering only those models with template sequence identities above 50%.
The maximum transmembrane region RMSD among top ranked models was seen for CXCR4 at 3.2 Å . In this case, helical placement of TM7 was shifted by six residues between the target sequence and the sequence of the templates, resulting in a gap in the alignment ( Figure S2). Without reliable backbone coordinates to model the top of TM7, the resulting models rely on Rosetta to de novo fold the region using the CCD loop closure algorithm. The helical structure is recovered, but the top two helical turns of TM7 in the models are displaced from that of the experimental structure by distance of 13.3 Å ( Figure 3A).
However, the conformation of the transmembrane helices is reasonably accurate throughout regions where transmembrane helices are aligned, specifically in terms of helical kinks. Deviations from ideal helical conformation are typically caused by proline or glycine residues and are important for both function and folding [47]. Major helical kinks occur in regions where proline residues are highly conserved between the GPCR sequences, particularly in TM 5, 6 and 7. In the two cases where templates had a proline or glycine-induced kink that was not present in the target, Rosetta was able to remove the kink and recover the correct conformation. The template of the top ranked S1P1R model, D3R, contains a proline at P84 which causes a kink in TM2 that was resolved by Rosetta ( Figure 3B). The same is seen for TM4 in KOR, where the Figure 2. Template sequence identity versus comparative model RMSD. Each point represents the average RMSD over all comparative models of a target GPCR built using a particular template. For each target-template pair, percent sequence identity was calculated on the sequence alignment shown in Figure S2. Sequence identity is shown here to correlate with low average A) receptor RMSD, calculated over the C-alpha atoms of the full receptor and B) pocket residue RMSD, calculated over the C-alpha atoms of residues within the ligand binding pocket. doi:10.1371/journal.pone.0067302.g002 glycine-induced induced kink at G178 in DOR was removed during Rosetta energy minimization to recover the correct conformation ( Figure 3C).

Native-like Loop Conformations are Sampled but are Difficult to Identify by Score
Rebuilding the three extracellular loops was a point of focus during the modeling protocol because of their role in ligand binding. The first and third extracellular loops range in length from five to ten residues, which is within the range of successful loop prediction for Rosetta when applied on experimental structures [30,31]. Here we find that the first and third extracellular loops are built with an average RMSD of 2.0 Å to the loop conformation from the experimental structures (Table  S1). In several cases, identification of the correct loop conformation was possible using the energy of the loop ( Figure S6, Figure  S8).
The second extracellular loop (ECL2) ranges in length from 16 to 31 residues. While the length of ECL2 is beyond the capability of prediction for loops excised from experimental structures, restriction of the sampling space was provided by requiring formation of the conserved disulfide bonds. The results demonstrate that ranking the most accurate ECL2 is difficult based on energy and clustering, since no top ranked models contained ECL2 RMSDs under 2.0 Å (Table 2 and Figure S7). However, it is possible to sample these native-like loop conformations, which is needed during docking to generate the correct ligand binding mode as observed in the experimental structure. Specifically, loop conformations were sampled below 2.0 Å for MOR, M2R and D3R (Table 2 and Figure 4). When focusing on the C-terminal region of ECL2, which is most often involved in ligand binding, we find native-like sampling for all models, with 0.4% to 85% of sampled ECL2 conformations below 2.0 Å ( Table 2).

Accurate ECL2 Conformations Often Recover Secondary Structure Elements
On average, the ECL2 RMSD for top ranked models by clustering was 5.3 Å , with the most accurate ECL2 conformations given for D3R at 3.6 Å , M2R at 4.2 Å and B1Ar at 4.3 Å (Table 2). Compared to the length of other ECL2s, which are about 21 residues long, D3R was relatively the easiest ECL2 to model with only 16 residues ( Figure 4A). Other comparative models where ECL2 regions were most accurately sampled were those with secondary structure within the loop. Applying a fragment-based approach to de novo loop modeling allows for the insertion of secondary structure into loop regions where it is predicted from the sequence. In four of six cases, Rosetta was able recover helical elements found in ECL2 of experimental structures and one of five cases where b-sheets were found in ECL2 ( Figure S9). MOR was the one case where b-sheets were conserved in the model, and the most accurately sampled ECL2 had an RMSD of 1.6 Å ( Figure 4B). B1Ar ( Figure 4C), B2Ar, M2R ( Figure 4D) and M3R were the models in which helical elements were correctly sampled. In cases where predicted secondary structure in the target agrees with that of the template, such as with B1Ar and B2Ar, it would be beneficial to keep the loop conformation of the template and enforce the helical element [48]. The most difficult loop conformations to model were in S1P1R ( Figure 4E) and bRh ( Figure 4F), where the top ranked models only came within 5.4 Å and 7.5 Å of the experimental structure respectively. Both receptors have ECL2s longer than twenty residues with little secondary structure to stabilize the conformation. Additionally, ECL2 in both receptors packs against the N-terminal region, which was removed prior to comparative modeling. Therefore, inclusion of the N-terminal region into comparative modeling might be beneficial in these cases.

Comparison with Previous Studies on GPCR Loop Modeling
Other studies have likewise addressed GPCR loop modeling. They include the protein local optimization program (PLOP) [48,49], which samples amino acid rotamers in loop regions and evaluates models using a physics-based energy function while explicitly modeling membrane molecules. Modeller uses the CHARMM-22 force field and knowledge-based energy terms to optimize the loop conformation [50]. The algorithm employed by Nikiforovich et al. [51] performs geometric sampling of the loops using all possible conformations of the peptide backbone. In comparison to their study, Rosetta was able to rank loop conformations in comparative models more accurately than the loop conformations built de novo in experimental structures by Nikiforovich et al. [51]: bRh was modeled to 7.5 Å RMSD compared to 8.4 Å , B1Ar was modeled to 4.3 Å compared to 6.4 Å , and B2Ar was modeled to 4.4 Å compared to 7.4 Å . In their most recent study, Goldfeld et al. reported top ranked loop conformations built de novo in experimental structures as 2.7 Å for B1Ar and 2.2 Å for B2Ar [48]. However, the algorithm they used enforced the helical bounds within ECL2 for these structures. When considering the results from true de novo constructed loop conformations without the helical constraints, top ranked loop conformations from Rosetta are also more accurate than PLOP, whose top ranked ECL2 conformations were 9.1 Å for bRh, 5.6 Å for B1Ar and 13.8 Å for B2Ar.
These results indicate that even current state-of-the-art methods for loop modeling continue to have difficulty determining loop conformations, especially within comparative models. However, the experimental structures which we attempt to reproduce still only represent one of many possible loop conformations for these flexible regions and it is possible that more of the sampled conformations are in fact realistic [52,53].

Rosetta Captures Native-like Ligand Binding Pocket Conformations and Samples Beyond the Flexibility Evident from Experimental Structures
To assess the sampling density of residues lining the ligandbinding pocket, we aligned all the models to the experimental structures and measured the collapse of the pocket as the change in distance for each residue C-beta atom (C-alpha for glycine) to the closest ligand atom as determined from the experimental structures ( Figure 5). The models display increased flexibility at the top of the transmembrane (TM) helices, as would be expected due to the variability represented by the crystallographic templates. With an average collapse of -0.1 Å and a standard deviation of 3.6 Å within all the comparative models generated, Rosetta samples beyond the flexibility that is represented by the experimental structures, which have an average collapse of -0.1 Å with a standard deviation of 1.0 Å ( Figure 5). As the experimental structures are still a small and biased representation of the GPCR space, it is unclear if Rosetta is introducing too much flexibility in these regions. However, for the present study, many comparative models within our ensemble will not make constructive interactions with the ligand due to non-native placement of the residues involved in ligand binding.

Knowledge-based Filters Improve the Accuracy of the Ligand-binding Pocket
Because Rosetta samples the flexibility of the transmembrane region beyond the variability that is represented in the experimental structures, a knowledge-based filter was created which focused on the pocket residues alone to identify models that would be suitable for ligand docking. Models with structural deviation beyond the maximum flexibility observed within the binding pocket in existing experimental structures were removed, as described in the Methods section. The filter accepted between 0.2% and 10% of the models from the initial receptor ensemble and the overall RMSD of these models were comparable to those identified by traditional clustering methods ( Figure 6A). For several receptors, there was a correlation between pocket RMSD and receptor energy ( Figure S5) and based on this correlation, energy was used to reduce the filtered ensemble to a maximum of 100 structures. In this filtered ensemble, the receptor collapse was 0.3 Å with a standard deviation of 0.8 Å , which is slightly greater compared to what is seen in the experimental structures. This is possibly due to favorable energy when collapsing the pocket. The average RMSD of residues constituting the common ligand binding pocket is significantly improved compared to a receptor ensemble selected by clustering of the initial receptor ensemble ( Figure 6B).

Accuracy of the Ligand Conformer Ensemble is Highly Dependent on Ligand Flexibility
The generation of ligand conformations is not a trivial process, as the bioactive ligand conformation need not occupy its aqueous GMEC [36,54,55]. Our approach was to create ensembles of low energy ligand conformations (LMECs) and allow Rosetta to minimize these conformations in the context of a fully flexible receptor. Nevertheless, ligand ensembles will inevitably contain irrelevant conformations which results in the search of irrelevant binding modes. Low energy ligand conformations were generated with MOE and the energy cutoff was determined by the number of rotatable bonds within the ligand [36], as discussed in the Methods section.
To evaluate how the use of such ligand ensembles would affect sampling efficiency of ligand binding modes with RMSD below 2.0 Å , the Uniform Sampling Efficiency of binding modes below  Table 1 and Figure S10). These ligands are characterized by many degrees of freedom, which contribute to the difficulty of sampling the bound conformation accurately.
While a benchmark of ligand ensemble generation methods was beyond the scope of this study, we noted some reduction in the number of non-native ligand conformers by using the Generalized Born implicit solvent instead of the distance-dependent dielectric constant. Further improvement might be possible in some cases by using MD-simulations to generate a canonical ligand ensemble weighted according to the Boltzmann distribution to identify the most populated and thereby most probable ligand conformations. For large, flexible ligands, a fragment-based docking approach Figure 5. Ligand binding pocket flexibility within comparative models compared to experimental structures. For each of the 26 pocket residues in the ligand binding pocket of the receptor, pocket collapse is defined as the change in distance from each pocket residue to the ligand, measured between the model and the experimental structure. A positive pocket collapse value indicates that the pocket residue moves closer to the ligand in the model compared to the experimental structure, while a negative pocket collapse value indicates movement towards the receptor. The width of the beanplot area represents the number of models having a pocket collapse of a certain value for the threaded models (blue) and for all generated comparative models after loop rebuilding and energy minimization (orange), with corresponding blue and orange horizontal lines representing the average pocket collapse over the given set of models. doi:10.1371/journal.pone.0067302.g005 Figure 6. Comparison of two comparative models analysis methods: filtering by ligand binding pocket residues and clustering on RMSD. For each receptor, the ten lowest energy models of the largest five clusters are represented in green and the one hundred lowest energy models from the knowledge-based filter on residues in the ligand binding pocket are represented in blue. The width of the beanplot area represents the number of models having a particular A) receptor RMSD and B) pocket residue RMSD, with the corresponding horizontal lines representing the average RMSD for all models clustered by RMSD (green) and models from the knowledge-based filter (blue). doi:10.1371/journal.pone.0067302.g006 might be more suitable and has already been applied in many drug design studies [56][57][58].

Interaction Energy is not Reliable for Identification of the Experimental Ligand Binding Mode when Docking into Flexible GPCR Comparative Models
The bioactive ligand conformation from the experimental structure as well as ligand conformers generated by MOE were docked into the top ranked comparative models as evaluated by clustering and the knowledge-based filter. While Rosetta considers the energy of the receptor while sampling ligand binding poses, noise generated by the multiple loop conformations makes it difficult to identify low energy binding modes using the total Rosetta energy for the receptor-ligand complex. As a result, we choose to make the assumption that all structures from the comparative model ensemble have equal energy and accuracy when docking the ligands. Thus the local minimum interaction energy conformation (LMIEC) with the lowest energy that we sample needs to equate the energy of the experimental ligand binding mode in order to be useful for its identification. To test the extent to which this occurs, the bioactive ligand conformation was re-docked into the experimental structure it came from with no relaxation of the complex after docking ( Figure S11). In 13 out of 14 cases we find that the lowest energy LMIEC was within 2.0 Å of the experimental ligand binding mode and that other LMIECs were significantly separated from the lowest energy LMIEC by 4.663.0 Rosetta Energy Units (REU). The one case where the lowest energy LMIEC deviated from the experimental ligand binding mode is IT1t in CXCR4, which has exceptionally few interactions to the protein, 7 water molecules within 4 Å , and contacts a residue in the N-terminal region, which is not represented in our models.
When repeating the protocol with the addition of flexibility within the receptor through a minimization step, the lowest energy LMIEC deviated from the experimental ligand binding mode in 6 out of 14 cases and with an insignificant energy gap of 0.362.1 REU, showing that lowest energy LMIEC is not suitable for identification of the experimental ligand binding mode in flexible models. However, while the ligand binding mode within a RMSD of 2.0 Å to the experimental structure could not be identified consistently by interaction energy in this highly biased analysis, it was possible to sample the correct binding mode in all 14 cases ( Table 3).

Sampling of Native-like Ligand Binding Modes is on Average 10 3 Fold Increased over Random Sampling
Having generated receptor-ligand complexes through docking, we asked how frequently the experimental ligand binding mode was sampled within an RMSD of 2.0 Å and compared this to the sampling efficiency that can be achieved using uniform sampling in a 5.0 Å docking sphere with no protein (USE2.0). In the receptor ensemble that was selected based on the knowledge-based filter, we found that the experimental ligand binding mode was sampled in all cases with an average of 10 3 fold increase over USE2.0 (Figure 7 and Table 3). The correct binding mode for S1P1R and CXCR4 was sampled correctly least often with only one correct binding mode out of approximately 10 4 generated models. The reason for the difficulty of sampling the correct S1P1R ligand binding mode is most likely related to its flexibility and its contacts to the N-terminus, which is lacking in our models. The low number of ligand-protein contacts in the model seems to be the main reason for the poor sampling efficiency of IT1t in CXCR4, which as discussed above, was not in an interaction energy minimum when its bioactive conformation was docked into a backbone static receptor. Even so, sampling the experimental ligand binding mode within 2.0 Å RMSD was increased 300 fold over USE2.0 for both S1P1R and CXCR4, demonstrating preference of biologically relevant ligand-protein interactions during the docking procedure.
On the other end of the spectrum of sampling efficiency is DOR, which sampled the correct binding mode in 266 out of approximately 10,000 cases -10 3 times better than USE2.0. The ligand in DOR, natrindole, has only 2 degrees of freedom, with all conformers below 1.0 Å of the bioactive ligand conformation (Table 1), and binds to the receptor mainly through hydrophobic contacts and one salt bridge. For all other cases, docking multiple ligand conformations into the comparative models sampled binding modes within 2.0 Å of the experimental binding mode less than 1% of the time (Figure 7).

Sampling of Native-like Ligand Binding Modes Improve within the Knowledge-based Filtered Comparative Model Ensemble
To assess the effect of the knowledge-based filter we compared the sampling efficiency in models selected with the knowledgebased filter receptor ensemble with those chosen by traditional clustering methods and found that sampling efficiency is improved in 10 out of the 14 cases (Table 3 and Figure 7).
Additionally, as an attempt to identify which parameters of the receptor models yielded native-like ligand binding modes, we examined the importance of the pocket residue RMSD in Figure 8. Accuracy between the C-alpha atoms of the pocket residues does not guarantee accurate ligand placement, as side chain placement varies greatly and creates many non-native binding pockets. Also, given the flexibility of the ligand conformations, it is expected that the docking algorithm detects alternate binding modes within a particular binding pocket conformation. Despite this, we show that more accurate placement of the residues within the ligand binding pocket leads to more binding modes sampled within 2 Å of the experimental ligand binding mode using the knowledge-based filters and templates of high sequence identity (Figure 7, Figure  S12). Importantly, while we show a correlation between pocket RMSD and ligand RMSD, the same effect cannot be shown when selecting receptor models using full receptor RMSD based clustering. This is likely due to the irrelevant noise that arises from non-pocket residues.

Interaction Energy Enriches for Experimental Ligand Binding Modes
Despite the lack of robustness in the use of interaction energy to identify the correct binding modes in relaxed experimental structures, we expected that it might be useful for enrichment of correct binding modes in our docking ensembles by removing obvious non-fit ligand-protein interactions. We found that an enrichment of approximately three fold can be achieved in most cases by taking the 10% best scored structures, as shown in Figure 9. However, when taking the top 10% of structures for bRh, CXCR4 and M3, sampling efficiency dropped. There was no correlation between optimal cutoff value and the overall sampling efficiency or the size of the largest cluster (data not shown).

Clustering Aids in Selecting Native-like Models
In spite of the low sampling efficiency of the experimental ligand binding modes, we hypothesized that clustering mediated through total energy optimization during docking might occur, and thus offer a method to identify native-like ligand binding modes. Notably, however, the Rosetta ligand docking algorithm does not in principle generate a Boltzmann distribution, but instead emphasizes sampling of rare binding modes, in hope of identifying a rare native-like global minimum interaction energy conformation [39]. This might blur any tendencies for clustering around an experimental ligand binding mode. Clustering was performed on the heavy-atom ligand RMSD with a cutoff of 2.0 Å and the lowest energy binding modes of the largest five clusters were examined further. Other cutoffs of 2.5, 3.0 and 3.5 Å , were also considered, but did not provide any improvement of clustering performance. The percentage of models in the largest clusters was below 1% for most receptors (Figure 10). For the receptors in which convergence occurred, however, there was some correlation between cluster size and ligand RMSD ( Figure 10A).
Within the largest clusters for each receptor, there was on average 12% coverage and 10% accuracy of the correct contacts between the ligand and receptor (Table 4 and Figure 10B). For CXCR4, KOR and NOP, alternate modes are preferred over the experimental binding mode. In examining cases where the experimental binding mode is not preferred, several problems are identified which keeps the ligand from binding in the correct mode. For CXCR4 and NOP, less than 30% of the ligand conformers came within 2.0 Å of the bioactive ligand conformation, resulting in inaccurate docking results. For ligands binding high within the receptor binding sites such as A2Ar, incorrect loop placement in the models blocks the ligand from docking in the correct mode ( Figure 11A). Incorrect loop placement can also induce hydrogen bonds favorable to the ligand which move it into an incorrect binding mode, as shown in KOR ( Figure 11B). For many ligands, Rosetta places the ligand in the correct position but is unable to discriminate the correct interactions and flips the ligand orientation as seen in H1R ( Figure 11C), indicating possible inaccuracies within the force field and improper treatment of polar interactions. There were two cases, DOR and M2R, in which Rosetta was able to identify the correct binding mode within 2.0 Å in the top ranked clusters (Table 4), shown in Figure 11D and 11E. Docking naltrindole in DOR and 3-quinuclidinyl-benzilate in M2R was simplified by the limited number of rotatable bonds in the ligand and high sequence identities of the templates.

Comparison with Previous Studies on Ligand Docking into GPCR Comparative Models
Using Glide [59] and Induced Fit Docking [60] to dock ligands within biased comparative models of GPCRs, Beuming and Sherman [6] ranked ligand binding modes within 2.5 Å of the experimental ligand binding mode in six out of the ten receptors they modeled. In these six cases, success was likely due to the structural similarity of the templates, which always came from receptors of the same sub-family: b-adrenergic receptors were used as templates for each other and for H1R and the muscarinic receptors were used as templates for each other. Alignments were manually refined to ensure correct alignment of loop regions and the disulfide bridge within ECL2. Only regions with missing density according to the alignment were rebuilt using PLOP [48]. Additionally, the ligand from the template structure remained within the model during the comparative modeling process, which may have assisted in the preservation of the ligand binding pocket. While remaining relatively unbiased in sequence alignment and de novo loop rebuilding, Rosetta was able to sample binding modes within 2.5 Å of the experimental ligand binding mode in all cases. However, inaccuracies in the energy function and flexibility Reported is the negative log of the sampling efficiency of ligand binding modes within 2.0 Å RMSD of the bioactive ligand conformation within the experimental structure as measured over the ligand heavy-atoms. *denotes where sampling efficiency of Rosetta is worse than the worst-case uniform sampling scenario. # ND denotes not defined. No binding modes within 2.0 Å were sampled for this case. 1 fold improvement over USE2.0 of bioactive ligand is given in parentheses. introduced within the pocket residues made it difficult to identify native-like binding modes as top ranked. As discussed above, Rosetta had success in ranking the correct binding mode only in the cases of M2R and DOR.

Sampling Efficiency is Increased by One Order of Magnitude for Every 10 known Ligand-protein Contacts
Docking into comparative models guided by mutational data is a widespread and largely non-validated method in the literature. Typically, side-chain alterations that heavily affect ligand binding are interpreted as having direct contacts to the ligand. To assess how such information can be used as experimental constraints in our ligand-protein ensembles, we tested to which extent these constraints would allow us to detect the correct binding mode. Enrichment of the correct binding modes was determined through 10,000 iterations of randomly chosen contacts between 0 and total number of all 4.0 Å contacts between the ligand and receptor. When docking ligand conformers into comparative models, the sampling efficiency for native-like binding modes increased on average by one log scale for every 10 known contacts assumed between the binding mode and the receptor (Figure 12). The greatest improvement was seen for receptors where sampling efficiency of the experimental binding mode was already above 0.1%, particularly for DOR, NOP and B1Ar. Little or no improvement in sampling efficiency was observed for those receptor ensembles already sampling less than 0.1% of the experimental binding mode, including A2Ar, B2Ar, S1P1R and CXCR4. Experimental data with higher information density, such as the ionic interactions used for blinded prediction of the binding mode of eticlopride in the dopamine D3 receptor, can be expected to provide a significantly higher improvement in sampling efficiency -in our ensemble, the sampling efficiency was improved by 7 fold by requiring a distance of less than 3.0 Å between the positively charged hydrogen atom on the tertiary amine and the oxygen atoms in the carboxyl acid group of the aspartic acid.

Concluding Remarks
This study provides an analysis of the sampling performance that can be expected when docking ligands into comparative models of GPCRs. Previous studies of ligand docking into GPCR comparative models have demonstrated that the problem is highly challenging for all but the simplest of cases that require the least   sampling of receptor space [3,4,6]. This is in agreement with recent docking studies for flexible ligand docking into multiple static structures [61][62][63], which consistently report that while the performance in docking and screening accuracy of a 'small' conformational ensemble is superior to that of a single conformer, that performance starts to rapidly decline when the size of the conformational ensemble begins to grow. The study presented here sought to quantify the challenges of docking ensembles of ligand conformers into comparative models through deep and relatively unbiased sampling using full receptor and ligand flexibility.
Comparative models of 14 unique GPCRs were constructed using the other 13 experimental structures as templates. Threading was based on the unbiased alignment between the target and template sequences and loops were constructed de novo with a fragment-based loop closure algorithm in Rosetta. When com- Figure 10. Clustering captures binding modes with lower RMSD and increased contact coverage. Binding modes for each receptor were clustered by ligand heavy-atom RMSD with a cutoff of 2.0 Å . When compared to smaller cluster sizes, the large cluster sizes were more likely to capture A) lower average ligand RMSD to the experimental binding mode and B) a higher percentage of correct ligand contacts. Contact coverage was calculated using SimiCon [42]. doi:10.1371/journal.pone.0067302.g010 pared to corresponding experimental structures, the most accurate comparative models demonstrated a correlation to Rosetta energy. Top ranked structures with templates within 50% sequence identity were modeled with an average RMSD of 2.2 Å in the transmembrane region, with the best models coming within 1.2 Å RMSD. Extracellular loops with lengths ranging between 5 and 7 residues were modeled with an average RMSD of 2.0 Å , while ECL2 was modeled with an average RMSD of 5.3 Å . The most difficult cases to model were those in which helical regions were unable to align to suitable templates and those cases in which Nterminal residues necessary for ECL2 packing were missing.  Despite these challenges, Rosetta was still able to rank more accurate loop conformations than other leading methods. Using the ligands found in the crystallized GPCR structures, docking was performed in the top ranked comparative models. Docking ensembles of ligand conformers into comparative models sampled the correct ligand binding mode for each of the 14 receptors, but often less than 1% of the time. While the lack of energy gap makes discrimination of the correct binding modes difficult, certain techniques for filtering the models and binding modes demonstrated some success in this study. Using templates with a sequence identity above 50% provides a higher chance for correctly modeling of the ligand binding pocket as also observed in previous studies [6]. In cases where such templates do not exist, using a knowledge-based filter to identify models for which the binding pocket is within the variability that is represented in the experimental structures is beneficial for docking, significantly increasing sampling efficiency in 10 of the 14 cases. Inaccuracies in the minimized structures strongly affected the accuracy in the loop regions, which in turn affected the resulting ligand binding modes. Therefore, it may be best to limit the flexibility introduced by sampling when using a highly homologous template, such as the case for B1Ar and B2Ar.
As demonstrated in this study, clustering can provide improvement over energy for identifying correct binding modes, but only if clusters contain at least 1% or more of the total binding modes. Selection of the correct binding mode from an ensemble of models might be further improved using information from structure activity relationship of active ligands, as proposed by Katritch et al., to select the best performing models from an ensemble [64]. However, this requires knowledge about active ligands which is typically limited for novel protein or receptor targets and the approach is based on the assumption that different ligands share a common binding mode.
In addition, many papers are published under the premise that experimental information such as mutational data can aid in finding the correct ligand binding mode within a large ensemble of models [65][66][67][68][69]. Considering the challenges faced in this study, application of mutational data as experimental constraints seems to be an appealing strategy. Sampling efficiency for binding modes within 2.0 Å of the experimental ligand binding mode increased on average by one log scale for every 10 known contacts between the binding mode and the receptor. However, the expected benefit should be evaluated carefully on one or more experimental ligandreceptor complexes to access the true value of such constraints -in particular since indirect effects are known to occur and could blur the identification and the selection of the ''correct'' binding modes.
Through the use of unbiased sequence alignments and sampling algorithms using the Rosetta software suite, the most challenging scenario for GPCR comparative modeling and ligand docking was explored. As with other studies on comparative modeling and docking, however, there were still minor biases introduced in both aspects of this work which may limit the scope of this approach. Bias in the comparative modeling experiments included the addition of constraints on the disulfide connectivity of the loops based on the experimental structures, which influenced the conformations of ECL2. Bias in the ligand docking experiments included the ligand stereochemistry and protonation state of CXCR4, restricting the conformational search space by centering a sphere of 5Å radius at the center of the experimentally determined binding mode. Also, bias to the experimentally determined structures could have been eliminated with a leaveone-out cross-validation of the knowledge based filter. Despite these biases, the findings of this study identified specific avenues for improvement to approach this challenging problem. Knowl-edge-based and energy-based filters are able to improve sampling performance over random by 10 3 fold. Additionally, sampling performance is increased by one order of magnitude for every 10 residues known to contact the ligand. Contacts with high information density, specifically the salt bridge between the oxygen atoms of an aspartic acid in DOR and the positively charged hydrogen atom on the tertiary amine of its ligand, improved sampling efficiency 7 fold. As the number of GPCR experimental structures being solved increases, so does the opportunity to find suitable templates for comparative modeling. With the guidelines suggested by the results from this study, relevant ligand docking studies may be able to generate structural hypotheses to guide experimental designs. Figure S1 Ligand structures used in this study. Ligand structures depicted here were crystallized with the G-protein coupled receptors used in this study and were obtained from the Protein Data Bank. (TIF) Figure S2 Structure-based sequence alignment of Gprotein coupled receptors. This sequence alignment of the fourteen GPCRs used in this study was obtained through a structural alignment of the receptors in MUSTANG [28]. Transmembrane regions are highlighted in blue, cysteine residues forming disulfide bonds are highlighted in yellow and residues in contact with their respective ligands are highlighted in purple. Conserved residues representing Ballesteros-Weinstein x.50 are outlined with a black box. The figure was generated using Aline [44]. (TIF) Figure S3 Structures of G-protein coupled receptors used in this study. Experimental structures of the fourteen Gprotein coupled receptors used in this study were obtained from the Protein Data Bank. Extracellular loop (ECL) 1 is shown in yellow, ECL2 in purple, and ECL3 in orange. (TIF) Figure S4 Energy plot of relaxed experimental structures and comparative models compared with full receptor RMSD. For each structure, full receptor RMSD is plotted against total Rosetta energy. The experimental structure was minimized in the Rosetta force field without the ligand (in green) and with the ligand (in blue). Comparative models are in grey, with models selected through clustering in orange and models selected by the knowledge-based filter in purple. (TIF) Figure S5 Energy plot of comparative models based on templates of varying sequence identity. For each comparative model, pocket residue RMSD is plotted against total Rosetta energy. Each point is colored by the template by which the model was built, with color varying from blue to red with increasing sequence identity. (TIF) Figure S6 Energy plot of ECL1 in comparative models. For each comparative model, ECL1 RMSD is plotted against the Rosetta energy for residues in ECL1. (TIF) Figure S7 Energy plot of ECL2 in comparative models. For each comparative model, ECL2 RMSD is plotted against the Rosetta energy for residues in ECL2. ECL2 for A2Ar and H1R could not be evaluated because of unresolved structure in this region of the experimental structure in the Protein Data Bank. (TIF) Figure S8 Energy plot of ECL3 in comparative models. For each comparative model, ECL3 RMSD is plotted against the Rosetta energy for residues in ECL3. ECL3 for KOR could not be evaluated because of unresolved structure in this region of the experimental structure in the Protein Data Bank. (TIF) Figure S9 Structural representations of extracellular loop two from comparative models compared to experimental structures. For A) bRh, B) B1Ar, C) B2Ar, D) A2Ar, E) CXCR4, F) D3R, G) H1R, H) S1P1R, I) M2R, J) M3R, K) MOR, L) KOR, M) NOP and N) DOR, the experimental structure is represented in gray, the most accurately sampled model is represented in yellow and the top ranked model is represented in blue. The top ranked model is the lowest energy model of the largest cluster, where clustering is performed on pairwise full receptor C-alpha RMSD over the top ten percent of comparative models by energy. (TIF) Figure S10 RMSD of ligand conformations generated by MOE. Ligand conformations generated by MOE using the MMFF94x force field and Generalized Born solvation model were compared to the bioactive conformation found in the experimental structure by RMSD to heavy atoms in the ligand. The average RMSD is represented by a black line. The fold decrease in sampling efficiency is calculated by the uniform sampling efficiency within a 2.0 Å radius (USE2.0) for the bioactive ligand conformation divided by the uniform sampling efficiency within a 2.0 Å radius for ligand conformers. (TIF) Figure S11 Interaction energy plot of binding modes from docking into experimental structures and comparative models. For each structure, ligand heavy-atom RMSD is plotted against Rosetta interaction energy. The bioactive ligand conformation was docked into the static experimental structure (in blue), the energy minimized experimental structure (in orange) and comparative models (in purple). Ligand conformers generated by MOE were docked into comparative models, shown in yellow. (TIF) Figure S12 High sequence identity templates produce models with more accurate binding modes. Each point represents the average ligand RMSD over all binding modes produced by docking the ligand into target GPCR comparative models built using a particular template. For each target-template pair, percent sequence identity was calculated on the sequence alignment shown in Figure S2. Sequence identity is shown here to correlate with low average ligand heavy-atom RMSD. (TIF)

Supporting Information
Table S1 Rosetta loop modeling in comparative models with cyclic coordinate descent compared to kinematic loop closure. Reported is the average RMSD and standard deviation for all comparative models of target receptors, calculated over C-alpha atoms in the loop regions compared to the corresponding experimental structure from the Protein Data Bank. Loop closure with KIC was only performed on a subset of the GPCR dataset. (DOCX) File S1 Protocol capture. This protocol capture contains the steps necessary to obtain the results presented in the manuscript ''Assessment and challenges of ligand docking into comparative models of G-protein coupled receptors''. The input files necessary to carry out the steps outlined in this protocol as well as the output files relating to the results found in the manuscript are provided in the attached folder: File S2. GPCR_model_dock.zip. While the actual protocol was carried on every pairwise combination of GPCRs from Table 1, this protocol capture uses the comparative modeling of bRh onto the template B2Ar as an example for simplification. The Rosetta 3.4 software suite is publically available and the license is free for non-commercial users at http://www. rosettacommons.org/. The supplementary materials are included with Rosetta 3.5 under the directory ''rosetta_demos/protocol_capture/2012/GPCR_model_dock''. (DOCX) File S2 Files for protocol capture. The input files necessary to carry out the steps outlined in the protocol capture in File S1 as well as the output files relating to the results found in the manuscript are provided in this attachment. (ZIP)