An Algorithm to Identify Target-Selective Ligands – A Case Study of 5-HT7/5-HT1A Receptor Selectivity

A computational procedure to search for selective ligands for structurally related protein targets was developed and verified for serotonergic 5-HT7/5-HT1A receptor ligands. Starting from a set of compounds with annotated activity at both targets (grouped into four classes according to their activity: selective toward each target, not-selective and not-selective but active) and with an additional set of decoys (prepared using DUD methodology), the SVM (Support Vector Machines) models were constructed using a selective subset as positive examples and four remaining classes as negative training examples. Based on these four component models, the consensus classifier was then constructed using a data fusion approach. The combination of two approaches of data representation (molecular fingerprints vs. structural interaction fingerprints), different training set sizes and selection of the best SVM component models for consensus model generation, were evaluated to determine the optimal settings for the developed algorithm. The results showed that consensus models with molecular fingerprints, a larger training set and the selection of component models based on MCC maximization provided the best predictive performance.


Introduction
The identification of ligands that display a high affinity for one protein target and that are significantly less active for another or a group of closely related members of a given family is of high relevance for modern drug discovery. Apart from using selective ligands as leads in drug design workflows, they can also be applied as molecular probes for studying, e.g., cellular functions [1]. Because the validation of compound selectivity requires significant experimental efforts and financial resources, fast and accurate computational methods to predict ligand selectivity are highly desirable.
In recent years, diverse computational ligand-and/or structure-based approaches to explain the molecular mechanism of selectivity and/or to predict compound selectivity have been developed. The most prominent example reported on molecular dynamic simulations combined with free energy calculations to study mechanisms underlying the selectivity of tyrosine phosphatases PTP1B/TCPTP/SHP-2 [2], phosphatidylinositol-3-kinases PI3Kα/PI3Kγ [3] and phosphodiesterase PDE5/PDE6 [4]. Other studies have described QSAR modeling to predict the ligand selectivity for serotonin 5-HT 1E /5-HT 1F [5] or dopamine D 2 /D 3 receptors [6] and for a panel of 45 different kinases [7]. Yet other investigations used machine learning (ML) methods to construct selectivity prediction models, e.g., ML based on neural networks to generate structure-selectivity relationship models [8], the binary classification SVM (Support Vector Machines) algorithm to solve multiclass predictions and compound ranking to distinguish between selective, active but non-selective, and inactive compounds [9], and the LiCABEDS (Ligand Classifier of Adaptively Boosting Ensemble Decision Stumps) algorithm to model cannabinoid CB 1 /CB 2 selectivity [10].
Among fourteen 5-HT receptor (5-HTR) subtypes, 5-HT 7 R represents the most recent addition to a subfamily of G-protein-coupled receptors (GPCRs). Distribution studies revealed a correlation between the localization of 5-HT 7 Rs in the CNS (especially in the suprachiasmatic nucleus) and their function, suggesting that they are involved in the regulation of circadian rhythm, learning and memory processes, as well as in pathological processes such as affective disorders, neurodegenerative diseases, and cognitive decline [11]. A large body of evidence has demonstrated that the clinically established antidepressant effects of atypical antipsychotics, i.e., amisulpiride, lurasidone and aripiprazole, are mediated by antagonism at 5-HT 7 Rs [12,13]. Several preclinical studies support the hypothesis that 5-HT 7 R antagonists may produce beneficial pro-cognitive effects and ameliorate negative symptoms of schizophrenia in animal models [14][15][16][17]. On the other hand, potential application for 5-HT 7 R agonists has been proposed for the treatment of dysfunctional memory in agerelated decline and Alzheimer's disease [18], diabetic neuropathy and neuropathic pain [19,20]. Moreover, recent preclinical findings have demonstrated novel therapeutic applications of 5-HT 7 R agonists for the treatment of fragile X syndrome, ADHD and other attention deficit-related diseases [21,22].
Despite a great interest in 5-HT 7 R since the 1990s, its function remains incompletely understood. Apart from fundamental criteria for the classification of receptors, i.e., primary amino acid sequence and signal transduction (G-protein, β-arrestin or MAPK/ERK pathways), 5-HT 7 R displays structural features that are similar to those of 5-HT 1A R [23][24][25][26]. Although this similarity hampers the design of selective ligands of 5-HT 7 R [27,28], the situation appears to be even more complicated when considering the co-localization and functional interplay between 5-HT 7 and 5-HT 1A Rs (i.e., homo/hetero dimerization, receptor desensitization and/or internalization) [23,29].
Considering the aforementioned findings regarding the clinical significance of 5-HT 7 R, the elaboration of new algorithms to support the design of selective 5-HT 7 R agents (as an alternative to those reported in the literature- Fig 1) appears to be critical to obtain a more detailed understanding of the pharmacological function of 5-HT 7 Rs.
Here, we developed and investigated the algorithm (based on SVM [38] classification models of ligands showing different affinity/selectivity relationships for 5-HT 7 /5-HT 1A receptors and a data fusion approach) for its application to predict ligand selectivity between both targets (Fig 2). The performance of this algorithm was compared to a simple ranking strategy and the best in-class component SVM models. Furthermore, ligand-(molecular fingerprints) and structure-based (Structural Interaction Fingerprint, SIFt) data representations, as well as performance metrics (AUC and MCC), were evaluated to select the best SVM models.

Data sets and definition of training classes
The compounds with activity determined for both 5-HT 1A and 5-HT 7 receptors were retrieved from the ChEMBL v17 database [39]. The parameters (K i , IC 50 and pK i ) describing the ligand affinity were used to define four subsets of compounds (Table 1), i.e., selective toward 5-HT 7 R (Selective) or 5-HT 1A R (Revsel), not-selective but active (Nselbact) and not-selective (Notsel). The list of compound ChEMBL IDs belonging to a given subset is provided in the Supporting Information (S2 File).
The pK i and IC 50 values were recalculated to the K i using the following expressions: K i = 10 -pKi and K i = IC 50 /2. The conversion factor of 2 was suggested by Kalliokoski et al. [40] and has been applied successfully in similar studies [41,42]. In addition, for each selective ligand, 50 decoys with similar 1D physicochemical properties to remove bias (e.g., molecular weight, logP) and a dissimilar 2D topology to be likely non-binders, were generated using DUD-e service [43]. Accordingly defined sets were further used to construct component (class-
In the structure-based approach (Fig 3), Structural Interaction Fingerprint profiles (SIFt-p) were used [48,49]. They were obtained by docking all the defined subsets of ligands to different conformations of 5-HT 1A and 5-HT 7 Rs homology models [26,50] with and without extracellular loops (EL). The 3-dimensional structures of the ligands were prepared using LigPrep v3.6 [51], and the appropriate ionization states at pH = 7.4 were assigned using Epik v3.4 [52]. The   Schematic of the algorithm. The ChEMBL database was filtered out to extract the compounds with annotated activity for both 5-HT 7 and 5-HT 1A receptors. Next, the obtained set of compounds was divided into four subsets using defined rules (Table 1). Additionally, using the DUD-e web service, the decoys for the Selective set were generated. The compounds from each subset were next encoded in binary string format using a set of molecular (ligand-based approach) and interaction fingerprints (SIFt-p, structure-based approach). Next, for each representation, the Selective subset was merged with one from the remaining sets to generate four groups for use in independent training and testing of RBF SVM models (10 trials per issue). The best in-class SVM models were selected based on AUC and MCC values. The final ranking was obtained by application of the SUM rule of data fusion, in which the scores of component SVM models were summed. Abbreviations used: Revsel-reversed selective, i.e., at least 5-fold more active for 5-HT 1A R than 5-HT 7 ; Nselbact-not selective but active, i.e., dual ligands; Notsel-not selective, i.e., the remaining compounds; SIFt-p-Structural Interaction Fingerprints profile (calculated by averaging the SIFts obtained for the docking of a given compound to three target conformations); SVM RBF-Support Vector Machines with radial basis function kernel. Protein Preparation Wizard was used to assign the bond orders, appropriate amino acid ionization states and to check for steric clashes. The receptor grid was generated (OPLS_2005 force field) by centering the grid box with a size of 15 Å on Asp3.32. Automated docking was performed using Glide v6.9 at the SP level with the flexible docking option turned on [53]. Five poses per ligand were generated, but only the one with the best Glide Score was used for SIFt generation.
The SIFt encodes the 3D ligand-protein complex in the form of a 1D binary string, in which a nine-bit pattern is used to describe the interaction type: any contact, backbone, side chain, polar, aromatic, hydrophobic interaction, hydrogen bond donor/acceptor and charged [54]. Schematic of the structure-based approach. In the first stage, all subsets of compounds were docked (Glide SP mode) to the three conformations of the 5-HT 1A R and 5-HT 7 R homology models (generated on 5-HT 1B R and D 3 R templates). Next, the interaction fingerprints (SIFt) were calculated for all obtained ligand-receptor complexes. The interaction analysis resulted in the selection of 34 common amino acids that formed any type of interaction with the ligands. For a given compound, the SIFts were recalculated (independently for each receptor and template) and averaged (SIFt-p). Finally, for each compound, the reduced SIFt profile (concatenating dockings to 5-HT 1A and 5-HT 7 receptors to a single vector) was used as an input to SVM. The SIFt-p were created by calculating the mean value for each position in three fingerprint strings obtained for a given compound in three conformations of a given receptor model. Based on the docking of all compounds and sequence alignment of 5-HT 1A and 5-HT 7 receptors, a set of 34 common amino acids (sharing the same sequence position and having any contact with the docked set of ligands) was defined, and the reduced SIFt-p were created. Finally, for each compound, the reduced SIFt profiles from docking to 5-HT 1A and 5-HT 7 receptors were concatenated to a single vector that handles information regarding averaged interactions between a particular ligand and both receptors.

Optimization of SVM learning parameters
The molecular fingerprints and concatenated SIFt-p were used as input to generate classification models using the SVM algorithm. To select the classification model with the best performance for a given training class, a bootstrapping procedure was used. All classes were divided into training and test sets using two ratios: 0.40 and 0.60 (i.e., 40% and 60% of all examples, respectively, were used for training, while the remaining ones constituted the testing set). For each ratio, 10-trials of resampling with replacement of the original sets was performed to optimize the kernel parameters in the SVM classification model. The models were constructed using the SVMlight library and radial base function (RBF, the Gaussian function) [55]. For each run, a grid search was performed for two parameters: a penalty of the error term C 2

CScore model generation
For each set of optimized SVM component models, the threshold (boundary value that separates positive and negative classes) for which MCC was highest was determined by sampling the range of the RBF decision function with a step of 0.1. Next, the best SVM models were identified and used to build the consensus score (CScore) classifiers (Fig 5). Two criteria were used to select the best in-class SVM model-the highest AUC or MCC values. The set of best component models and their thresholds were then used to perform the new classification. The CScore model was created by applying the SUM rule from the data fusion [56] merging the outputs of the individual component classifiers by summing the predictions that were calculated using the thresholds obtained for each component model.

The performance measures
The recall (1), precision (2), Mathews Correlation Coefficient-MCC (3), area under the receiver operator characteristic (ROC) curve (AUC) and the Boltzmann-Enhanced Discrimination of ROC (BEDROC) metrics were used to assess the classification effectiveness of trained SVM models.
where TP denotes the number of true positives (actives labeled as actives), TN-true negatives,  [57] to address the problem of "early recognition" in virtual screening. It can be interpreted as the probability that an active is ranked before a randomly selected compound that is exponentially distributed with parameter α, which controls the earliness of "early recognition" to test whether a ranking method is useful in the context of VS. The BEDROC metric ranges from 0 to 1, and it was calculated for α = 20 in the present study, as previously suggested [58].
The ROC curves and AUC values were calculated using the ROCR [59] package in R [60]. The BEDROC was also calculated in R using the enrichvs package [61].

Results
It should be emphasized that in the present analysis we focused on comparing the performance of the designed algorithm in different settings (i.e., representations, selection of the best component models) in terms of its ability to distinguish Selective from not-selective, inactive, decoy and multimodal (dual) compounds for the virtual screening of molecular databases. Moreover, the classification obtained by combining the SVM and binary molecular fingerprints cannot be interpreted at the level of chemical features that may be responsible for compound selectivity.
Initially, the performance of the CScore, the best component and Classical (trained on the Selective subset as positive class and on the sum of the Revsel, Notsel and Nselbact subsets as negative classes) were compared. The AUC, BEDROC and MCC parameters were calculated ( Table 2) for ligand-based and structure-based approaches.
Interestingly, to address the "early recognition" problem (BEDROC value), the CScore models always demonstrated superior performance for recognizing selective over not-selective compounds in comparison to any single SVM-based ranking strategy (i.e., Classical and best component). However, considering the global performance (MCC, AUC), a single strategy Table 2. Performance of CScore models obtained for the 0.40 training ratio and AUC and MCC strategies compared with the best single models trained in the classical manner and the best in-class component models. The median values for the Classical and best component strategies (ten trials were performed) are presented in parentheses. sometimes provided better results than a consensus approach. In should be noted that use of MCC as the SVM model selection strategy consistently provided better CScores than AUC.

Fingerprint
To evaluate global performance, the CScore was compared with all component models. Fig  6 shows an example panel of heat maps comparing the results obtained for ligand-based ( Fig  6A) and structure-based (Fig 6B) approaches. As expected, in the majority of cases, the CScore models were better than any of the component models. The CScore models optimized for AUC and MCC ranked first in 56.7% and 40% of the analyzed cases, respectively (Fig 7).
Additionally, to study the relationships between CScore and component models, the clustering of rows (represented by vectors containing four performance parameters) using the complete linkage method with Euclidean distance measure was performed (all of the mentioned heat maps are available in S1 File). The global analysis of dendrograms revealed that the  performance of the CScore models was coupled with component models on different levels of performance similarity. For example, considering the highest level of similarity (i.e., the shortest Euclidean distance between two models), the performance of the CScore model was most similar to DUD, Revsel, Notsel and Nselbact in eight, six, eight and one cases, respectively. In the five remaining cases, CScore was coupled on the second level (i.e., with a cluster formed by two component models). Interestingly, eight cases had singleton component models (Nselbact and Notsel in six and two, respectively) and generally displayed poor performance. Comparison of the approaches used to generate the input data (Fig 8) revealed that significantly better CScore models were obtained for ligand-based ( Regarding the homology modeling approach, the results showed that for both 5-HT 1B R and D 3 R templates, slightly better CScore models were obtained when homology models with extracellular loops were used for SIFt-p generation (MCC for both models = 0.50, Fig 8). Additionally, the template also appears to influence the performance of the CScore models-better results were obtained for more homologous 5-HT 1B R template.
The CScore models obtained for component models selected using MCC criteria were slightly more efficient than those constructed using component models with the best AUC values. Because we optimized MCC to identify the threshold enabling the best classification effectiveness for each component model, the approach based on CScore model generation for MCC provides, globally, SVM models with the highest performance on classification tasks.

Identification of Selective Ligands
Finally, increasing the size of the training set (from 40% to 60% actives) improved the performance of both component and CScore models in the majority of cases (Fig 9), which is consistent with our previous findings [62,63].

Selectivity threshold
As demonstrated in the present analysis, machine learning classification models trained on a set of ligands with different selectivity and activity profiles can provide a consensus model, the performance of which is significantly better than the component models. It must be stressed that the ligand was regarded as selective for 5-HT 7 R as long as the ratio of K i (5-HT 1A )/ K i (5-HT 7 ) was greater than 5. The rationale for this criterion was based on a close investigation of the data retrieved from ChEMBL database v17, which showed that there were 69 compounds for selectivity threshold ! 5, whereas there were only 34 compounds for threshold ! 20 that could be used for training and testing of the SVM models (S2 Fig). It should be noted that different thresholds for the selectivity index and rationale for their assessment have been used in similar studies, however, a quantitative definition is lacking. For example, Ma et al. used a selectivity index ! 10, which was selected based on the findings for the selective CB 1 /CB 2 cannabinoid ligand by J.W. Huffman [10]. However, Wang et al. used a threshold for the selectivity index ! 3 for the kNN QSAR Classification model for 5-HT 1E /5-HT 1F receptor selectivity [5]. To minimize errors resulting from, e.g., uncertainty regarding K i values, they tested four selectivity thresholds and demonstrated that a threshold higher than 3 led to unacceptable models, generally due to a number of selective ligands that was too small. However, a higher selectivity index threshold was used by Wassermann [9] and Ning (50-fold) [8].

Performance of the CScore
To test general performance, as well as the "early recognition" preferences of the proposed algorithm, different performance metrics were applied. Although widely used, AUC is not a Heat maps comparing the performance of CScore models obtained for all the studied cases, i.e., representation of the data (three molecular fingerprints and SIFt-p for models with and without loops) and strategy for component models selection (AUC, MCC).
doi:10.1371/journal.pone.0156986.g008 sufficient metric to address the "early recognition" problem specific to VS [57,58]. Additionally, the application of AUC to rank any database necessitates the selection of a decision threshold that enables binary classification (e.g., active/inactive or selective/not-selective). Consequently, different metrics can potentially be used to identify the optimal threshold. For example, Alvarsson et al. used net reclassification improvement (NRI) [41]. Because MCC is a more balanced summary statistic of the confusion matrix when unbalanced classes (see Table 1) are used [57], we decided to apply it in our algorithm.
Our analyses revealed that significantly better CScore models were obtained when the component models with the best MCC compared with the best AUC value were selected. This observation is explained in Fig 5, in which red circles on ROC curves depict the decision thresholds that were determined by maximizing MCC. These circles are localized in the area of the curve in which the number of true positives is ranked before false positives, which in some cases corresponds to the "early recognition" of the BEDROC.

Fingerprint influence
The influence of diverse parameters on the performance of the proposed algorithm was tested. Interestingly, CScore models based on the molecular fingerprints showed better performance than averaged interaction fingerprints (SIFt-p). All of the used fingerprints have different lengths, ranging from 166 (MACCS FP) to 4860 bits (Klekota-Roth FP), whereas concatenated SIFt-p had lengths of 1494 and 1458 bits for receptors with and without EL, respectively. There was no correlation between the length of the representation and the performance of the CScore model. The superior performance of ML models based on molecular rather than on interaction fingerprints in retrieving selectivity patterns may be due to uncertainty in predicting the correct binding mode by docking. It should be noted that because models obtained for receptors with EL showed better performance than those without loops, these additional four amino acids belonging to EL could play a role in the recognition of selective ligands. The method presented herein could be especially useful for the virtual screening of chemical databases and for assessing combinatorial libraries to prioritize compounds for synthesis. It also offers more control capabilities in virtual screening searches for selective ligands because it enables the construction of a CScore model using different classification thresholds and performance parameters, e.g., one can generate a CScore model to optimize its performance for recall or precision. Additionally, the proposed algorithm is flexible, and after redefining the training classes, it can be used to, e.g., predict multimodal ligands.

Conclusions
In this study, a new algorithm is presented to identify new target-selective ligands and is evaluated based on its selectivity prediction for 5-HT 7 receptor ligands over the 5-HT 1A subtype. We adopted data fusion and SVM component models (class-specific) that were trained on four datasets, i.e., selective toward 5-HT 7 R (Selective) or 5-HT 1A R (Revsel), not-selective (Notsel) and not-selective but active (Nselbact), to construct the consensus classifier-CScore. The primary objective of this study was to obtain a virtual screening algorithm, which was evaluated in terms of its "early recognition" performance using the BEDROC metric. The analyses showed that the CScore was a significantly better scoring strategy than the best single models trained in a classical manner and the best in-class component models. The selection of component models to construct the consensus classifier is crucial and is significantly influenced by the molecular representation and performance parameter applied. In all studied cases, selection of the component models with the best MCC versus AUC value improved "early recognition" (measured by BEDROC).
Considering the successful implementation of the proposed algorithm, it will be incorporated into our screening protocol [64] and applied to analyze combinatorial libraries to prioritize the synthesis of selective 5-HT 7 R ligands. Further improvements in the functionality of the algorithm will be conducted to improve its utility for other research groups (S2 File).