Skip to main content
  • Loading metrics

Controlling astrocyte-mediated synaptic pruning signals for schizophrenia drug repurposing with deep graph networks


Schizophrenia is a debilitating psychiatric disorder, leading to both physical and social morbidity. Worldwide 1% of the population is struggling with the disease, with 100,000 new cases annually only in the United States. Despite its importance, the goal of finding effective treatments for schizophrenia remains a challenging task, and previous work conducted expensive large-scale phenotypic screens. This work investigates the benefits of Machine Learning for graphs to optimize drug phenotypic screens and predict compounds that mitigate abnormal brain reduction induced by excessive glial phagocytic activity in schizophrenia subjects. Given a compound and its concentration as input, we propose a method that predicts a score associated with three possible compound effects, i.e., reduce, increase, or not influence phagocytosis. We leverage a high-throughput screening to prove experimentally that our method achieves good generalization capabilities. The screening involves 2218 compounds at five different concentrations. Then, we analyze the usability of our approach in a practical setting, i.e., prioritizing the selection of compounds in the SWEETLEAD library. We provide a list of 64 compounds from the library that have the most potential clinical utility for glial phagocytosis mitigation. Lastly, we propose a novel approach to computationally validate their utility as possible therapies for schizophrenia.

Author summary

Phagocytosis is a fundamental biological process to protect biological organisms from exogenous infectious particles as well as to preserve equilibrium and efficiency of the host by removing its unwanted cells. A dysregulation of the phagocytic activity can lead to severe consequences for the host. In this study, we focus on a recent theory that relates an excessive phagocytic activity in brain cells, and a consequent abnormal reduction in brain volume, to the development of schizophrenia. Our working hypothesis is that pharmaceutical compounds that can reduce excessive of phagocytic activity might prove effective as a schizophrenia treatment. Rather than attempting to develop ex-novo such a chemical compound, we rely on a more cost-effective and efficient approach that seeks candidate therapies in a set of approved chemical compounds. To achieve this, we train a machine learning model capable of predicting, with good accuracy, the ability of a molecular compound to increase or decrease phagocytosis in the target brain cells. Our approach leverages learning models capable of directly processing the molecular graph of the compound, leading to the identification of 64 candidate drugs of potential clinical utility.

This is a PLOS Computational Biology Methods paper.


Schizophrenia is a chronic and severe mental disorder that affects how a person thinks, feels, and behaves. It is expressed as a combination of symptoms, such as recurrent psychosis, social withdrawal, anhedonia, and cognitive dysfunctions. Worldwide about 1% of the population is diagnosed with schizophrenia, with 100,000 new cases annually only in the United States [1].

A recent study [2] states that brain volumes, measured on Magnetic Resonance Imaging (MRI) scans, are abnormal in patients with schizophrenia compared to unaffected individuals, with a reduction in both grey and white matter. In particular, the decreased density of dendritic spines in schizophrenia subjects has been supposed by MacDonald et al. [3] as the result of an excessive pruning activity against synapses. This action is assumed to be performed by glial cells, which are non-neuronal cells with multiple functions in the central nervous system to support and remove neurons. This assumption is supported by the evidence that glial phagocytic activity may be directly associated with the prevalence of various neurodegenerative diseases due to hyperactivation of phagocytic pathways [4, 5]. In addition, the novel PET tracer binds to synaptic vesicle glycoprotein 2A (SV2A) and shows diminished uptake in the frontal and anterior cingulate cortex in individuals with schizophrenia [6].

Towards the goal of discovering novel treatments for schizophrenia, previous work conducted a large-scale phenotypic screen to discover compounds with the ability to alter glial cell phagocytosis. However, understanding structure activity relationships is a challenge in these screens. Further, generating accurate models is difficult because it is not known what chemical information is most associated with predicting chemical function and how to best represent this information for predictive models. Additionally, further experiments remain the gold standard for validating model predictions. Yet, it is not always possible to conduct additional high-throughput screens and we require alternative methods for testing the utility of model predictions.

The compound property/activity prediction problem is a task faced by pharmaceutical companies and academia to improve the comprehension of diseases, discover new drugs, or identify new indications of existing drugs. It is standard practice to scan large libraries of compounds to test their biological activity. However, this operation can be costly and time-consuming, and Machine Learning (ML) methods can be helpful to reduce the effort needed to run experiments. For that reason, in the last decades, several computational approaches have been proposed to determine compound properties, or as filter to select the most promising compounds for clinical and biological experiments [7, 8]. A pioneering work is that by Bianucci et al. [9], where the authors employed Cascade Correlation Networks for structures to predict the boiling point of Alkenes and to predict the affinity towards the Benzodiazepine/GABAA receptor by a group of Benzodiazepines. More recently, Banerjee et al. [10] have developed a ML model to discriminate between sweet and bitter taste of molecules. Specifically, the model leverages a static fingerprint of the molecule to predict the property through a Random Forest. Similarly, Lind et al. [11] feed a static fingerprint and oncogene mutation status to a Random Forest to predict the activity versus inactivity of drugs against cancer cell lines. These results demonstrate that it is possible to associate chemical information to biological outcomes. Yet, static fingerprints are not sufficient in all applications.

Explicitly for schizophrenia, Zhao et al. [12] explored five different ML approaches to repurpose drugs for schizophrenia, depression, and anxiety disorders. In particular, they considered Deep Neural Networks, Support Vector Machines, Elastic Net regression, Random Forest, and Gradient Boosted Trees. Models were trained to predict whether a drug is a known treatment for the disease or not, using drug expression profiles as inputs. Those profiles capture transcriptomic changes when HL60, PC3, and MCF7 cell lines were treated with a chemical. Xu et al. [13] proposed PhenoPredict, a ranking algorithm for schizophrenia drug repurposing. PhenoPredict infers drug treatments from diseases that are phenotypically related to schizophrenia. These models demonstrate the ability to connect chemical information to biological information, yet they are limited to predicting molecular changes (such as gene expression) and are not suited to predicting phagocytic activity from phenotypic screens.

This work studies the use of ML techniques to predict the effects of compounds on glial phagocytic activity that cause abnormal brain reduction in schizophrenia subjects. This work has been done in collaboration with SPARK at Stanford University [14, 15], the main node of a partnership network between university and industry experts in chemistry, biology, and medicine to advance academic biomedical research discoveries into promising new treatments for patients. The objective of this work is to propose a ML method apt to optimize drug phenotypic screens. Specifically, our method identify compounds that reduce glial phagocytic activity for the treatment of schizophrenia, which, to the best of our knowledge, has not been proposed before. Our contribution can be summarized as follows. First, we introduce a ML method based on Deep Graph Networks to predict if a compound can influence the glial phagocytic activity in the brain tissue. Then, we evaluate our method on a real high-throughput screening experiment provided by SPARK. The proposed model achieves a macro Area Under the ROC curve (AUROC) of 0.68 when predicting if a compound inhibits, intensifies, or does not affect the phagocytic activity. Afterwards, we perform an analysis to understand the potential benefits of our approach in a practical scenario. Specifically, we leverage our method to prioritize the selection of a new set of compounds in the SWEETLEAD library, leading to the identification of 64 potential candidates. Lastly, we propose a novel approach to understand the relevance of compounds to biological use case. That approach allows us to compare our results with the more than 287,000 references in the literature. With this analysis we highlight the effectiveness of the model in identify compounds that are already studied in relation to brain-related diseases.

Results and discussion

Compound dissimilarity analysis

Before constructing the ML model, we measured the dissimilarity between compounds in the dataset to better understand how much molecular structures differ from each other in our data. A library of highly similar compounds would prevent the model from sufficiently discriminating drug effects. For this analysis, we computed the Extended-Connectivity Fingerprint (ECFP) for each compound, using standard parameters (i.e., radius equal to 3 and length equal to 1024); then, we measured the dissimilarity for each pair of fingerprints. Note that the ECFP fingerprint is a non-adaptive vectorial representation that encodes the structure of a molecule (please refer to the Model section for a detailed description). For this analysis we leveraged two different metrics: Cosine distance, and Jaccard dissimilarity. Given A and B two vectors, the cosine distance is defined as while the Jaccard dissimilarity is as usual. Let’s consider the squared upper triangular matrix where the element di,j represents the dissimilarity between fingerprints i and j, measured with one of the two metrics defined before. The final dissimilarity score is computed as where m = n2/2.

Table 1 reports the final scores computed with the two metrics. Both scores are close to 1, highlighting that the compounds have distinct fingerprints, and therefore can be considered strongly dissimilar.

Table 1. Cosine distance and Jaccard dissimilarity between dataset’s compounds.

Afterward, we measured the scaffold diversity in our data with the same strategy proposed for the compound dissimilarity analysis. It is clear from Table 2 that there is a rich diversity even between scaffolds. Fig 1 emphasizes such dissimilarity by showing that only a few molecules in our data share the same scaffold. Indeed, there are 863 scaffolds associated with one molecule in the data, while only 26 are shared among 8 and 232 molecules. Such results highlight that the scaffold distribution is long-tailed. Lastly, Fig 2 shows through the Principal Component Analysis (PCA) that there is no evidence of any natural cluster neither in molecule or scaffold fingerprints. Thus, it is reasonable to assume that the compounds’ uniqueness in the dataset represents an index of the strong complexity of the task.

Table 2. Cosine distance and Jaccard dissimilarity between dataset’s compound scaffolds.

Fig 1. The distribution of the number of shared scaffolds within molecules in the dataset.

Fig 2.

(a) The PCA plot of the molecule ECFPs. (b) The PCA plot of the scaffold ECFPs.

Model selection and risk assessment

The best model used in our experiments has been selected by an empirical analysis on the SPARK’s high-throughput screening results (please refer to the Dataset section for details). We tested multiple models to understand which best predicted effects from our phenotypic screen. Each model used different information about the chemical structures in making predictions. These models include: MoRF, MoNN, LinNN, SAGENN, GaNN, ENN, and NeFPNN. The first two exploit static compound fingerprints by leveraging the ECFP technique. LinNN leverages adapting fingerprints computed by a MLP using only atom information. The latter employ several deep graph network based fingerprints, which exploit atom and topological information, with ENN and NeFPNN including also bond features. Please refer to the Model section for details.

Table 3 reports the predictive performance achieved in the 3-fold cross-validation on the development set, using the standard stratification schema that divides the compounds maintain the distribution of the target variable. All configurations outperform the baseline LinNN, whose performance is very close to a random guesser. However, ENN and NeFPNN resulted in a validation score that is substantially inferior with respect to the best performing models. These results suggest that bond information is not helpful in solving this task. The top four models have overlapping validation scores, but highly different training scores. This situation hints at the use of alternative stratification strategies.

Table 3. Macro-AUROC scores achieved with simple cross-validation.

Given the neat performance discrepancy between models, we ran a more articulated stratification only for the top four configurations, i.e., MoNN, MoRF, SAGENN, GaNN. The complex strategy separates compounds into groups by maintaining high compound diversity in each split of the data (details discussed in the Experimental setting section). Table 4 shows that nearly all models have better (or overlapping) validation performances with respect to the simple cross-validation strategy discussed above. Training scores are 1 to 5 points smaller than in Table 3 and the validation scores have overlapping value ranges, but the training scores have less optimistic values. MoRF and GaNN are the configurations with the highest gain in validation with the complex stratification setting, while MoNN shows a reduction of approximately 2 points.

Table 4. Macro-AUROC scores achieved with complex cross-validation.

After the model selection phase, where we selected the best hyper-parameters configuration for each model, we performed the risk assessment step by evaluating the models on the hold-out test set. Also in this case we considered only MoNN, MoRF, GaNN, and SAGENN. Table 5 shows that the configurations selected with the simple cross-validation often obtained a better performance than those selected with the complex stratification strategy. MoRF is the only model achieving an higher score in the latter case. With both methods, each model reached a performance in line with that obtained during model selection.

Table 5. Comparison of macro-AUROC scores computed during risk assessment.

By analyzing the confusion matrices in Fig 3, we observe that a higher performance corresponds to an increased ability to correctly predict both increase phagocytosis and decrease phagocytosis classes. This result suggests that DGN-based models are more capable than the Morgan-based counterpart to recognize significant patterns to tackle this central task in our study.

Fig 3. Normalized confusion matrices of GaNN, SAGENN, MoNN, and MoRF on test set.

Upon investigation of the confusion matrices, models have relatively high prediction accuracy for predicting drugs with no effect on phagocytosis (center square on all matrices). All models performed well in predicting drugs that would decrease phagocytosis (bottom right of each matrix) with the GaNN having the highest accuracy for this compound effect. All models had relatively low prediction accuracy for compounds that increase phagocytosis (upper left square of each matrix) and erred on the side of predicting no effect for these drugs (upper middle square of each matrix). For our dataset, we were prioritizing compounds that decreased phagocytosis, and so relatively high prediction for these compounds relative to the increase phagocytosis group was suitable. Ultimately, we selected the GaNN as the best model for both its performance on the hold-out test set and its capacity to distinguish between the three classes.

Problem relaxation

To prove the complexity of the task, we relaxed the problem to predict whether a compound has a positive or negative impact on phagocytosis. We considered no change and increase phagocytosis classes as the negative label. Each model takes as input a compound and its concentration to predict the impact on phagocytosis. Thus, we shifted from multi-label to a binary classification problem.

We followed the same experimental procedure as in the Model selection and risk assessment section. We first performed a model selection phase by running a 3-fold cross-validation on the development set using the standard stratification schema. Given the strong discrepancy between validation and training scores obtained in this phase, we selected the top four model configurations and we run the more articulated stratification. Table 6 reports the results of both experiments. As it shows, relaxing the problem to binary classification helps improve the final performances. Indeed, the validation scores with the standard stratification are more than 10 points higher on average if compared to multi-label classification. When the more sophisticated stratification schema is employed, the difference increases to over 11 points. Particularly interesting is the improvement of LinNN, which performance is on par with NeFPNN. This result remarks the fact that bond information is not strictly helpful in solving this task.

Table 6. Comparison of macro-AUROC scores achieved with simple and complex cross-validation when only 2 classes are considered.

After the model selection, we proceeded with the risk assessment phase only for the most performing configurations. Table 7 shows the results obtained during this step. The scores exhibit similar behavior to the validation scores with respect to the original problem. However, they are below the confidence interval discovered in the model selection phase. We believe that this is the consequence of some labeling noise introduced during class aggregation. Indeed, the class no change is a borderline class that may contain noise itself. For such reasons, we believe that the three-class splitting leads to better performances that are less prone to errors.

Table 7. Comparison of macro-AUROC scores computed during risk assessment when only 2 classes are considered.

These findings show that by relaxing the problem it is possible to achieve higher performances. Therefore, it is reasonable to assume these results as proof of the complexity of the original problem. Indeed, in this case, the computational task of structure-function prediction is really hard because phagocytosis is a multi-protein biological process. This suggests that multiple molecules could bind proteins in this pathway and that there might not be one ideal structure for influencing this process. Moreover, this situation, followed by the high dissimilarity between molecule structures, indicates that our method learns good compound representations and is not overfitting molecular sub-structures for making predictions.

SWEETLEAD library repurposing

The main goal of our analysis is understanding the usability of our approach in a practical setting, i.e., prioritizing the selection of compounds to be tested in a biomedical experiment. We focus on the GaNN model which has emerged as the best configuration according to our model selection and risk assessment analysis. We leveraged the model to predict the impact, at different dosages, of a new set of compounds on astrocyte-mediated synaptic pruning in schizophrenia. We considered the compounds in the SWEETLEAD library [16], an in silico database of approved drugs, regulated chemicals, and herbals designed for drug discovery. The library contains 4314 compounds with 1391 of them marked as FDA approved.

We simulated with the GaNN model the impact of each compound in SWEETLEAD on the phagocytic activity at five different concentrations, i.e., 1.39, 2.78, 5.56, 11.11, 22.22 μM, to parallel the dose ranges used in the initial phenotypic screen and to mirror those typically used in literature.

Table 8 shows how many compounds in SWEETLEAD are predicted to belong to each of the three phagocytosis classes by our GaNN model. The GaNN generates different outcomes for different doses only in 29.49% of the compounds, so the model can be considered highly confident on the other predictions.

Table 8. The number of predicted compounds in SWEETLEAD library for each class.

We analyzed predicted compound effects by Anatomical Therapeutic Chemical (ATC) codes to understand the therapeutic use classification of these drugs. Additionally, ATC codes related to the neurological system would provide further evidence that a predicted drug could modify phagocytosis in the brain. ATC codes are part of a classification system, controlled by World Health Organization, that classifies drugs according to the organ or system on which they act and their therapeutic, pharmacological and chemical properties. ATC codes are hierarchically organized in five levels, where the first one is the more general and refers to the anatomical main group, while the latter are the more specific and indicates the chemical substance. Note that a drug can be associated with more than one ATC code.

We focused on FDA-approved compounds that have a match in DrugBank [17]. Fig 4 shows the first level of ATC codes with respect to each class. The decrease phagocytosis class is mostly characterized by the category N (nervous systems), confirming that the model can identify compounds that are already used to treat nervous system diseases. Intriguinly, the next highest category for decrease phagocitosys class is C (cardiovascular system). There is some literature evidence supporting the repurposing of cardiovascular system drugs for neurological conditions. Specifically, beta-blockers can reduce severity of migraines, and statins can reduce contrast-induced neuropathy [18]. This preliminary evidence suggests that the model may be predicting viable applications of these drugs to affect phagocytosis. The Cardiovascular system class drugs also contained the highest number of predicted increase phagocytosis drugs suggesting that we cannot predict new drug effects from ATC codes alone, and that even within-class drugs can have distinct effects.

Fig 4. First level ATC codes matched with each predicted class.

The multiple vertical bars chart the first level ATC codes with respect to the predicted class for each analysed compound. The letters on the x-axis refer to the ATC codes; A = alimentary tract and metabolism, B = blood and blood forming organs, C = cardiovascular system, D = dermatologicals, G = genito-urinary system and sex hormones, H = systemic hormonal preparations, excluding sex hormones and insulins, J = antiinfectives for systemic use, L = antineoplastic and immunomodulating agents, M = musculo-skeletal system, N = nervous system, P = antiparasitic products, insecticides and repellents, R = respiratory system, S = sensory organs, and V = various. We used the label mixed* to indicate those cases where the model prediction changes with different doses. We recall also that each compound may be associated with multiple ATC codes.

We selected 64 compounds that have most potential clinical utility. The selection was among compounds marked as FDA approved and that have been predicted to decrease phagocytosis with high confidence. The complete list is reported in Table 9.

Table 9. The list of selected compounds ranked by model confidence.

We sought to computationally validate the utility of the selected compounds as possible therapies for schizophrenia. We exploited SciFinder [19], which is a database for chemical literature, to extract the 15 topics with highest frequencies per compound. We hypothesized that biological terms co-mentioned with drug names in the literature could represent a reasonable measure of the goodness of candidates. We recall that a topic is a term used by the Chemical Abstracts database to identify the general topic of a reference. We conducted the analysis over more than 287,000 references. Chemical compounds are associated with an average of 914 terms, and frequency of chemical-term co-mentions are in the tens to hundreds of mentions. We retain the top 15 topics with highest frequencies for further analysis. For our 64-chemicals, the frequencies for the 15 topics ranged from 6992 to 1. This level of co-mention was sufficient for us to analyze whether these compounds had utility for our repurposing goal.

We clustered the compounds into three main groups based on topics:

  • Brain (), which contains all compounds with at least a brain-related topic in the top-15. We considered as brain-related topics the terms: Antidepressants, Antipsychotics, Bipolar disorder, Brain, Cognitive disorders, Depression, Epilepsy, Mental and behavioral disorders, Mood disorder, Multiple sclerosis, Obsessive-compulsive disorder, Parkinson disease, Psychosis, Schizophrenia;
  • Antibiotics (), which contains all compounds with the topic Antibiotic in the top-15;
  • Miscellaneous (), which contains all compounds with no brain or antibiotic related topics in the top-15.

We selected brain-related topics because this could suggest applications to neurological diseases. We also selected antibiotics because these compounds are generally well-tolerated and have known safety profiles. Moreover, some literature evidence suggest that antibiotics can directly affect schizophrenia [20] and there is mounting evidence that the gut microbiome affects brain health [21, 22]. Lastly, we kept a miscellaneous category to capture drugs without a clear alignment to either of these hypotheses.

We report the associations between compounds and clusters in Table 9. It appears that compounds with brain-related topics have a higher ranked probability predicted by the GaNN model with respect to others. Indeed, 20 of the first 30 compounds belong to the cluster. These additional results reinforce the idea that the model is effective in identifying compounds that are already studied in relation to brain-related diseases. Three of the top five compounds ranked by the GaNN model are antibiotics suggesting that repurposing for schizophrenia could leverage the human gut microbiome-brain connection.

Lastly, we found evidence of the utility of some of our 64 candidates (i.e., Loxapine, Dextromethorphan, Thioridazine, Trifluoperazine, and Cetirizine) in the work of So et al. [23]. In their analysis, they leveraged GWAS data, and gene imputation to identify drug repurposing candidates for multiple psychiatric diseases. Their work presents a complimentary view that our predicted candidates influence biology relevant to psychiatric disease. We observe that a high recall score is not possible in this case because the authors report only the top-15 compounds.

Materials and methods


We performed our experiments on top of SPARK’s high-throughput screening results, which aim was to screen for compounds that inhibit or activate MEGF10 [24] to correct aberrant astrocyte-mediated synaptic pruning in schizophrenia. With that purpose, the screen was a phagocytosis assay using astrocytes isolated from fetal human brain samples and synaptosomes prepared from mouse brain samples. They measured phagocytosis with a pH-sensitive fluorescent dye conjugated to the synaptosomes that is only activated when engulfed and localized to the low pH found in intracellular lysosomes. The screen was conducted in plates containing both positive and negative controls for data normalization (article in preparation).

The screening assessed 2218 different compounds at different concentrations, i.e., 1.39, 2.78, 5.56, 11.11, 22.22 μM. All the analyzed compounds derive from the Library of Pharmacologically Active Compounds (LOPAC) [25] and NIH Clinical Collection (NIHCC) [26], so they include inhibitors, receptor ligands, pharma-developed tools, and approved drugs. Due to the overlap of the two libraries, we removed duplicate results, leading to 10914 unique compound-dose combinations. For this analysis, we used median-normalized fluorescent signal as a proxy for relative phagocytosis.

We classified screening results by assigning a class to each instance. In particular, there are three class types:

  • increase phagocytosis—if the combination of compound and dose intensifies the phagocytic activity;
  • decrease phagocytosis—if the combination of compound and dose inhibits the phagocytic activity;
  • no change—if the combination of compound and dose does not affect the phagocytic activity.

We assigned the classes by applying a threshold on the number of cells with phagocytosis signal. More specifically, with β as the number of cells with phagocytosis signal with respect to the median of the plate.

Each compound is initially represented through its Simplified Molecular Input Line Entry System (SMILE) string. The SMILE string is a character string that captures the compounds’ elements and the bonds between them. For each compound we also include additional information regarding atoms and bonds, as shown in Table 10.

Table 10. The list of features regarding atoms and bonds.

Finally, given the set of tested concentrations and the set of target values, the final dataset can be described as , where ci is the i-th tested instance (note that a single compound is tested under different concentrations), is the tested concentration for the instance, and is the corresponding outcome.

For the purposes of our work, we represent each compound (originally expressed as a SMILE string) as a molecular graph, which is a consolidated representation of atoms and the bonds between them. We consider a molecular graph as an undirected graph defined as the tuple . The set contains interacting entities, which in this case corresponds to atoms, and is the set that contains the links among those entities, i.e., chemical bonds. On the other hand, is the atom features matrix, where |F| is the number of available features for an atom. Analogously, is the bond features matrix, where |E| is the number of available chemical bond features. We refer to xi and eij as the feature vector of atom i and feature vector of bond connecting atoms i and j, respectively. Also, we denote the neighborhood of a node as the set, , that contains every nodes with a link with node i.


We tackle the task of predicting glial phagocytic activity in brain tissue by an approach based on deep learning for graphs [27], given its power in encoding graph-based structures, e.g., molecules. Our model comprises the following two components:

  • a compound embedding module fcomp;
  • an output module fout.

From a high level point of view, given the set of tested concentrations, a compound, c, is processed as follows. First the compound is processed by the compound embedding module to compute a representation vector from its molecular graph, that later is concatenated with the dose, , and passed to the output module. Hence, the final prediction is computed as

This process is summarized visually in Fig 5.

Fig 5. A high level overview of the proposed model.

Given the molecular graph of a compound c and a dose , the model computes a vectorial representation hc using the compound embedding module fcomp and later computes the final result using the output module fout by leveraging the concatenation of hc and d.

We considered three different compound embedding modules and two output modules and we empirically confronted their effectiveness. We implemented the output module either by a Multi-Layer Perceptron (MLP) or by a Random Forest (RF) [28]. For the compound embedding module, we first leveraged the Extended-Connectivity Fingerprint (ECFP) [29] based on the Morgan algorithm [30]. ECFPs are static molecular fingerprints that exploit atom neighborhoods to represent molecules through a non-adaptive approach that does not take into consideration the predictive task at hand. A visual representation of this process is shown in Fig 6.

Fig 6. Fluorine’s influence in a circular fingerprint computation (radius equal to 3) of 1-Chloro-4-fluorobenzene.

Differently from the ECFP method, which is static, the other two approaches considered for the compound embedding module are adaptive and generate compound embeddings that are specialized for the specific predictive problem. To this end, we consider deep learning solutions that can process the compound represented as a molecular graph.

The first adaptive approach that we consider is a linear embedding model, implemented through a MLP that computes a vectorial representation for each atom in the compound, which then are aggregated by an element-wise sum or mean without taking into consideration any spatial information. The second approach leverages Deep Graph Networks (DGNs) [27], which learn a mapping function that compresses the complex relational information captured by a graph into an information-rich feature vector that reflects both the topological and the label information in the original graph. At a high level, given a molecular graph as input, a DGN computes a representation for each node through transformations that combine the previous representation of the node with its neighbor representations. Those transformations are often referred to as Graph Convolutional Layers (GCLs). Ultimately, node representations are aggregated to obtain a single embedding for the whole graph. An overview of this method is shown in Fig 7.

Fig 7. An example of DGN applied on 1-Chloro-4-fluorobenzene molecular graph.

Given the molecular graph of a compound, each GCL computes the representation of each node in the graph as a transformation of their neighbor representations. In the end, node representations are aggregated to obtain a vectorial representation that reflect the original molecular graph.

We investigated different DGN implementations. More specifically, the DGNs that we consider are based on GraphSAGE [31], Graph Attention Network (GAT) [32], Edge-Conditioned Convolution Network (ECC) [33, 34], and Neural Graph Fingerprint [35]. An overview of such methods is reported in S1 Appendix.

For our purposes, we evaluated the performance of seven configurations of compound embedding module and output module:

  • Linear Embedding and Neural Network (LinNN)—used as a baseline for this task;
  • ECFP fingerprint based on Morgan algorithm and Random Forest (MoRF)—commonly used in literature for biomedical-based problems [10, 11, 36, 37];
  • ECFP fingerprint based on Morgan algorithm and Neural Network (MoNN);
  • GraphSAGE and Neural Network (SAGENN);
  • GAT Network and Neural Network (GaNN);
  • ECC Network and Neural Network (ENN);
  • Neural Graph Fingerprint and Neural Network (NeFPNN).

Experimental setting

We split the data into a development set (80%) for model selection and a test set (20%) for risk assessment. We consider the obtained test set as a hold-out, in other words, as a set of examples only used to estimate the generalization performance of the model, and never used during the training phase. Internally to the development set, we used a 3-fold cross-validation for model selection. We generated each split in a stratified fashion. In other words, each split maintains the distribution with respect to a target variable. Specifically, we implemented stratification according to two strategies. The former splits the data by maintaining the distribution with respect to the target y; while the latter with respect to the target y, the cardinality of atoms in the compounds, and the concentrations. The rationale behind the latter strategy is to generate more homogeneous data splits, avoiding unbalanced distributions of molecule size and concentrations in the training and validation splits. We will refer to the first strategy as simple cross-validation, while the second as complex cross-validation.

The three classes are not balanced, with the no change category representing roughly 81% of samples. To preserve the minority classes we undersampled the majority class. At each epoch we randomly sampled from the no change instances to generate a subset with the same size of other classes. The rationale behind this approach is to keep the classes always balanced, and also to leverage all the data available. Indeed different sub-samples are extracted at each epoch. We recall that Random Forests are more resistant to data imbalance, therefore, we did not implement undersampling for RF-based models.

Our experiments can be summarized as follows. At first, we conduct the model selection and the risk assessment phases. The rational behind these steps is, first, to select the best hyper-parameters for the models among a set of candidates, and then to evaluate their generalization capability on a different set of data. Lastly, we use the best model, selected from the previous stage, in a real world drug repurposing scenario with the aim of understanding the potential in prioritizing the selection of compounds to be tested in a specific biomedical experiment.

We performed hyper-parameter tuning via grid search, optimizing the AUROC with macro-average, which is a good estimate of the classification performances since the dataset is balanced. We recall that the macro-AUROC is defined as where c is the number of classes (in this work c = 3) and AUROCi is the metric computed for class i. The grids used in our experiments are reported in S2 Appendix. We trained models with fout = MLP to minimize the Cross-Entropy loss accumulated across all the instances in the dataset.

The experiments were carried on a Dell server with 4 Nvidia GPUs Tesla P100. We release openly the code implementing our methodology and reproducing our empirical analysis at:


This work investigates the benefits of ML for graphs to predict compounds that mitigate abnormal brain reduction induced by excessive glial phagocytic activity in people affected by schizophrenia. In this context, we designed a model that is able to recognize whether a compound can reduce, increase, or not influence phagocytosis. More precisely, the model takes as input a compound and a concentration to predict a score associated with the three possible compound effects. This allows us to anticipate compounds with potentially desirable clinical effects for patients with schizophrenia. Internally, the model leverages a static fingerprint (i.e., Morgan-based ECFP) or an adapting fingerprint (i.e., DGNs) to represent compounds. We have shown experimentally that our approach is effective and has good generalization capabilities. Indeed, we have found that the model can generalize its predictions when employed on an unseen library, identifying as potential beneficial compounds those already used to treat brain-related diseases. Lastly, we have presented a list of compounds that we believe have the most potential clinical utility against glial-mediated brain reduction in schizophrenia patients.

We tested multiple chemical representations and discovered that an adapting approach was sufficient for describing phenotypic screen effects. A static fingerprint was insufficient, yet including full bond information decreased model performance. This suggests that, in some cases, structure-function information requires knowledge of the atoms and their arrangements, but not the full detail of their connections. It may be advantageous for other rapid drug development programs to leverage this information to more efficiently predict compounds with desired effects. In this scenario, we were eager to test our model on new drug information and validate the utility of the model, however experimental validation was infeasible for this work. Instead, we tested a novel validation approach by using SciFinder to understand the relevance of compounds to biological use cases. Indeed, SciFinder recovered drug-phenotype associations that mapped to neurological terms, suggesting that emerging literature evidence supported the potential utility of model-predicted compounds. We anticipate that this can be used to further prioritize predicted compounds and minimize the needs for very large, follow-up validation screens and that other structure-functional studies using machine learning will benefit from our in silico validation approach.

Supporting information

S1 Appendix. Overview of the employed dynamic compound embedding modules based on DGNs.



This work has been partially supported by SPARK at Stanford University. The authors would like to thank SPARK members and Francesco Landolfi, University of Pisa, for the insightful discussions throughout the development of this work.


  1. 1. Schizophrenia Symptoms, Patterns and Statistics and Patterns;. Available from:
  2. 2. Haijma S, Haren NEM, Cahn W, Koolschijn PC, Pol H, Kahn R. Brain Volumes in Schizophrenia: A Meta-Analysis in Over 18 000 Subjects. Schizophrenia bulletin. 2012;39. pmid:23042112
  3. 3. MacDonald M, Alhassan J, Newman J, Richard M, Gu H, Kelly R, et al. Selective Loss of Smaller Spines in Schizophrenia. The American journal of psychiatry. 2017;174:appiajp201716070814.
  4. 4. Sellgren C, Gracias J, Watmuff B, Biag J, Thanos J, Whittredge P, et al. Increased synapse elimination by microglia in schizophrenia patient-derived models of synaptic pruning. Nature Neuroscience. 2019. pmid:30718903
  5. 5. Lee E, Chung WS. Glial Control of Synapse Number in Healthy and Diseased Brain. Frontiers in Cellular Neuroscience. 2019;13:42. pmid:30814931
  6. 6. Onwordi EC, Halff EF, Whitehurst T, Mansur A, Cotel MC, Wells L, et al. Synaptic density marker SV2A is reduced in schizophrenia patients and unaffected by antipsychotics in rats. Nature Communications. 2020;11(1). pmid:31937764
  7. 7. Abramenko N, Kustov L, Metelytsia L, Kovalishyn V, Tetko I, Peijnenburg W. A review of recent advances towards the development of QSAR models for toxicity assessment of ionic liquids. Journal of Hazardous Materials. 2020;384:121429. pmid:31732345
  8. 8. Bianucci AM, Micheli A, Sperduti A, Starita A. In: Cartwright HM, Sztandera LM, editors. A Novel Approach to QSPR/QSAR Based on Neural Networks for Structures. Berlin, Heidelberg: Springer Berlin Heidelberg; 2003. p. 265–296. Available from:
  9. 9. Bianucci AM, Micheli A, Sperduti A, Starita A. Application of Cascade Correlation Networks for Structures to Chemistry. Applied Intelligence. 2000;12(1):117–147.
  10. 10. Banerjee P, Preissner R. BitterSweetForest: A Random Forest Based Binary Classifier to Predict Bitterness and Sweetness of Chemical Compounds. Frontiers in Chemistry. 2018;6. pmid:29696137
  11. 11. Lind AP, Anderson PC. Predicting drug activity against cancer cells by random forest models based on minimal genomic information and chemical properties. PLOS ONE. 2019;14(7):1–20. pmid:31295321
  12. 12. Zhao K, So HC. Drug Repositioning for Schizophrenia and Depression/Anxiety Disorders: A Machine Learning Approach Leveraging Expression Data. IEEE Journal of Biomedical and Health Informatics. 2019;23(3):1304–1315. pmid:30010603
  13. 13. Xu R, Wang Q. PhenoPredict: A disease phenome-wide drug repositioning approach towards schizophrenia drug discovery. Journal of Biomedical Informatics. 2015;56:348–355. pmid:26151312
  14. 14. SPARK at Stanford;.
  15. 15. Kim E, Omura P, Lo A. Accelerating biomedical innovation: A case study of the SPARK program at Stanford University, School of Medicine. Drug Discovery Today. 2017;. pmid:28456750
  16. 16. Novick PA, Ortiz OF, Poelman J, Abdulhay AY, Pande VS. SWEETLEAD: an In Silico Database of Approved Drugs, Regulated Chemicals, and Herbal Isolates for Computer-Aided Drug Discovery. PLOS ONE. 2013;8(11). pmid:24223973
  17. 17. Wishart DS, Knox C, Guo AC, Cheng D, Shrivastava S, Tzur D, et al. DrugBank: a knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Research. 2007;36(suppl_1):D901–D906. pmid:18048412
  18. 18. Ishida J, Konishi M, Ebner N, Springer J. Repurposing of approved cardiovascular drugs. Journal of Translational Medicine. 2016;14(1):269. pmid:27646033
  19. 19. CAS SciFinder;. Available from:
  20. 20. Liu F, Guo X, Wu R, Ou J, Zheng Y, Zhang B, et al. Minocycline supplementation for treatment of negative symptoms in early-phase schizophrenia: A double blind, randomized, controlled trial. Schizophrenia Research. 2014;153(1-3):169–176. pmid:24503176
  21. 21. Galland L. The gut microbiome and the brain. Journal of medicinal food. 2014;17(12):1261–1272. pmid:25402818
  22. 22. Hill-Burns EM, Debelius JW, Morton JT, Wissemann WT, Lewis MR, Wallen ZD, et al. Parkinson’s disease and Parkinson’s disease medications have distinct signatures of the gut microbiome. Movement Disorders. 2017;32(5):739–749. pmid:28195358
  23. 23. So HC, Chau CKL, Chiu WT, Ho KS, Lo CP, Yim SHY, et al. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nature Neuroscience. 2017;20(10):1342–1349. pmid:28805813
  24. 24. Chung WS, Clarke L, Wang G, Stafford B, Sher A, Chakraborty C, et al. Astrocytes mediate synapse elimination through MEGF10 and MERTK pathways. Nature. 2013;504. pmid:24270812
  25. 25. LOPAC1280—The Library of Pharmacologically Active Compounds;. Available from:
  26. 26. NIH Clinical Collection;. Available from:
  27. 27. Bacciu D, Errica F, Micheli A, Podda M. A gentle introduction to deep learning for graphs. Neural Networks. 2020;129:203–221. pmid:32559609
  28. 28. Breiman L. Random Forest. Machine Learing. 2001; p. 5–32.
  29. 29. Rogers D, Hahn M. Extended-Connectivity Fingerprints. Journal of chemical information and modeling. 2010;50:742–54. pmid:20426451
  30. 30. Morgan HL. The Generation of a Unique Machine Description for Chemical Structures-A Technique Developed at Chemical Abstracts Service. Journal of Chemical Documentation. 1965;5(2):107–113.
  31. 31. Hamilton WL, Ying R, Leskovec J. Inductive Representation Learning on Large Graphs. In: NIPS; 2017.
  32. 32. Veličković P, Cucurull G, Casanova A, Romero A, Liò P, Bengio Y. Graph Attention Networks. International Conference on Learning Representations. 2018;.
  33. 33. Simonovsky M, Komodakis N. Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs; 2017. p. 29–38.
  34. 34. Gilmer J, Schoenholz SS, Riley PF, Vinyals O, Dahl GE. Neural Message Passing for Quantum Chemistry. In: Precup D, Teh YW, editors. Proceedings of the 34th International Conference on Machine Learning. vol. 70 of Proceedings of Machine Learning Research. PMLR; 2017. p. 1263–1272. Available from:
  35. 35. Duvenaud DK, Maclaurin D, Iparraguirre J, Bombarell R, Hirzel T, Aspuru-Guzik A, et al. Convolutional Networks on Graphs for Learning Molecular Fingerprints. In: Cortes C, Lawrence ND, Lee DD, Sugiyama M, Garnett R, editors. Advances in Neural Information Processing Systems 28. Curran Associates, Inc.; 2015. p. 2224–2232.
  36. 36. Liu S, Alnammi M, Ericksen SS, Voter AF, Ananiev GE, Keck JL, et al. Practical Model Selection for Prospective Virtual Screening. Journal of Chemical Information and Modeling. 2019;59(1):282–293. pmid:30500183
  37. 37. Kapsiani S, Howlin BJ. Random forest classification for predicting lifespan-extending chemical compounds. Scientific Reports. 2021;11(1):13812. pmid:34226569