Identification of STAT1 and STAT3 Specific Inhibitors Using Comparative Virtual Screening and Docking Validation

Signal transducers and activators of transcription (STATs) facilitate action of cytokines, growth factors and pathogens. STAT activation is mediated by a highly conserved SH2 domain, which interacts with phosphotyrosine motifs for specific STAT-receptor contacts and STAT dimerization. The active dimers induce gene transcription in the nucleus by binding to a specific DNA-response element in the promoter of target genes. Abnormal activation of STAT signaling pathways is implicated in many human diseases, like cancer, inflammation and auto-immunity. Searches for STAT-targeting compounds, exploring the phosphotyrosine (pTyr)-SH2 interaction site, yielded many small molecules for STAT3 but sparsely for other STATs. However, many of these inhibitors seem not STAT3-specific, thereby questioning the present modeling and selection strategies of SH2 domain-based STAT inhibitors. We generated new 3D structure models for all human (h)STATs and developed a comparative in silico docking strategy to obtain further insight into STAT-SH2 cross-binding specificity of a selection of previously identified STAT3 inhibitors. Indeed, by primarily targeting the highly conserved pTyr-SH2 binding pocket the majority of these compounds exhibited similar binding affinity and tendency scores for all STATs. By comparative screening of a natural product library we provided initial proof for the possibility to identify STAT1 as well as STAT3-specific inhibitors, introducing the ‘STAT-comparative binding affinity value’ and ‘ligand binding pose variation’ as selection criteria. In silico screening of a multi-million clean leads (CL) compound library for binding of all STATs, likewise identified potential specific inhibitors for STAT1 and STAT3 after docking validation. Based on comparative virtual screening and docking validation, we developed a novel STAT inhibitor screening tool that allows identification of specific STAT1 and STAT3 inhibitory compounds. This could increase our understanding of the functional role of these STATs in different diseases and benefit the clinical need for more drugable STAT inhibitors with high specificity, potency and excellent bioavailability.


Introduction
Cytokines and growth factors are the main tool of the organism to battle any kind of immune challenge like inflammation or cancer. Signal transducers and activators of transcription (STATs) are targets for activation by many of these signals, including interferons (IFNs), interleukins (ILs) and growth factors like EGF (Epidermal Growth Factor) and PDGF (Platelet-Derived Growth Factor). Also oncoproteins ABL (Abelson murine leukemia viral oncogene homolog) and Src are STAT activators. The STAT family is composed of seven members: STAT1, STAT2, STAT3, STAT4, STAT5A, STAT5B and STAT6. Structurally they share five domains, which are an amino-terminal domain, a coiled-coil domain, a DNA-binding domain, a SH2 (Src Homology 2) domain and a carboxyl-terminal transactivation domain [1]. STAT activation is mediated by a highly conserved SH2 domain, which interacts with phosphotyrosine (pTyr) motifs for specific STAT-receptor contacts and STAT dimerization. The active dimers induce gene transcription in the nucleus by binding to a specific DNA-response element in the promoter of target genes [2]. STAT proteins promote fundamental cellular processes, including cell growth and differentiation, development, apoptosis, immune responses and inflammation. STATs are convergence points of many oncogenic and inflammatory pathways, therefore, the abnormal activation of STAT signaling pathways is also implicated in many human diseases. Especially STAT1 and STAT3 display prominent roles in cancer, inflammation and auto-immunity. STAT1 is responsible for cell growth and apoptosis, T H 1 cell-specific cytokine production and antimicrobial defense. It plays tumor-suppresive function and has pro-atherogenic properties. Atypical STAT1 activation leads to cardiovascular diseases like atherosclerosis, whereas STAT1 deficiency is responsible for causing infections and immune disorders. STAT3 function is essential for early embryonic development, cell proliferation and survival, inflammation and immune response, as well as cell motility. STAT3 function is often aberrant in the context of cancer. Constitutively active STAT3 is detected in numerous malignancies, including breast, melanoma, prostate, head and neck squamous cell carcinoma (HNSCC), multiple myeloma, pancreatic, ovarian, and brain tumours. There is growing evidence that preternatural functioning of other STATs also leads to immune disorders and infections (STAT2), autoimmune diseases like lupus (STAT4), chronic myelogenous leucaemia (STAT5A and STAT5B), as well as astma and allergy (STAT6). STAT inhibitors therefore could be valuable in treatment of these diseases [3][4][5][6].
Various STAT inhibitory strategies are being pursued, particularly for STAT3, including disruption of dimerization, tyrosine kinase STAT-competitive inhibitors, decoy deoxyrybonucleotides blocking STAT-DNA binding, induction of protein tyrosine phosphatases which dephosphorylate STATs and antisense oligonucleotides targeting STAT-mRNAs. Amid these approaches most studies focus on inhibiting STAT dimerization using small molecules identified by molecular modeling, virtual screening, computer-aided drug design, organometallic compounds or natural products [7][8][9][10]. According to the crystal structure of murine STAT3β, pTyr705, localized at the border of SH2 and transactivation domain, in one STAT3 monomer binds to the SH2 domain of the other [11]. Moreover, the SH2 domain comprises of several sub-pockets that can be targeted by small-molecule inhibitors, including: (1) pTyr705-binding pocket or pY+0, and (2) a hydrophobic side-pocket or pY-X [12]. Since dimerization via reciprocal phosphotyrosine-SH2 interactions is a key event in the activation of STATs, manipulations disrupting the dimer formation, such as use of small molecules, render the protein incapable of forming dimers, binding DNA and inducing gene transcription [13]. Disruption of e.g. STAT3 dimer formation therefore provides an effective therapeutic approach in cancer by blocking its aberrant signaling hyperactivity and pro-oncogenic effects [14].
Searches for STAT3-targeting compounds, exploring the pTyr-SH2 interaction area of STAT3, are numerous and yielded many small molecules. For example, STA-21 discovered by structure-based virtual screening was one of the first reported small inhibitors. It inhibits STAT3 dimerization, DNA binding, and STAT3-dependent transcription in breast cancer cells [15]. Another small molecule, stattic, was discovered by high-throughput screening and has been shown to selectively inhibit activation, dimerization, nuclear translocation of STAT3, and to increase apoptosis in STAT3-dependent cancer cell lines [16]. Among all the reported nonpeptidomimetic small inhibitors, 5-hydroxy-9,10-dioxo-9,10-dihydroanthracene-1-sulfonamide (LLL12) has the lowest IC50 (0.16−3.09 μM), inhibiting STAT3 phosphorylation and the growth of human cancer cells [17]. Natural products have been an important resource in STAT3 inhibitor discovery and these efforts have yielded several lead candidates, including curcumin and resveratrol [18,19]. In many of these cases, however, the mechanism of action of these candidates with regard to STAT3 activity is unclear. It is possible that they inhibit STAT3 indirectly and are likely to block several targets [10].
However, it becomes clear that many of these inhibitors are not STAT3-specific, thereby questioning the present selection strategies of SH2 domain-based competitive small inhibitors for STAT3 and other STATs. The virtual screening approaches are mostly based on the limited available crystallographic data from STAT1 and STAT3 dimers. Therefore, no valid comparative information exists about differences and commonalities between STAT-SH2 domains and their detailed interactions with small compound inhibitors. Indeed, comparative information concerning pY+0 and pY-X in the SH2 domain of all STATs is lacking and cross-binding specificity of previously identified STAT3 inhibitors has not been properly checked. Together, this illustrates the need for better models and screening and docking validation tools that allow the identification of STAT-specific inhibitors.
Here, we generated 3D structure models for all human (h)STATs (1, 2, 3, 4, 5A, 5B and 6) and developed a STAT inhibitor screening method, based on comparative in silico virtual screening and docking validation, to obtain further insight into STAT-SH2 cross-binding specificity of a selection of previously identified STAT3 inhibitors. The standard selection criteria of these compounds were, confirmed in vitro, disruption of the phosphotyrosine-SH2 interactions and proven STAT cross-binding. By comparative virtual screening of a natural compound and clean leads library for binding of all STATs and introducing the 'STATcomparative binding affinity value' (STAT-CBAV) and 'ligand binding pose variation' (LBPV) parameter as selection criteria, we provide initial proof that this novel in silico screening method enables selection of STAT3 as well as STAT1-specific inhibitors.

Sequence analysis
First, searches of the reference version of current sequence database (refseq_protein) for vertebrates were carried out at the NCBI using BLASTp [20] with default parameters and e-value threshold of 1e-3. Full-length sequences of H. sapiens STAT1 isoform alpha, STAT2 isoform 1, STAT3 isoform 1, STAT4, STAT5A, STAT5B and STAT6 isoform 1 (NCBI gene identification numbers 6274552, 4885615, 21618340, 4507255, 21618342, 21618344 and 23397678, respectively) were used as queries. The multiple sequence alignment (MSA) of human (h)STATs and homologous proteins identified in the database was calculated using MUSCLE [21], with default parameters, and refined manually to ensure that no unwarranted gaps had been introduced within α-helices and β-strands. On the basis of the alignment, phylogenetic tree was calculated with MEGA 6 [22] employing the Neighbor Joining method with JTT model of substitutions and pairwise deletions. The stability of individual nodes was calculated by the bootstrap test (1000 replicates).
In case of STAT1 protein we used comparative analysis to build a 3D structure of maximal length with a flexible linker and selected two templates for modeling-1YVL [28] (crystal structure of unphosphorylated STAT1 monomer) and 1BF5 [29] (crystal structure of a tyrosine phosphorylated STAT1 dimer). As suggested by these fold-recognition methods we used the same crystal structures for modeling of STAT2 protein. For STAT3 protein we took information from the crystal structure of the STAT3β homodimer bound to DNA [11] (PDB Id: 1BG1) and for missing N-terminal domain, 1YVL and 1BF5 crystal structures of STAT1. Similar procedure was applied for modeling of STAT4, where we exerted the crystal structure of the amino-terminal protein interaction domain of STAT4 [29] (PDB Id: 1BGF) and for missing regions the crystal structures of 1YVL and 1BF5 were used. To obtain full-length models of STAT5A, 5B and 6 we employed a crystal structure of the unphosphorylated STAT5A dimer [30] (PDB Id: 1Y1U) together with 1YVL.
hSTAT models refinement All preliminary models were assessed with MetaMQAP [34] to predict their accuracy at the level of individual residues. Hybrid models were then refined in poorly scored regions (mainly loop regions) with REFINER (with restraints on remaining predicted secondary structure) [35] and SuperLooper [36] programs. After refinement, homology models of hSTATs combined with their linkers underwent a two-step energy minimization process in AMBER force field in HyperChem software. First, steepest descent algorithm to the RMS gradient value 1.0 kcal/ (mol x Å) was used and secondly Polak-Ribiere conjugate gradient algorithm to the RMS gradient value 0.1 kcal/(mol x Å). Finally Chiron Server [37] was used to perform a rapid equilibration of all human STAT models using discrete molecular dynamics with an all-atom representation for each residue in the protein.
Final models were again evaluated with MetaMQAP and PROQ [38] methods to assess quality improvements. Thus, our structural predictions are highly accurate and can be used as receptors in docking simulations [39][40][41][42]. Models and their features were visualized with PyMOL [43]. Mapping of the electrostatic potential on protein surfaces was calculated with adaptive Poisson-Boltzmann solver (APBS) [44]. Moreover, employing results from MSA and phylogenetic relations the evolutionary conservation of amino acid positions in hSTATs was estimated with ConSurf server [45].

Comparative docking of STAT3 inhibitors
First, the AMBER ff99SB charges were applied to all models of human STAT proteins and STAT3 inhibitory compounds optimized by M05-2X/6-31G(d,p). In comparative docking procedure on the level of protein structures we selected the highly conserved pTyr binding pocket (pY+0) and hydrophobic side-pocket (pY-X) on the surface of SH2 domain. Then ligandbased approach was used to generate a 'protomol'-molecular probe which is a 3D representation of the active site to which ligands are matched. In case of STAT proteins the ligand used to generate a 'protomol' was fragment of STAT-SH2 specific pTyr-linker matching to the selected sub-pockets: for STAT1 (GpY 701 IK), STAT2 (KpY 690 LK), STAT3 (PpY 705 LK), STAT4 (GpY 693 VP), STAT5A (GpY 694 VK), STAT5B (GpY 699 VK) and STAT6 (GpY 641 VP). Docking simulations for all STATs were carried out with Surflex-Dock 2.6 software [57] based on the pgeomx algorithm recommended for detailed studies of relative alignments. Exhaustive and time-consuming docking accuracy parameter set for optimal pose prediction of the compounds was used (density of search for specific spin alignment method set to 0.9, pre-dock and post-dock minimization, six additional starting conformations, minimum RMSD of 0.05 between poses, max 20 poses/ligand). As a result we obtained twenty binding poses of each structure in the predefined area of STAT-SH2 domain. Then, the best of twenty binding poses for each compound were filtered out for further analysis. By using the 'STAT3-comparative binding affinity value' (STAT3-CBAV) the binding quality between hSTAT3 and all the other STATs was compared for each compound. Finally, graphical presentation, measured by 'ligand binding pose variation' (LBPV) parameter of the results for STAT3-specific inhibitors was also validated. It was used to investigate the docking accuracy. LBPV described the ratio of conformers that had the binding position with RMSD < 0.5Å to the top scored one to all 20 output conformations obtained from docking. This parameter was adapted to the variety of inhibitory binding possibilities: LBPV 0 -conformational tendency towards pY+0, LBPV X -preference to fit in pY-X, LBPV 0+X -binding to both cavities simultaneously.

Comparative virtual screening of small compound libraries
First, the AMBER ff99SB charges were applied to all models of human STAT proteins. Small compounds from ZINC libraries were downloaded with ready-to-dock parameters of protonation state and partial atomic charges [58]. In virtual screening the same ligand-based approach, as in comparative docking, was used to generate a 'protomol'. A five-step virtual screening procedure was employed to select the top STAT1 and STAT3 inhibitors. It was divided in the following steps: 1. Pre-screen Docking simulations were carried out with Surflex-Dock 2.6 software [57] based on the pscreen algorithm recommended for large databases with fast screening parameter set (predock minimization, post-dock minimization, max 3 poses/ligand). As a result we obtained three binding poses of each compound in the predefined area of STAT-SH2 domain. Additionally, each binding pose was supplied with the total score value representing the binding affinity of the compound to the SH2 domain.
2. Primary filtering of inhibitors At first, the best of three binding poses for each compound were filtered out for further analysis. Then by using the STAT-CBAV the binding quality between different STATs was compared for each compound. Compounds with CBAV (for the STAT protein of interest) 3.0 were selected to re-screen (approx. 2000-3000 structures). Compounds with STAT-CBAV < 3.0 were removed from further analysis.
3. Re-screen Repeated docking simulations were carried out with Surflex-Dock 2.6 software based on the pgeomx algorithm recommended for detailed studies of relative alignments. More exhaustive and time-consuming docking accuracy parameter set for optimal pose prediction of the compounds from primary filtering was used (density of search for specific spin alignment method set to 0.9, pre-dock and post-dock minimization, six additional starting conformations, minimum RMSD of 0.05 between poses, max 20 poses/ligand). As a result we obtained twenty binding poses of each compound in the predefined area of STAT-SH2 domain.
4. Secondary filtering of inhibitors At first, the best of twenty binding poses for each compound were filtered out for further analysis. Then by using STAT-CBAV the binding quality between different STATs was compared for each compound. Compounds with CBAV (for the STAT protein of interest) 3.0 were selected to graphical validation (approx. 50-100 structures).

Binding diversity of conformers
Finally, graphical presentation, measured by LBPV parameter of the results for top 50-100 compounds was also validated. LBPV in range [0.8;1.0] represented low conformer diversity and very good binding specificity of the compound to STAT-SH2, whereas in range [0.0;0.2] denoted high conformer diversity and poor binding specificity.
Our tool combines CBAV ( 3.0) and LBPV ( 0.8 for STAT-specific and 0.2 STATnon-specific) criteria to select the most specific STAT-SH2 targeting compounds.

Homology modeling of human STAT monomers with their specific pTyrlinkers
To generate new 3D structure models for all hSTATs, we applied several methodologies. First, the most reliable structure and position of the hSTAT1, hSTAT2, hSTAT3, hSTAT4, hSTAT5A, hSTAT5B and hSTAT6 SH2 domain and pTyr-linker interaction site was determined, with respect to their complete protein structure. We performed multiple sequence alignment (MSA) of STAT sequences for vertebrates and phylogenetic analysis, presented in Fig. 1A in form of a simplified phylogenetic tree. Fig. 1A revealed that the STAT family could be subdivided into two groups. A large group of vertebrate sequences was represented by Although crystal structures of human STAT1, murine STAT3 and STAT5A are available in the literature [11,[28][29][30], the derived amino acid sequences of these crystal structures are not complete. Therefore, we decided to build our own models of all human STATs (see Methods and S1 Fig.). We built complete models of hSTAT1, by creating a hybrid structure of the 1YVL [28] and 1BF5 [29]; of hSTAT3-based on M. musculus 1BG1 [11] and H. sapiens 1YVL [28] and of hSTAT5A-based on M. musculus 1Y1U [30] and H. sapiens 1YVL [28] (Fig. 1B and  1C). Because the crystal structures of hSTAT2, hSTAT4, hSTAT5B and hSTAT6 have not been solved to date, a homology model of the H. sapiens counterpart of these STAT proteins as well as their corresponding pTyr-peptide fragment were built (see Methods) and we are the first to present them ( Fig. 1B and 1C). S1 Table demonstrates scores of the chosen template structures for hSTATs, calculated by methods of protein homology detection, which are implemented at GeneSilico Metaserver. Based on the MetaMQAP and PROQ, assessment scores varied between 2.6Å and 3.0Å for RMSD and from 3.292 to 4.522 for LGscore ( Table 1).
Graphical presentation of the SH2 domain for individual STATs, predicted the presence of the pY+0, pTyr-binding pocket and the hydrophobic side-pocket, pY-X [59] (Fig. 1C and S2  Fig.). Active residues ('hot-spots') identified by Park and Li [12] in hSTAT3-SH2, including Lys, Arg, Ser, Glu, Ser, appeared also present in the SH2 domain of other hSTATs (S2 Table). Structural superimposition of the pY+0 cavities for all hSTAT models [59] revealed that these 'hot spots' contain mostly conserved amino acids, with some group substitutions present in hSTAT2 (Arg instead of Lys) and STAT5A, STAT5B and STAT6 (Asp instead of Glu). Structural superimposition of the pY-X cavities for all hSTAT models [59] also showed high conservation of these residues (S2 Table), with group substitutions in hSTAT2 (Val instead of Ile), hSTAT4 (Val instead of Ile), hSTAT5A (Leu instead of Met, Val instead of Ile, Asn instead of Ser), hSTAT5B (Leu instead of Met, Val instead of Ile, Asn instead of Ser) and hSTAT6 (Ile instead of Met).
We also compared the arrangement of electrostatic potential of the protein surface among all hSTAT-SH2 models ( Fig. 1C and S2 Fig.). Red color indicates negatively charged regions, determined by the presence of amino acids with negatively charged side chains, like Glu or Asp in pY+0. Blue color indicates positively charged regions with presence of positive amino acids e.g. Arg or Lys in pY+0. White color determines neutral regions, where amino acids have no electrostatic charge (e.g. hydrophobic Met, Phe, Gly and Val in pY-X). Comparing the electrostatic potential of the hSTAT-SH2 domains, combined with MSA [59], implies that pY+0 is positively charged in all hSTATs ( Fig. 1C and S2 Fig.), and predicts that minor differences exist in amino acid composition. The pY-X region on the other hand is more divergent between different STATs. It is mostly neutral in case of hSTAT1; positively charged in case of hSTAT2, hSTAT4, hSTAT5A, hSTAT6; positive-negative in hSTAT3; positive-neutral in STAT5B. Together, differences in electrostatic potential of hSTAT-SH2 domains reflect minor differences in both pY+0 and pY-X cavities, which correspond to the evolutionary divergence of pTyrlinkers [59]. According to this homology analysis we also predicted the interaction sites between SH2 and pTyr linkers for all STATs (Fig. 1C and S2 Fig.). Based on this, we decided to use both binding pockets (pY+0 and pY-X) of all hSTATs as 'protomols' for subsequent virtual screening methods.
Model validation by comparative docking of stattic to the SH2 domain of all hSTATs In order to verify our hSTAT models we decided to first examine the questionable binding specificity of the known hSTAT3 inhibitor stattic for the hSTAT3-SH2 domain [16], by using a comparative docking strategy (chemical structure of stattic is displayed in S3 Fig.). First, docking simulation of stattic in all hSTAT-SH2 domains, using pgeomx algorithm, resulted in a list of 20 optimized conformations (see Methods), with corresponding binding affinity score values for each individual hSTAT (not shown). Table 2 shows the top binding affinity scores of stattic for each STAT, ranging between 2.90 (for STAT4) and 4.65 (for STAT2). As becomes clear from the calculated 'STAT3-comparative binding affinity value' (STAT3-CBAV), stattic exhibits a similar binding affinity to the SH2 domain of all STATs. STAT3-CBAVs for each STAT were lower than one, suggesting hSTAT-SH2 cross-binding (Table 2). Strikingly the STAT3-CBAV for hSTAT1 is close to zero, reflecting high conservation between these two STATsthey share 50% of global amino acid sequence homology, according to pairwise sequence identity analysis [60], see also S4 Fig. This was also confirmed by evolutionary conservation analysis of amino acid positions in hSTAT1 and hSTAT3 monomers, based on the MSA and phylogenetic relations using ConSurf [45]. Especially high conservation can be noticed in SH2 domain and DNA binding domain (S5 Fig.), indicated in dark purple. In case of SH2 domain the most conserved place is the interaction site between phosphotyrosine and pY+0 binding pocket, also indicated in dark purple.
In addition, we determined the 'ligand binding pose variation' (LBPV, see Methods) of stattic towards the hSTAT-SH2 pY+0 and pY-X cavities (LBPV 0 indicates conformational tendency towards pY+0, LBPV X towards pY-X, whereas LBPV 0+X towards both cavities simultaneously). We were able to calculate the conformational tendency of stattic to the hSTAT3-SH2. According to Table 2, from the top 20 optimized binding conformations of stattic to hSTAT3-SH2, 14 (70%) favor pY+0 and 6 (30%) fit to pY-X. LBPV analyses for other hSTAT-SH2 revealed that stattic also shares partial affinity between pY+0 and pY-X in case of hSTAT1, hSTAT2, hSTAT5B and hSTAT6 (Table 2) similar to hSTAT3. In contrast, for hSTAT4 and hSTAT5A stattic only fits in pY+0. These calculations were supported by graphical presentation of the docking results (Fig. 2) in which the top scored conformation of stattic for each individual STAT competes with pTyr in binding to the hSTAT-SH2 domain. Together, this supports the affirmation that due to its small size and low molecular weight stattic lacks STAT-SH2 binding specificity.

hSTAT-comparative docking of selected STAT3-specific inhibitors
This comparative docking strategy was subsequently applied to examine the binding specificity of a pre-selection of fourteen hSTAT3 inhibitors (S3 Fig.) for the hSTAT3-SH2 domain, as compared to that of all other hSTATs. As for stattic, docking simulations of these compounds in the hSTAT-SH2 domains resulted in a list of 20 optimized conformations (see Methods), with corresponding binding affinity score values for each individual hSTAT (not shown). S3 Table shows the top binding affinity scores of the individual STAT3 inhibitors for each STAT. STAT3 binding affinity values were the highest for natural compounds: cucurbitacin Q > curcumin > cucurbitacin E (9.08; 7.89; 7.49 respectively) (S3 Table). For the synthetic compounds, STAT3 binding affinities were lower than those for natural products with BP- . To obtain further insight into hSTAT-SH2 cross-binding specificity of these known STAT3 inhibitors, STAT3-CBAVs were calculated ( Fig. 3 and Table 3). According to Fig. 3, 74% of STAT3-CBAVs are between -1.0 and 1.0, while the majority of these compounds (94%) demonstrate STAT3-CBAVs between -2.0 and 2.0. This reflects a similar binding affinity of all of these compounds to the SH2 domain of individual STATs, as was seen with stattic. Only the natural compounds cucurbitacin Q (for STAT2 and STAT5A) and curcumin (for STAT5A) display STAT3-CBAVs higher than 3.0, which could point to a certain degree of STAT3 specificity. We also calculated LBPVs for these fourteen compounds in hSTAT3 as compared to other hSTATs ( Table 3, with stattic also included). For this purpose, the compounds were divided into two groups (see Table 3). The first group, labeled in italic, consists of complex and expanded compounds with high molecular weight and predominant affinity to both pY-0 and pY-X sub-pockets simultaneously (LBPV 0+X ). For example, cucurbitacin Q simultaneously binds to pY+0 and pY-X in STAT1, STAT2, STAT3, STAT5B and STAT6, with similar LBPV 0+X of 0.75, 0.85, 0.75, 0.55 and 0.6, respectively. In STAT4 and STAT5A, however, it only fits in pY+0 with LBPV 0 of 0.8 and 0.65, respectively. For each individual compound in this group the LBPV for STAT3 is similar to LBPV for other STATs, correlating with hSTAT-SH2 crossbinding. Moreover, average LBPVs are lower than 0.8 and higher than 0.3, reflecting sub-optimal binding. The second group, labeled in bold, consists of small molecules like stattic with low molecular weight and predominant affinity to one of the pY+0 or pY-X sub-pockets (LBPV 0 or LBPV X ). For example, HJC0123 shares partial affinity between pY+0 and pY-X in case of hSTAT2, hSTAT3 and STAT4. In case of hSTAT1 and hSTAT5A, it only fits in pY+0. For hSTAT5B and hSTAT6, on the other hand, HJC0123 simultaneously binds pY+0 and pY-X ( Table 3). When comparing the LBPV of STAT3 to that of other STATs, more variation is visible for this group of compounds, which could be due to their small size and low molecular weight and predicts lack of target specificity. Based on the graphical presentation (S6 Fig.) we confirmed that in hSTAT3-SH2 these compounds primarily target the highly conserved pTyr-SH2 binding pocket (pY+0) and also highly conserved hydrophobic side-pocket (pY-X). From these results we conclude that none of these compounds are STAT3-specific, As such we propose that STAT3-CBAV 3.0 + LBPV 0+X 0.2 indicates STAT-cross-binding and STAT3-CBAV 3.0 + LBPV 0+X 0.8 predicts STAT-specificity, which is especially important for STAT1 and STAT3 cross-binding in aspect of their high SH2 domain homology (approximately 49% identity, see S4 Fig.).
hSTAT comparative virtual screening of a small ligand library of natural products to identify STAT1 or STAT3-specific inhibitors Based on the results obtained from the comparative docking of STAT3-specific inhibitors, we decided to apply a 5-step in silico hSTAT-SH2 comparative virtual screening strategy (for detailed description see Methods) for commercially available small compound libraries to identify specific STAT1 or STAT3 inhibitors. A 'proof of principle' test was performed on a 'natural product' library containing approx. 130 000 compounds. Virtual screening simulation of these compounds in the hSTAT-SH2 domain, using the pscreen algorithm, resulted in a list of 3 optimized conformations (see Methods) with supporting binding affinity score values (not shown) and successively STAT1-CBAVs and STAT3-CBAVs were calculated (not shown). After applying a threshold CBAV of 3.0, we obtained 215 top hits for hSTAT1 and 297 top hits for hSTAT3, which were used for the re-screen step (not shown), applying the pgeomx algorithm. This resulted in a list of 20 optimized conformations (see Methods) for each hSTAT, with supporting binding affinity values (not shown) and STAT1-CBAVs and STAT3-CBAVs. Accordingly, the top 5 potential specific inhibitors for hSTAT1 ( Table 4) and hSTAT3 ( Table 5) are presented, ordered by descending (STAT1-STAT3)-CBAVs and (STAT3-STAT1)-CBAVs, respectively. All these compounds exhibit high STAT1 or STAT3 binding affinity values (ranging between 11.0 and 15.0), whereas CBAV between STAT1 and STAT3 are in general above 3.5. Additionally, LBPV 0+X for these compounds in hSTAT1-SH2 and hSTAT3-SH2 were calculated (Tables 4 and 5). As can be observed, the top 5 hSTAT1-specific compounds (NP_1_1-NP_1_5) display STAT1-LBPV 0+X 0.9 and STAT3-LBPV 0+X 0.35. In contrast, the top 5 hSTAT3-specific compounds (NP_3_1-NP_3_5 in Table 5) have STAT3-LBPV 0+X 0.75 while STAT1-LBPV 0+X is 0.3. This is further illustrated in Fig. 4, in which the top 20 optimized binding conformations for NP_1_1 (Fig. 4A) and NP_3_1 (Fig. 4B) are depicted in the SH2 domains of STAT1 and STAT3, as a graphical representation of LBPV 0+X . As a representative hSTAT1 specific compound, with (STAT1-STAT3)-CBAV of 4.62, NP_1_1 has STAT1-LBPV 0+X of 0.9 and subsequent high conformational conservation within pY+0 and pY-X STAT1 sub-pockets. In STAT3-SH2, however, it also has predominant affinity to both pY+0 and pY-X but its STAT3-LBPV 0+X is only 0, which corresponds to no conformational conservation. Likewise, NP_3_1 displays high conformational conservation towards hSTAT3-SH2, but low conservation with respect to hSTAT1-SH2 (Fig. 4B).
Application of comparative virtual screening pipeline approach in search of STAT1 or STAT3 specific inhibitors from multi-million ligand library of clean leads Finally, a more advanced test of our in silico hSTAT-SH2 comparative virtual screening strategy was performed on a multi-million small compound clean leads library to identify specific STAT1 or STAT3 inhibitors. Docking of nearly 6 million compounds to the SH2 domain of each individual STAT was performed and a STAT1-CBAV and STAT3-CBAV threshold of 3.0 for all STATs was applied. Virtual screening simulation in the pscreen mode initially resulted in 1680 top hits for hSTAT1 and 1078 top hits for hSTAT3, which were used for the rescreen step (not shown). Based on a combination of 20 optimized conformations of each compound for individual hSTATs with supporting binding affinity values and STAT1-CBAVs and STAT3-CBAVs, the most promising compounds were selected. Thus, the top 5 potential specific inhibitors for hSTAT1 (Table 6) and hSTAT3 (Table 7) are presented, ordered by descending (STAT1-STAT3)-CBAVs and (STAT3-STAT1)-CBAVs, respectively. All these compounds exhibit high STAT1 or STAT3 binding affinity values (ranging between 9.0 and

Discussion
Searches for STAT-targeting compounds, exploring the pTyr-SH2 interaction area, yielded many small molecules for STAT3 but sparsely for other STATs. Many of these inhibitors seem not STAT-specific, thereby questioning the present selection strategies of SH2 domain-based STAT inhibitors. This illustrates the need for better models, and screening and validation tools for more drugable STAT inhibitors with high specificity, potency and excellent bioavailability. Therefore, our aim was to develop a novel bioinformatic selection strategy for STAT-specific inhibitory compounds using STAT-SH2 models in combination with comparative in silico virtual screening and docking validation.
First, we generated new 3D structure models for all human (h)STATs (1, 2, 3, 4, 5A, 5B and 6) based on multiple sequence alignment (MSA) of STAT vertebrate sequences and the limited crystal structures of STAT1, STAT3 and STAT5A [11,29,31]. Phylogenetic analysis confirmed close sequence and structural homology between all members of the STAT family, being subdivided into two groups (Fig. 1A); one consisting of STAT1, STAT2, STAT3 and STAT4, and the other of STAT5A, STAT5B and STAT6. In agreement with chromosomal localization of STAT genes (e.g. in H. sapiens STAT1 and STAT4-ch. 2, STAT3, STAT5A, STAT5B-ch. 17, STAT2 and STAT6-ch. 12) our analysis reflects the evolution of STATs through whole genome duplications and gene duplications by unequal chromosomal crossing-over [61]. Subsequently, by applying the homology modeling procedure, structural models of the H. sapiens counterparts of all STAT proteins as well as their corresponding pTyr-peptide fragment were built, with highly accurate final structural predictions (Table 1).
By focusing on the STAT-SH2 pTyr linker interaction, we studied in more detail the previously identified active residues in STAT3-SH2 [12]. We observed that the pY+0, pTyr-binding pocket and the hydrophobic side-pocket, pY-X, are highly conserved among all STATs [59] ( Fig. 1C and S2 Fig.). Active residues ('hot-spots') identified by Park and Li [12] in hSTAT3-SH2, appeared also present in the SH2 domain of other hSTATs (S2 Table). Structural superimposition of the pY+0 and pY-X cavities for all hSTAT models [59] revealed that these 'hot spots' contain mostly conserved amino acids, and that their positions in the SH2 domain are fixed. This implicated high structural conservation of the pTyr-binding and hydrophobic pocket. However, structural superimposition of hSTAT-SH2 domains revealed the existence of additional lesser conserved regions in these cavities [59]. This observed divergence could be explained by the arrangement of electrostatic potential of the hSTAT-SH2 protein surface, which is clearly different among all STAT models ( Fig. 1C and S2 Fig.). The superficial differences between hSTAT-SH2 of monomer I correspond to the evolutionary divergence of pTyrlinker from the monomer II (between all STATs). Based on the homology analysis we predicted the interactions between SH2 and pTyr linkers for all STATs (Fig. 1C and S2 Fig.). The mutual position of hSTAT-SH2 domain cavities and pTyr-peptide determines the specificity of dimer formation ( Fig. 1C and S2 Fig.). This creates a possibility that by targeting the combination of pY+0 and pY-X sub-pockets in comparative virtual screening, involving all hSTAT models, will enable selection of STAT3 as well as STAT1-specific inhibitors.
Subsequently, a comparative in silico docking strategy was applied to obtain further insight into STAT-SH2 cross-binding specificity of the previously identified STAT3 inhibitor stattic. Stattic was reported to be a STAT3-specific (not STAT1 and STAT5) inhibitor [62]. It was proposed that stattic is a competitor of the phosphopeptide binding, which disrupts the STAT3dimer formation. However, in 2012 Sanseverino et al. have shown, that in human monocytederived dendritic cells (MDDCs), stattic was able to reduce the level of IFNβ-induced STAT1 phosporylation and, to a lesser extent, of STAT2 phosphorylation [63]. Recently, we provided additional evidence that by primarily targeting the highly conserved pTyr-SH2 binding pocket stattic is not a specific hSTAT3 inhibitor, but is equally effective towards hSTAT1 and hSTAT2 [59]. This was confirmed in Human Micro-vascular Endothelial Cells (HMECs) in vitro, in which stattic inhibited interferon-α-induced phosphorylation of all three STATs. Here, by docking simulation of stattic in the SH2 domain of all hSTATs, combined with the STAT3-CBAV and LBPV parameters, we now further prove that stattic exhibits a similar binding affinity to the SH2 domain of all STATs by either targeting the pY+0 or pY-X cavity. Together, this supports the affirmation that due to its small size and low molecular weight stattic lacks STAT-SH2 binding specificity.
Docking simulation of the pre-selected hSTAT3 inhibitors in the SH2 domain of all hSTATs combined with proven cross-binding characteristics from in vitro experiments, also enabled us to identify STAT-CBAV and LBPV criteria correlating with 'STAT cross-binding' and 'STATspecificity'. Thus, we proposed that STAT3-CBAV 3.0 + LBPV 0+X 0.2 indicated STATcross-binding and STAT3-CBAV 3.0 + LBPV 0+X 0.8 predicted STAT-specificity. Based on these criteria we developed a novel in silico hSTAT-SH2 comparative virtual screening and docking validation strategy for commercially available small compound libraries to identify specific STAT inhibitors. Indeed, screening of a natural product library as well as a multi-million clean leads compound library successfully identified STAT1 as well as STAT3-specific inhibitors. For example, clean lead compound CL_1_1 was identified as hSTAT1-specific based on the combined criteria of (STAT1-STAT3)-CBAV of 5.47, STAT1-LBPV 0+X of 0.8 and STAT3-LBPV 0+X of 0.1. Consequently, CL_1_1 displayed high conformational conservation towards hSTAT1-SH2, but low conservation with respect to hSTAT3-SH2, which was confirmed by graphical representation in the SH2 domains of STAT1 and STAT3 (Fig. 5A).
Likewise, CL_3_1 displayed high conformational conservation towards hSTAT3-SH2, but low conservation with respect to hSTAT1-SH2 (Fig. 5B), and was identified as hSTAT3-specific. Our novel screening approach highlights the great potential of the use of in silico hSTAT-SH2 comparative virtual screening and docking validation to identify specific STAT inhibitors. However, we realize that in vitro validation of STAT phosphorylation or STAT DNA-binding affinity is required to provide final proof of the value of our in silico screening tool. In this respect, following a simplified screening protocol (using only the pY+0 binding pocket of hSTAT1-SH2 as the protomol, combined with the pgeom algorithm in Surflex-Dock 2.6) we recently identified several new STAT1 inhibitors that after in vitro validation inhibited STAT1 phosphorylation but were also shown to inhibit STAT3 activity. This is in agreement with the fact that identifying STAT-specific inhibitors meets many obstacles, from insufficient structural data to undefined mechanism of action, and lack of STATspecificity of selected compounds. In-between there are also challenges with virtual and experimental validation and selection methods, like no uniformed screening protocols describing STAT-inhibitor binding affinities, not using the same STAT inducer for comparative in vitro assays, or lack of known inhibitors used as a reference point [7,64].
Therefore, based on our newly developed 3D structure models for all human STATs, we propose a standardized approach that combines comparative in silico virtual screening and docking validation of STAT-SH2 models with an in vitro multiple STAT phosphorylation assay, as a novel tool to screen multi-million clean lead-like and drug-like compound libraries and identify specific inhibitors for different STATs. 'Specific' STAT inhibitory compounds are subsequently selected based on the highest 'comparative STAT binding affinity value' combined with the highest conformational conservation. In an additional step a low throughput in vitro cell-based multiple STAT activation assay should be performed to test the effect of pre-selected inhibitory compounds on cytokine-induced and/or constitutive STAT phosphorylation in different cell types, following each in silico comparative screen [69]. Furthermore, by building the full-length hSTAT models, comparative in silico virtual screening can be combined with the comparative search for alternative STAT-specific inhibitor-binding cavities on the surface of STAT proteins (other than STAT-SH2 domain) and yield novel STAT-selective inhibitory strategies.
Development of an effective STAT inhibitor screening tool benefits the clinical need for more drugable STAT inhibitors. Identification of specific and effective STAT inhibitory compounds could provide a tool to increase our understanding of the functional role of STATs in different diseases, and could serve as therapeutic strategies in cancer, inflammation and autoimmunity. Although STATs represent highly attractive therapeutical targets for these diseases, so far no FDA approved treatment exists involving direct targeting of STAT proteins [10]. Therefore, the search for new STAT inhibitors with high specificity, potency and excellent bioavailability remains extremely important.  Table. Scores of the optimal template structures for hSTATs. Values were calculated by selected methods of protein homology detection, which are implemented at GeneSilico Metaserver; n/a-no result. (DOCX) S2 Table. Amino acid 'Hot spot' analysis for the pY+0 and pY-X binding sub-pockets in the hSTAT-SH2 domains. Based on the multiple sequence alignment and hSTAT models superimposition. Bold-group substitutions of the amino acids in comparison to hSTAT1. (DOCX) S3 Table. Binding affinities for hSTAT3-specific inhibitors, represented by top-scored conformers. Results were obtained using Surflex-Dock 2.6 program.