Identification of lead anti-human cytomegalovirus compounds targeting MAP4K4 via machine learning analysis of kinase inhibitor screening data

Chemogenomic approaches involving highly annotated compound sets and cell based high throughput screening are emerging as a means to identify novel drug targets. We have previously screened a collection of highly characterized kinase inhibitors (Khan et al., Journal of General Virology, 2016) to identify compounds that increase or decrease expression of a human cytomegalovirus (HCMV) protein in infected cells. To identify potential novel anti-HCMV drug targets we used a machine learning approach to relate our phenotypic data from the aforementioned screen to kinase inhibition profiling of compounds used in this screen. Several of the potential targets had no previously reported role in HCMV replication. We focused on one potential anti-HCMV target, MAPK4K, and identified lead compounds inhibiting MAP4K4 that have anti-HCMV activity with little cellular cytotoxicity. We found that treatment of HCMV infected cells with inhibitors of MAP4K4, or an siRNA that inhibited MAP4K4 production, reduced HCMV replication and impaired detection of IE2-60, a viral protein necessary for efficient HCMV replication. Our findings demonstrate the potential of this machine learning approach to identify novel anti-viral drug targets, which can inform the discovery of novel anti-viral lead compounds.


Introduction
Identification of viral and cellular proteins required for virus replication can be a critical step in the discovery of novel anti-viral targets. A number of genetic methods are available to screen infected cells to identify proteins required for virus replication. These include the screening of infected cells using siRNA [1][2][3][4][5][6][7] or CRISPR/Cas9 [8][9][10] and analysis of infected haploid cells a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 foreskin fibroblast (HFF) cells (clone Hs29) were obtained from the American Tissue Culture Collection.

Western blotting
HFF cells were infected at the MOI indicated in each Figure or prepared for analysis at the time of infection. After washing with PBS, cells were resuspended in Laemmli buffer containing 5% β-mercaptoethanol. Proteins were separated on 8% polyacrylamide gels. Membranes were probed with antibodies recognizing IE1/2, (Virusys, 1:1000 dilution), IE2 proteins (clone 5A8.2, Millipore, 1:1000 dilution), MAP4K4 (ab155583, Abcam, 1:500 dilution) and β-actin (SIGMA, 1:5000 dilution). All primary antibodies were detected using anti-mouse-or antirabbit-horseradish peroxidase (HRP) conjugated antibodies (Millipore and Cell Signaling Technology, respectively). Chemiluminescence solution (GE Healthcare) was used to detect secondary antibodies on film. Where necessary blots were striped and re-probed. Relative band intensity (band intensity relative to β-actin signal in the same lane) was analyzed using ImageJ software obtained from the National Institutes of Health (USA). Thusly, in lanes where relative band intensity was analyzed, densitometry was used to calculate the percentage difference in band intensity between β-actin bands in those lanes. The percentage difference in band intensity for specific proteins in those lanes was then calculated. Specific protein band intensity was divided by β-actin intensity to calculate relative band intensity.

Treatment of cells with siRNA and infection of transfected cells
Twenty four hours before transfection 1 x 10 5 HFF per well were seeded in 12-well plates in media with no antibiotics. siControl Non targeting siRNA #3 (D-001810-03-05) or ON-TAR-GETplus or SMARTpool MAP4K4 siRNA (L-003971-00-0005) (both Dharmacon/GE) were used. Per well, 113 μl of 1 μM siRNA and 2 μl Dharmafect2 (Dharmacon/GE) were diluted in 93 μl and 146 μl OptiMEM (Invitrogen), respectively. After 5 mins at room temperature, both solutions were combined. After 20 mins, media was removed from each well and replaced with the siRNA/Dharmafect mixture, then 500μl of media with no antibiotics was added to each well. Transfected cells were incubated at 37˚C for 72 hours then used as indicated in the text.

Viral yield reduction assays
HFF cells (5 × 10 4 per well) were incubated overnight and infected at an MOI of 1. Virus was adsorbed to cells for 1 hour at 37˚C and then infected cells were incubated with 0.5 ml of media containing DMSO or compound at a range of concentrations in duplicate. Plates were incubated for 72 hours at 37˚C. The final concentration of DMSO in all samples was maintained at <1% (v/v).Viral titre (plaque forming units (p.f.u.) per ml) was determined by titration of viral supernatants on HFF monolayers. The mean value of duplicate plaque counts was determined and the percentage of viral titre in the presence of compound compared to control was calculated. To determine ED 50 values, parentage inhibition versus compound concentration was plotted using Microsoft Excel and a linear fit model was used to determine the concentration at which virus yield was reduced by 50%.

MTT assays
HFF cells (1 × 10 4 per well) were incubated overnight and then treated for 72 hours with either DMSO or compound at range of concentrations (2 fold dilution series starting at 50 μM) in duplicate. Relative cell number was then determined with an MTT assay according to the manufacturer's instructions (GE Healthcare). The mean value of duplicate readings was determined and the percentage of assay output in the presence of compound compared to DMSO was calculated. The final concentration of DMSO in all samples was maintained at <1% (v/v). As a positive control, in all experiments a 2-fold dilution series of HFF cells starting at 1 × 10 4 cells per well was included. In each experiment we found a linear relationship between the number of cells per well and output from the MTT assay (data not shown).

Collection and organization of kinase inhibitor screening data for machine learning analysis
Previously, we devised a cell based high throughput methodology [33] to screen the GSK PKIS collection of kinase inhibitors [35,36] for their ability to increase or decrease the expression of a viral protein, pp28, in cells infected with HCMV high passage strain AD169. After excluding screened compounds for toxicity effects [33], we interpreted the results of our screen [33] as a z-score [41,42], where a positive or negative z-score represented an increase or decrease, respectively, in the number of pp28 positive cells in the presence of each compound. The zscore for each compound is shown in Fig 1A, where each bar represents the z-score of a single compound.
The GSK PKIS collection has been extensively characterized [35], including each compounds kinase inhibition profile; biochemical analysis of each compounds ability to inhibit 224 human kinases using in vitro assays. The kinase inhibition profiles of PKIS compounds demonstrated that nearly every compound in this collection displayed some degree of promiscuity [33,35]. To deconvolute the kinase inhibition profiles of compounds and identify kinases that inhibit HCMV replication, we subjected our GSK PKIS screen data to a machine learning algorithm that was previously developed and validated in a mammalian screening system [37].
The kinase inhibition profiles of compounds with z-scores between <-0.75 ("hit class") and >0.25 ("anti-hit class") were selected. Profiles of compounds with z-scores between < -0.75 and > 0.25 were chosen to ensure a separation of at least 1 between the hit and anti-hit classes and to ensure the profiles that potentially had the most information were analyzed. The aforementioned kinase inhibition profiles are shown in Fig 1B, where the heatmap indicated the potency of kinase inhibition (each row represents a kinase tested and each column represents a compound). The selected kinase profiles in Fig 1B were analyzed using a maximum relevance (MR) algorithm [37] to identify kinases whose inhibition in both classes had the highest information content ( Fig 1C). Thus, the MR analysis was able to produce a list of kinase proteins most likely related to either inhibition or promotion of HCMV protein production.

Identification of potential drug targets within pharmacologically linked kinase groups
From the kinases selected by MR analysis, a greedy backwards feature selection algorithm using support vector machines (SVM) [37] was then used to identify the minimum number of kinases whose inhibition was highly predictive of HCMV protein production inhibition ( Fig  1C). These kinases were referred to as the Maximum Information Set (MAXIS). Closely related kinases can have similar inhibition profiles, termed "pharmacological linkage". Therefore, the MAXIS kinase proteins were grouped as pharmacologically linked kinases ( Fig 1D) (Analysis of sequence homology and pharmacological similarity that identified the pharmacologically relationship between kinases has been previously described [37].) Each group was given a MAXIS score to indicate the number of times kinase proteins within each group had been analyzed by SVM [37] (Fig 1C and 1D). The greater the number of times a kinase is selected by the selection algorithm increases the MAXIS score. To determine whether kinase groups with MAXIS scores were acting as targets (inhibition resulted in suppression of HCMV protein production) or anti-targets (inhibition resulted in promotion of HCMV protein production), we used a previously developed inhibition bias metric, Bk [37]. A positive Bk score indicated that the MAXIS kinase was a candidate target, while a negative Bk score indicated that a kinase was a candidate anti-target ( Fig 1D). Therefore, the analysis of our GSK PKIS screening data yielded 15 groups of pharmacologically related kinases with positive Bk scores, indicating one or more members of each groups was a potential target for inhibiting HCMV protein production ( Fig 1D).
Many of the kinase proteins shown in Fig 1D had no known role in HCMV replication. To elucidate which members of each pharmacologically linked group were relevant to HCMV replication and, therefore, potential drug targets, we sought to understand which proteins were present in HCMV infected cells and which facilitated HCMV replication.
We compared proteins in each group ( Fig 1D) to a proteomics dataset listing proteins that have previously been found in human fibroblasts infected with HCMV [43] (Table 1). Nearly every group contained at least one kinase protein found in this proteomic dataset. We then compared the proteins in each group to datasets in which collections of siRNAs had been used to understand the requirement for kinase proteins in HCMV replication [30] or HCMV protein production [28] (Table 1). Many of the siRNA had no obvious effect in the siRNA screen, or were toxic to infected cells in the screen.
DYRK1A, CHEK2 and CSNK1G1 were present in the proteomic analysis and were found to be necessary for HCMV protein production or HCMV replication in siRNA screens (Table 1). It has been demonstrated that inhibitors of DYRK1A prevent HCMV replication [44]. Although CHEK2 (also known as Chk2) was found to be required for HCMV replication in one siRNA based study [30], it has also been reported that signaling involving CHEK2 is inhibited in HCMV infected cells [45]. Thus, the requirement for CHEK2 in HCMV replication was unclear. There was no other information on the requirement of CSNK1G1 in HCMV infected cells. Although inhibitors that specifically inhibit other casein kinase isozymes have been reported [46], there is no selective and potent inhibitor of CSNK1G1. Therefore, our  [33], where each bar represents a single compound. (B) Heatmaps of kinase inhibition profiling of compounds grouped from Hit and Anti-Hit classes. The potency of each compound at 1μM concentration against a particular kinase is represented in colour (less than 0% inhibition-blue, 0-50% inhibition-green, 51-75% inhibition-yellow, 76-90% inhibition-orange, greater than 91% inhibition-red). Each row represents a kinase tested and each column represents a compound. (C) Schematic of machine learning analysis of Hit and Anti-hit kinase inhibition profiles. (D) analysis of proteins from the pharmacologically linked groups showed that a known anti-HCMV drug target, DYRK1A, could be identified. However, it was unclear if CHEK2 and CSNK1G1 could be considered as anti-HCMV targets.

MAP4K4 was present in HCMV infected cells and was required for efficient HCMV replication and protein production
We noted that one group of kinase proteins including MAP4K4, TNIK and MINK1, had a high MAXIS score (Fig 1C). MAP4K4 and MINK1 were thought to be present in HCMV infected cells (Table 1). However, our analysis of siRNA did not indicate a role for any of these proteins in HCMV replication or identity a lead compound for any of these proteins (Table 1). Given the high MAXIS score of this group, we decided to investigate if one or more of the aforementioned proteins were necessary for HCMV replication. TNIK was not reported to be found in HCMV infected cells (Table 1) and the functional role of MINK1 is unclear and may be restricted to T cells [47]. It has been reported that MAP4K4 is required for production of the IE proteins of another herpesvirus, Kaposi's sarcoma herpesvirus (KSHV) [48,49]. Therefore, we focused on investigation of MAP4K4.
Using western blotting, we confirmed the presence of MAP4K4 in HCMV infected cells. In this assay MAP4K4 was found in HFF cells infected with AD169 at 48-72 h.p.i. (Fig 2A, lanes  3 and 4). In this and subsequent western blotting, the presence of β-actin was assayed to determine the amount of cell lysate in each sample. We noted that detection of MAP4K4 was coincident with the production of the late viral protein pp28 (Fig 2A, lanes 3 and 4). Production of late viral proteins, including pp28, requires DNA replication [19]. However, in the presence of HCMV DNA replication inhibitor ganciclovir we found a decrease in pp28 production, but no obvious defect in production of MAP4K4 using western blotting (Fig 2B), indicating MAP4K4 production was not dependent on HCMV DNA synthesis.
To investigate if MAP4K4 was necessary for HCMV replication, we treated HFF cells with siRNA targeting production of MAP4K4 or a control siRNA, then challenged those cells with high passage HCMV strain AD169. Virus released into the supernatant of infected cells was quantified (Fig 2C). In parallel, AD169 infected cells treated with siRNA were prepared for western blotting to analyze the presence of viral and cellular proteins (Fig 2D). We hypothesized that in our previous siRNA screening experiments (Table 1, [28]), the concentration of siRNA targeting MAP4K4 used was too low to see effects in our screen. Therefore, in this study we increased the concentration of siRNA used in transfections by approximately 4-fold and observed no obviously harmful effects to transfected cells. Assays were carried out at 72 hours post infection, as at this time point HCMV virus production from infected cells should be underway and all HCMV proteins should be produced. Compared to production of HCMV from cells treated with control siRNA, treatment of cells with siRNA targeting MAP4K4 production resulted in a more than 3-fold decrease in HCMV production (Fig 2C), indicating that MAP4K4 was required for efficient HCMV  replication. The production of MAP4K4 was examined using western blotting. MAP4K4 was robustly detected in HCMV infected cells treated with control siRNA (Fig 2D, lane 2), but no other sample, resulting in an approximately 5-fold decrease in MAP4K4 detection (as determined by relative band intensity of bands compared to β-actin in the same lane) in cells treated with siRNA targeting MAP4K4 production. Further analysis of infected cells by western blotting was carried out to understand HCMV protein production. HCMV replication is dependent upon the production of Immediate Early (IE) proteins IE1 and IE2, which antagonize innate immunity and promote viral transcription, respectively. IE1 and IE2-86 are produced by alternative splicing of the same RNA (Fig 2E). At late time points, two other IE2 proteins, IE2-60 and IE2-40, are produced from translation initiation start codons in RNA from exon 5 (Fig 2E). IE2-60 and IE2-40 are essential for efficient HCMV replication [50].
Western blotting for IE proteins revealed that treatment of cells with siRNA targeting MAP4K4 production resulted in an approximately 2-fold decrease in IE2-60 and IE2-40 detection (Fig 2D), but no obvious defect in detection of either IE1 or IE2-86. Thus, a reduction in the presence of MAP4K4 in HCMV infected cells was associated with a loss of HCMV replication and impaired detection of IE2-60 and IE2-40 and a corresponding inhibition of HCMV replication.

Lead compounds targeting MAP4K4 inhibited HCMV replication
There has been little development of compounds targeting MAP4K4. However, we identified two structurally unrelated lead compounds, PF06260933 and CA409, reported to inhibit MAP4K4. PF06260933 (Fig 3A) strongly inhibited MAP4K4 and a number of other kinase proteins including MINK1 and TNIK [39]. CA409 (Fig 3B) was a potent and selective inhibitor of MAP4K4 and MINK1 [40].
We investigated the ability of PF06260933 and CA409 to inhibit HCMV replication in virus replication assays using the high passage HCMV strain AD169. It was observed that both PF06260933 and CA409 could inhibit HCMV replication with a 50% effective dose (ED 50 ) value of approximately 10μM ( Table 2). To exclude the possibility that cellular cytotoxicity was responsible for the anti-HCMV effects of PF06260933 and CA409, we tested uninfected cell viability in the presence of PF06260933 and CA409 using an MTT assay to measure the activity of the mitochondrial NAD(P)H-dependent cellular oxidoreductase enzymes. We found no defect in cell viability at concentrations below 50μM (Table 2), which was well above the ED 50 value we had observed of 10μM. This result indicated the anti-HCMV effects of PF06260933 and CA409 were unlikely to be due to cytotoxicity in the presence of these compounds.
Low passage strains of HCMV have a genomic content comparable to primary HCMV strains [51]. Therefore, we also tested the ability of PF06260933 and CA409 to inhibit replication of the low passage HCMV virus Merlin(R1111) [38] (Table 2). Similar results to those found when using AD169 were observed. Therefore, protein kinases inhibited by PF06260933 and CA409 were required for replication of both high and low passage HCMV viruses.

Lead compounds targeting MAP4K4 inhibited HCMV protein production
Next, we investigated how PF06260933 and CA409 inhibited HCMV replication. Based on experiments using siRNA shown in Fig 2, we hypothesized that treatment of infected cells with inhibitors of MAP4K4 would inhibit production of IE2 proteins. Using western blotting, we assayed the production of IE2 proteins at 72 hours post infection in cells infected with AD169 and treated with the ED 50 dose of either PF06260933 or CA409 (10μM). Compared to infected cells treated with DMSO ( Fig 3A, lane 2), treatment of HCMV infected cells with PF06260933 resulted in a decrease in production of all three IE2 proteins (Fig 3A, lane 3). Reduction in IE2-86 and IE2-40 production was less than 2-fold, but the reduction in IE2-60 production was approximately 2-fold. Compared to infected cells treated with DMSO ( Fig 3B, lane 2),  when HCMV infected cells were exposed to CA409 we observed an approximately 2-fold decrease (data not shown) in detection of IE2-86 and IE2-60 and a defect in IE2-40 production that was less than 2-fold ( Fig 3B, lane 3). We found no obvious decrease in IE1 production in the presence of either PF06260933 or CA409 (data not shown). Therefore, treatment of infected cells with ED 50 dose of either PF06260933 or CA409 resulted in an approximately 2-fold decrease in production of IE2-60.

Discussion
We demonstrate how a machine learning approach can be applied to reveal new insights into data from high throughput compound kinase inhibitor screening. Examination of machine learning results identified potential anti-HCMV drug targets. Many of these potential targets had no previously reported roles in HCMV replication or pathogenesis. Thus, the methodology used here also has the potential to uncover hitherto unappreciated aspects of HCMV biology. Further analysis of machine learning resulted in the identification of lead compounds targeting MAP4K4 that had anti-HCMV activity. Given the benefits of the methods we use here, we propose that our study and others will stimulate renewed interest in screening of kinase inhibitors for anti-viral effects and support the production of highly characterized kinase inhibitor collections for screening. There are, however, several points surrounding validation of machine learning results that should be addressed. We attempted to validate machine learning results by comparing our data to previously reported siRNA datasets. There was no overlap in the two datasets of siRNAs that had either positive or negative effects. However, it has been reported that there is only limited overlap in the effects of orthologous siRNAs [52]. Many of the siRNA examined in our analysis had no obvious effect in siRNA screens, or were toxic to infected cells in the screen in which they were used. The paucity of data from siRNA screens meant that several kinases could not be directly validated as drug targets. However, it is possible that the lack of effect in an siRNA screen could be the result of inefficient knockdown of protein or the statistical method of analysis used in the siRNA screening process scored an siRNA as a false negative [41,52]. Therefore, many of the kinase proteins identified in our machine learning results could be required for HCMV replication and could be anti-HCMV drug targets. Indeed, we went on to demonstrate using siRNA that MAP4K4 had effects on HCMV replication and protein production even though it had been previously reported that siRNA targeting MAP4K4 had no obvious effects in two different siRNA screens [28,30]. Thus, in future, siRNA screening data should be cautiously interpreted during validation of machine learning results.
In this study we examined the role of MAP4K4 in HCMV replication and sought to identify lead compounds targeting MAP4K4 that had anti-HCMV activity. The use of siRNA or compounds inhibiting MAP4K4 all result in a reduction in the detection of IE2-60. Thus, we propose that there is an association between the function of MAP4K4 and production of IE2-60. It is interesting to note that use of either siRNA or different compounds had different effects on production of IE2-86 and IE2-40. We propose that this may be due to different off-target effects or lack of potency of the siRNA and compounds we have used. Our observation using siRNA that knockdown of MAP4K4 leads to a reduction in IE2-60 and IE2-40 production is consistent with the somewhat limited reduction in HCMV replication in the presence of siRNA targeting MAP4K4 production. IE2-60 and IE2-40 are not essential for HCMV replication, but their expression is required for optimal HCMV replication [50]. Therefore, it is perhaps to be expected that loss of either IE2-60 or IE2-40 did not lead to a drastic reduction in HCMV replication. This leads to the question of should factors non-essential for virus replication be targeted in anti-viral strategies? We would argue that this should be considered, as there has been previous success in targeting proteins non-essential for HCMV replication. For example, the HCMV kinase protein UL97 is non-essential for HCMV replication [53], but an inhibitor of UL97, maribavir, has been used in phase III clinical trials in humans [54].
Our screen of GSK compounds was based upon inhibition of HCMV pp28 production [33]. However, it has been noted that deletion of IE2-60 or IE2-40 from the HCMV genome had no effect on pp28 production [50]. Thus, we propose that the machine learning approach used here is able to identify factors required for virus replication that were not directly related to production of pp28. We suggest that in our screen compounds that were assigned negative z-scores had inhibition profiles that contained MAP4K4 and kinase proteins that were able to inhibit pp28 production.
It remains unknown how inhibition of MAP4K4 leads to a reduction in production of IE2 proteins, as there is little understanding of MAP4K4 function. A canonical view of MAP4K4 signaling in human cells involves activation of a phosphorylation cascade that includes MAP4K4 which results in that leads to activation of the kinase JNK1 and transcriptional activation [55,56]. This may involve upstream regulation of MAP4K4 by TRAF2 [55]. We have observed an increase in TRAF2 production late in HCMV replication, similar to that which we observed with MAP4K4 in this study (data not shown). It has been reported that activation of JNK1 is inhibited in HCMV infected cells [57]. However, JNK proteins JNK1-3 (MAPK8-10) were identified in our machine learning analysis (Fig 1C). We treated HCMV infected cells with a potent inhibitor of JNK1-3, JNK-IN-8 [58], and found that this compound had very little or no effect on production of infectious HCMV (data not shown). Thus, activation of signaling that leads to JNK1 function, including that involving MAP4K4, was unlikely to be required for HCMV replication.
Other intracellular signaling pathways involving MAP4K4 have been reported [55,56], but are less well characterized. These data suggest STAT3 and NF-κB proteins are substrates of MAP4K4 [55,56]. However, we have previously demonstrated that neither canonical nor noncanonical NF-κB signaling was active in HCMV infected cells [59]. While inhibition of STAT3 can influence HCMV replication [60], we found that treatment of HCMV infected cells with either PF06260933 or CA409 had no obvious effect on STAT3 phosphorylation (data not shown). Thus, it was unlikely that inhibition of MAP4K4 in our experiments was related to the function of either STAT3 or NF-κB proteins.
Emerging evidence places MAP4K4 directly or indirectly in a number of other intracellular signaling pathways in a number of human pathologies [55,56]. Thus, it is possible that further study of MAP4K4 will uncover poorly understood, or as yet unrecognized, intracellular signaling pathways required for HCMV replication. Alternatively, we hypothesize that IE2-60 could have been a substrate of MAP4K4 in HCMV infected cells and lack of phosphorylation could have resulted in lack of protein production or detection during western blotting. As it is unclear what dictates how a protein serves as a substrate for MAP4K4. Further study of HCMV infected cells could reveal novel insights into a protein that appears to be widely used in a number of contexts [55,56].
We identify PF06260933 and CA409 as lead compounds that could be developed to be become highly active anti-HCMV compounds. This will be necessary as the ED 50 for both compounds were in the high micromolar concentrations range despite high affinity on target results in in vitro binding assays [39,40]. It is possible that the weak ED 50 of both PF06260933 and CA409 could be attributed to poor solubility, poor cell permeability and the dynamic environment in HCMV infected cells. We observed MAP4K4 production increased over time. However, we found no obvious decrease in MAP4K4 production in HCMV infected cells treated with either PF06260933 or CA409 (data not shown). This suggested that inhibition of MAP4K4 had no effect on MAP4K4 production. Regardless, these observations imply that production of proteins thought to be novel drug targets in HCMV infected cells should be assayed to investigate a potential relationship between production of protein and anti-viral effects of a compound. We argue that increased production of a protein thought to be a drug target in infected cells should not preclude development of compounds against that target, as many effective anti-viral compounds target viral proteins whose production increases over time.
Furthermore, we argue that the seemingly high ED 50 concentrations recorded here for PF06260933 and CA409 should not preclude the development of these compounds. It is not unusual that lead compounds have somewhat high ED 50 values before development using medical chemistry approaches. Medicinal chemistry approaches to modifying CA409 have been reported [61] and may have potential to produce a novel anti-HCMV compound with a more potent ED 50 value. Also, maribavir, an HCMV inhibitor that has been used in human clinical trials [54], can display ED 50 values in excess of 10μM in virus yield reduction assays [62]. Thus, there is precedent for continued study of compounds that otherwise might be discarded due to somewhat limited performance in anti-viral assays.