Identification of potential new treatment response markers and therapeutic targets using a Gaussian process-based method in lapatinib insensitive breast cancer models

Molecularly targeted therapeutics hold promise of revolutionizing treatments of advanced malignancies. However, a large number of patients do not respond to these treatments. Here, we take a systems biology approach to understand the molecular mechanisms that prevent breast cancer (BC) cells from responding to lapatinib, a dual kinase inhibitor that targets human epidermal growth factor receptor 2 (HER2) and epidermal growth factor receptor (EGFR). To this end, we analysed temporal gene expression profiles of four BC cell lines, two of which respond and the remaining two do not respond to lapatinib. For this analysis, we developed a Gaussian process based algorithm which can accurately find differentially expressed genes by analysing time course gene expression profiles at a fraction of the computational cost of other state-of-the-art algorithms. Our analysis identified 519 potential genes which are characteristic of lapatinib non-responsiveness in the tested cell lines. Data from the Genomics of Drug Sensitivity in Cancer (GDSC) database suggested that the basal expressions 120 of the above genes correlate with the response of BC cells to HER2 and/or EGFR targeted therapies. We selected 27 genes from the larger panel of 519 genes for experimental verification and 16 of these were successfully validated. Further bioinformatics analysis identified vitamin D receptor (VDR) as a potential target of interest for lapatinib non-responsive BC cells. Experimentally, calcitriol, a commonly used reagent for VDR targeted therapy, in combination with lapatinib additively inhibited proliferation in two HER2 positive cell lines, lapatinib insensitive MDA-MB-453 and lapatinib resistant HCC 1954-L cells.


Imputing missing values in the microarray data
In the microarray dataset most experiments were performed in quadruplicates, except in a few cases where only three replicates were available. In these cases, we calculated the sample mean (μ s ) and standard deviations (σ s ) using the three available replicates and then generated a random number by sampling a normal distribution (Nðm s ; s 2 s Þ) with the same mean and standard deviation. This random number was then used to represent the missing fourth replicate value [23].

Empirical estimation of the GP parameters μ i (t), Σ i
We assumed that the expression profile of an mRNA is a GP with mean function and covariance matrix μ i (t), S i respectively, i.e. m i (t)*N(μ i (t), S i ). We further assumed that the mean function itself is a GP, i.e. it has the following prior distribution: μ i (t)*N(μ 0 (t), S 0 ) where μ 0 (t), S 0 are hyperparameters. Since the distribution of the mean function is unknown a priori, μ 0 (t) was assumed to be zero, and the S 0 was assumed to be defined by a Ornstein-Uhlenbeck function (S t;t 0 0 ¼ c i exp À jtÀ t 0 j d i ) [24,25] which is an exponential functional form that describes the correlation between values of μ i (t) at different time points (μ, μ i ). The parameters δ i , ψ i , d i , v i , and s 2 i of the covariance matrices S 0 and S i can only take positive values. Therefore, they were assumed to have gamma prior distributions with location and scale parameters α = 0.1 and b = 100 respectively. The choices of a, b ensures that the corresponding gamma distributions are flat, thereby allowing a wide range of values for the above parameters to be considered during inference/estimation of μ i (t), S i . Multiplying the likelihood (see Eq 2 in the "Results" section) by the prior distributions of μ i (t), δ i , ψ i , d i , v i , and s 2 i , and integrating the resulting joint distributions with respect to μ i (t) one arrives at the following marginal posterior.
We used active set algorithm implemented in MATLAB's fmincon function (see http://uk. mathworks.com/help/optim/ug/constrained-nonlinear-optimization-algorithms.html for details) to find the values of δ i , ψ i , d i , v i , and s 2 i which maximizes the above posterior probability. The likelihood of an observed mRNA expression profile was calculated by replacing the maximum a posteriori estimates of δ i , ψ i in Eq 1, replacing μ i (t) in the likelihood function (see Eq 2 in the "Results" section) by m i ðtÞ in Eq 1, replacing d i , v i , and s 2 i in Eq 2 by their maximum a posteriori values.

Simulating time course gene expression profiles
The temporal profiles m ij (t) of the i th gene in the j th condition were generated by sampling a hierarchical GP with mean and covariance matrix μ ij (t), S ij respectively. The covariance matrix (S ij ) has two components S t;t 0 ij , S N ij to account for systematic and random variabilities, respectively. For simulation, we used a Gaussian kernel function (S t;t 0 ij ¼ exp À ðtÀ t 0 Þ 2 l ij ) to model the systematic variabilities ðS t;t 0 i Þ where l ij is the smoothing parameter. The random noise is Gaussian with 0 mean and variance s 2 ij , i.e. S N ij ¼ s 2 ij I, I is an identity matrix. The smoothing parameter l ij was sampled from a gamma distribution with shape and scale parameters a ij and b ij , respectively. The mean function μ ij (t) itself was randomly generated by sampling a GP with mean 0 and covariance matrix S t;t 0 ij . If a gene (i) is not differentially expressed between two conditions (j = 1, 2), then μ i1 (t) = μ i2 (t) and a i1 = a i2 = 15, b ij = 1. In the opposite case, i.e. when a gene (i) is differentially expressed between two conditions (j = 1, 2), we assumed μ i1 (t) 6 ¼ μ i2 (t) and these were generated by sampling two corresponding GPs. The smoothing parameters l i1 ,l i2 were generated by sampling Gamma distributions with the parameters a i1 = 15, b i2 = 1; a i2 = 25, b i2 = 1 respectively.

Gene expression data
The gene expression dataset, which has been described previously, was obtained from GSK in the form of raw data files (.cel files) [12]. Hegde et al., treated four cell lines (BT-474, SKBR-3, T47D and MDA-MB-468) with 0.1% DMSO (control), 0.1 μM lapatinib and 1 μM lapatinib at 0, 2, 6, 12, 24 hours. Not all samples that were taken had data for all time-points (see Table 1). All measurements were performed using the U133A Affymetrix human 22,000-element microarray. There were 36 arrays for each cell line, so a total of 144 arrays were used in our analysis. We performed a probe centric analysis, i.e. the intensity values of each Affymetrix probe was modelled using the aforementioned GP model. Once a probe was identified as a potential marker, the corresponding gene was identified using BioMart (http://www.biomart. org/) mapping service.

Cell-lines and reagents
SKBR-3, BT-474, MDA-MB-453, HCC 1954, MDA-MB-468, and T47D were obtained from ATCC and the JIMT-1 cells were obtained from the German Tissue Repository DSMZ. HCC 1954-L was developed in-house through continuous exposure to lapatinib [26]. BC cell lines were maintained in RPMI 1640 medium supplemented with 10% fetal bovine serum (Gibco). HCC 1954-L were maintained in culture in 1μM lapatinib, however, cells were removed from drug 7 days prior to all assays. All cell lines were maintained at 37˚C in a 5% CO 2 incubator.
Calcitriol and lapatinib were obtained from Sequoia Research Products (Pangbourne UK). Cell culture media, DMSO, ethanol were obtained from Sigma-Aldrich (Dublin, Ireland).
Lapatinib stock solution (10 mM) was prepared in DMSO. Calcitriol stock solution (10 mM) was prepared in ethanol and stored at -20˚C.
cDNA was prepared from 2 μg of total RNA using a high capacity RNA to cDNA kit (Applied Biosystems, Foster City, CA, USA). The following thermal cycling specifications were performed on the ABI 7900 Fast Real-Time PCR system (Applied Biosystems, Foster City, CA, USA); 20 s at 95˚C and 40 cycles each for 3 s at 95˚C and 30 s at 60˚C. Expression values were calculated using the comparative cycle threshold (Ct) method [27]. 18S ribosomal RNA was selected as the endogenous control. The cycle threshold (Ct) indicates the cycle number by which the amount of amplified target reaches a fixed threshold. The Ct data for 18S was used to create ΔCt values [ΔCt = Ct (target gene)-Ct (18S)]. ΔΔCt values were calculated by subtracting ΔCt of the calibrator (DMSO controls) from the ΔCt value of each target. Relative quantification (RQ) values were calculated using the equation 2-ΔΔCt. Differences in the mRNA expression level between untreated and drug treated samples were assessed using the Students t test.

Proliferation assay in vitro
Proliferation was measured using an acid phosphatase assay [28]. 2.5 to 5 x 10 3 cells per well were seeded in 96 plates and incubated for 24 hours prior to the addition of drug. After 5 days of drug treatment cells were washed with PBS. 10 mM paranitrophenol phosphate substrate (Sigma-Aldrich) in 0.1 M sodium acetate buffer with 0.1% Triton X (Sigma Aldrich) was added to each well and incubated at 37˚C for 2 hours. The reaction as stopped with 50 μL of 1 M NaOH and the absorbance was read at 405 nM (reference-620 nM). Growth of drug treated cells was calculated relative to control untreated cells in biological triplicate.

Comparing time course gene expression profiles using GP
Time course gene expression data consists of noisy measurements of mRNA levels at different time points. As discussed before, these measurements contain both systematic and random variabilities. It is necessary to take both types of variabilities into account when comparing two sets of time course measurements of mRNA expression. We developed an empirical GP model of time course mRNA profiles that captures (a) average mRNA levels at different points in time, (b) the extent of random uncertainty caused by cell to cell variability and experimental measurement noise; and (c) the temporal variability in mRNA profiles caused by internal mechanisms of gene regulation. In our formulation, a time course mRNA profile (m i (t)) is assumed to have a Gaussian distribution with a mean (μ i (t)) which represents the average mRNA levels at different time points (t), and a covariance matrix (S i ) which captures the overall variability in the mRNA profile. The covariance matrix (S i ) has two components , one (S t;t 0 i ) accounts for systematic variability between mRNA expressions at any two instants of time (t, t') and the other (S N i ) accounts for random noise. We used a modified Ornstein-Uhlenbeck function (S t;t 0 ) [24,25,29], to model the systematic variabilities ðS t;t 0 i Þ in mRNA expressions between two instants in time (t, t'), where v i and d i are two parameters [30]. The random noise is assumed to be Gaussian with 0 mean and variance s 2 i (i.e. S N i ¼ s 2 i I, I is an identity matrix). Under these assumptions, the likelihood of a set of observed mRNA profiles (m r i ðtÞ) is given by where N r is the number of replicates and m r m ðtÞ ¼ m r i ðtÞ À m i ðtÞ. μ i (t) and S i are estimated from observed data as shown in Materials and Methods (M&M).
We designed a statistical hypothesis test to see whether two sets of (m ). The comparisons were performed using a likelihood ratio test. The test was repeated for each mRNA in the dataset, producing a set of p-values. Subsequently, the Benjamini-Hochberg method was used to correct for multiple testing, and differentially expressed mRNAs were selected within a 5% false discovery rate (FDR). These mRNAs are the ones whose temporal expression patterns are significantly different between the two experimental conditions under investigation. An implementation of the above pipeline, named GEAGP, in MATLAB is freely available for download from https:// github.com/SBIUCD/GEAGP.git.

Comparing the performance of GEAGP with other methods
To evaluate and compare the performance of the GEAGP framework with other methods we generated 100 simulated datasets. Each simulated dataset consists of the expression profiles of 1000 genes measured at 10 equal intervals over a period of 100 hours in two different conditions. Four replicates were generated for each expression profile. Approximately half of the genes in each dataset were randomly chosen to have differential expression patterns between the two conditions. For the simulation, we assumed that the genes which are not differentially expressed have the same underlying pattern (average expression at each time point) with the same level of added systematic and random variabilities in each replicate across the two conditions. On the other hand, the genes which are differentially expressed were assumed to have different underlying patterns with different levels of systematic variabilities across the two conditions. The random variability, which represents biological variability and measurement errors, was kept at the same level in both conditions. Details of the data simulation process are described in the methods section. Random noise was kept at a moderate level (standard deviation = 0.2). Then, we used four different methods to find differentially expressed genes in each simulated dataset. Two of these methods, ttest & ranksum test, are widely used to analyse static gene expression data. The other two methods, GEAGP and GPREGE [17], use GP to analyse temporal expression profiles. The main differences between GEAGP and GPREGE [17] are the following: GEAGP directly estimates the mean functions of the GP from data, whereas GPREGE assumes that the mean function is a linear combination of a set of basis functions, and attempts to infer the coefficients of these linear equations. The two methods also use different mathematical formulations for parameter estimation and optimization process (see M&M and [17] for details). Prediction accuracy of each method was evaluated by calculating the area under the receiver operating characteristic (AUROC) curve (for details see [31]) based on the results produced by these methods on each dataset. AUROCs can have values between 0 and 1, and the closer these values are to 1 the better is the accuracy of the inferred networks, with AUROC = 1 being the ideal case. The average AUROCs across all 100 datasets represents the overall prediction accuracy of a method and the corresponding standard deviation represents the confidence intervals surrounding these estimates. The results of the above analysis suggest that GEAGP and GPREGE have comparable accuracies (AUROCs = 0.946 ± 0.0074 and 0.952 ± 0.0046 respectively), whereas both t-test and ranksum test were less accurate than the GP based methods (average AUROCs = 0.7538± 0.0087 and 0.7432 ± 0.0082 respectively). Although the accuracies of GPREGE and GEAGP are comparable, GEAGP has the distinct advantage of speed. In the above dataset GEAGP and GPREGE took on an average % 0.5 and % 7.5 seconds, respectively, to execute each differential expression test on a Intel Core i7 (3rd Gen) 3720QM based laptop with 20Gb RAM, i.e. GEAGP is on an average about 15 times faster than the GPREGE. To illustrate what it means for a medium scale bioinformatic analysis, consider a dataset consisting 20,000 genes and 10 experimental conditions. To find differentially expressed genes between each pair of conditions one needs to perform 10 2 ! ¼ 45 differential expression tests for each gene. Using a single core of the Intel Core i7 (3rd Gen) 3720QM processor, GEAGP will take approximately 5 days to do this analysis, whereas GPREGE will take more than 78 days. Using all four cores of the processor GEAGP will finish the task in about 1 days and 8 hours, whereas GPREGE will take more than 19 days. Thus, GEAGP provides similar accuracy as GPREGE at a fraction of the computational cost, making it more suitable for the analysis of medium to large scale time course data sets. Encouraged by the above results, we used GEAGP to analyse real gene expression datasets.

Discovering markers of lapatinib insensitivity in BC cells using GEAGP
GEAGP was used to analyse time course mRNA profiles in BC cell lines which were treated with different doses of lapatinib and DMSO [12]. In this experiment the mRNA expression profiles were measured in two lapatinib responsive (BT-474, SKBR-3) and two lapatinib nonresponsive (MDA-MB-468, T47D) BC cell lines. HER2 and EGFR, the two receptor kinases targeted by lapatinib, have different expression patterns in these cell lines ( Table 2).
HER2 is over-expressed in both lapatinib responsive cell lines SKBR-3 and BT-474 [32]. EGFR is overexpressed in SKBR3 [33], but has relatively low expression in BT-474 (~62% of 1018 cancer cell lines in the GDSC database had higher EGFR expression than BT-474) [34]. The non-responsive cell lines, T47D and MDA-MB-468 are classified as luminal A and basallike BC cells in literature [32]. Both types of BC cells have low HER2 expression. However, T47D has high HER2 expression (more than 77.55% of GDSC cell lines [34]) and moderate low EGFR expression (lower than~56% of GDSC cell lines [34]). MDA-MB-468 cells have low HER2 expression (lower than~61% of GDSC cell lines [34])) and very high level of EGFR expression (higher than~99.6% of all GDSC cell lines [34]). We also looked at the mutation spectrum of these cell lines using the COSMIC database (cancer.sanger.ac.uk). Data on some of the mutations which were previously implicated in lapatinib resistance are shown in Table 2. Among the responsive cell lines, SKBR3 has a p53 mutation, whereas, BT-474 has PI3K, ERBB2 and p53 mutations. Among the non-responsive cell lines, T47D has a p53 and PI3KCA mutation, whereas MDA-MB-468 has p53, PTEN and RB1 mutations. None of the cell lines have KRAS or BRAF mutation. Therefore, there is no particular set of mutations that separate the above responsive and non-responsive cell lines. This suggest that the expressions of EGFR and HER2 receptors and/or the mutation statuses do not explain the responsiveness of the above cell lines to lapatinib treatment. Hence, these cell lines are reasonable choices for studying lapatinib response in BC cells.
The cells were treated with 0.1 μM DMSO, 0.1 μM or 1 μM lapatinib [12]. The measurements were taken at 0, 2, 6, 12 and 24 hours for DMSO treated cells; at 2, 6, 12, 24 hours for cells treated with low doses (0.1μM) of lapatinib (except for SKBR-3 cells for which no measurements were available at 2 hours); and at 2, 6 and 12 hours for cells treated with high doses (1.0 μM) of lapatinib (except for BT-474 and SKBR-3 cell lines for which measurements were not available at 2 hours). Each experiment was replicated 3-4 times. Missing replicates (where only 3 replicates were available instead of 4) were imputed based on the intensities observed in the other replicate experiments (see M&M for details). The resulting dataset was then used to identify potential markers of lapatinib insensitivity in BC cells.
We assumed that genes which satisfy all of the following criteria are potential markers for lapatinib insensitivity: • Have similar expression profiles within the responsive or non-responsive groups of cell lines, but are differentially expressed between these two groups.
• Have altered expression in response to both doses of lapatinib treatment in responsive cells but not in non-responsive cells.
To find genes which satisfy the above criteria we performed a two stage analysis. In the first stage each mRNA expression profile was subjected to the following hypothesis tests (Fig 1): • Responsive vs non-responsive groups (Test 1): The objective of this test was to identify genes which have differential expression patterns between DMSO treated responsive (SKBR-3, BT-474) and non-responsive cell lines (MDA468, T47D). Here we tested the hypothesis  The genes which passed the first test and at least one of the second and third tests, but fail both of the last two tests within a 5% false discovery rate were assumed to satisfy the criteria of genes selectively regulated by lapatinib in responsive, but not in non-esponsive cells. This analysis selected 519 genes out of 22000 genes as potential markers of lapatinib response (Table A in S1 File). In a second step, these genes were further analysed using established bioinformatics methods as described below.

Bioinformatics analysis of markers of lapatinib insensitivity in BC cells
First, we performed gene ontology enrichment analysis on the 519 selected genes using DAVID [35,36]. The enriched gene ontology terms were then summarized and visualized by REVIGO (Fig 2A) [37]. The most enriched gene ontology terms fall into seven main categories, i.e. (i) cell cycle, (ii) apoptosis or programmed cell death, (iii) DNA damage, (iv) RNA splicing, (v) intracellular transport, (vi) signal transduction and (vii) chromosome assembly. These biological processes are usually found altered in cancer cells, and therefore it is not surprising to see genes which participate in these process responding differentially to lapatinib treatment in responsive and non-responsive BC cells. We also performed pathway enrichment analysis using DAVID [35,36] and employing KEGG [38] and PANTHER pathway databases [39]. Several signalling pathways that play key roles in cancer, e.g. EGFR pathway, FGF pathway, spliceosome, proteasome, FcγR mediated signalling, were also found to be enriched in the selected genes (Fig 2B). FGF and FcγR share several components with EGFR pathway which is activated by the EGFR family of receptors, two of which are targets of lapatinib. Ten genes, RAC1, STAT1, YWHAZ, PPP2CB, MAP2K1, MAPK1, PPP2R5B, PLCG1, PPP2CA, STAT5B, belonging to the EGFR pathway were in the list of potential markers of lapatinib insensitivity.
We then looked for known transcription regulatory interactions (TRIs) among the selected genes. Human genome wide known TRIs were retrieved from two sources, the ENCODE database containing ChIP (chromatin immunoprecipitation) on ChIP data [40][41][42] and the HTRIB database [43] which contains experimentally validated TRIs. All TRIs, where both the transcription regulator and its target gene were identified as potential markers for lapatinib insensitivity in the aforementioned GEAGP analysis, were retrieved. The ensemble of these TRIs (Fig 2C) represents part of the known human transcription regulatory network which is dysregulated in lapatinib non-responsive BC cells. Interestingly, this dysregulated network consists of a transcriptional module formed by eight transcription factors STAT1, CTCF, EP300, FOXA1, VDR, MAPK1, TFAP2C, E2F1 (Fig 2C) which regulate each other and a large number of the genes selected by the GEAGP analysis. Among these transcription factors, STAT1, MAPK1, FOXA1, TFAP2C, E2F1 are associated with resistance or insensitivity to several anticancer therapies including tyrosine kinase inhibitors such as lapatinib in BC cells [44][45][46][47][48][49][50][51]. CTCF and EP300 also have been linked to BC. The putative tumour suppressor gene EP300 is frequently mutated in BC [52,53], while CTCF overexpression is correlated with resistance to apoptosis in BC [52,53]. Polymorphism of the VDR (Vitamin D Receptor) was previously associated with BC [54], but its expression has no known association either with BC or with lapatinib insensitivity. Since it was found to be part of the same transcriptional module whose constituents are associated with lapatinib insensitivity, VDR could be a potential target for modulating lapatinib response in BC cells which do not respond to lapatinib.
We further investigated the protein-protein interaction (PPI) network formed by the protein products of the selected genes. The STRING database [55] was used to find PPIs between the products of the selected genes ( Fig 2E). Interestingly, many of the above transcription factors (MAPK1, TFAP2C, EP300) act as large hubs in the PPI network. They were also found to interact with each other, not only at the transcriptional level as found in the TRI network, but also at the protein level (Fig 2E-2G). In the PPI networks, VDR interacts with relatively few proteins ( Fig 2F). However, its interactors include STAT1, MAPK1 and PPP2CA, all of which have been associated with lapatinib response [44][45][46]56]. This strengthens the case for VDR as a potential target for influencing lapatinib insensitivity. Another interesting protein is EIF2S2. It interacts with several large hubs such as MAP2K1, ELAVL1 and PAPOLA1 in the PPI network (Fig 2G), and therefore may play important roles in insensitivity of BC cells to cancer drugs.
The above analysis suggests that many of the genes which were identified by the GEAGP algorithm have association with BC and lapatinib insensitivity. Additionally, it also demonstrates that some of these genes play important role in the transcriptional and protein interaction networks that are dysregulated in lapatinib insensitive BC cells, and therefore can be potential targets for treating these cells.

Validating the markers of lapatinib insensitivity using GDSC data
We analysed the GDSC database [34] to see if the expression of the genes selected by the GEAGP algorithm correlates with responses of BC cells to HER2 and/or EGFR targeted therapies. Any such correlation will strengthen their potential as markers for BC cell response to HER2 and/or EGFR targeted therapies. GDSC database [34] contains basal gene expression and drug response data for fifty BC cell lines belonging to different histological subtypes and five drugs (lapatinib, erlotinib, EKB-569, afatinib and CP724714) that target HER2 and/or EGFR tyrosine kinases. Expression data was available for 458 of 519 genes selected by the GEAGP algorithm. We calculated pearson correlation between the mRNA expressions of each gene and the area under the dose response curves of each cell-line for each drug. mRNA expressions of 120 genes out of 458 had statistically significant (p-value<0.05) correlation with the responses of BC cells to at least one of the five drugs ( Table B in S1 File). When corrected for false discovery rate (FDR), 32 of the 120 genes were selected at 10% FDR ( Table B in S1 File). Most of these genes' expressions correlated with response to afatinib which, like lapatinib, targets HER2 and EGFR. There were relatively little correlation between the expressions of these genes and lapatinib response in GDSC dataset [34]. This is most likely due to lack of lapatinib response data which is available for only 13 out of 50 BC cell lines. Nevertheless, the above analysis suggests that many of the genes that were identified as markers of lapatinib insensitivity in T47D and MDA-MB-468 cells, can indeed be generally used as markers of BC cell response to a variety of HER2 and/or EGFR targeted therapies.

Experimental validation of selected genes
It is not feasible to experimentally validate all 519 genes which were identified in the above analysis due to time and cost factors. To select a smaller subset for experimental validation, we first compared our list of potential lapatinib insensitivity markers to that determined in a previous study by O'Neill et al. [57]. O'Neill et al. used multiple parallel t-tests to analyse part of the same microarray dataset used here for identifying potential lapatinib biomarkers. 25 genes were found to be common between our list and that of the O'Neill et al.'s study. These common genes include • VDR, EIF2S2 and SOCS3 (regulator of STAT1) which were found to play important roles in the PPI and transcriptional networks in the bioinformatics analysis • ANKRD10, PTP4A1, PTP4A2, VDR, ATP2B4 which had statistically significant correlation with BC cell responses to HER2 and/or EGFR targeted therapies Therefore, this list includes a mix of genes identified in the bioinformatics analysis, GDSC data [34] analysis, and a set of new lapatinib insensitivity markers. We further added two more genes to this list, phosphatase PPP2CA and transcription factor TFAP2C, both of which was identified to be important in the bioinformatics analysis and the later also has statistically significant correlation with BC cell responses to HER2 and/or EGFR targeted therapies (Table B in S1 File).
The differential gene expression levels of the resulting 27 genes (see Table 1) and an endogenous control gene (18S ribosomal RNA) were measured in mRNA from SKBR-3, BT-474, T47D and MDA-MB-468 cell lines treated with DMSO or lapatinib (1 μM) for 16 hours, using real time quantitative polymerase chain reaction (qRT-PCR). The data obtained from the above experiment were then statistically analysed to examine whether expression of these genes was differentially altered in response to lapatinib in responsive and non-responsive cell lines. Relative expression values were calculated using the comparative threshold cycle (C T ) method [27]. First, we subtracted the cycle threshold (C T ) values of the housekeeping gene (18S) from those of the selected genes [58]. The resulting values (ΔC T ) represent the responses of each gene to DMSO/ lapatinib and were analysed using ANCOVA (Analysis of Covariance) analysis [59]. For the ANCOVA analysis, the ΔC T values of each gene were used as response variables, the dose of lapatinib was used as predictor variable (1 μM lapatinib for treated cell, 0 μM for DMSO treated control cells), and the responsive or non-responsive (to lapatinib) status of the cell lines were used as covariates. The p-values calculated by the ANCOVA analysis were then subjected to correction for multiple testing. For this purpose, a false discovery rate (FDR) corresponding to each p-value was calculated using the Benjamini Hochberg method. Genes with FDR lower than 0.1 (representing 10% false discovery rate) were considered to pass the validation test. We used MATLAB functions aoctools and mafdr for the ANCOVA and FDR analysis, respectively. 16 out of 28 genes passed the test and their changes (in terms of average log-fold change with respect to DMSO treated cells) to lapatinib treatment in all four cells lines are shown in Fig 3.

Identifying potential therapeutic targets for treating lapatinib insensitive BC cells
To find potential targets, we focussed on the genes which satisfy the following criteria: • identified by GEAGP as potential lapatinib insensitivity markers • have statistically significant correlation (at <10% FDR) with BC cell response to HER2 and/ or EGFR targeted therapy.
32 genes (Table B in S1 File) satisfy these conditions. To further narrow down the list of potential targets we investigated whether the expressions of any of these genes correlate with BC patient survival. Ideally, we would like to analyse data from BC patients who was treated with HER2 and/or EGFR targeted therapies. But, due to lack of sufficient details on treatment history of patients, we analysed data from HER2 negative, and basal type BC patients who typically do not respond to lapatinib therapies [60,61]. In some cases, HER2 positive patients do not respond to these types of therapies due to lack of EGFR expression or acquired resistance [62]. Therefore, we also included HER2 positive patients in our survival data analysis. We used Kaplan-Meier plot to find association between expression of each gene in the list (Table B in S1 File) and recurrence free survivals of HER2 negative, basal-like and HER2 positive patient cohorts. We used kmplot [63] web tool which integrate data from several public sources to perform this analysis. 26, 21 and 23 genes (out of total 32) were associated at 10% FDR with the survival of HER2 negative, basal and HER2 positive patient cohorts (Table C in S1 File). 14 genes, SDS, UCP2, KPNB1, REEP5, ALAS3, ZNF10, SDC1, GPAA1, RHOB, EXOSC4, FOXA1, VDR, GLANT7, MLPH, had significant association with the survival of all three patient cohorts. We did not find any reference to REEP5, ALAS3, ZNF10, EXOSC4 and GLANT7 in existing BC literature. Therefore association of these genes to BC, especially lapatinib insensitivity of BC cells, may be novel finds of this study. SDS, UCP2, KPNB1, SDC1, GPAA1, and MLPH had been sporadically studied in the context of BC [64][65][66][67][68][69]. But, to the best of our knowledge, their role in lapatinib insensitivity of BC cells are not known. VDR, FOXA1 and RHOB are well studied in the context of BC. However, to the best of our knowledge, their potential as a therapeutic target for treating lapatinib insensitive BC cells was not previously investigated. We selected VDR for further experimental testing.

Investigation of VDR as a therapeutic target
Vitamin D Receptor (VDR), a member of the nuclear receptor family of proteins, acts as a receptor for active vitamin D as well as a DNA-binding transcription factor. In that respect, it is similar to more familiar cancer targets such as oestrogen receptor, androgen receptor and progesterone receptor [70]. Patient survival analysis using data from kmplot database [63] suggests that VDR expression does not correlate with recurrence free survival of BC patients in general (Fig 4A). However, HER2 negative BC patients (typically do not respond to lapatinib) with high VDR expressions are more likely to survive longer than those with low VDR expressions ( Fig 4B). Interestingly, basal-like BC patients, most of whom have low or no HER2 expression show the opposite trend, that is, patients with low VDR expression are more likely  to survive longer than patients with high VDR expression (Fig 4C). HER2 positive patients exhibit similar trend as the basal-like patients (Fig 4D). Furthermore, using Regulome explorer (http://explorer.cancerregulome.org/) we investigated whether VDR expression has any significant association with other genomic features of BC patients. It was found to have significant correlations with protein expression of Estrogen receptor and the expressions of twenty two miRNAs (Fig 4E), most of which had been previously implicated in proliferation, apoptosis, senescence, resistance to drugs including HER2 targeted therapies [71,72]. This further strengthens VDR's potential as therapeutic target for lapatinib insensitive BC patients. One of the most commonly used commercially available reagents for VDR targeted therapy is Calcitriol, the active hormonal form of Vitamin D 3 [73]. It primarily functions by binding to and activating VDR leading to signalling and transcriptomic changes [73]. However, the effect of calcitriol depends on dose, frequency and time lapsed following treatment. For instance, low levels of calcitriol have been shown to possess pro-tumorigenic effects, whereas high doses of calcitriol, especially when applied in pulses, were shown to have anti-proliferative and antitumorigenic effects in several types of cancer cells [74][75][76]. Although at the protein level calcitriol activates VDR signalling by binding to it, it was also shown to reduce the mRNA expression level of VDR [77][78][79]. Recently, combinations of calcitriol and antiestrogen or gefitinib (a tyrosine kinase inhibitor), respectively, were found to reduce proliferation and increase apoptosis of BC cells more effectively than any of these compounds alone [80,81].
We investigated whether the combination of calcitriol and lapatinib can be used as an effective treatment strategy for lapatinib insensitive BC patients. We selected four cell lines for our experiment. Two of these, MDA-MB-453 and JIMT-1 are HER2 positive but innately insensitive to lapatinib. Whereas HCC 1954-L cell lines, which was developed by exposing HER2 positive HCC 1954 cells to lapatinib for a long period of time, has acquired resistance to lapatinib. The clinico-pathological subtypes and the expression levels of HER2, EGFR and VDR which are targets of lapatinib and calcitriol are shown in Table 3.
Among the innately lapatinib insensitive cell lines, clinico-pathological subtype of MDA-MB-453 is fiercely debated in literature, while some suggest it is a triple negative BC cell with no or low HER2 expression [82], others reports it to be HER2 positive with overexpressed HER2 [83]. We compared HER2 expression in MDA-MB-453 cells with other cancer cell lines in GDSC database [34] and found that HER2 has higher mRNA expression in MDA-MB-453 than~97.5% of all (1018) cell lines in this database. JIMT-1 is known to be HER2 positive [84] which is corroborated by the GDSC data [34] (Table 3). While EGFR is over expressed in MDA-MB-453, it is significantly under-expressed in JIMT-1 ( The receptor expression data were collected from the GDSC database [34]. The expression data are shown in percentile, here X percentile means X percent of all cell lines in the GDSC database [34] had equal or less expression than the above cell lines. https://doi.org/10.1371/journal.pone.0177058.t003 expressions in MDA-MB-453 and JIMT-1 cell lines and have higher expression in JIMT1 than in MDA-MB-453 according to the GDSC database. However, in our qRT-PCR analysis VDR has higher expression in MDA-MB-453 than in JIMT-1 cells ( Figure A in S1 File). Despite overexpressing at least one of HER2 or EGFR neither of these cell lines respond to lapatinib. Following 16 hours of treatment with 1 μM lapatinib neither cell-line showed a significant change in VDR gene expression (Fig 4F), reflecting the same behaviour as the lapatinib insensitive T47D and MDA-MB-468 cell lines, which were used to generate the time course gene expression data. HCC1954, which was used to develop HCC 1954-L with acquired lapatinib resistace, is known to be HER2 positive and has overexpressed HER2, EGFR, and VDR ( Table 3).
On the other hand, among the isogenic HCC1954 and HCC1954-L cells, only the latter displayed an additive effect of the lapatinib and calcitriol combination (Fig 4I and 4J). Based on qRT-PCR analysis, VDR mRNA expression is increased in the HCC1954-L cells (1.7 fold) ( Figure B in S1 File), which may contribute to benefit observed with calcitriol treatment.

Discussion
The rapid advent of omics technology made it possible to investigate the molecular networks of living cells in unprecedented detail. Consequently, it is now possible to track the behaviours of thousands of genes simultaneously over a period of time. However, identifying genes which show different temporal behaviour under different experimental conditions is not straightforward. Here, we developed a statistical tool that addresses this question and applied it to study the gene expression changes associated with lapatinib resistance in BC cells. Using existing datasets, our study identified several potential biomarkers of lapatinib resistance in BC, some of which were experimentally validated.
We identified more than 500 genes as potential lapatinib resistance markers, many of which can also be potential targets for treating lapatinib and, more generally, HER2-targeted TKI resistant BC cells. One such target identified in this study is VDR. We experimentally verified whether VDR can be a viable target for combination therapy for treating HER2 positive lapatinib resistant (innate and acquired) BC patients. We selected two cell lines for the investigation of innate insensitivity, MDM-MB-453 where both HER2 and VDR are relatively highly expressed, and JIMT-1 where HER2 is highly expressed but VDR has significantly lower mRNA expression compared to MDA-MB-453. The combination of lapatinib and calcitriol inhibited proliferation more effectively than individual treatment by either reagents in MDA-MB-453, but not in JIMT-1. In the acquired resistance setting we used the HCC1954-L cells which were developed by exposing HCC1954 cells to lapatinib for 6 months. The combination of calcitriol and lapatinib inhibited proliferation of the lapatinib resistant HCC1954-L more effectively than either agent alone. These results suggest that the combination of calcitriol and lapatinib can be a potential combination therapy regimen for patients who express HER2 and VDR but do not respond to lapatinib treatment. However, these are preliminary results and need further experimental validation in a larger panel of cell lines and animal models to assess the clinical viability of this drug combination.
Drug resistance is a complex process which involves dynamic adaptations of the interplay between signalling and transcriptional networks of cancer cells induced by drug treatments. A system level understanding of transcriptional and signalling networks is required to design therapeutic strategies targeted at overcoming such resistance. Such insight can be gained by reconstructing quantitative models of transcriptional and signalling network from experimental data using network reconstruction methods [50,[85][86][87][88][89][90]. The reconstructed models can then be used to simulate the effects of drug treatments on signalling and transcriptional networks of drug resistant cancer cells. We are currently pursuing this avenue to identify potent combination therapy regimens for treating BC patients.