Intermittent Androgen Suppression: Estimating Parameters for Individual Patients Based on Initial PSA Data in Response to Androgen Deprivation Therapy

When a physician decides on a treatment and its schedule for a specific patient, information gained from prior patients and experience in the past is taken into account. A more objective way to make such treatment decisions based on actual data would be useful to the clinician. Although there are many mathematical models proposed for various diseases, so far there is no mathematical method that accomplishes optimization of the treatment schedule using the information gained from past patients or “rapid learning” technology. In an attempt to use this approach, we integrate the information gained from patients previously treated with intermittent androgen suppression (IAS) with that from a current patient by first fitting the time courses of clinical data observed from the previously treated patients, then constructing the prior information of the parameter values of the mathematical model, and finally, maximizing the posterior probability for the parameters of the current patient using the prior information. Although we used data from prostate cancer patients, the proposed method is general, and thus can be applied to other diseases once an appropriate mathematical model is established for that disease.

In clinical practice, a medical doctor often chooses a treatment option for a current patient based on prior limited observations and experiences gained from prior patients. Use of a mathematical model based on data obtained from many prior patients as well as use of computational technology would permit more objective decision making for the individual patient. However, such a method has not been proposed yet as far as we are aware of.
In this paper, we apply a mathematical model derived from data of men treated with intermittent androgen suppression (IAS) [21][22][23] to current patients to assess how the model performs. First, we construct the prior information for the parameters of the mathematical model by fitting data from patients treated over longer intervals. Then, we fit the data from a shorter time course obtained from the current patient using the Bayesian formula [24].

Mathematical model of disease
We assume that a mathematical model of disease is given by a dynamical system. Namely, suppose that the state vectorx 2 R r represents the internal state of the disease, where r is the number of state variables. In addition, assume that the dynamical system f m q : R r ! R r is given by Hereq is a set of parameters for a patient of the disease, and m shows whether a treatment is on (m = 1) or off (m = 0). We assume that given a set of initial conditionsxð0Þ, a set of parametersq, and a series of times for starting and stopping the treatment, then the solutionxðtÞ exists uniquely. Although we cannot observexðtÞ directly, we can have some measurement gðxðtÞÞ through an observation function g : R r ! R. Typically, gðxðtÞÞ corresponds to measurement of some biomarker. To optimize a series of times for deciding when we stop and/or resume the treatment in the future, it is important to decide what are a set of initial conditionsxð0Þ and parametersq. We denote a set of ðxð0Þ;qÞ byp, which is called a set of combined parameters. Our problem is how to determinep given a series of times for starting and/or stopping the treatment, and a short series of observationsõ ¼ fo k 2 R j k ¼ 1; 2; . . . ; Kg corresponding to the values fô k ¼ gðxðt k ÞÞg for the generated orbit at discrete days {t k jt k ! 0, k = 1, 2, . . ., K}.

Bayesian formula
We employed the Bayesian formula [24] to estimate a set of combined parameter values for the current patient by using the prior information obtained from the past other patients. The Bayesian formula enables us to estimate the probability for the combined parameter values given a series of observations from the prior probability for the combined parameter values and the probability for the series of observations given the combined parameter values. Initially, we obtained the prior probability for the combined parameter values by fitting entire series of observations for the past other patients. Then, for each patient for whom we now want to estimate their set of combined parameters, we employ a dynamical model of disease to obtain the probability for observations given a set of combined parameter values. Then, we maximize the probability for the combined parameter values, and hence the likelihood for the generated orbit given the series of observations to obtain the combined parameter values. Mathematically we use the Bayesian formula [24] to describe the posterior probability Pðp jõÞ of combined parametersp given observationõ of biomarker time series by combining the conditional probability Pðõ jpÞ of observationõ given the combined parametersp with the prior probability PðpÞ for the combined parameters [24] as follows: By taking the logarithm, we obtain the following equation with a constant: In addition, suppose that there are constraints C a ðpÞ ! 0 for a = 1, . . ., A and D b ðpÞ ¼ 0 for b = 1, . . ., B that realize physiological appropriateness for the disease. Using 10000ð1 À eÞ; e < 0; ð4Þ we combine these constraints in QðpÞ using the penalty method [25] as Observe that QðpÞ is non-negative and attains QðpÞ ¼ 0 when all the constraints are fulfilled. See Eq 22 for a more concrete example of QðpÞ. Although we used the penalty method to implement these constraints, readers might use the method of Lagrangian alternatively. By flipping the sign of the right-side of Eq 3, we minimize À log PðõjpÞ À log PðpÞ þ QðpÞ ð6Þ to obtain a set of the combined parametersp. Namely, the optimal and personalized combined parametersp can be written aŝ Through the minimization, we can obtain a set of combined parameters that realizes the physiological appropriateness discussed above, as well as achieving that the corresponding generated orbit fxðtÞg is the most likely given the observed datasetõ. There are many possible ways to prepare the prior distribution PðpÞ. In this paper, we fit time series of biomarker for prior patients, obtain the mean p and the co-variance matrix S, and approximate PðpÞ, for our first step, by a multivariate Gaussian distribution [24] as where n is the number of the combined parameters. Let o i andô i be the actual value for the ith observation and its estimation by the mathematical model. Because the prediction error Àlog Pðõ jpÞ can be written as plus a constant value if we assume that the observational noise is Gaussian with mean 0 and standard deviation σ, Eq 7 can be rewritten aŝ We solve this minimization problem using differential evolution [25], a heuristic method for minimization. Differential evolution is one of variants of genetic algorithms [25].

Intermittent androgen suppression as an example
In this paper, we use a mathematical model derived from prostate cancer patients treated with IAS [15] as an example. Prostate cancer is dependent on testosterone for growth and therefore, suppression of testosterone production (usually accomplished with a GnRH analog) is the mainstay of therapy for metastatic prostate cancer. It is also used in men who have only biochemical evidence of recurrent prostate cancer after either surgery or radiation therapy. While androgen suppression is usually very effective early in the disease, eventually the cancer cells acquire the ability to survive even when serum androgen levels are in the castrate range. This disease state is called castration resistant prostate cancer (CRPC). Intermittent androgen suppression delayed time to CRPC compared to continuous androgen suppression in animal models and has been studied in numerous clinical trials in an effort to delay time to CRPC as well as to improve quality of life [26,27]. In clinical trials of IAS, androgen suppression is generally stopped after 9-12 months of treatment and is resumed when PSA increases above a pre-specified threshold.
Although there are many mathematical models [9][10][11][13][14][15][16][17][18][19][20] proposed until now, we used the model of Ref. [15] because there are two published papers [28,29] that compared the model of Ref. [15] with that of Ref. [19], concluding that the model of Ref. [15] is relatively better, especially in quality on whether relapse can be reproduced or not (the model of Ref. [19] could not reproduce the relapse of prostate cancer under the assumption of continuous androgen suppression with the combined parameters obtained after fitting the real data), while Ref. [28] showed that the model of Ref. [15] outperformed the model of Ref. [19] when they were compared in the mean square prediction errors although their difference in predictability is not yet significant. We have not yet compared the model of Ref. [15] with other models such as the one in Ref. [18] because the model of Ref. [18] contains many parameters and might need longer data to fit them while the lengths of our datasets are limited.
In the mathematical model of Hirata et al. [15], multiple pathways of castration resistance are summarized into two categories: reversible (epigenetic) and irreversible (genetic). In the model, there are three variables: the first one x 1 for androgen dependent cancer (AD) cells, and the other two x 2 and x 3 , androgen independent (castration resistant) cancer (AI) cells with reversible and irreversible changes, respectively. When we apply hormone therapy, AD cells whose population is expressed by x 1 may change to the AI cells whose populations are expressed by x 2 and x 3 . In addition, the AI cells in x 2 can change to the AI cells in x 3 . If we stop the hormone therapy, the AI cells in x 2 may change back to x 1 , AD cells. However, the AI cells in x 3 would not change back to x 1 because of genetic mutation. Therefore, the mathematical model can be described as for the on-treatment periods and d dt x 1 x 2 for the off-treatment periods. The parameter w m i;j means the contribution of x j to the growth of x i for the treatment period labeled by m (m = 1 during the on-treatment periods and m = 0 during the off-treatment periods). We assume that we can observe the PSA level which is given by Patients are classified into 3 types [15,30] (see Fig 1): Type (i) is for patients whose relapse, namely the increase in PSA, can be prevented by IAS; Type (ii) is for patients whose relapse cannot be prevented eventually but can be delayed by IAS later than the continuous androgen suppression (CAS); Type (iii) is for patients for whom CAS is more desirable than IAS in a long run. In this paper, we follow the criteria proposed in Ref. [30].
When we fit the datasets for Eqs 11 and 12, we use the Euler approximation and describe the dynamics with difference equations as for the on-treatment periods and for the off-treatment periods, where d m i;i ¼ ð1 þ w m i;i DtÞ, d m i;j ¼ w m i;j Dt for i 6 ¼ j, and we set Δt = 1 (day). In addition, we enforce some constraints to realize the biological appropriateness during fitting the datasets for the mathematical model [15]. These constraints include the non-negativity for the parameters and initial conditions, the bounds for changes in the cell type within a day, and the possibility to relapse if we continue the hormone therapy by CAS. We use the  [15]. These criteria used here were originally proposed in Ref. [30]. penalty method [25] to enforce these constraints. These constraints can be written as for each i, j, m, Therefore, herep consists of parameters d m i;j and initial conditions x i (0) in this example. We used differential evolution [25] for solving the minimization problem of Eq 7, and estimating the optimal set of the combined parametersp. See the supporting information for our program implementing the proposed method in C. We can clearly see whether or not all the constraints are satisfied because the cost function becomes smaller than 10000 if all the constraints are satisfied.

Data
We used three datasets in this study. The first dataset is that obtained from the Canadian Phase II clinical trial [22,23] of IAS (There are neither registry name nor registration number for this trial because it is too old). The Health Clinical Research Ethics Review Boards of each participating center approved this study and all patients signed written informed consent. This study was previously analyzed in Refs. [15,16,30,31]. We divided 72 patients into 2 groups. The combined parameters of the model for the first 36 patients were fitted using the whole datasets in the way described in Ref. [15]. Then, we obtained the mean vector and the covariance matrix for their combined parameters to construct the prior information. We used the remaining 36 patients for testing the proposed method.
The second and the third datasets are from the studies performed in Japan and the United States. The ethics committees of JCHO Tokyo Shinjuku Medical Center and University of Tokyo approved the Japanese study of 26 patients. In the Japanese study, all the patients provided informed consent orally. The ethics committee of JCHO Tokyo Shinjuku Medical Center approved that the oral consent is sufficient because the study was part of usual clinical practice, retrospective, and did not intervene their treatments. The medical doctor for each patient documented the patient's oral consent by writing down his agreement on his medical record. This Japanese study does not have the registry name or the registration number because it is not a clinical trial. This dataset was previously analyzed in Ref. [30]. The University of Washington Institutional Review Board approved the American study, an on-going Phase II study of IAS [32,33] that included 79 patients. The NCI number for this American trial is NCT00223665. In the American study, all the patients provided written informed consent. This American study was also analyzed previously in Ref. [30]. These datasets comprised of a total of 141 patients were also used for testing the proposed method. We set s ¼ 1= ffiffi ffi 2 p because the mean and the standard deviation for the residuals per measurement for the first 36 patients using the fitting method of Ref. [15] were 0.64±0.29, including 1= ffiffi ffi 2 p within this error bar.

Results
We compared the classifications obtained from fitting the whole data by the method of Ref. [15] with those obtained from fitting the whole data by the proposed method. These two groups of the classifications are shown in Tables 1 and 2, respectively. Although the method of Ref. [15] lost the correlation between the classifications by the mathematical model and those by the medical doctors when we restricted the datasets to the second half of patients (see Table 1, especially the p-value of 0.13), the proposed method kept the correlation by using the prior distribution (see Table 2, which achieved the smaller p-value less than 0.001 due to the fact that most patients without relapse were classified into Type (i), while most patients with metastasis and androgen independence were classified into Type (ii)). In addition, Table 3 shows that the classifications by the proposed method and the whole dataset retain the typical characteristics of the classifications by the method of Ref. [15] because more than the half of the patients concentrated on the diagonal elements within this table. Therefore, the proposed method seems to work well. Table 1. Results obtained by the original method used in Ref. [15]. MD and MM [15] correspond to classifications by medical doctors and those by mathematical models using the fitting method of Ref. [15], respectively (the p-value obtained by Fisher's exact test implemented in R (we applied the same method for obtaining the p-values in the other tables): 0.13). For example, there were 35 patients who were classified to "Without relapse" by medical doctors and were classified to "Type (i)" by the mathematical models using the original method of Ref. [15]. Then, we shortened the lengths of PSA time series for the estimation of the prior distribution and used only the first one and half cycles of IAS, meaning the first on-and off-treatment periods and the second on-treatment period. Examples of fitting and prediction using the proposed method are shown in Fig 2. Even when we limited the observation to the first one and half cycles, the subsequent PSA values were predicted well as similarly as Fig. 5 of Ref. [15]. Under the proposed method, we compared the classifications obtained using the first one and half cycles with those obtained using the whole data set (Table 4). We found that these two groups of the classifications matched well. We also compared the classifications by fitting the first one and half cycles with the classifications by the medical doctors (Table 5). We found that their correlation persisted because the p-value was as small as 0.005. In addition, the classifications obtained by the proposed method using the first one and half cycles are consistent with the classifications obtained by the method of Ref. [15] using the whole dataset (Table 6) because in more than the half of patients, the type obtained by the proposed method using the first one and half cycles agreed with the type obtained by the method of Ref. [15] using the whole data. These results mean that the first one and half cycles can provide sufficient information for fitting the datasets for the mathematical model with the proposed method.
However, when we only used the first half cycle of IAS using the proposed method (meaning the first 9-12 months of androgen suppression), we could not obtain the combined parameter values of the mathematical model in such a way that the two groups of the classifications are statistically correlated (see Fig 3, and Tables 7 and 8). In particular, the correlation between the classifications by the mathematical model and the classifications by the medical doctors was lost (Table 8).
We tested whether the period before starting the first cycle of the on-treatment period helped to estimate the combined parameters or not. We tested this hypothesis by generating artificial data simulating the off-treatment period of 16 weeks followed by the first half cycles of IAS for the second half of Canadian patients. However, the classifications obtained from the Table 2. Results obtained by the proposed method. MD and MMWD correspond to classifications by medical doctors and those by mathematical models with the whole data, respectively (the p-value < 0.001).  Table 3. Comparison between the classifications obtained by the proposed method with the whole data (MMWD) using the classifications by mathematical models using the fitting method of Ref. [15] (MM [15]) (p-value < 0.001).

MMWD
MM [15] MMWD    Table 6. Comparison between the classifications obtained by the proposed method with the first one and half cycles (MM1H) and the classifications by mathematical models using the fitting method of Ref. [15] (MM [15]) (p-value: 0.029).
MM [15] MM1H    artificial data did not match the classifications of the combined parameter values with which the artificial data were generated (Table 9).

Discussion
There are some correlations between pairs of the combined parameters obtained by fitting time courses of the tumor marker for the mathematical model. By enforcing the prior distribution of the combined parameters constructed by the whole datasets of the past other patients, we can eventually shrink the space of the combined parameters to the space where the combined parameters for the past other patients are distributed. This shrinkage is why the proposed framework enables us to obtain the optimal set of the combined parameters from a short PSA time course of the patient who is currently being treated. We tested whether or not the distribution of combined parametersp follows a multivariate Gaussian distribution by using the skewness and the kurtosis proposed by Mardia [34]. We found that neither the skewness nor the kurtosis was consistent with a multivariate Gaussian distribution; both p-values were less than 0.001 using the χ 2 distribution and the normal distribution, respectively. This finding implies that we may be able to estimate a set of combined parametersp better if we use a more sophisticated distribution such as a mixture of Gaussians [24]. We will follow this direction of research in the future.
There are two possible reasons why Tables 7 and 8 were not statistically significant: The first possible reason is that the first on-period and/or the first on-period with the period before starting the hormone therapy were too short to identify the types of patients; the second possible reason is that the number of patients used for this study was too small to conclude the usefulness for the first on-period and/or the first on-period with the period before starting the  Table 9. Comparison between the classifications used for generating the artificial data (MM) and those obtained by fitting the artificial data for the mathematical model using the proposed method (MMAD). The artificial data were generated by simulating the treatment schedule of the off-treatment period of 16 weeks followed by the first half cycle of IAS (the p-value: 0.18). Personalized Estimation of Parameters for IAS Therapy hormone therapy. We will answer this point in a future study by analyzing a larger dataset from a phase III trial [26]. The problem considered in this paper, namely estimating the combined parameters for the patient who is currently being treated using information of prior patients, is one of the typical problems we encounter in the context of mathematical medicine. We have another but complementary paper [35] on this topic, within which we extended a method of machine learning so that we can find some patients whose cancer's temporal behavior was similar to that of a current patient, and take weighted averages for the orbits of the past patients to predict when an additional treatment option is necessary for the current patient. Other problems include finding a treatment schedule such that some medical index is kept within a certain range for a finite time [36][37][38]; integrating different kinds of information for better diagnosis, treatment, and intervention; and improving the accuracy of a mathematical model while keeping its complexity. Because these mathematical problems have become clearer, it is expected that, in the future, mathematicians will play a significant role in providing efficient personalized therapy based on an individual patient's data.

MMAD
If we limit our program for choosing and/or optimizing a treatment option for the hormone therapy of prostate cancer, we now have the proposed method for estimating the initial conditions and parameters as well as some methods for optimizing the treatment schedule [36][37][38]. Therefore, we will be ready for a clinical trial that tests the validity for the mathematical optimization of the treatment schedule after we clarify whether the first on-period with the off-period beforehand is useful or the first one and half cycles of IAS is necessary for estimating the initial conditions and parameters. This remaining problem is exactly the problem discussed at the second last paragraph.
In sum, we have proposed a method for fitting a relatively short time course of bio-marker levels for a mathematical model of disease when the sets of the Initial conditions and parameters for past other patients are given. We have constructed the prior distribution from the past other patient's information and formulated a new method of time course fitting by using the Bayesian formula. In this report, we showed that we can estimate a set of the Initial conditions and parameters for the mathematical model of prostate cancer treated with intermittent androgen suppression by using only its first one and half cycles (approximately 2 years). These results are promising and are representative of the potential role of mathematical modeling and rapid learning in the context of decision making in medicine.
Supporting Information S1 File. A zipped file containing a program in C named S1_File.c for estimating the set of combined parametersp, or initial conditionsxð0Þ and parameters d m i;j from the dataset of the first one and half cycles of each patient using the prior distribution obtained from the set of the past patients. The input file should be the CSV file format as in ones at http://www. nicholasbruchovsky.com/clinicalResaerch.html, where the columns should be, from the left, the patient number (not used), the date (not used), the amount of prescribed CPA (not used), the amount of prescribed LEU (not used), the PSA level, the testosterone level (not used), the number of cycle, whether the hormone therapy is on (1) or off (0), the accumulated number of days, and the accumulated number of days starting from a different day (not used). Each line corresponds to a day where either the PSA and/or testosterone level was measured or the treatment option was changed. The output is generated in a file called fitted_params8a2c2b2.dat. The output is a single column containing the estimated combined parametersp. The values from the first line show x 1 (0), x 2 (0), x 3 (0), d 1 1;1 , d 1 2;1 , d 1 2;2 , d 1 3;1 , d 1 3;2 , d 1 3;3 , d 0 1;1 , d 0 1;2 , d 0 2;2 , and d 0 3;3 . In the last line, the program outputs the value for the cost function. In between, you find eight 0s, which may be used for the future developments. (ZIP)