A Markov Chain Monte Carlo Approach to Estimate AIDS after HIV Infection

The spread of human immunodeficiency virus (HIV) infection and the resulting acquired immune deficiency syndrome (AIDS) is a major health concern in many parts of the world, and mathematical models are commonly applied to understand the spread of the HIV epidemic. To understand the spread of HIV and AIDS cases and their parameters in a given population, it is necessary to develop a theoretical framework that takes into account realistic factors. The current study used this framework to assess the interaction between individuals who developed AIDS after HIV infection and individuals who did not develop AIDS after HIV infection (pre-AIDS). We first investigated how probabilistic parameters affect the model in terms of the HIV and AIDS population over a period of time. We observed that there is a critical threshold parameter, R 0, which determines the behavior of the model. If R 0 ≤ 1, there is a unique disease-free equilibrium; if R 0 < 1, the disease dies out; and if R 0 > 1, the disease-free equilibrium is unstable. We also show how a Markov chain Monte Carlo (MCMC) approach could be used as a supplement to forecast the numbers of reported HIV and AIDS cases. An approach using a Monte Carlo analysis is illustrated to understand the impact of model-based predictions in light of uncertain parameters on the spread of HIV. Finally, to examine this framework and demonstrate how it works, a case study was performed of reported HIV and AIDS cases from an annual data set in Malaysia, and then we compared how these approaches complement each other. We conclude that HIV disease in Malaysia shows epidemic behavior, especially in the context of understanding and predicting emerging cases of HIV and AIDS.


Introduction
Acquired immune deficiency syndrome (AIDS), caused by infection with human immunodeficiency virus (HIV), is one of the most alarming and deadly diseases in human history. The total number of people living with HIV and AIDS in 2013 was 35 million [1]. The spread of AIDS through populations has caused panic and economic disturbance. In the last three decades since the emergence of AIDS, many interdisciplinary scientific efforts have coalesced to model the spread of this disease. In 1989, Hyman and Stanley [2] generated mathematical models based on the underlying transmission mechanisms of HIV/AIDS that were used to understand and anticipate its spread in different populations. In 1991, Romieu et al. [3] presented work demonstrating how to model the spread of HIV/AIDS in Mexico City; the goal of their work was to provide a conceptual framework to help understand the transmission dynamics of infection and give a reasonable estimation of the short-term cumulative number of AIDS cases. Mathematical modeling of the spread of HIV/AIDS has become even more useful in the modern era of AIDS research. In 2011, Nyabadza [4] presented a simple deterministic HIV/AIDS model that applied ordinary differential equations to the current South African situation and considered the use of condoms, sexual partner acquisition, behavior change, and treatment; their results suggested that HIV/AIDS could be controlled through these measures. Naresh [5] calculated the spread of the AIDS epidemic with immigration among HIV-infected individuals, and the findings revealed a constant flow of immigrating susceptible individuals and individuals infected with HIV. Merli [6] presented an exploration of the implications of patterns of sexual behavior for the spread of HIV in China; this model reflected the uncertainty surrounding key parameters, and the analyses used showed a range of possible outcomes. In 1999, Kakehashi formulated a mathematical model to describe the spread of HIV/AIDS among adult commercial sex workers in Japan that was used to analyze the effect of introducing HIVinfected commercial sex workers into a population without HIV [7]. De Arazoza and Lounes (2002) outlined how a non-linear model could be used to develop an epidemic with contact tracing, specifically in Cuba. These authors suggested that to control the spread of HIV/AIDS, the target group must be in contact with individuals who carry HIV [8]. In 2008, Kim [9] formulated a simple continuous model for the transmission of HIV, although this model failed to consider the demographic parameters that have a significant impact on modeling the spread of HIV. Furthermore, most of these previous models have serious drawbacks. For instance, most of these models failed to demonstrate how the impact of AIDS causes the death of HIVinfected individuals. These models also typically describe changes in time and are therefore referred to as 'dynamic' models, where time is the independent variable. Similar work was conducted by Haario et al. (2006) [10], who proposed various strategies to combine two quite powerful ideas in the Markov chain Monte Carlo method (MCMC), adaptive Metropolis samplers and delayed rejection, to study the spread of algae.
The current study assessed the robustness of a new method for predicting the spread of AIDS among HIV-infected individuals. We used Monte Carlo-based methods, including importance sampling and MCMC approaches, which are more useful in dealing with the nonlinearity and interdependency of parameters through their application to a model describing the dynamics of HIV [11]. MCMC is one of the most important numerical techniques for creating a sample from the posterior distribution, and it has been widely used in mathematical modeling to quantify parameter uncertainties [10,12,13]. In the current study, we formulated a deterministic mathematical model to reflect the trend of AIDS cases after HIV infection, and we also applied MCMC approaches by considering the uncertainty in the model parameters and the model output to supplement the mathematical model.

Materials and Methods
We present the simplest HIV disease models where individuals classified as a sexually active population are divided into four classes: susceptible, S(t); infected, (HIV) I(t); pre-AIDS cases who did not progress to AIDS after HIV infection, A 1 (t); and AIDS cases who have AIDS after HIV infection at time t, A 2 (t). HIV can be transmitted to a susceptible person when they come into contact with an infected person via the appropriate transmission routes. In 2003, Rao [14] formulated a model for individuals who did or did not develop AIDS after the HIV epidemic in India. Unlike the model from this report [14], our model assumed that γ is the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class, which is a very significant indicator of when an intervention should be introduced. We assumed that the infected individuals are capable of having children that are either infected with HIV or will not have HIV. However, the susceptible class has a recruitment rate equivalent to the birth rate, b, which is independent of vertical transmission. Moreover, this model assumes that infected newborn babies enter the HIV class at the rate of b(I + A 1 + A 2 ), for which we assume that I, A 1 , and A 2 are sexually active, and πb(I + A 1 + A 2 ) are individuals who are infected and enter the HIV stage. The portion π of these individuals is assumed to be infected with HIV (categorized in the I class), and the complementary portion (1 − π)b(I + A 1 + A 2 ) is considered susceptible (and moves to the susceptible class S). The removal rate of infected HIV individuals who enter the AIDS class is represented by α; the portion of HIV-infected individuals is δ. This model also assumes that at rate δα, some of the HIV-infected cases transition to the AIDS group, whereas the remaining HIV-infected cases move to the class of individuals who do not develop AIDS (pre-AIDS) after an HIV infection rate of (1 − δ)α, where 0 δ 1. The model also assumes the natural death rate μ of individual deaths from all four compartments. β is the contact rate between susceptible individuals and exposed or HIV-infected individuals. AIDS patients are given an additional disease-induced mortality rate: σ > 0, ε > 0 and ρ > 0 for I(t), A 1 (t) and A 2 (t), respectively. This form of a susceptible-infected-pre-AIDS-AIDS (SIA 1 A 2 ) model can be used to model HIV disease based upon the assumption that once an individual becomes infected, that individual remains infectious for life, as shown in Fig 1. The deterministic systems of nonlinear differential equations describing the SIA 1 A 2 models of HIV disease with additional demographics (birth and death) for an individual population are listed as follows: The summation of 1)-(4) and substituting S = N − I − A 1 − A 2 into (2) are given by where N(t) represents the total population.

MCMC approach
The MCMC method consists of algorithms for inverse modeling, in particular identifiability [15], which includes the local and global sensitivity analysis, and this was employed to estimate parameter uncertainties [10,16,17]. One of the MCMC methods suited for use with dynamic models is delayed rejection adaptive metropolis [10], which was applied to analyze how the model fits the reported HIV and AIDS cases. The MCMC method was applied to sample from the probability distribution by creating a Markov chain with a set of parameters, which combine the parameter values representing the parameter distribution at the equilibrium distribution. We set the prior distribution for the parameters to θ and independent variables t (for details see [11]). Similarly, we set y to represent our system of non-linear Eqs (1)-(4) model (e.g., f(t, θ)). We also assumed that χ is the additive and the independent Gaussian error, with unknown variance σ 2 . These terms can be defined as follows: The posterior for the parameters is estimated as [16] p À yjy; s 2 Á / exp À0:5 SSðyÞ s 2 p pri ðyÞ ð 11Þ where SS is the sum of squares function SS(θ) = ∑ (y i − f(t i , θ i )) 2 and p pri (θ) is the prior distribution of the parameters. To obtain proper results from the MCMC method, a constrained least squares approach is necessary to provide initial estimates of θ i . If the non-informative prior p pri (θ) is constant for all of the values of (θ), this can be ignored. For the reciprocal of the error variance (σ −2 ), a gamma distribution is used: The reciprocal of the error variance at each MCMC step is sampled from a gamma distribution [18] as follows: where n 0 and n input arguments to the function and the number of observations, respectively [11].

Equilibrium, stability analysis and material
In this section, we describe the selection of equilibrium points to determine whether the populations are constant or changing. Additionally, we determined whether the model system was stable or unstable. At equilibrium, From (16) and (17) we get By submitting (18) and (19) into (14), and solving for I, A 1 , and A 2 , we obtain: Local stability of the equilibrium point To determine the stability of the endemic equilibrium, we assessed the eigenvalues of the characteristic equation of the corresponding Jacobian matrix, JðN Ã ; I Ã ; A Ã 1 ; A Ã 2 Þ ¼ JðE Ã Þ, which is given by The characteristic polynomial equation corresponding to J(E Ã ) is given by where By substituting the appropriate parameter values into the eigenvalues from Table 1, we obtained λ 1 = -0.0797, λ 2 = 17.42408, λ 3 = 0, and λ 4 = 9.998e-01 from the above Jacobian matrix at the point E Ã (17). Thus, E Ã is unstable because threshold parameter basic reproduction number R 0 is unconditionally greater than unity. This result represents a major concern and an unsatisfactory indicator from the public health point of view because the aim is to stabilize the epidemic at the disease-free equilibrium.
Moreover, to determine whether the disease will continue to spread, we evaluated the stability of the disease-free equilibrium point. The reproduction number R 0 is a threshold value that can be used to determine the stability of the disease-free equilibrium [19][20][21][22][23]. We write the right-hand side of system (5)-(8) as F − V with the following equations: Then, we consider the Jacobian matrices associated with F and V: The basic reproduction number of the system is obtained as the spectral radius of the matrix Source of data  Where β = the contact rate between susceptible individuals and exposed or HIV-infected individuals, α = removal rate, μ = nature death rate, δ = the portion of HIV-infected individuals, ρ = disease-induced mortality rate of A 1 (t), ε = disease-induced mortality rate of A 2 (t), σ = disease-induced mortality rate of I(t), π = the portion of individuals infected with HIV, b = birth rate, γ = is the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class.  Where β = the contact rate between susceptible individuals and exposed or HIV-infected individuals, α = removal rate, μ = nature death rate, δ = the portion of HIV-infected individuals, ρ = disease-induced mortality rate of A 1 (t), ε = disease-induced mortality rate of A 2 (t), σ = disease-induced mortality rate of I(t), π = the portion of individuals infected with HIV, b = birth rate, γ = is the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class.

Results and Discussion
This section discusses the various results obtained using Malaysian data to analyze the accuracy of our model. Ten parameters were used to enable us to determine the inaccuracies of the model and to obtain better graphical representations. Model fitting has become an art and requires a good understanding of the behavior of the applied model for the known parameters. The mathematical Eqs (1)-(4) are nonlinear and depend on constant parameters. The following sections will address the issue of determining parameter values that minimize a measure of badness-of-fit, usually a least square function or a weighted sum of squared residuals. This analysis will provide an estimate of the parameter uncertainty and will quantify the effects of that uncertainty on the data.
We chose parameters with the highest sensitivity values, as shown in Table 1. Based on the summary statistics shown in Table 1, it is clear that parameter ε has the least effect on the output variables, whereas b shows the highest sensitivity value. This result shows that HIV-infected, pre-AIDS, and AIDS individuals are born at a rate of b, the newborn baby birth rate, which is more significant than the remaining nine parameters.
are the L1 norm, this condition is referred to as the least absolute deviations, and the L2 norm is known as the least squares. The mean represents the mean of the sensitivity functions, the Min represents the minimal value of the sensitivity functions, and Max represents is the maximal value of the sensitivity functions, as shown in Table 2.  Table 2 shows that all ten parameters have a small standard error, which provides a good representative of the real data of the entire population. Because the standard error is also inversely proportional to the sample size, this implies that the larger the sample size, the smaller the standard error. There were 25 data points with a degree of freedom of 23. All of the parameters have p-values less than 0.05, which suggests that the differences of overall parameters are statistically noteworthy. Because the t-values are significantly greater and because p-values are smaller as t-values get larger, there is a difference between the four compartments (i.e., S(t), I(t), A 1 (t), and A 2 (t)).
The empirical mean and standard deviation for each variable, standard error of the mean, and their respective 95% confident intervals are reported in Table 2. The disease-induced mortality rate ε for AIDS A 1 (t) cases that did not progress to AIDS after HIV infection at time t showed the highest standard deviation, which has a significant impact on the spread of AIDS.
For comparison, the initial model output and the best-fit model are plotted against the data shown in Fig 2 and Fig 3. The following figures show the simulations from the MCMC approach for each parameter. From Fig 4, the traces of the MCMC chain (grey line) show that the chain has converged, which indicates that there is no apparent drift. The last figure also shows the error variances for each observed variable. In Fig 5, the pairs plot shows a strong relationship between parameters γ and ρ (the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class and the disease-induced mortality rate for A 2 (t), respectively). This plot visualizes the pairwise relationship in the upper panel, the correlation coefficients in the lower panel, and the marginal distribution for each parameter, represented by a histogram, on the diagonal. This figure also shows the correlation between the mean HIV reported cases and the various parameters, as well as their positive relationships.
As shown in Fig 6, the high variances were observed in the following compartment order: S > I > A 2 > A 1 , which shows the predictive accuracy of the model reflected by the variance of the predictive distribution. The large variance is due to either the uncertainties in the model or noise in data collection, and this model fit the noisy data reasonably well.

Conclusion
This study demonstrates how to model the spread of AIDS after HIV infection. As with any modeling study of such a complex system as HIV and AIDS, several assumptions were necessary to make the analysis tractable. We assumed that there were sexual interactions between the susceptible and HIV-infected populations, that infected newborn babies moved directly to the HIV class and that a fraction of the remaining population also moved to the susceptible class to increase the growth of the total population. The model also assumed that γ is the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class; this rate was not considered by the model reported in a previous study [14]. HIV/AIDS continues to infect the susceptible population if no control measures are swiftly enacted, and the endemic point, if it existed, could have been stable if all the eigenvalues were negative. The reproduction number R 0 is a threshold value or number that determines the stability of the disease-free equilibrium. If R 0 > 1, then an epidemic of AIDS occurs, and if R 0 < 1, then the disease-free equilibrium is locally asymptotically stable and disease becomes endemic. Our results show that the disease-free steady state is unstable because the basic reproduction number R 0 was 13.22657. These results show that the number of HIV cases and AIDS cases is still epidemic within the Malaysian population, and this will have policy implications for the most at-risk groups of populations, especially the HIV-infected population (Fig 2 and Fig 3). The public health implication of this instability is that HIV will continue to infect the susceptible population because in the rate of newborn babies b(I + A 1 + A 2 ), b is the parameter with the highest value compared with the other parameters. Thus, there must be effective intervention measures that will continue to minimize the spread of the HIV epidemic within the unaffected population. Furthermore, there must be effective ways to minimize the spread of pre-AIDS A 1 (t) cases that progress to AIDS after HIV infection, especially the rate of newborn babies b(I + A 1 + A 2 ), because this had the highest impact on disease spread and indicated that more infected HIV/AIDS individuals are born at these stages than at the other stages. Our results further suggest that without the intervention of antiretroviral medication (drug treatment), the rate γ at which an individual will fully move from the A 1 (t) class to the A 2 (t) class is 0.99/year. This information will assist policymakers in deciding at which stage to introduce intervention measures. The analysis presented herein with MCMC can be applied to a large class of HIV/AIDS epidemic models by taking into account both the uncertainty in the model parameters and other characteristics of the target posteriors by generating chains of samples. Contrasts were found as the posterior standard deviations exceeded the standard errors, as shown in Table 2 and Table 3. The graphical descriptions further demonstrate and support the empirical results and the long-term model prediction. We conclude that the predictive distributions generated predicted the model to a large degree of accuracy, as shown in Fig 6. Finally, there were some significant differences in the estimated parameters that will be useful to public health, potentially representing a practical and more effective way to epidemiologically model AIDS disease after HIV infection. Where β = the contact rate between susceptible individuals and exposed or HIV-infected individuals, α = removal rate, μ = nature death rate, δ = the portion of HIV-infected individuals, ρ = disease-induced mortality rate of A 1 (t), ε = disease-induced mortality rate of A 2 (t), σ = disease-induced mortality rate of I(t), π = the portion of individuals infected with HIV, b = birth rate, γ = is the rate at which an individual will fully move from A 1 (t) class to A 2 (t) class. doi:10.1371/journal.pone.0131950.t003