Development of an Agent-Based Model (ABM) to Simulate the Immune System and Integration of a Regression Method to Estimate the Key ABM Parameters by Fitting the Experimental Data

Agent-based models (ABM) and differential equations (DE) are two commonly used methods for immune system simulation. However, it is difficult for ABM to estimate key parameters of the model by incorporating experimental data, whereas the differential equation model is incapable of describing the complicated immune system in detail. To overcome these problems, we developed an integrated ABM regression model (IABMR). It can combine the advantages of ABM and DE by employing ABM to mimic the multi-scale immune system with various phenotypes and types of cells as well as using the input and output of ABM to build up the Loess regression for key parameter estimation. Next, we employed the greedy algorithm to estimate the key parameters of the ABM with respect to the same experimental data set and used ABM to describe a 3D immune system similar to previous studies that employed the DE model. These results indicate that IABMR not only has the potential to simulate the immune system at various scales, phenotypes and cell types, but can also accurately infer the key parameters like DE model. Therefore, this study innovatively developed a complex system development mechanism that could simulate the complicated immune system in detail like ABM and validate the reliability and efficiency of model like DE by fitting the experimental data.

Recently, researchers did develop several ABMs for the immune system simulation.For example, The Basic Immune Simulator (BIS) [10] is an agent-based model (ABM) that can be used to study the interactions between cells of the innate and adaptive immune systems.The BIS demonstrated that the degree of the initial innate response was a crucial determinant for an appropriate adaptive response [10].Also, the ImmunoGrid project [11] is to develop a natural-scale model of the human immune system using an ABM, that can reflect both the diversity and the relative proportions of the molecules and cells.This model will be of great value for specific applications in the field of immunology [11].
ABM has several significant advantages.First, its natural representational formalism can be employed to denote a cell's biological properties and behavior in detail [1].Second, its flexible features can be employed to reflect the real complex dynamic environment [12].However, it is difficult for ABM to incorporate experimental data, because ABM describes the system at the level of its constituent units but not at the top level [13].
DE is broadly employed to approximate experimental data and predict the progression of the immune system.For example, researchers have applied it to the case of influenza A virus (IAV) infection.Miao et al., [14] developed a differential equation model to describe the dynamic interactions among the components (i.e., epithelial cells, virus, CD8 CTLs, and antibody) in the lung.The model was used to quantify the immune responses and to estimate the key parameters in primary infection.Not limited to IAV infection, DE can also be widely used for other virus infections, such as HIV in the study of Miao et al. [9].The researchers developed statistical estimation, model selection, and multi-model averaging methods for in vitro HIV viral fitness experiments using a set of nonlinear ordinary differential equations and addressed the parameter identifiability of the model [9].
The DE has been the focus of a great deal of attention due to its great potential as a new optimization technique to solve complex nonlinear problems and widespread use in various areas [15].Compared to ABM, DE can be easily employed to solve the optimization problem by estimating a few control parameters [15].However, it has difficulty describing the details of biological systems because DE falls short in constructing a biological model to a sufficient degree, especially when faced with the simulation of complex phenomena.
To integrate the advantages of these two commonly used models, we developed an integrated ABM regression model (IABMR) and employed the IAV data set [14] to evaluate its efficiency and accuracy.IABMR employed ABM to denote each cell as an agent with three phenotypes (i.e., quiescence, proliferation and apoptosis).Then, it employed Loess regression to build a Loess model based on the input and output of ABM.The model's key parameters were optimized using the particle swarm optimization algorithm (PSO) [16][17][18][19][20][21].The concept of PSO is illustrated in the S1 File.
Next, we employed the classical greedy algorithm [22][23][24] to optimize the ABM parameter and compare the efficiency of ABM with the greedy algorithm and IABMR.The results demonstrated that IABMR not only described the immune response at the cellular level using various cells' phenotypes and possessed great potential for investigating interactions and special information for the cells but also overcame the limitations of ABM in parameter estimation.

Using ABM to describe the immune system
To describe the dynamic interactions among the components (i.e., epithelial cells, infected epithelial cells and virus) in the lung, Fig 1 is used to quantify immune responses in primary infections.
An epithelial cell in a quiescent state Ep q can be transited to three other states.Two of these states belong to the Ep cell, where P Ep q B and P Ep q Q are the probabilities for Ep q to change its state.Ep q and Ep p are two states of the Ep cell.The Ep q cell can also be differentiated into another type of the cell (Ep Ã ) with a probability P Ep q T .Once the Ep q state transits to the Ep p state with a probability of P Ep q B , it will have P Here, V represents the infective viral titer and Ndiv Ep t is used to represent the number of cells which will divided into two cells.The case of an infected epithelial cell is shown in Fig 1

(Infected epithelial cells).
The Ep Ã q state can transit to itself and Ep Ã d with the probability P Ep Ã q Q and P Ep Ã q D , respectively.The transition equations are described as the following equations. ð2:2Þ Different from the epithelial cell and infected epithelial cell, the virus is too small to be described as a discrete variable.In Fig 1(Virus), the virus is described as a continuous variable with P V D percentage of dying (V d state) and P V Q percentage of living.Here, we set Additionally, the virus can be produced by Ep Ã q with respect to the rate of π v .The case of the virus can described using the following equations.
To simulate the process of cellular immunity among the epithelial cells, virus and infected epithelial cells, an agent based model (ABM) is developed based on the diagrams and equations provided above.The parameters listed in Table 1 agree with the following rules.

Parameter Estimation
To estimate the parameters in this study, parameter vector space (H) is generated by the Sparse Grid method [25], which consists of a set of parameter vectors; each vector has 4 dimensions.The Sparse Grid method always chooses the most important points in the high dimension space to approximate the complicated surface [26][27][28].
In what follows, the input parameter of ABM is denoted by a four-dimensional vector θ, where the components θ k , k = 1,2,3,4 represents (P Ep q B ; P Ep q T ; P Ep Ã q D ; P V D ) respectively.Reported by the previous research [14], the input data θ are estimated as (6.2×10 −9 ,2.42×10 −7 ,5.98×10 −2 ,4.23×10 −1 ), which we call as the initial parameter θ 0 .In this study, we set the input parameter of ABM in the region (0,2θ 0 ) = {(θ 1 , θ 2 , θ 3 , θ 4 )2 R 4 ,0 θ K 2θ 0k ,k = 1,2,3,4}.However, according to the rules of the Sparse Grid, each component of parameter vector h2H is between 0 and 1.Therefore, we need to map the parameter vector space H generated by Sparse Grid into the region (0,2θ 0 ).The mapping function is: Where h is a parameter vector in the space H, a = 0, b = 2θ 0 .θ 1 is employed as the input parameters for the ABM to generate L sets of output data (G 1 ), which represents the number of cells in 5 days.To generate randomness for ABM, we performed Lr replicates for each set of θ 1 .Next, θ 1 and G 1 are employed to develop a Loess regression [29][30][31][32] mode M 0 .
In our model M 0 , the Loess regression is described in Eq 6.
Here, w is a weighting function and θ 1i is an input parameter of ABM, where i denotes the i-th sampling point in the parameter vector space.θ 1i represents the points in the neighborhood of x to be weighted by w depending on the distance to x. g is a key parameter in the procedure called the "bandwidth" or "smoothing parameter" that determines how much of the data is used to fit each local polynomial.G 1i is the output data value of ABM corresponding to the input data θ 1i .α and β are two coefficients of the least squares method [33] that is employed to approximate their value by minimizing the value of χ 2 in Eq 6.
Next, the particle swarm optimization algorithm (PSO) [16] is employed to locate the optimal parameter by fitting the real experimental data.PSO [17][18][19][20][21] is illustrated in the S1 File in detail, and its key equations are described by Eqs 7.1 and 7.2.
First, let S be the number of particles in the swarm.Then initialize the particle's position with a uniformly distributed random vector x i *U(lb,ub), where lb and ub are the lower and upper boundaries of the search-space, here (lb,ub) = (0,2θ 0 ).Obviously, x i can be considered as the input parameter.The particle's initial velocity is: v i *U(−|ub−lb|,|ub−lb|).Here, w is a weight function used to maintain the inertia force of each particle.Let p i be the best known position of particle i and let p g be the best known position of the entire swarm.Then, Eq 8 is Proliferation rate of Ep q (hour −1 ) 6.2×10 −9

P Ep p P
Probability value for Ep p to stain resting(hour −1 ) 9.999997518×10 −1 Probability value for Ep Ã q to stain resting(hour −1 ) 9.402×10 −1 employed as the object function for the parameter estimation.
Here, m is the time point, and n is the replicates at each time point, V 1 is the real experimental data in five days.y i is the predictive value from the Loess model based on input value x i .By using the PSO algorithm and Loess model, we can minimize the object function f obj to locate the local optimal parameter θ Ã in the region (0,2θ 0 ).
Next, we reemployed the mapping function (Eq 5) to map parameter vector space H on region (0,2θ Ã ) to generate L sets of input parameters θ 2 and n replicates for each set of θ 2 .These data will be employed as input parameters in the ABM; then, we can obtain G 2 output data with m time points.Next, θ Ã will be employed as the input data of ABM to generate the simulated experimental data set V 2 with n replicates, which will replace V 1 by adding random noise.
The normal distribution method (Eqs 9.1 and 9.2) [34] is used to add noise for each replicate of the V 2 data set and develop the simulated experimental data set N(0,α i ) denotes a normal distribution with mean 0 and standard deviation α i .Next, a new Loess regression model M 1 is built based on θ 2 and G 2 in a process similar to M 0 .We used PSO [35] to explore the optimal local parameter Estθ i by fitting the simulated experimental data V Ã 2 .Finally, we can compute average relative error (ARE) [9] for each Estθ i using Eq 10.
Here, M is the total number of ABM simulation runs for each sample.This parameter estimation process is illustrated in Fig 2.

Results
The IABMR model is developed using C++ and R program language and works in the Linux environment.

Primary data for model fitting
We used real experimental data V 1 [14] from infection of mice with the H3N2 influenza virus A/X31 strain to fit the model.This study employs data from the initial preadaptive phase constituting 0 to 5 days post-infection.The real experimental data contains 6 samples and each sample has 13 time points.The detailed experimental data information is listed in Table 2.The initial key parameters of ABM are also from the literature [14].

Obtain the sampling data using Sparse Grid function
We employed the "createIntegrationGrid" function of the R "SparseGrid" package to create three sampling data sets in the region (0, 1) (sample size: 41, 137 and 385) (listed in S1-S3 Tables).Then, these sampling data are mapped to the input parameters sets of ABM (θ 1 ) by Eq 5.The values of θ 1 are listed in S4-S6 Tables.

Estimate the parameter of ABM by fitting the real experimental data
To obtain randomness, we run data sample 41, 137 and 385 with 9,9 and 6 times.And then, we denote them as model 41×9, 137×9 and 385×6, respectively.The output data set G 1 (S7-S9 Tables) of ABM is obtained by inputting θ 1 .Eqs 7.1 and 7.2 is employed to explore the local optimum parameter θ Ã for each sampling data set listed in Table 3.

Generate the simulated experimental data by ABM
We can obtain an output of ABM V 2 by inputting θ Ã .The simulated experimental data V Ã 2 is developed from V 2 by Eq 9 by adding three levels of noise (α i ), such as ffiffiffiffiffiffiffiffiffi 0:75 p , ffiffiffiffiffiffiffiffiffi 1:50 p and ffiffiffiffiffiffiffiffiffi 3:00 p regarding to our previous study [36].Part of the simulated experimental data is listed in S10-S12 Tables.

Average relative error computing
After fitting the model to the simulated experimental data using Eqs 7.1 and 7.2, we obtain the local optimal parameter Estθ i .Then, Eq 10 is employed to compute the average relative error for each set of simulated experimental data.Here, we set the total number of ABM simulation runs as M = 100 and the three sample sizes as 5×3 (5 is time points (m), 3 is the replicates (n)),10×6 and 15×9.The values of ARE for each sample size are listed in Tables 4-6.

Evaluate the accuracy and efficiency of the IABMR model
To evaluate the accuracy and efficiency of the IABMR model in parameter estimation, we employed the greedy algorithm [22,37] with ABM to estimate the parameters.

Discussion
In this work, we developed an agent-based model (ABM) to simulate influenza A virus (IAV) infection and integrated the ABM with Loess regression to develop an integrated ABM regression model (IABMR).This model can be employed to locate the key ABM parameter by fitting the real experimental data.By inheriting the advantages of ABM, IABMR is capable of mimicking the biological system in detail.Here, IABMR not only showed quantitative changes in the system but also simulated the phenotypic switch for each cell type.Compared to the previous well-developed ODE model [14], it was possible to describe a multi-scale biological system in a very complicated external environment.IABMR Integrated with Loess regression [29] can employ classical numerical optimization methods such as the genetic algorithm [38,39] to estimate key parameter of the model, which is much faster than the greedy algorithm [22][23][24] used by ABM.These two theoretical advantages made IABMR an attractive application to simulate biological systems, compared to the ODE and ABM.
The average relative error (ARE) is commonly employed to evaluate the capacity of parameter estimation for statistical models.The smaller the ARE, the better the model's performance.Tables 4-6 showed the ARE values of four key probabilities of the IABMR under the control of the following two aspects: the number of time points collected from the preadaptive phase and the level of noise added to the simulated experimental data.Second, the ARE values increase when the noise level increases from ffiffiffiffiffiffiffiffiffi 0:75 p to ffiffiffiffiffiffiffiffiffi 3:00 p under the same number of time points, which demonstrates that the parameter estimation accuracy is higher with a smaller noise level.For instance, in the case of sample 5×3 (Table 6), the ARE value of P Ep q B has the order: P and P V D ) in the parameter have similar trends to P Ep q B (Table 6).Fig 3 compared the accuracy and parameter estimation speed between the IABMR and ABM models.IABMR is much faster than ABM in terms of locating key parameter.For example, it takes at least 54,600 runs for ABM with the greedy algorithm to make the RSS converge, but only 2310 runs for IABMR with the largest size of parameter space to make the RSS converge.Additionally, the size of the parameter vector space has high impact on the parameter estimation accuracy.The larger the size, the more accurate the estimated results.As described by the Fig 3, model 41×9 has the greatest RSS and model 385×6 has the least RSS.Meanwhile, the trends of the ARE values in Tables 4 and 5 are not as perfect as in Table 6.Lastly, Fig 4 demonstrated that the IABMR simulation results had high similarity like the ODE to approximate the real experiential data, which validated the efficiency and accuracy of the IABMR.
In conclusion, this study developed an IABMR method to simulate detailed biological systems and locate their key parameter using classical numerical optimization methods.By integrating the advantages of both the ABM and ODE modes, it not only described the complicated microenvironment of the biological system and the cell's behavior in multiple scales in detail, but also easily to incorporate real experimental data.To evaluate the efficiency and accuracy of IABMR, we employed primary influenza infection data as the case study to exhibit the advantages of the IABMR.The validation results demonstrated that IABMR could mimic the immune system on multiple levels similar to ABM and approximate real experimental data similar to ODE with a reasonable parameter estimation cost.

Fig 4
Fig 4 illustrates that IABMR can approximate the primary data with a similar effect as the ODE model [14].

Fig 4 .
Fig 4. Comparison between IABMR and ODE.doi:10.1371/journal.pone.0141295.g004 probabilities to become Ep p and Ep Ã q , respectively.With respect to the above state transition diagram (Fig 1 (Epithelial cells)), the state transition equations for epithelial cells are developed as follows.

Table 2 .
Real experimental data between 0 to 5 days.

Table 4 .
The summary table of ARE values for model 41×9.

Table 5 .
The summary table of ARE values for model 137×9.

Table 6 .
The summary table of ARE values for model 385×6.