Entropy Based Modelling for Estimating Demographic Trends

In this paper, an entropy-based method is proposed to forecast the demographical changes of countries. We formulate the estimation of future demographical profiles as a constrained optimization problem, anchored on the empirically validated assumption that the entropy of age distribution is increasing in time. The procedure of the proposed method involves three stages, namely: 1) Prediction of the age distribution of a country’s population based on an “age-structured population model”; 2) Estimation the age distribution of each individual household size with an entropy-based formulation based on an “individual household size model”; and 3) Estimation the number of each household size based on a “total household size model”. The last stage is achieved by projecting the age distribution of the country’s population (obtained in stage 1) onto the age distributions of individual household sizes (obtained in stage 2). The effectiveness of the proposed method is demonstrated by feeding real world data, and it is general and versatile enough to be extended to other time dependent demographic variables.


Introduction
Predicting demographic trends (DT) [1] in the light of emerging complex processes [2] of the 21st Century continues to be an important and open research topic. Understanding developments and the changes in population is critical in assisting governments in targeting policies for the future and saving money for education, public health, retirement, transportation, energy consumption among others [3] [4]. Specifically, DT refers to the changes in the joint distribution between population with time, age or other demographic factors, such as household's size, health measures, economic status, religious affiliation, education, marriage, etc [5][6] [7].
Forecasting DT is a challenging task, and remains to be a fundamental concern in both basic and applied ecology [8]. The complexity lies in the DT'S intricate connectivity to the heterogeneous activities of a large group of individuals, and it is impacted by observed and unobserved time dependent factors [9] [10]. Existing methods such as the least square methods [11] [12] [13] and Bayesian inference [14], in spite of being the most extensively used procedures in estimating and predicting various engineering problems, fail to capture the driving mechanisms of complex processes that shapes DT [15]. There are very few literatures on building optimization models for understanding DT. Typical approaches involve incorporating factors such as environmental [16] [17] [18], demographic [19] and/or observer-related covariates [20]. However, data to support and verify such techniques is often not readily available as [21]- [23] suggesting that building an optimization model constrained by limited data to characterise DT is fundamentally important with a lot of potential applications.
Entropy-based methods, the measure of the uncertainty in random variables, have been successfully applied to many modelling and estimation problems, as seen in [24][25] [26] [27]. In this paper, we introduce the entropy-based method to estimate DT. We build the model motivated by our empirical observation that the age distribution of population follows an increasing entropy trend. The paradigm is based on minimizing the entropy-based objective function and incorporating some parameters describing the historical trends into the constraints where the dynamic and intrinsic properties can be reflected. We illustrate this procedure by estimating the evolution of demographic distributions over ages and household sizes. Our work involves a three-fold modeling stages. Firstly, an "age-structured population model" based on Leslie matrix [28][29] [30][31] is used to predict the age distribution of a country's population. This makes the modelling of the demographic temporal distributions become possible, as one usually needs to project the age distribution of population into other factors. Secondly, the age distribution of each household size is estimated based on a proposed entropy-based model, where we propose an entropy formulated cost function and incorporate the DT into the constraint conditions. The model applied in this stage is called "individual household size model". Finally, the age distribution of the country's population (obtained in stage 1) is projected onto the age distributions of individual household size types (obtained in stage 2), which we refer to as "total household size model". Note that our estimation does not rely on any observed determinant on the formation of households. The evolution of the household size is estimated based on the historical information and the entropy principle.
To compare with existing works [3][32], our method predicts DT with limited information [33]. The output is a joint distribution of age and other demographic variables over time. Among its applications will be on policy analysis, economic forecasting and urban planning and so on. For the purpose of illustration, we use the population data from US Census and predict the age DT for each household size in 2010, based on the historical data in 2000 and 2006. The remaining parts of the paper are organized as follows. Section 2 lists the definitions and notations which are used throughout the article. Section 3 presents the three stages for the estimation of DT. The simulation results based US data are illustrated in Section 4 and we conclude the article in Section 5.
i: The age index (i = 0, 1, . . ., A upper ). P i (t): The population for the people at age i (older than i but younger than i + 1) in the year t. P(t) = [P 0 (t) . . . P i (t) . . . P A upper (t)] T : The population vector for the people at all ages in the year t.              (12). ω 1 , . . ., ω k 0 : The weights defined in Model (12) and Formula (13).
H: The hessian matrix.
x j (t): The number of household size j (j = 1, . . ., m 0 ) in the year t.
W: A weighting matrix in the total household size Model (16).
τ: A parameter in the matrix W in the total household size model (Formula (18)).
u: A small positive weight parameter in the total household size Model (16).
F : Predicted weighting matrix by collecting the predicted age distributions of all household sizes.
Three stages for forecasting the demographic trends using an "age-structured population model" to predict the population in the year t + 1. Stage 2: using an "individual household size model" to estimate the age distribution for each household size j based on data in the historical years where the DT reflected in the previous years can be incorporated into the constraint conditions. Stage 3: Combining the results from Stages 1 and 2, and employing a "total household size model" to predict the number of each household size. We detail in the next subsections each of the three stages shown.
Age-structured population model: for estimating age distribution of the population We consider the population as a summation of all the organisms of the same group or species, who live in the same geographical area, and have the capability of inter-breeding. Quite frequently, the prediction of demographic temporal distributions is highly linked to the population's age-structure. Demographic temporal distribution modeling is achievable using the "agestructured population model" since it allows projection of the age distribution into other factors. Assumptions. We apply the Leslie matrix method [28]- [31] that assumes: a. There is no plague, disaster or war that will lead to abrupt changes in age specific death rate.
b. Statistical variables such as birth rates and birth ratio are slowly changed and predictable.
c. The fertility rate for both local residents and immigrants is the same.
d. All people who are older than A upper are in the same age group. Here, we set A upper = 90.
Problem formulation. We first consider the case without immigration and emigration. In the year t + 1, the number of people at age i + 1 is where t and t + 1 denote the current year and the next year, respectively, and i = 0, 1, . . ., A upper − 1 is the age index. When i = A upper , we have Let [i 1 , i 2 ] be the age interval that a female has the ability to give birth. Then, P 0 (t + 1) = N 0 (m, t + 1) + N 0 (f, t + 1) and where Ratio mf (t) is a ratio of the newly born boys (N 0 (m, t)) to the newly born girls (N 0 (f, t)) at year t. Let be the vectors of the population, male population and female population, respectively, for ages between 0 and A upper at year t. Next, we extend the model to take into account of immigration effects. Let Immig(m, t)/ Emig(m, t) and Immig(f, t)/Emig(f, t) be the respective immigrants and emigrants vector for males/females at year t. We obtain the "age-structured population model" as follows: where A(t), B(t) and C(t) are the matrices constructed based on Eqs (2)-(4), and given by Note that the population data we collected allows us to estimate the values of all the above parameters (such as the fertility rates and death rates). These parameters change slowly and are predictable which confirm the validity of our assumption. Thus, the population distribution for the coming years can be predicted based on the age-structured population Model (5), and its estimation is denoted asPðt þ 1Þ for the year t + 1 as shown in Model (16) later.
Individual household size model: for estimating age distribution for each household size In this section, we will describe in detail our individual household size model that estimates the age distribution of each household size. The model is operated by minimizing an entropy based objective function and using the historical trends as constraints, where both the dynamic and intrinsic properties are reflected.
Let p i (t) be the probability that a person is at age i in year t. We define an entropy function for year t as follows: where P A upper i¼0 p i ðtÞ ¼ 1 and p i (t) ! 0. Fig 2 plots the entropy of the age distribution based on the population data collected from six countries. In general, the entropy of the age distribution increases monotonically with respect to time in most countries. This observation suggests that we can estimate the age distribution of a particular household size based on entropy concepts. To this end, we divide the household size into n 0 types: i.e., 1 person per household, 2 persons per household, . . ., until n 0 persons per household.
Let j be the household size index and assume that we already have the age distributions for each household size j (j 2 {1, . . ., m 0 }) in the years t, t − 1, . . ., t − k 0 , which are denoted as q j i ðt À kÞ for κ = 0,1, . . ., k 0 . Let p j i ðt þ 1Þ represent the percentage of persons at age i in household size j in the year t + 1. This means we group the people whose ages are above 90 years together. Our objective is to estimate the age distribution p j i ðt þ 1Þ in the year t + 1 based on the historical data.
We group the people from 0 to A upper years old into n 0 groups, i.e., the groups G n for n = 1, . . ., n 0 , where n 0 ( A upper . The age interval for the group G n is [0, A n ] and 0 < A 1 < A 2 < . . . < A n 0 = A upper . It is easy to see that G n−1 & G n . Define a j n ðtÞ as a parameter such that a j n ðtÞ ¼ P A n i¼0 p j i ðtÞ ð10Þ which means that a j n ðtÞ is a ratio of people in group G n , i.e., in the age interval [0, A n ], to the population in household size j. Note that 8j 2 {1, . . ., m 0 }, α n 0 (t + 1) = 1 since A n 0 = A upper . Letã j n ðt þ 1Þ ¼ a j n ðt þ 1Þ À a j n ðtÞ ð11Þ be the parameter which reflects the percentage change of the ratio a j n ðtÞ from the year t to the next year t + 1.
From here, we build an individual household size model to predict the age distribution fp j i ðt þ 1Þg i for each household size type j where j = 1, 2, . . ., m 0 , by optimizing the following: Again, given that the entropy of the population is monotonically increasing with time, we can minimize an entropy based cost function under some constraints by employing the historical data. Compared with Eq (9), we omit the minus sign "−" such that the model becomes a minimization problem. The upper limit of such entropy as t = +1 is a uniform distribution with a histogram function having a constant 1/A upper magnitude. Essentially, there are two parts in this cost function where o Ã 0 is a small positive weight parameter. The first part is the cross entropy distance (KL distance [34]) between fp j i ðt þ 1Þg i and the historical data, and the second part is the relative entropy distance between fp j i ðt þ 1Þg i and population distribution when t = +1.
Note that we can never know the value ofã j n ðt þ 1Þ at the year t as we do not know a j n ðt þ 1Þ. However, it can be estimated from the historical data as: where ω κ for κ = 1, . . ., k 0 + 1 are decreasing weights, which implies that the more recent data is more valued. Let x j n ðt þ 1Þ ¼ã be an error term of the estimation, then we havẽ the distribution of x j n ðt þ 1Þ is known and bounded within ½À x; x. Usually x j n ðt þ 1Þ can be assumed as a random variable uniformly distributed in ½À x; x. We now have that: Theorem 1. The optimization problem defined in Model (12) is a strict convex optimization.
Proof. Note that the Hessian matrix H of the objective function is given by: Since p j i ðt þ 1Þ ! 0 for all i and j, it is easy to see that H is a positive definite matrix. On the other side, it is known that the constraints of the optimization problem in the Model (12) are linear. Therefore, the feasible domain is a convex set. Both the objective function and the feasible domain are convex, hence the problem is a convex optimization. Note that one only needs to find a local minimum point of a convex optimization to obtain the global minimum point [35] [36][37] [38].

Total household size model: for estimating the number of each household size
In this section, we build a total household size model to further estimate the number of each household size j for j = 1, 2, . . ., m 0 based on the predicted age distribution of population and age distribution of each individual household size. Here, our objective is to estimate the number of household size j for j = 1, 2, . . ., m 0 in the year t + 1.
Let x j (t) be the number of household with size j in the year t and denote that XðtÞ ¼ ½x 1 ðtÞ ::: x m 0 ðtÞ T . We hope to estimate the vector Xðt þ 1Þ ¼ ½x 1 ðt þ 1Þ ::: x m 0 ðt þ 1Þ T . As mentioned, the first stage is to obtain the estimated total population distributionPðt þ 1Þ based on the current fertility rate and death rate. The second stage is then to obtain the estimated age distribution of each household type j denoted asp j ðt þ 1Þ. Now we estimate the household number distribution by solving the following total household size model: where jj.jj is the L 2 norm, and X(t + 1) ! 0 means each component of X(t + 1) is nonnegative, andF is a weighting matrix collected from the the predicted age distributions of all household sizes:F ¼ The above objective function contains two parts with u being a small positive weight parameter. The first part is the distance between the estimated age distribution for population and the accumulative of the age distribution for all household sizes. The other part is the weighted distance of the estimated X(t + 1) (denoted asX ðt þ 1Þ) to X(t). As there are j persons in the household size j, we construct a diagonal weighting matrix W with a given power τ > 0 in Eq 18. As shown in Theorem 2, the optimization of Eq (16) is also convex.
Theorem 2. The optimization problem defined in Model (16) is convex. Prof. The proof is similar to Theorem 1. The Hessian matrix of the objective function in Eq (16) is Obviously H is a positive definite matrix and we have this theorem holds.

Simulations
In this section, we illustrate the procedure we have discussed above using the US's Census population data. We predict the demographic distribution in the year 2010 based on the historical data in years 2000 and 2006. The prediction is then compared with the actual Census data in the year 2010. We show that the method we described here accurately captures the actual statistics. As mentioned, there are three stages in the estimation: Stage 1: Estimating the age distribution of the population by employing the age structure based population model in Section 3A.
Stage 2: Estimating age distribution for each household size type by employing the individual household size model in Section 3B.
Stage 3: Estimating the number of different household size type by employing the total household size model in Section 3C.
In Stage 1, we collect the population data from the US Census and get the values of all parameters that are required in Model (5). By solving this model, we obtain the estimation of the population in the year 2010 based on the the data in the year 2000 and 2006 in Fig 3. In Stage 2, by letting ω = [0.95 0.025 0.025] and assuming that the error term bound x ¼ 0, we divide the population into 9 groups (G n , n = 1, 2, . . ., 9) and let A n = nÁ10. By solving the individual household size model (12), the age distributions for the household sizes j = 1 and j = 2, . . ., 7 are obtained in Figs 4-10, respectively. It is seen that the individual household size model predicts accurately the age distribution of all household sizes.
In stage 3, the numbers of each household size are estimated by solving the total household size Model (16). As seen in Fig 11, the difference between the estimation and the real values is quite close, which again shows the accuracy of our proposed method.
In addition, we also look at the cases when the error term bound implying that x k ðt þ 1Þ is randomly distributed in the interval [−2 2]. By repeating the

Discussion and Conclusions
In this paper, we have demonstrated a new method that estimates the development of age and household's size distributions. The procedure consists of three models in three coupled stages, we referred to as: the age-structured population model in stage 1 where the age distribution of countries' population was predicted; the individual household size model in stage 2 where the age distribution of each individual household size was estimated; and the total household size model in stage 3 where the number of different household sizes was derived by projecting the age distribution of total population onto the age distributions of individual household sizes. The procedure described here indicates that demographic trends can be accurately estimated using entropy as an optimisation variable, which we believe will be of potential interest to both academics and practitioners alike. We have illustrated and validated the correctness and accuracy of the proposed method using US data. While we have considered age and household size distributions in this article, we note that the method we have demonstrated is general and versatile enough to be extended to other time dependent demographic variables.