Using a penalized likelihood to detect mortality deceleration

We suggest a novel method for detecting mortality deceleration by adding a penalty to the log-likelihood function in a gamma-Gompertz setting. This is an alternative to traditional likelihood inference and hypothesis testing. The main advantage of the proposed method is that it does not involve using a p-value, hypothesis testing, and asymptotic distributions. We evaluate the performance of our approach by comparing it with traditional likelihood inference on both simulated and real mortality data. Results have shown that our method is more accurate in detecting mortality deceleration and provides more reliable estimates of the underlying parameters. The proposed method is a significant contribution to the literature as it offers a powerful tool for analyzing mortality patterns.


Introduction
Human death-rate patterns are astoundingly log-linear over a wide range of adult ages.The Gompertz distribution (Gompertz, 1825) with an exponentially increasing hazard function captures this accurately.The theory of unobserved heterogeneity and the associated frailty model (Vaupel et al., 1979) predicts a downward deviation at the oldest ages, to which only the most robust individuals in the population survive.Detecting such a deceleration in real data is not always successful (Gavrilova and Gavrilov, 2015;Newman, 2018), even though the vast majority of studies indicate that death rates at older ages increase at lower rates and can even level off (Curtsinger et al., 1992;Fukui et al., 1993Fukui et al., , 1996;;Carey et al., 1995;Khazaeli et al., 1998;Gampe, 2010Gampe, , 2021;;Rootzén and Zholud, 2017;Alvarez et al., 2021;Camarda, 2022;Belzile et al., 2022).In a frailty model setting, testing for mortality deceleration is equivalent to testing whether the non-negative frailty parameter is strictly positive.
Formally, denote by X a non-negative continuous random variable that describes individual human lifespans (complete or after a given adult age).If X ∼ Gompertz(a, b), where a is the mortality level at the initial age and b is the rate of aging, the associated hazard function (force of mortality) at time x µ(x) = lim ε↓0 P(x ≤ X < x + ε|X ≥ x).
is µ(x) = ae bx .Vaupel et al. (1979) introduce a positive continuous random variable Z, called frailty, that acts multiplicatively on µ(x) and captures one's unobserved susceptibility to death.The force of mortality for an individual with frailty Z = z is For a gamma-distributed frailty with E(Z) = 1 and VAR(Z) = σ 2 , the force of mortality of the population, i.e., the marginal hazard is (see Vaupel et al. (1979) and Vaupel and Missov (2014) for all technicalities).Note that the variance of Z is often denoted by γ (e.g., in Vaupel and Missov, 2014) because it is also equal to the squared coefficient of variation of the distribution of frailty among survivors to any age x.If σ 2 > 0, the force of mortality for the population μ(x) starts deviating from the exponential pattern with increasing x and reaches an asymptote b/σ 2 .When σ 2 = 0, i.e., when there is no unobserved heterogeneity, the model for the population reduces to the (Gompertz) model for individuals with an exponentially increasing hazard function µ(x) = ae bx .Testing for mortality deceleration in this setting reduces to statistical testing whether σ 2 = 0 given the alternative σ 2 > 0. The frailty parameter σ 2 can take a value on the boundary of the parameter space (σ 2 = 0).This violates the standard underlying assumptions about the asymptotic properties of likelihood-based inference and statistical hypothesis testing (see, for example, Böhnstedt and Gampe, 2019).As a result, the asymptotic distribution of the maximum likelihood estimator may not be Gaussian.
In this paper, we treat the problem of identifying whether σ 2 > 0 or σ 2 = 0 as a model misspecification problem, i.e., we consider the gamma-Gompertz model when it is the Gompertz model that actually holds.In this setting, we suggest subtracting a penalty from the log-likelihood function.This penalty will be responsible for shrinking σ 2 to zero when there is no heterogeneity, as well as for adding a small bias to the Maximum Likelihood Estimator (MLE) when the effect of unobserved heterogeneity is non-negligible.We carry out Monte Carlo simulation experiments to evaluate the accuracy and precision of the estimates obtained by maximizing the likelihood function, on the one hand, and the penalized likelihood function, on the other.
In Section 2, we formulate the model misspecification problem and introduce inference methodology taking advantage of the maximum a posteriori probability (MAP).Then we carry out a Monte Carlo simulation study to compare the performance of maximizing a standard and a penalized likelihood.In Section 3, we compare the latter on mortality data for France, Japan and the USA.Section 4 discusses the advantages and drawbacks of applying our method to detect heterogeneity (deceleration) in mortality patterns.

Methodology
Suppose X is a random sample with a cumulative distribution function G(x), and we fit the incorrect family of densities {f (x; θ), θ ∈ Θ} to the data using MLE.The misspecified log-likelihood is Applying the law of large numbers, we get in the limit what the misspecified log-likelihood function (θ; X) looks like for each θ ∈ Θ (see the right-hand side below): Assume there is no heterogeneity in the data (σ 2 = 0), and we fit a gamma-Gompertz model.In other words, we observe an exponential death-rate increase in the data, but we estimate a model that implies a downward deviation from the exponential at the oldest ages.As shown in (2), we will estimate σ 2 close to but never equal to zero.
In this model setting, the standard technique is to estimate both the Gompertz and the gamma-Gompertz models and compare their goodness of fit.However, minor changes in the data can result in different models being selected, which can reduce prediction accuracy and lead to misinterpretations about the mortality deceleration and the mortality plateau.Böhnstedt and Gampe (2019) derive the asymptotic distribution of the likelihood ratio test statistic to detect heterogeneity.Here, we would like to suggest an alternative that does not involve hypothesis testing.Using the latter has been widely discussed and rethought in the Statistics community (Berk et al., 2010;Head et al., 2015;Vidgen and Yasseri, 2016;Bruns and Ioannidis, 2016), especially in relation to the arbitrary choice of the α-level (most often 0.1, 0.05, or 0.01) and sample size issues.
Maximum likelihood estimators, obtained by maximizing the log-likelihood function, often have low bias and large variance.Estimation accuracy can sometimes be improved by shrinking some parameters to zero (Tibshirani, 1996).The associated shrinkage estimator improves the overall prediction accuracy at the expense of introducing a small bias to reduce the variance of the parameters.This class of estimators is implicit in Bayesian inference and penalized likelihood inference.Using shrinkage estimators is applied as an alternative to hypothesis testing.Lasso, Ridge and Stein-type estimators are the most widely used examples of penalizing methods (see, for example, Hastie et al., 2009).

Inference
Let D x be the number of deaths in a given age interval [x, x+1) for x = 0, . . ., m, and E x denote the number of person-years lived in the same interval (see, for example Brillinger, 1986;Macdonald et al., 2018).Define D = (D 0 , D 1 , . . ., D m ) and E = (E 0 , E 1 , . . ., E m ) .In addition, let θ = (a, b, σ 2 ) ∈ Θ be the parameter vector that characterizes the force of mortality at age x of the gamma-Gompertz model given by (1).
Let us now define a penalized log-likelihood function as where (θ) is the standard log-likelihood (3), while p(σ 2 ) is a penalty function.The penalized maximum-likelihood estimate is obtained by maximizing p (θ) with respect to θ = (a, b, σ 2 ) .
For the problem addressed in this paper, the penalty function p(σ 2 ) must be a non-decreasing monotonic continuous function and lim In a Bayesian framework, maximizing (4) is equivalent to maximizing a posterior distribution in a setting, in which e −p(σ 2 ) /C p , C p := Θ e −p(σ 2 ) ∇θ < ∞, is taken as a prior distribution of θ.This procedure yields the maximum a posteriori probability (MAP) estimator.MAP is the only Bayesian estimator that minimizes the expected canonical loss (Pereyra, 2019) and is widely used in image and video processing (Greig et al., 1989;Afonso et al., 2010;Belekos et al., 2010).
As σ 2 describes the variance of frailty at the starting age of analysis, the standard approach would be to specify an inverse gamma prior distribution for it (Gelman et al., 1995).The inverse gamma distribution is heavy-tailed and keeps probability mass further from zero than the gamma distribution.In addition, while the inverse-gamma mode is always positive, the gamma mode can also be zero (Llera and Beckmann, 2016).As we aim to test whether σ 2 = 0 or σ 2 > 0, we will use the log-kernel of the gamma distribution to define the penalty function as for some non-negative λ.When λ < 1, using ( 5) is equivalent to specifying a gamma prior distribution for σ 2 with parameters α = 1 − λ and β = λ.When m → ∞, the effect of the penalty diminishes regardless of the size of λ.For human life table data m is finite, thus λ ≥ 0 is a constant that controls the relative impact of the penalty function on the estimates.When λ = 0, the penalty term has no effect, and maximizing the penalized likelihood will produce the standard maximum likelihood estimates (MLE).However, as λ → ∞, the impact of the penalty grows, and the maximum penalized likelihood estimates for σ 2 will approach zero, providing high precision, but low accuracy.
Choosing λ is sensible in a wide range of applications (Li et al., 2009;Bhattacharya and Mc-Nicholas, 2014).Therefore, in accordance with the recommendations in Li et al. (2009), we carry out a pilot simulation study, in which we find that choosing λ = 1 2 provides similar precision to the one by MLE when σ 2 > 0, but better accuracy and precision when σ 2 = 0 (simulation results are presented in the next subsection).As a result, the final expression for the penalized log-likelihood we propose is From a Bayesian perspective, choosing λ = 1 2 provides an informative prior distribution for σ 2 .As for human populations we are likely to estimate σ 2 < 1 (Missov, 2013), the specified prior will provide for σ 2 a distribution with a mode equal to zero, a median equal to 0.4549, and mean equal to 1. Furthermore, the prior provides a probability mass of 0.6826 in the interval (0, 1]. Figure 1 shows the log-likelihood and penalized log-likelihood functions for all parameters when σ 2 > 0 (first row) and σ 2 = 0 (second row).When σ 2 > 0, the penalty function affects neither the shape of the log-likelihood nor the location of its maximum.However, when σ 2 = 0, adding a penalty yields a higher maximum at 0. Moreover, when σ 2 = 0, the first and second derivatives of the penalized log-likelihood are higher than their respective counterparts of the loglikelihood.As a result, derivative-based optimization methods may reach the maximum point faster, and the estimator σ2 may have a smaller variance.

Monte Carlo simulations
We carry out Monte Carlo simulations to explore the performance of the MAP and ML methods in estimating the gamma-Gompertz model parameters.We use the R software (Team et al., 2022) to maximize the log-likelihood and the penalized log-likelihood functions via the optim function applying as a pre-step differential evolution (Storn and Price, 1997;Ardia et al., 2011).The performance of the ML and MAP estimators are evaluated by calculating two measures: the bias and the standard deviation.
We generate 10,000 random samples from this model for some parameter values (scenarios with sample sizes of 2,000 and 5,000 were also considered, and are presented in the appendix).
From these samples, we generate life tables and use them to estimate model parameters via the MAP and MLE methods.This process was repeated 2,000 times.In the presence of unobserved heterogeneity, the true parameter values are a 1 = 0.0001 and a 2 = 0.00001 for a, b 1 = 0.1 and b 2 = 0.15 for b, and σ 2 1 = 0.2 and σ 2 2 = 0.8 for σ 2 .When there is no heterogeneity (σ 2 = 0), the true parameter values are a 1 = 0.0001, a 2 = 0.0003 and a 3 = 0.0005 for a, and b 1 = 0.09, b 2 = 0.10 and b 3 = 0.11 for b.
In the absence of unobserved heterogeneity, the ML estimator provides again a smaller bias for a and b than the MAP estimator.However, in this case, the MAP method estimates more precisely the frailty parameter σ 2 , with a bias and a standard deviation close to zero (∝ 10 −15 ).The MAP estimator also provides a slight reduction in the standard deviation of parameter b.
By the Monte Carlo simulation we also calculate the proportion of trials in which MAP estimates σ 2 > 0 when the true values is σ 2 = 0 (error type I), as well as the proportion of trials in which MAP estimates σ 2 = 0 when the true values is σ 2 > 0 (error type II).Based on our simulations, the type I errro equals 0.001502, while the type II error is 0.001126.
The Monte Carlo simulations show that using a penalizing likelihood function ( 6) is an alternative to hypothesis testing, the latter being dependent on the asymptotic distribution of the ML estimator, sample size and the arbitrary choice of the α-level (Böhnstedt and Gampe, 2019).

Performance of MAP and ML estimators on HMD data
In this section, we estimate the gamma-Gompertz model via ML and MAP using mortality data from the Human Mortality Database (HMD, 2022).We take exposures and raw death counts for the female population of France, Japan and the USA in the years 1960, 1980, 2000, and 2020, after age 70.We apply again R (Team et al., 2022) to compute the ML and MAP estimates of θ = (a, b, σ 2 ) by using differential evolution.We use the mean squared error given by to assess the goodness of fit.Table 2 shows the results of applying ML and MAP methods to the datasets described above.The MAP estimator provides lower MSEs in 8 of the 12 datasets.When the standard ML method estimates σ 2 < 10 −4 , our novel method estimates σ 2 = 0 and provides a smaller MSE.This suggests that the MAP provides a slightly better fit to the data.Overall, MAP performs better than ML when unobserved heterogeneity is not detected, and while for estimates of σ2 > 0 ML has a slight advantage.
The results from the real-data application back up the results from the Monte Carlo simulations in Section 2. In the presence of unobserved heterogeneity, the MLE method provides the most precise and accurate estimates.The MAP method, though, has just slightly lower precision.On the other hand, in the absence of unobserved heterogeneity, the MAP provides a smaller bias and variance in its estimates compared to MLE.

Examples when MAP and ML estimators yield different outcomes
Using MAP and ML estimators does not always lead to the same statistical inference.One of them can detect heterogeneity in cases when the other does not.We will illustrate this on HMD data for the Japanese female population in 2009 and the French female population born in 1848, ages 70+.To assess the goodness of fit, we will use MSE again.
For Japanese females in 2009, ML yields estimates θMLE = (0.006359, 0.133805, 0.070513) with standard errors SE(a) = 0.000188, SE(b) = 0.002263 and SE(σ 2 ) = 0.021156.The 95% confidence interval for σ 2 is (0.029047, 0.111978) indicating stasitically significant unobserved heterogeneity, i.e., the existence of mortality deceleration.On the other hand, the MAP method estimates θMAP = (0.006966, 0.125440, 0) , indicating the absence of unobserved heterogeneity.Comparing the goodness of fit of both methods speaks in favor of the MAP outcome: MAP's MSE is by 37% lower than ML's LSE (0.018691 for MAP vs 0.029958 for ML).It indicates that unobserved heterogeneity is negligible and that the gamma-Gompertz model is misspecified.The left panel of Figure 2 shows that both methods estimate a similar logarithmic force of mortality at most ages.However, after age 100, the MLE deviates downward from the observed logarithmic death rates.
Furthermore, while the MAP estimate of σ 2 suggests that there is non-negligible unobserved heterogeneity, the ML estimate and standard error for σ 2 indicates the opposite: the amount of unobserved heterogeneity is not statistically significant.The right panel of Figure 2 shows the difference between these estimates.MAP's estimate shows a leveling-off in the force of mortality, while the MLE shows a log-linear increase in the hazard function.

Comparison between MAP and ML estimators for different populations
To evaluate and compare empirically the performance of the ML and MAP methods we are going to use them to estimate the force of mortality for the male and female populations of France, Denmark, Sweden, Italy, Japan, Czechia, and the United States of America from 1950 to 2019, resulting in 980 populations.To access the goodness of fit we are using the MSE.
Figure 3 presents the method that provides a better fit (which has the smallest MSE).Overall both methods provide similar goodness of fit, with MAP providing a slightly lower MSE (on average 0.5% smaller than the ML method).Over the 980 populations, the MAP provides a better fit for 502 of them.For Czechia, Sweden, and France, the MAP also provided a slightly better fit than the ML method; however, the MSE difference is small within the country.We also estimate the standard error for the ML estimates of σ 2 to test if the parameter is not statistically significantly different from zero.The results were compared with the MAP estimate.Figure 4 presents for which populations σ 2 is zero through the ML and MAP method.From both methods, it is clear that the non-estimation of unobserved heterogeneity is related to the male populations -especially for the Danes.It might be caused by the small male population surviving after age 105.
In about 96% of the populations, the methods agree on the significance or non-significance of σ 2 .However, when the MAP estimates σ 2 = 0 the ML provided positive confidence intervals for about 5.1948% of the cases, which corresponds to the chosen α-level (probability of Type I error) of 5% for those intervals.

Concluding remarks
Böhnstedt and Gampe (2019) introduced a formal procedure to identify whether σ 2 > 0 or σ 2 = 0 in a hypothesis testing setting: they studied the asymptotic properties of the maximum likelihood estimator and the likelihood ratio test (LRT) for H 0 : σ 2 = 0 vs. H 1 : σ 2 = 0 for the gamma-Gompertz model.However, LRTs are based on the asymptotic distribution of the maximum likelihood estimator, hence its convergence depends on the sample size.Moreover, conclusions drawn from hypothesis tests are dependent on the arbitrary choice of the significance level or p-value.
We suggest an alternative method by considering the problem as model misspecification.We add a penalty function to the likelihood so that we make sure that σ2 = 0 when there is no heterogeneity.We also present a Bayesian interpretation (MAP) to our method.We take advantage of robust Monte Carlo simulations to measure the bias and standard deviation of the ML and MAP methods in scenarios with and without unobserved heterogeneity.We also compare the performance of both methods for estimating the gamma-Gompertz model parameters using actual mortality data from the Human Mortality Database.The two methods work almost equally well, the ML having a slight advantage, in the presence of unobserved heterogeneity.However, in the absence of the latter, the MAP method provides an estimate closer to 0 (σ 2 ≈ 10 −20 ) and a better fit to the model in comparison to ML.As a result, the method we propose here can be used as an alternative to likelihood ratio testing for the gamma-Gompertz model with H 0 : σ 2 = 0 vs. H 1 : σ 2 > 0. On the one hand, the MAP method does not depend on any asymptomatic distribution, its performance is not strongly affected by sample size, and it also does not depend on the arbitrary choice of the significance level.On the other hand, MAP provides similar estimates to the ones by ML when σ 2 > 0 and more accurate estimates when σ 2 = 0.

Figure 1 :
Figure 1: Plots of the profile log-likelihood (first row) and penalized log-likelihood functions (second row) of the parameters.In the first row we used synthetic data from a gamma-Gompertz model with parameters a = 0.0001, b = 0.1 and σ 2 = 0.1, in the second row we from a Gompertz model with parameters a = 0.0001 and b = 0.1.

FranceFigure 2 :
Figure 2: MAP and MLE estimates of the force of mortality for the Japanese population in 2009 and the Swedish population born in 1881, after age 70

Figure 3 :
Figure 3: Method providing the best goodness of fit.

Table 1 :
Simulation results: gamma-Gompertz model and sample size 10,000.

Table 3 :
Simulation results: gamma-Gompertz model and sample size 2,000.