Some simulations of age-period-cohort analysis applying Bayesian regularization: Conditions for using random walk model

Yuta Matsumoto

doi:10.1371/journal.pone.0329223

Abstract

Age-period-cohort (APC) analysis, one of the fundamental time-series models, has an identification problem of the inability to separate linear components of the three effects. However, constraints to solve the problem are still controversial because multilevel analysis used in many studies results in the linear component of cohort effects being close to zero. In addition, previous studies do not compare the Bayesian cohort model proposed by Nakamura with the well-known intrinsic estimator. This paper focuses on three models of Bayesian regularization using priors of normal distributions. A random effects model refers to multilevel analysis, a ridge regression model is equivalent to the intrinsic estimator, and a random walk model refers to the Bayesian cohort model. Here, applying Bayesian regularization in APC analysis is to estimate linear components by using nonlinear components and priors. We aim to suggest conditions for using the random walk model by comparing the three models through some simulations with settings for the linear and nonlinear components. Simulation 1 emphasizes an impact of the indexes by making absolute values of the nonlinear components small. Simulation 2 randomly generates the amounts of change in the linear and nonlinear components. Simulation 3 randomly generates artificial parameters with only linear components are less likely to appear, to consider the Bayesian regularization assumption. As a result, Simulation 1 shows the random walk model, unlike the other two models, mitigates underestimating the linear component of cohort effects. On the other hand, in Simulation 2, none of the models can recover the artificial parameters. Finally, Simulation 3 shows the random walk model has less bias than the other models. Therefore, there is no one-size-fits-all APC analysis. However, this paper suggests the random walk model performs relatively well in data generating processes, where only linear components are unlikely to appear.

Citation: Matsumoto Y (2025) Some simulations of age-period-cohort analysis applying Bayesian regularization: Conditions for using random walk model. PLoS One 20(8): e0329223. https://doi.org/10.1371/journal.pone.0329223

Editor: Md. Kamrujjaman, University of Dhaka, BANGLADESH

Received: November 15, 2024; Accepted: July 13, 2025; Published: August 8, 2025

Copyright: © 2025 Yuta Matsumoto. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The author has declared that no competing interests exist.

Introduction

Age-period-cohort (APC) analysis is one of the fundamental time-series analyses used in many research fields. In APC analysis, age effects reflect the influence of individual differences in age and period effects reflect the influence of differences in time period. Cohort effects represent the influence of differences in birth year. APC analysis is important because long-term changes are also the result of demographic metabolism, in which older generations leave society and younger generations with different characteristics enter [1]. Given the nature of such cohort replacement, we need to consider not only period effects but also cohort effects.

It is well-known that APC analysis has a serious issue of identification. In general, cohort is linearly associated with age and period according to the relationship cohort = period – age. This linear dependence of the three factors confounds linear components of the three effects. In other words, the APC identification problem makes it impossible to directly estimate the linear components of these effects. There are various constraints, such as using unequal-interval widths for age, period, and cohort indexes [2]. Many previous studies have applied multilevel analysis to solve the rank deficiency of the design matrix. However, this constraint is problematic [3], especially because it results in the linear component of cohort effects being close to zero [4, 5]. The intrinsic estimator [6] is also a well-known method in APC analysis. This method is sensitive to the type of dummy parameterization of the design matrix [7, 8], and is difficult to verify in empirical research [9].

On the other hand, there are few studies using the Bayesian cohort model proposed by Nakamura [10]. Sakaguchi and Nakamura [11] suggest that this assumption, unlike multilevel analysis, mitigates underestimating the linear component of cohort effects caused by the indexes of the three effects. However, these constraints are still controversial [12], as the above previous studies do not compare the model with the intrinsic estimator and do not evaluate the performance of the three models. Furthermore, they do not show whether the mitigation of underestimating the linear component of cohort effects reduces the bias in APC analysis. Therefore, it is unclear when we should use the Bayesian cohort model.

This paper examines the major models of APC analysis by considering some simulations with settings for the linear and nonlinear components. To compare the assumed constraints, we focus on three models of Bayesian regularization using prior probabilities of normal distributions. A random effects model refers to multilevel analysis, a ridge regression model is equivalent to the intrinsic estimator, and a random walk model refers to the Bayesian cohort model. Here, the linear components of the three effects are estimated using the nonlinear components and the priors. This paper evaluates the three models in terms of how well the linear components are recovered. The first simulation makes absolute values of the nonlinear components small to emphasize an impact of the indexes mentioned in the previous studies. The next simulation randomly generates the amounts of change in linear and nonlinear components according to normal distributions. The following simulation sets artificial parameters so that a pattern with only linear components is unlikely to appear, to consider the Bayesian regularization assumption. By comparing the three models through these simulations, this paper suggests conditions for using the random walk model.

This paper reviews APC analysis in “Theory” section. Specifically, “APC analysis” subsection shows the notation and the identification problem. “Bayesian regularization” subsection describes the constraints for the random effects model, the ridge regression model, and the random walk model. “Mathematical mechanism” subsection shares the why the previous study suggests that the random walk model performs better than the random effects model. “Methods” section presents systematic simulations adopted in this paper and the definition of a bias evaluation function. “Results” section verifies the performance of the three models through these simulations. “Discussion” section concludes that while there is no one-size-fits-all APC analysis, the random walk model performs relatively well in data generating processes, where only linear components are unlikely to appear.

Theory

APC analysis

Notation.

Let denote the index of the age group, denote the index of the period group, and denote the index of the cohort group. These three indexes be determined by

(1)

if the intervals of age and period have the same scale. The general model for APC analysis is

(2)

where y_i,j denotes the observed value (see Table 1), b₀ denotes the intercept, denotes the age effect, denotes the period effect, denotes the cohort effect, denotes the error term, and each effect satisfies the sum-to-zero condition,

Download:

Table 1. Observed values in an age-period table.

https://doi.org/10.1371/journal.pone.0329223.t001

Here, when we approximate the error terms with normal distributions, the model becomes

where y_n is y_i,j rearranged as the component of a vector with N rows, σ denotes the standard deviation, and , , and are the components of the design matrix composed of three factors. Then, the log likelihood is

(3)

excluding the constant term. In general, estimates are obtained by maximizing Eq (3); however, we need to add constraints in APC analysis since it is not possible to uniquely determine the estimates owing to the identification problem described below.

Identification problem.

To understand the identification problem, it is convenient to center each index [13],

Here, the equation

is satisfied using the relationship of the cohort index in Eq (1). Thus, the right-hand side of Eq (2) becomes

and we can write the general solutions of the three effects as

(4)

where , , and are the particular solutions of the three effects and s denotes an arbitrary real number.

In summary, the APC identification problem is that there are many maximum likelihood estimates of Eq (3) owing to the linear dependency of cohort = period – age. In other words, the linear components of the three effects affected by , , and cancel each other out completely when the slopes of the age and cohort effects increase by s and the slope of the period effect decreases by s. On the other hand, we can easily separate the nonlinear components that are irrelevant to the identification problem.

Bayesian regularization

To overcome the identification problem, Bayesian regularization constrains parameters of the three effects by assuming prior probabilities. It is a strategy to statistically estimate mathematically indistinguishable linear components by using mathematically identifiable nonlinear components and priors. Here, if there are an infinite number of maximum likelihood estimates, as in APC analysis, point estimates that maximize the posterior probabilities of Bayesian models are determined by maximizing the priors.

Random effects model.

Many studies use the multilevel analysis that reflects the nesting of individuals in groups of period and cohort [14]. They treat the age effects as fixed effects, which means that the age effects are unconstrained. In this paper, we consider the random effects model with reference to multilevel analysis, where the model assumes a normal distribution for the prior probabilities of each of the three effects. The priors are

where , , and denote the standard deviations of the three effects. The log priors are

(5)

excluding the constant term. Here, maximizing Eq (5) means minimizing the sum of squares of the parameters.

Ridge regression model.

Ridge regression analysis is a method that imposes the sum of squares of parameters as a penalty. The aim is to overcome the adverse effects of multicollinearity. The ridge regression model is implemented by assuming normal distributions with zero means and equal standard deviations for the prior probabilities of the three effects. Unifying the standard deviations,

(6)

and substituting Eq (6) into Eq (5), we can write the log priors as

(7)

Here, maximizing Eq (7) means minimizing the sum of squares of the parameters as well as .

The intrinsic estimator is another well-known method in APC analysis and produces similar results to the ridge regression model. The reason is that this operation minimizes the Euclidean norm of the parameters, giving a particular solution that is the average of the general solution [15].

Random walk model.

We can also apply time-series models to APC analysis based on the previous study that proposes smoothing cohort effects [16]. The random walk model literally assumes a random walk for the prior probabilities of the three effects [17]. We write this model as

The log priors can be summarized as follows:

(8)

excluding the constant term.

Here, maximizing Eq (8) means, unlike and , minimizing the sum of squares of the differences in the adjacent parameters. Furthermore, the random walk model is equivalent to the Bayesian cohort model proposed by Nakamura [10] and this constraint takes advantage of the fact that age, period, and cohort indexes are ordered.

Mathematical mechanism

Linear and nonlinear components.

APC analysis depends heavily on the way in which the constraints assign the linear components to the three effects. Thus, this paper separates the linear and nonlinear components of the general solution as [11] in order to discuss constraint bias. For example, we regress the particular solution of the age effects on the centering index and denotes the obtained slope. The equation is and , as does not contain the linear component. Here, the particular solutions of the three effects are

where and are the slopes calculated from the particular solutions of the period and cohort effects. By substituting the above solutions into Eq (4), we rewrite the general solutions as follows:

(9)

Therefore, the linear components of the general solutions are

(10)

using the centering indexes.

Linear components represented by indexes.

We express the linear components of the general solutions for the models of Bayesian regularization using the centering indexes. Here, the equation

is satisfied because the linear and nonlinear components are orthogonal. The sum of squares of the parameters is

using Eq (9). Therefore, the log priors of the random effects model in Eq (5) become

(11)

We obtain the log priors of the ridge regression model by substituting Eq (6) into Eq (11).

Next, the sum of squares of the differences in the adjacent parameters is

using the general solutions of Eq (9). Thus, the log priors of the random walk model in Eq (8) become

(12)

Index weights of linear components.

The linear components cause the difference in the estimates and are weighted by the indexes in the general solutions of Eq (10). Focusing on , (–, and that are common to these models, the terms in the random effects model of Eq (11) are

(13)

Here, maximizing often satisfies s where is smaller than and (– because makes the index weights and . In other words, the index weights exert a strong pressure to shrink the linear component of the cohort effects; consequently, they tend to be flat. The above also occurs with the ridge regression model. The terms in the random effects model of Eq (12) are

(14)

For the same reason, it also tends to underestimate the linear component of the cohort effects owing to the index.

Bayesian regularizations using normal distributions have in common that the linear component of the cohort effects is close to zero. Here, Sakaguchi and Nakamura [11] suggested that the index weights of the random effects model had a greater impact than those of the random walk model. However, the previous study did not verify that the index weights of the random effects model were large regardless of the values of I or J. Consequently, this paper compares the impact of the index weights in Eqs (13) and (14). Focusing on period and cohort, the ratio is for the random walk model and for the other two models. The squared sums of the centering index are

Here, a comparison of the above ratios of the index weights shows

(15)

Thus, the comparison of the index weights in Eq (15) suggests that the random walk model is less affected by the index weights than are the random effects and ridge regression models. In other words, minimizing the sum of squares of the differences in the adjacent parameters rather than the parameters themselves mitigates underestimating the linear component of cohort effects.

Methods

Artificial parameter and bias evaluation function

This paper examines some simulations to evaluate the performance of the three models applying Bayesian regularization. This subsection describes a common framework for the simulations. Since estimation of the linear components is important for the identification problem, artificial parameters need to be separated into linear and nonlinear components. There are some candidates for data generating processes, including polynomial functions. However, analysts cannot freely determine each component of artificial parameters generated by polynomial functions, because they contain not only linear components but also nonlinear components. Thus, this paper uses trigonometric functions for the nonlinear components, as they contain no linear components and can be set to any amount of change. Another reason is to demonstrate the robustness of the results by adopting data generating processes that do not match the priors of normal distributions. Accordingly, the artificial parameters of the three effects are as follows:

where , , and denote the slopes of the artificial parameters, , , and denote the amounts of change in the nonlinear components. Since when I is an odd number, adding to each artificial parameter of the age effect satisfies . Therefore, the zero-sum condition of are derived by setting , , and . Here, y_n represents the artificial data generated as follows:

This paper then sets I = 10, J = 10, and , so that the error terms generated by the normal distributions do not greatly affect the simulation.

In addition, we need to define a bias evaluation function. The estimates of these models can be approximated by using the artificial parameters and the linear components,

taking the medians of the estimates as the particular solutions and referring to the general solutions of Eq (4). Here, a small absolute value of s means the model succeeded in recovering the artificial parameters. Thus, this paper defines the bias evaluation function,

and calculates s such that the above function satisfies ,

(16)

Simulation 1

Simulation 1 focuses on underestimating the linear component of cohort effects under the constraints of shrinking the parameters because the indexes have a large influence on the cohort effects owing to in Eq (1). To conduct a systematic simulation, we discuss combinations of the linear components that are the basis of the identification problem. First, this paper assumes three types of slopes for the artificial parameters: 0, +, and –. The total number of combinations here is 3³ = 27, since each effect has three patterns. Specifically, is expressed as , as , and as . In fact, we need only consider cases since this paper excludes the cases where there is no linear component, such as and where the positive slope is merely reversed to a negative. The combinations are thus cases 1 to 3 having a positive linear component in one factor, cases 4 to 6 having positive linear components in two factors, cases 7 to 9 having positive and negative linear components in two factors, and cases 10 to 13 having linear components in all factors.

Simulation 1 emphasizes the impact of the index weights shown in “Mathematical mechanism” subsection by making the absolute values of the nonlinear components small. This paper sets the variation of the slope to 0.1 and the nonlinear component to 0.05. Specifically, represents and , represents and , and represents and . To understand the 13 cases, Fig 1 visualizes artificial data that includes only linear components using , and Fig 2 also includes nonlinear components using . The dot plots are visualizations by period and the x-axis represents cohort. The solid lines connect the dots corresponding to for each period. Here, y_i,j in case 1 of Fig 1 increases by 0.1 as the age index increases, and we see the values of older cohorts as higher within the same period. Case 2 increases by 0.1 as the period index increases and case 3 increases by 0.1 as the cohort index increases. However, Fig 1 shows that cases 1 and 7 are identical and very similar to case 10. Moreover, cases 2 and 5 are identical and very similar to case 11, while cases 3 and 9 are identical and very similar to case 12. In addition, the linear components of case 13 are offset and no variation appears in the artificial data. In other words, the mixture of linear components in the identification problem means that combining different linear components can generate precisely the same data. Unlike Figs 1 and 2 does not reveal identical data. Consequently, this paper verifies whether the models of Bayesian regularization recover the artificial parameters using this small difference.

Download:

Fig 1. Artificial data generated with only linear components for age, period, and cohort effects (Simulation 1).

Note: Each panel represents a different combination of linear slopes as described in Table 2.

https://doi.org/10.1371/journal.pone.0329223.g001

Download:

Fig 2. Artificial data generated with both linear and nonlinear components for age, period, and cohort effects (Simulation 1).

Note: Each panel represents a different combination of linear slopes as described in Table 2.

https://doi.org/10.1371/journal.pone.0329223.g002

Simulation 2

Simulation 2 randomly generates the amounts of change in the linear and nonlinear components according to normal distributions,

This paper generates artificial data 500 times, obtains s in Eq (16) for each model, and evaluates the models by calculating ,

(17)

where T denotes the number of times the model converged in the simulation.

Fig 3 visualizes the linear and nonlinear components of the generated artificial parameters as dots. Here, this paper classifies the artificial parameters into four main patterns: (1) no linear and nonlinear components, (2) only linear components, (3) only nonlinear components, and (4) both linear and nonlinear components. This simulation has an equal probability of the patterns containing only linear components and only nonlinear components.

Download:

Fig 3. Linear and nonlinear components of artificial parameters (Simulation 2).

https://doi.org/10.1371/journal.pone.0329223.g003

Simulation 3

Simulation 3 modifies the assumption of Simulation 2 that the pattern with only linear components and only nonlinear components appear equally. In addition, Simulation 3 generates the artificial parameters by considering Bayesian regularization that the assumption estimates linear components using nonlinear components. In other words, we set the artificial parameters so that the pattern with only linear components is less likely to appear than only nonlinear components. Specifically, this paper randomly generates the amounts of change in the nonlinear components according to normal distributions and uses their absolute values to generate the linear components,

Fig 4 visualizes the linear and nonlinear components of the generated artificial parameters as dots. Moreover, we add to Fig 4 the gray area bounded by and , including the horizontal axis. The absence of dots in the gray area indicates that Simulation 3 does not generate the artificial parameter containing only linear components.

Download:

Fig 4. Linear and nonlinear components of artificial parameters (Simulation 3).

https://doi.org/10.1371/journal.pone.0329223.g004

Results

The three models of Bayesian regularization were implemented using the probabilistic programming language Stan [18] and were run in R [19]. Sampling settings were chains = 4, iter = 2000, warmup = 500, and thin = 3. The lower bounds of , , and in the random effects model were set to 0.05 in order to search for parameters in a wide range, as this model can get stuck in locally optimal solutions. This paper judges the model to have converged when all parameters satisfy .

The three models in Simulation 1 satisfy the convergence criterion in all cases. Table 2 summarizes the systematic combinations of the linear components and the results of the three models. The letters A through E that appear in three of the table columns are used to categorize the results: A if the absolute value of s is less than 0.02, B if less than 0.04, C if less than 0.06, D if less than 0.08, and E if 0.08 or more. As shown, among the 13 cases of artificial data, 10 cases in the random walk model rated B or better (i.e., the value of s was less than 0.04) as compared to 4 cases in the random effects and ridge regression models, indicating that the random walk model performed relatively well.

Download:

Table 2. Combinations of linear components and results (Simulation 1).

https://doi.org/10.1371/journal.pone.0329223.t002

The cases where the models failed to effectively recover the artificial parameters in Simulation 1 contain the linear component of cohort effects. First, constraints shrinking the parameters, such as in Bayesian regularization, always fail case 13, where the linear components completely cancel. Moreover, we found the estimated linear component of the cohort effects in the failed cases to be close to zero, because leads to s < 0 and leads to s > 0. Specifically, Fig 5, which visualizes case 3, shows that the estimated slope of the cohort effects becomes horizontal, and the linear component is incorrectly assigned to the other effects. The random effects and ridge regression models obtain the estimates like the artificial parameters in case 9 because the age effects have a negative slope and the period effects have a positive slope. However, the random walk model did not underestimate the linear component of the cohort effects.

Download:

Fig 5. Comparison of the three models’ estimates (case 3 in Simulation 1).

https://doi.org/10.1371/journal.pone.0329223.g005

In Simulation 2, the number of times the models converged was 479 for the random effects model, 500 for the ridge regression model, and 487 for the random walk model. In addition, in Eq (17) was 0.092 for the random effects model, 0.077 for the ridge regression model, and 0.088 for the random walk model. Fig 6 shows that the histograms of s in Eq (16) for all three models are widely spread, indicating that none of the models can recover the artificial parameters.

Download:

Fig 6. Histograms of s for the three models (Simulation 2).

https://doi.org/10.1371/journal.pone.0329223.g006

Unlike Simulation 1, the random walk model did not perform well in Simulation 2. The reason is that different artificial parameters generate the same artificial data, as shown in Fig 1. For example, the artificial parameters shown in Fig 7 generate the artificial data of case 3, meaning that the random walk model in this case incorrectly assigns the linear component to the cohort effects that include the nonlinear components. Therefore, it is impossible to decide which model performs well with no constraints on the linear and nonlinear components.

Download:

Fig 7. Another artificial parameters to generate case 3 of Simulation 1.

https://doi.org/10.1371/journal.pone.0329223.g007

In Simulation 3, the number of times the models converged was 469 for the random effects model, 500 for the ridge regression model, and 499 for the random walk model. In addition, in Eq (17) was 0.063 for the random effects model, 0.070 for the ridge regression model, and 0.033 for the random walk model. Fig 8 shows that since the histogram of s in Eq (16) for the random walk model is concentrated near zero, this model has less bias than the other models.

Download:

Fig 8. Histograms of s for the three models (Simulation 3).

https://doi.org/10.1371/journal.pone.0329223.g008

The models applying Bayesian regularization estimates the linear components using the nonlinear components and the priors. Simulation 3 is less likely to generate artificial parameters with only linear components, which makes random walk models advantageous because this simulation does not generate patterns like Fig 7. Here, the nonlinear components of the three effects determine the lower bounds of , , and in the log prior probabilities of the random effects model and the random walk model, and their sigmas affect the assignment of the linear components. However, the ridge regression model cannot effectively use the nonlinear components because this model uses λ that is common to the three effects. Furthermore, Table 2 shows that the random effects model underestimates the linear components of the cohort effects than the random walk model. Therefore, the random walk model performs relatively well.

Discussion

Findings

This paper focused on the three models in APC analysis of Bayesian regularization using the priors of normal distributions. The random effects model refers to multilevel analysis, the ridge regression model is equivalent to the intrinsic estimator, and the random walk model refers to the Bayesian cohort model. We verified some simulation with settings for the linear and nonlinear components. Simulation 1 considers the systematic combinations of the linear components and emphasizes the impact of the indexes by making the absolute values of the nonlinear components small. Simulation 2 randomly generates the amounts of change in the linear and nonlinear components according to normal distributions. Simulation 3 sets the artificial parameters so that the pattern with only linear components is unlikely to appear. Table 3 briefly summarizes the settings for the amount of change in each component of Simulation 1 to 3. The purpose of this paper is to suggest conditions for using the random walk model by comparing the three models through these simulations in terms of how well artificial parameters are recovered.

Download:

Table 3. Settings for the amount of change in each component (Simulation 1 to 3).

https://doi.org/10.1371/journal.pone.0329223.t003

In general, the constraints of shrinking the parameters tend to drive the linear component of the cohort effects close to zero because the indexes have a large influence on the cohort effects owing to in Eq (1). The results of Simulation 1 showed the random effects model reproduced the findings [4, 5] that the linear component of the cohort effects becomes flat. Unlike the other two models, the random walk model mitigated underestimating the linear component of the cohort effects and successfully recovered the artificial parameters if one or more of the effects were zero. On the other hand, Simulation 2 showed none of the models can recover the artificial parameters. Therefore, the mitigation of underestimating the linear component of the cohort effects, mentioned in the previous study [11], does not determine which model performs well with no constraints on the linear and nonlinear components. In Simulation 3, as the pattern with only linear components was unlikely to appear, the random walk model had less bias than the other models.

Applying Bayesian regularization in APC analysis is to statistically estimate mathematically indistinguishable linear components by using mathematically identifiable nonlinear components and priors. This shrinking constraint always fails to estimate the cases where the linear components completely cancel each other, such as case 13 in Simulation 1. However, the setting in Simulation 3 is consistent with the assumption of Bayesian regularization using the nonlinear components to estimate the linear components. In addition, this means that the artificial parameters in Fig 7, where the random walk model fails, are less likely to appear. Here, the ridge regression model cannot effectively use nonlinear components to estimate linear components. Furthermore, Table 2 shows that the random effects model underestimates the linear components of the cohort effects than the random walk model. As a result, the random walk model in Simulation 3 recovered the artificial parameters generated by trigonometric functions rather than random walks.

Applicability

This paper classified artificial parameters into four main patterns: (1) no linear and nonlinear components, (2) only linear components, (3) only nonlinear components, and (4) both linear and nonlinear components. Among them, the simulations using trigonometric functions for the nonlinear components showed that the random walk model recovered artificial parameters better than the other models, if the pattern with only linear components is unlikely to appear. This subsection briefly discusses the possibility that the random walk models can estimate data generating processes other than trigonometric functions. For example, polynomial functions are more common than trigonometric functions in analysis, as there is polynomial regression. Therefore, we describe artificial parameters using them.

Let denote a index and denote a centered index, . Moreover, denotes the exponent of polynomial functions and z_m,h denotes the standardized ,

Using z_m,h, is a centered polynomial function,

where w_h denotes a random number generated by a normal distribution,

Here, denotes a linear component of the polynomial function, denotes a nonlinear component, and denotes a standard deviation of the nonlinear components. The above represents , where minimizes . Specifically, we obtain the following with reference to matrix calculations,

and the standard deviation is

Fig 9 visualizes and of the generated artificial parameters as dots. Moreover, we add to Fig 9 the gray area bounded by and , including the horizontal axis. As in Simulation 3, the absence of dots in the gray area indicates that the polynomial function does not generate the artificial parameter containing only linear components. Fig 9 shows that there are many dots in the gray area at H = 2 and no dots at H = 8. The reason is that w₁ affects only the linear component, while w₃ and w₅ affect the linear and nonlinear components. In addition, even if the absolute value of w₃ is large, there is the possibility that the linear component is close to zero due to w₁ and w₅. However, it is difficult to cancel out the nonlinear component of with other terms. In summary, the polynomial functions in this subsection are less likely to generate the pattern with only linear components as the degree of the polynomial increases. Therefore, the random walk model may have a smaller bias than the other models even when data generating processes can be approximated by polynomial functions with a large degree of polynomial.

Download:

Fig 9. Linear and nonlinear components generated by polynomial functions.

https://doi.org/10.1371/journal.pone.0329223.g009

Finally, this paper has several limitations, such as only discussing the main three models applying Bayesian regularization, not changing the indexes, for example I = 12 or J = 8, and not considering nonlinear components other than trigonometric functions. We should verify other data generating processes, and the simulations and bias evaluation functions in this paper will be useful in such cases. This paper showed the simulations not only where no model can recover the artificial parameters but also where the random walk model performs well. Therefore, future studies need to investigate not only failure cases but also constraints that reduce bias by imposing weak conditions on linear and nonlinear components.

Supporting information

S1 Fig. Comparison of the three models’ estimates.

https://doi.org/10.1371/journal.pone.0329223.s001

(PDF)

S1 Appendix. Stan codes to implement Bayesian regularization models.

https://doi.org/10.1371/journal.pone.0329223.s002

(PDF)

S2 Appendix. R codes to reproduce the systematic simulation.

https://doi.org/10.1371/journal.pone.0329223.s003

(PDF)

References

1. Ryder NB. The cohort as a concept in the study of social change. Am Sociol Rev. 1965;30(6):843.
- View Article
- Google Scholar
2. Luo L, Hodges JS. Block constraints in age–period–cohort models with unequal-width intervals. Sociol Methods Res. 2016;45(4):700–26.
- View Article
- Google Scholar
3. Bell A, Jones K. Another “futile quest”? A simulation study of Yang and land’s hierarchical age-period-cohort model. DemRes. 2014;30:333–60.
- View Article
- Google Scholar
4. Bell A, Jones K. Don’t birth cohorts matter? A commentary, simulation exercise on Reither, Hauser and Yang’s 2009 age-period-cohort study of obesity. Soc Sci Med. 2014;101:176–80.
- View Article
- Google Scholar
5. Fosse E, Winship C. Analyzing age-period-cohort data: a review and critique. Annu Rev Sociol. 2019;45(1):467–92.
- View Article
- Google Scholar
6. Yang Y, Land C. Age-period-cohort analysis: new models, methods, and empirical applications. Florida: CRC; 2013.
7. Luo L, Hodges J, Winship C, Powers D. The sensitivity of the intrinsic estimator to coding schemes: comment on Yang, Schulhofer-Wohl, Fu, and Land. Am J Sociol. 2016;122(3):930–61.
- View Article
- Google Scholar
8. te Grotenhuis M, Pelzer B, Luo L, Schmidt-Catran AW. The intrinsic estimator, alternative estimates, and predictions of mortality trends: a comment on Masters, Hummer, Powers, Beck, Lin, and Finch. Demography. 2016;53(4):1245–52.
- View Article
- Google Scholar
9. Luo L. Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort problem. Demography. 2013;50(6):1945–67.
- View Article
- Google Scholar
10. Nakamura T. Bayesian cohort models for general cohort table analyses. Annals Inst Statist Math. 1986;38(B):353–70.
- View Article
- Google Scholar
11. Sakaguchi N, Nakamura T. Age-period-cohort (apc) model as the mixed effects model: Comparison of the hierarchical APC model and the Bayesian APC model. Sociol Theory Methods. 2019;34(1).
- View Article
- Google Scholar
12. O’Brien R. Age-period-cohort models. Chapman and Hall/CRC; 2014. https://doi.org/10.1201/b17286
13. Kupper LL, Janis JM, Karmous A, Greenberg BG. Statistical age-period-cohort analysis: a review and critique. J Chronic Dis. 1985;38(10):811–30.
- View Article
- Google Scholar
14. Yang Y. 2. Bayesian inference for hierarchical age-period-cohort models of repeated cross-section survey data. Sociol Methodol. 2006;36(1):39–74.
- View Article
- Google Scholar
15. Yang Y, Fu WJ, Land KC. A methodological comparison of age-period-cohort models: the intrinsic estimator and conventional generalized linear models. Sociol Methodol. 2004;34(1):75–110.
- View Article
- Google Scholar
16. Fu WJ. A smoothing cohort model in age–period–cohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Sociol Methods Res. 2008;36(3):327–61.
- View Article
- Google Scholar
17. Schmid VJ, Held L. Bayesian age-period-cohort modeling and prediction-BAMP. J Stat Soft. 2007;21(8):1–15.
- View Article
- Google Scholar
18. Stan Development Team. RStan: the R interface to Stan. R package version 2.32.5. 2024. https://mc-stan.org/
19. R Core Team. R: A language and environment for statistical computing. version 4.3.2. 2023. https://www.R-project.org/

[ref1] 1. Ryder NB. The cohort as a concept in the study of social change. Am Sociol Rev. 1965;30(6):843.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Luo L, Hodges JS. Block constraints in age–period–cohort models with unequal-width intervals. Sociol Methods Res. 2016;45(4):700–26.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Bell A, Jones K. Another “futile quest”? A simulation study of Yang and land’s hierarchical age-period-cohort model. DemRes. 2014;30:333–60.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Bell A, Jones K. Don’t birth cohorts matter? A commentary, simulation exercise on Reither, Hauser and Yang’s 2009 age-period-cohort study of obesity. Soc Sci Med. 2014;101:176–80.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Fosse E, Winship C. Analyzing age-period-cohort data: a review and critique. Annu Rev Sociol. 2019;45(1):467–92.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Yang Y, Land C. Age-period-cohort analysis: new models, methods, and empirical applications. Florida: CRC; 2013.

[ref7] 7. Luo L, Hodges J, Winship C, Powers D. The sensitivity of the intrinsic estimator to coding schemes: comment on Yang, Schulhofer-Wohl, Fu, and Land. Am J Sociol. 2016;122(3):930–61.
View Article
Google Scholar

[18] View Article

[19] Google Scholar

[ref8] 8. te Grotenhuis M, Pelzer B, Luo L, Schmidt-Catran AW. The intrinsic estimator, alternative estimates, and predictions of mortality trends: a comment on Masters, Hummer, Powers, Beck, Lin, and Finch. Demography. 2016;53(4):1245–52.
View Article
Google Scholar

[21] View Article

[22] Google Scholar

[ref9] 9. Luo L. Assessing validity and application scope of the intrinsic estimator approach to the age-period-cohort problem. Demography. 2013;50(6):1945–67.
View Article
Google Scholar

[24] View Article

[25] Google Scholar

[ref10] 10. Nakamura T. Bayesian cohort models for general cohort table analyses. Annals Inst Statist Math. 1986;38(B):353–70.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Sakaguchi N, Nakamura T. Age-period-cohort (apc) model as the mixed effects model: Comparison of the hierarchical APC model and the Bayesian APC model. Sociol Theory Methods. 2019;34(1).
View Article
Google Scholar

[30] View Article

[31] Google Scholar

[ref12] 12. O’Brien R. Age-period-cohort models. Chapman and Hall/CRC; 2014. https://doi.org/10.1201/b17286

[ref13] 13. Kupper LL, Janis JM, Karmous A, Greenberg BG. Statistical age-period-cohort analysis: a review and critique. J Chronic Dis. 1985;38(10):811–30.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref14] 14. Yang Y. 2. Bayesian inference for hierarchical age-period-cohort models of repeated cross-section survey data. Sociol Methodol. 2006;36(1):39–74.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref15] 15. Yang Y, Fu WJ, Land KC. A methodological comparison of age-period-cohort models: the intrinsic estimator and conventional generalized linear models. Sociol Methodol. 2004;34(1):75–110.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref16] 16. Fu WJ. A smoothing cohort model in age–period–cohort analysis with applications to homicide arrest rates and lung cancer mortality rates. Sociol Methods Res. 2008;36(3):327–61.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref17] 17. Schmid VJ, Held L. Bayesian age-period-cohort modeling and prediction-BAMP. J Stat Soft. 2007;21(8):1–15.
View Article
Google Scholar

[46] View Article

[47] Google Scholar

[ref18] 18. Stan Development Team. RStan: the R interface to Stan. R package version 2.32.5. 2024. https://mc-stan.org/

[ref19] 19. R Core Team. R: A language and environment for statistical computing. version 4.3.2. 2023. https://www.R-project.org/

Figures

Abstract

Introduction

Theory

APC analysis

Notation.

Identification problem.

Bayesian regularization

Random effects model.

Ridge regression model.

Random walk model.

Mathematical mechanism

Linear and nonlinear components.

Linear components represented by indexes.

Index weights of linear components.

Methods

Artificial parameter and bias evaluation function

Simulation 1

Simulation 2

Simulation 3

Results

Discussion

Findings

Applicability

Supporting information

S1 Fig. Comparison of the three models’ estimates.

S1 Appendix. Stan codes to implement Bayesian regularization models.

S2 Appendix. R codes to reproduce the systematic simulation.

References