A mechanistic model for atherosclerosis and its application to the cohort of Mayak workers

We propose a stochastic model for use in epidemiological analysis, describing the age-dependent development of atherosclerosis with adequate simplification. The model features the uptake of monocytes into the arterial wall, their proliferation and transition into foam cells. The number of foam cells is assumed to determine the health risk for clinically relevant events such as stroke. In a simulation study, the model was checked against the age-dependent prevalence of atherosclerotic lesions. Next, the model was applied to incidence of atherosclerotic stroke in the cohort of male workers from the Mayak nuclear facility in the Southern Urals. It describes the data as well as standard epidemiological models. Based on goodness-of-fit criteria the risk factors smoking, hypertension and radiation exposure were tested for their effect on disease development. Hypertension was identified to affect disease progression mainly in the late stage of atherosclerosis. Fitting mechanistic models to incidence data allows to integrate biological evidence on disease progression into epidemiological studies. The mechanistic approach adds to an understanding of pathogenic processes, whereas standard epidemiological methods mainly explore the statistical association between risk factors and disease outcome. Due to a more comprehensive scientific foundation, risk estimates from mechanistic models can be deemed more reliable. To the best of our knowledge, such models are applied to epidemiological data on cardiovascular diseases for the first time.


A Solving the Mechanistic Atherosclerosis Model
To solve the model outlined in the main text, first we defined P (M, F, R; a) to be the probability that in an individual of age a there are M macrophages, F foam cells and R atherosclerotic states (plaques). As motivated in the Materials and Methods section of the main text, we assume the absence of atherosclerotic lesions at birth, i.e. P (0, 0, 0; 0) = 1 and P (M, F, R; 0) = 0 for other values of M , F and R. The survival function S(a) is the probability of not having had stroke until age a for a worker not deceased from any other cause until age a. In the model it is assumed that first stroke occurs a lag time t lag after the first vulnerable plaque has developed. The survival function S(a) is therefore the lagged probability of the absence of vulnerable plaques: This system of ordinary differential equations for P (M, F, R; a) for different M , F , R can be rewritten using the generating function yielding the partial differential equation The survival function is rewritten and the initial condition evaluates to Ψ(m, f, r, 0) = 1 from the absence of lesions at birth. The partial differential equation, Eq (d), can be transformed into a set of 1/4 ordinary differential equations by the method of characteristics: As we are interested in the survival function, Eq (e), we can apply the conditions m(a f ) = 1, f (a f ) = 1 and r(a f ) = 0 where a f is the age for which the survival is calculated. This immediately eliminates r as r(a) ≡ 0. A semi-analytical solution to the remaining set of ordinary differential equations, assuming constant parameters on successive, short time intervals, can be constructed analogous to ref. [1]. However, the direct numerical integration turned out to be more efficient.

B A Descriptive Model for Stroke in Mayak Workers
The previous section dealt with the calculation of the survival function of the stochastic model. This section is about the descriptive model which is most easily parameterized in terms of the hazard function h(a). The hazard function is equally suited for model definition as it is connected to the survival function S(a) by: When analyzing the cohort restricted to workers with doses below 2 Gy, we set h = h 0 where h 0 = 10 −5 e ψage+ψ birth +ψ calendar +ψcat (h) and ψ age = ψ 0 + ψ 1 ln a 60 + ψ 2 ln 2 a 60 (i) Here, a and b denote age and birth date, respectively. Units of years have been dropped. We have applied a function LT(t): Summands in ψ cat depend on the workers' individual information. They evaluate to zero for workers not entered into higher education, with normal blood pressure, and non-smoking. For other persons, the corresponding summand was determined by the fit. Parameter values of the best fit can be found in Table A and Table B.
When analyzing the full cohort, the response to ionizing radiation is parametrized by Inclusion of ψ b did not improve the fit significantly, therefore we set it to zero.

C Applying the Mechanistic Model to Stroke in Mayak Workers
In the mechanistic model, variables such as birth year, graduation etc. cannot directly be applied to the hazard function. Instead, they are implemented by applying them to any of the biological parameters. Like for the empirical model, we started the analysis with the variables of birth year, calendar year and graduation. As the effects of birth year should be relevant especially for young ages, i.e. for early stages of the disease, we modified N ν 0 with birth year. Calendar year could act on any stage of the disease progression. However, the observed kink in the risk in the early 90s (see ref. [31] of the main text), around the time of the dissolution of the Soviet Union, can be best described if the last stochastic step proportional to ν 2 was affected. Graduation can be viewed as a surrogate for lifestyle and working conditions. Thus, we cannot causally assign it to any step in the development of the disease. The choice for N ν 0 was motivated by the fact that it most closely resembles the way graduation is implemented in the empirical model.
Here, N ν 0 corresponds to N ν 0 for a worker born in 1930 and without higher education. Such an equivalence cannot be established for ν 2 as the second term in the exponential does not vanish for the year 1990. After some testing, birth year turned out to be insignificant and was therefore dropped from the model. As explained in the last part of the Material and Methods section of the main text, we tested for age dependence of any biological parameter. When age dependence was 3/4 A lag time of 10 years has been applied. Values for α = 12 year −1 and ν 2 = 10 −7 year −1 have been fixed as the choice does not affect the fit. For a definition of N ν 0 and ν 2 see Eqs (m) and (n).  N ν 0 (a, graduation) = N ν 0 a + 10 20 Information on blood pressure and smoking status was revealed to be best applied to ν 2 . This adds to the dependence on calendar year b + a: ν 2 (b + a, blood pressure, smoking) = = ν 2 exp ψ c, 0 b + a − 1990 10 + ψ c, 1 LT(b + a − ψ c, knot ) 10 e ψ smoking +ψ blood pressure (n) The best estimates of the parameter values can be found in Tables C and D.