A model-based framework for chronic hepatitis C prevalence estimation

Chronic hepatitis C (CHC) continues to be a highly burdensome disease worldwide. The often-asymptomatic nature of early-stage CHC means that the disease often remains undiagnosed, leaving its prevalence highly uncertain. This generates significant uncertainty in the planning of hepatitis C eradication programs to meet WHO targets. The aim of this work is to establish a mathematical framework for the estimation of a geographic locale’s CHC prevalence and the proportion of its CHC population that remains undiagnosed. A Bayesian MCMC approach is taken to infer these populations from the observed occurrence of CHC-related events using a recently published natural history model of the disease. Using the Canadian context as a specific example, this study estimates that in 2013, the CHC prevalence rate in Canada was 0.63% (95% CI: 0.53% - 0.72%), with 27.1% (95% CI: 19.3% - 36.1%) of the infected population undiagnosed.


Introduction
Chronic hepatitis C (CHC) is a progressive and infectious disease that can remain asymptomatic for decades before ultimately causing liver damage, liver failure and liver cancer. Hepatitis C-related illness annually claims in excess of 399,000 lives worldwide [1]. The primary modes of transmission of the hepatitis C virus (HCV) include injection drug use and blood transfusion [1].
New and highly effective therapies for HCV chronic infections, in the form of direct acting antivirals (DAA), present the opportunity to meet the WHO hepatitis C eradication target of 2030. However, the planning of eradication strategies is highly sensitive to estimates of the CHC prevalence and diagnosis rates [2]. Because CHC often remains asymptomatic until its late-stage, there is considerable uncertainty in current prevalence estimates. Furthermore, CHC treatment with DAAs remains expensive, making estimates of budget impact from the public funding of DAAs also highly uncertain and sensitive to estimates of disease prevalence.
In this paper, we present a model-based framework for the inference of CHC-infected populations, using the Canadian context as a specific example. The case of Canada is of particular interest for two reasons: the uncertainty in CHC prevalence is compounded by the absence of a1111111111 a1111111111 a1111111111 a1111111111 a1111111111 a population-level hepatitis C screening program. In addition, there is limited public data available from which to infer the size of the HCV-infected population.
In a 2007 Public Health Agency of Canada study, an actuarial method was used to estimate hepatitis C prevalence in Canada [3]. The population was divided into subgroups according to their risk of having or contracting hepatitis C. Rates of prevalence and incidence were estimated for each subpopulation, and these were then used in a state transition model to obtain projections of the total number of individuals with hepatitis C-induced liver diseases.
Thein et al. [4] reported the results of a systematic review of annual probabilities of fibrosis progression. A study of the 2011 prevalence rate of hepatitis C in Canada [5] incorporated the rates in [4] into a Markov model of disease progression and used a backcalculation method to obtain rate estimates. Alongside the backcalculation method, an actuarial method was also used, stratifying the Canadian population according to the risk of infection and prevalence. While the backcalculation estimated the 2011 prevalence of chronic hepatitis C to be 0.64% of the Canadian population, the actuarial approach yielded a higher rate of 0.71%. The backcalculation methodology and model assumptions used in [5] were not disclosed.
In 2015, a new Markov model of the natural history of hepatitis C in Canada was developed as part of economic evaluations of hepatitis C-related screening and treatment [6]. This model has been used by various agencies in Canada for decision making in hepatitis C-related policies on screening and treatment [2]. This model divided the progression of hepatitis C into two broad stages: pre-advanced liver disease and advanced liver disease. The population of individuals in the pre-advanced stage of hepatitis C was modeled as progressing through the liver fibrosis stages of the disease at rates that were obtained from [4]. The model additionally included the diagnosis of patients at the various fibrosis stages as well as their subsequent treatment and response to treatment. Patients who progressed to the last fibrosis stage were modeled as being at risk of progressing to advanced liver disease: either hepatocellular carcinoma or decompensated cirrhosis. From these latter stages, the model captured the progression of patients through advanced liver disease, including receipt of liver transplants, liver-related deaths and non-liver related deaths.
In this paper, we adopt a Bayesian Markov Chain Monte Carlo approach, based on the Metropolis-Hastings algorithm [7], to estimate the Canadian CHC prevalence and undiagnosed CHC population. We derive a comprehensive mathematical framework for the backcalculation using the model reported in [6]. The estimates are obtained by calibrating the model using publicly available hepatocellular carcinoma (HCC) diagnosis data from Statistics Canada [8] and HCV diagnosis data from the Public Health Agency of Canada [9], together with estimates of liver fibrosis distributions at diagnosis from [6].
This paper is organized as follows: First, a qualitative description of the hepatitis C natural history model of [6] is provided followed by a discussion of its key parameters and its mathematical formulation. An overview of the MCMC algorithm used in this study is then provided, followed by the modeling results and a discussion of key conclusions.

Methods
The aim of this work is to establish a mathematical framework for the estimation of CHC populations of interest. For the illustrative case of Canada, we focus on the estimation of the historical prevalence of CHC and the proportion of the CHC population that is undiagnosed for each of three birth cohorts: • individuals born after 1964.
We first give an overview of a state transition model, introduced in [6], that describes the natural history of hepatitis C within a single, generic, birth cohort. Each state of the model corresponds to the number of individuals, within the cohort, that populate a particular CHC stage and that fall under a particular patient status. We next give an overview of the model parameters and introduce a mathematical formulation of the model dynamics that describe the transition of patients from one state to another between an initial year 0 and a final year T. Using this formulation, we demonstrate how the model is able to simulate the number of individuals in each state of the disease between years 0 and T, inclusive, for a given set of input parameters. This input parameter set includes state transition probabilities, the states of the model in year 0 and the number of new infections in the T subsequent years. We then describe how the simulation results can be used to estimate the cohort's expected CHC prevalence, undiagnosed CHC population, as well as the expected number of HCC and HCV diagnoses from within the cohort in the years 0 to T.
Arriving at estimates in this way, however, relies on knowledge of the input parameter set, which is not readily available from the literature. Therefore, we then present a Bayesian calibration method, based on the Metropolis-Hastings algorithm, through which these unknown quantities are estimated from recorded HCC and HCV diagnosis data and from statistics on the stages of liver fibrosis that are observed upon HCV diagnosis.

The hepatitis C natural history model
We assume that for the entirety of any given calendar year, a CHC patient's disease state can be categorized into only one of several stages of the disease. The actual number of individuals in each disease state is not known. Such latent variables will, over the course of time, change as patients progress from one disease state to the next. On the other hand, there are observables arising from HCV-related events, such as the diagnosis of a patient with HCV infection or HCC. Such observables are consequences of HCV infection and so depend directly on the latent variables. As a result, these observables provide important information for estimating the unknown latent variables. The state transition model illustrated in Fig 1 is assumed to describe the movement of hepatitis C patients, within one birth cohort, between the various latent states of hepatitis C infection that precede advanced liver disease. A state transition model of progression between the advanced liver disease stages is illustrated in Fig 2. The transition probabilities emanating from each state in Fig 1 and Fig 2 sum to one, and are derived in S1 Appendix as functions of annual probabilities of fibrosis progression, CHC diagnosis, background death, treatment and SVR.
It is assumed that in the year t there are u(t) new HCV infections. Newly infected patients are assumed to have a chance of clearing the virus in the acute stage of infection [10]. If the virus is not cleared, the disease becomes chronic and the patient risks developing increasingly advanced levels of liver fibrosis. The model assumes that HCV infection may be diagnosed in either the acute or the chronic stages.
The degree of liver fibrosis is divided into five stages, denoted F0, F1, F2, F3 and F4 [6]. In Fig 1, individuals who have CHC but are undiagnosed are grouped into latent states X 0 , X 1 , X 2 , X 3 , X 4 , where the subscripts represent the five fibrosis levels F0-F4. Individuals who have CHC and are diagnosed but have not yet received treatment are similarly grouped into latent states D 0 , D 1 , D 2 , D 3 , D 4 . Individuals who have CHC and who are receiving treatment are grouped into latent states T 0 , T 1 , T 2 , T 3 , T 4 . A patient treated for CHC in one year may, the following year, show SVR. Individuals who show SVR following treatment are grouped into latent states S 0 , S 1 , S 2 , S 3 , S 4 . Once in these SVR states, it is assumed that patients are no longer at risk of progressing in liver fibrosis severity. Individuals who do not show SVR following treatment are grouped into latent states N 0 , N 1 , N 2 , N 3 , N 4 and their liver fibrosis is assumed to progress at a rate equal to that of the untreated population. CHC patients who reach the cirrhosis stage F4 (latent states X 4 , D 4 , T 4 , S 4 , N 4 ) become at risk of developing more advanced liver disease. Advanced liver disease (ALD) states are categorized into the following states: • DC: patients with decompensated cirrhosis, • HCC: patients with hepatocellular carcinoma, • LT: patients receiving liver transplants, • PT: patients who have survived more than one year after their liver transplantation.
The model assumes that individuals in these advanced liver disease states may transition to a liver-related death state (LD), with a probability greater than the background all-cause mortality.  Parameters of the natural history model Viral genotype. The hepatitis C virus has six major genotypes [11], which we denote v = 1,� � �,6. The distribution of the prevalence of the different genotypes in Canada is reported in [6]. The genotype affects the treatment success rate and, consequently, the treatment adoption rate.
Model transition probabilities. We denote by q chr the annual probability of a newly infected individual transitioning from an acute to a chronic infection. S1 Table gives the mean value of q chr as well as its range (from [10]), over which we assume it is uniformly distributed.
For individuals with CHC who do not show a sustained virologic response, we let q i,i+1 be the annual probability of progression between any two consecutive fibrosis stages F i and F i+1 . The annual probability of fibrosis level progression varies depending on viral genotype, and we assume each parameter q i,i+1 to be uniformly distributed within a range of values that captures this variation. These ranges, which we derive from the systematic review of progression statistics for CHC patients in [4], are given in S1 Table. For patients showing a sustained virologic response, the probability of fibrosis stage progression is assumed to be zero. S1 Table  additionally gives means and ranges of the annual probabilities of progression between the stages of advanced liver disease, denoted a 1 ,� � �,a 6 , which are assumed to be uniformly distributed.
We assume that the symptoms of hepatocellular carcinoma and decompensated cirrhosis are severe enough that an individual's progression to, and subsequent diagnosis with, either HCC or DC occur in the same calendar year. We additionally assume that patients who were previously undiagnosed with CHC and who progress to advanced liver disease are additionally diagnosed with CHC upon their diagnosis with the ALD. The annual probability of progression to decompensated cirrhosis is denoted by d DC for non-SVR individuals and by d DC−SVR for patients who show SVR. The parameters d DC and d DC−SVR are assumed to be uniformly distributed, with their means and ranges, reported in [12], given in S1 Table. The corresponding probabilities of progression to hepatocellular carcinoma, respectively denoted d HCC and d HCC−SVR , will be estimated.
We let d a be the annual acute hepatitis C probability of diagnosis. We also denote by d i the annual CHC probability of diagnosis for fibrosis stage F i . These parameters are not available from the literature and will be estimated.
We denote by t i (v) the annual probability that a patient who was previously diagnosed with CHC of viral genotype v and who is currently at fibrosis stage F i , will adopt pegylated interferon and ribavirin therapy. The estimated cumulative probabilities of starting treatment over the period 1999-2013 in Canada, per fibrosis stage and genotype were reported in [6]. We converted these cumulative probabilities to annual probabilities of treatment t i (v), given in S2 Table. We let s i (v) be the probability that an individual infected with CHC, of genotype v, who receives treatment when at fibrosis stage F i will show a sustained virologic response. The probability of a patient not showing a sustained virologic response at that fibrosis stage is then 1 −s i (v). Wong et al., [6], report the probabilities of SVR by fibrosis stage and viral genotype following pegylated interferon and ribavarin therapy.
We denote the annual probability of background death (i.e. all-cause mortality) for HCVinfected individuals in a year t by d r (t). For the specific example of Canada [13], the background mortality among CHC patients with respect to the general population varies with age. In particular, with respect to the general population, younger CHC patients are at an increased risk of mortality from activities associated with HCV acquisition [14]. One such activity is injection drug use, for which the standard mortality ratio is 16.4 (95% CI: 9.1-27.1) [3,15,16] and which accounts for 70-80% of new HCV injections in Canada [17]. In contrast, older patients are more likely to die of liver-related causes [14], which are accounted for in the model through the advanced liver disease states. Therefore, as a simplifying assumption, we assume that infected individuals born before 1965 have a background mortality equal to that of the general population.

Mathematical formulation of natural history model
State transition modeling. We denote the expected number of individuals in any one of the disease states PT(t) respectively. We define a state vector x(t) that consists of the expected number of individuals in each latent state: where To model changes in x(t) over time, we use the state transition model described illustrated in Fig 1 and Fig 2. The elements of the vector x(t+1) can be written in terms of u(t) (the number of new infections in year t) and the elements of the vector x(t), as follows: Definitions of the terms in (2) are given in S1 Appendix. In this study, patients who show SVR are assumed to be 'cured' of the disease, and therefore do not contribute to the total CHC prevalence from the time they reach SVR. The expected total number of individuals with CHC within a cohort in a year t, denoted P x (t), is given by The prevalence rate R x (t) within a given cohort in a year t is then given by where p n (t) is the cohort's average population in year t. Estimates of p n (t) are available online from Statistics Canada [18] and are given in S4 Table. The expected proportion of the cohort's CHC population that is undiagnosed, D x , in year t is given by the ratio Model observables. The model's expected total number of HCC cases diagnosed in a given cohort in a year t (including HCC cases not induced by CHC), is given by where c HCC is the proportion of all diagnosed HCC cases that are CHC-induced. The total number of HCC cases diagnosed each year in Canada and in each province is available from Statistics Canada [8]. By comparing the number of CHC-induced HCC cases diagnosed in British Columbia [19] with the total number of HCC cases diagnosed in that province between 1990 and 2012 (from [8]), we estimated the percentage of all HCC diagnoses that were attributable to chronic hepatitis C, for the three birth cohorts we consider. These estimates are summarized in S3 Table. Let y acute (t) = d a u(t) be the model's expected number of acute hepatitis C diagnoses in year t and let y F i ðtÞ be the expected number of CHC diagnoses at fibrosis state F i in year t. The total number of chronic cases diagnosed in year t, denoted y CHC (t), is then where The model's expected total number of hepatitis C cases diagnosed in year t for a given cohort, denoted y hepC (t), is the sum of diagnosed acute cases and chronic cases y hepC ðtÞ ¼ y acute ðtÞ þ y CHC ðtÞ ð9Þ For notational purposes, we define the vector of time-series observables: yðtÞ :¼ ½y HCC ðtÞ y hepC ðtÞ� 0 : Solutions to the model equations. Eqs (2), (6) and (9) are linear in u(t) and in the elements of x(t). The natural history model can therefore be expressed in the concise state-space form: Where matrices A(t), B(t), C(t), D(t) are given in S1 Appendix. If the expected disease state vector x(t) is known at a starting year t = 0, and if the expected number of yearly new infections u(t) is known for years t = 0� � �,T, we can use (10) to obtain x(t) for 0<t�T as Together with (11), this yields the expected number of diagnoses for year t�0 as: For this analysis, observational data on the diagnoses of HCC and HCV were only available for the calendar years 1999-2013. We therefore set the start year of the analysis, t = 0, at 1999 and denote by t = T, with T = 14, the last year of the analysis, 2013.
Model calibration. Using Eqs (4) and (5), estimates of the prevalence rate R x (t) and the undiagnosed proportion D x ðtÞ can be obtained from estimates of the vector x(t). The estimate of x(t) in (12) is a function of many input parameters, only some of which are available from the literature. We organize these literature-derived input parameters of the model into a vector with q = [q 01 ,q 12 ,q 23 ,q 34 ] 0 , a = [a 1 ,a 2 ,a 3 ,a 4 ,a 5 , and s i (v) are the annual probabilities of treatment adoption and of SVR for fibrosis stage F i and viral genotype v. As V can be treated as a vector of random variables, we denote by P(V) the joint probability distribution of these parameters. On the other hand, input parameters in model (12), (13) that are not available from the literature can be grouped into three categories: • Initial states S 0 (0). . .S 3 (0) were not calibrated since the model assumes that no eventual progression in fibrosis or to advanced liver disease is possible from these states, and their values therefore do not impact the number of HCC and hepatitis C diagnoses.

a) Unknown initial year data
• The ALD initial states HCC(0), DC(0), LT(0) and PT(0) were not calibrated since progression to these states only occurs upon or after diagnosis with HCC and hepatitis C (the model observables that are used for calibration).

b) Unknown yearly new infections
The yearly incidence of new infections u(t), for t = 0,� � �,14 is unknown. We assume that, at the population level, the number of new infections occurring yearly will vary over time in line with trends in activities that carry a high HCV-acquisition risk, such as injection drug use. As a simplifying approximation, we assume that this trend is linear between t = 0,� � �,14. We let u

c) Unknown probabilities of diagnosis
We treat as unknown parameters the annual acute hepatitis C probability of diagnosis d a , the annual probability of CHC diagnosis at the F i fibrosis stage, d i , the annual probability of progression from F4 to HCC for non-SVR individuals, d HCC , and for SVR individuals, d HCC−SVR . We assume that aggregate diagnosis rates are driven by trends in the general awareness of HCV infection and access to testing facilities. Furthermore, for simplicity, we assume that these trends are linear between in years t = 0 and t = 14. The parameters d a , d i , d HCC , and d HCC−SVR therefore satisfy the following constraints: • In any given year t, the annual probabilities of diagnosis of acute hepatitis C and chronic hepatitis C at fibrosis stage F0 and F1 are equal. That is, d a (t) = d 0 (t) = d 1 (t)≕d early (t). We let d early ð0Þ ¼ d early 0 and d early ð14Þ ¼ d early 14 . The value of d early (t) in intervening years is evaluated through linear interpolation between d early 0 and d early 14 .
• The annual probabilities of diagnosis of chronic hepatitis C at fibrosis stages F2, F3, F4 are equal. That is d 2 (t) = d 3 (t) = d 4 (t)≕d late (t). We let d late ð0Þ ¼ d late0 and d late ð14Þ ¼ d late14 . The value of d late (t) in intervening years is evaluated through linear interpolation between d late0 and d late14 .
• The annual probabilities of progression from stage F4 to HCC, d HCC (t) is assumed to vary linearly between years t = 0 and t = 14, with d HCC ð0Þ ¼ d HCC0 and d HCC ð14Þ ¼ d HCC14 .
• Following [12], we assume that the annual probability of progression to HCC among F4 patients with SVR is a fifth of that among non-SVR F4 patients.
The ranges of the twenty parameters to be estimated are listed in Table 1. We organize these parameters into a vector M and assume that each parameter has a flat uniform prior distribution over the ranges given. Priors for initial states and for the numbers of new infections were based on the previous estimates of hepatitis C incidence and prevalence in [3]. Priors for annual probabilities of diagnoses with HCV infection were based on the model validation study in [6]. Priors for annual probabilities of progression to HCC were based on estimates in [12]. The joint prior distribution for M is denoted P(M). Data for model calibration. The posterior distributions of the model's unknown parameters are proportional to the product of the likelihood function of parameters M and the prior distribution P(M). We used three sets of data in the evaluation of the likelihood of M. The first is a set of time-series data, composed of the number of new primary HCC cases recorded annually in Canada, available from Statistics Canada [8]. We denote by z HCC (t) the total number of new HCC cases diagnosed in Canada within a given birth cohort in year t. The second is also a set of time-series data, composed of the number of hepatitis C diagnoses recorded annually in Canada, available from the Public Health Agency of Canada [9]. In this set, no distinction is made between acute and chronic cases of hepatitis C. We denote by z hepC (t) the recorded number of individuals diagnosed with hepatitis C in Canada in the year t from within a given birth cohort. These recorded time-series observations are summarized in S4 Table for the years 1999-2013 (years t = 0,� � �14) for the three birth cohorts of interest. For notational purposes, we organize the time series observations in year t for a given cohort into the vector The third set of data used in evaluation of the likelihood of M is composed of estimates of the proportions z F i (i = 0,� � �,4) of each birth cohort's total CHC diagnoses falling in fibrosis stages F0 -F4. These estimates, given in S5 Table, are obtained using clinical data from [6]. We organize this data into the vector For each birth cohort, we estimated the joint posterior distribution P(M,V|z) by fitting the model (12) to the cohort's data vector z from S4 Table and S5 Table. Samples from this joint distribution were then used to simulate the model (12) and hence evaluate R x (t) and D x ðtÞ using (4) and (5). The joint distribution was generated using Algorithm 1 in S3 Appendix, which is composed of two stages.
Stage 1 of Algorithm 1: The first stage consists of two for loops, one embedded inside the other. In the k th run of the total K 1 = 10 4 runs of the outer loop, the algorithm first obtains a sample V k from the distribution P(V) of the literature-derived parameters V (14). Then, the Metropolis-Hastings MCMC algorithm (S3 Appendix) is implemented in the inner loop for K 2 = 10 6 iterations to ensure that the Markov chain converges to the stationary distribution of the posterior of M,P(M|z,V k ). At the end of Stage 1, the algorithm outputs P(M|z,V k ), k = 1,� � �, K 1 , the posterior distributions of M given z and V k for K 1 samples V k from P(V).
Stage 2 of Algorithm 1: The second stage is composed of a single for loop in which we obtain samples from the joint posterior distribution P(M,V|z). In each of the K 3 runs of this loop, we first uniformly sample a vector V � k from among the vectors V k generated in the outer loop of Stage 1, thus yielding a sample from P(V). Next, we obtain a sample M � from among the samples of the distribution PðMjz; V � k Þ obtained in Stage 1. The combined vector ½V �0 k ; M �0 � 0 constitutes a sample from the distribution P(M,V|z).
Algorithm 1 was used to obtain samples from the joint posterior distribution P(M,V|z) for each of the three cohorts listed in S4 Table. Distributions for the CHC prevalence rate R x (t) and the percentage of CHC population undiagnosed D x ðtÞ were then obtained for each cohort by repeatedly sampling the cohort's joint posterior distributions, using the samples to simulate the model, and then evaluating (4) and (5).

Results
We next summarize the results obtained for each cohort.

1) CHC in Canada for pre-1945 births cohort
For the cohort of individuals across Canada who were born before 1945, Fig 3A and Fig 3B show the estimates by model (12), (13) of the HCC and hepatitis C diagnosis data listed in S4 Table. The mean estimates fit the HCC diagnosis data with R 2 = 0.76, and the hepatitis C diagnosis data with R 2 = 0.93. Fig 3C gives (12), (13) of the HCC and hepatitis C diagnosis data listed in S4 Table. The mean estimates fit the HCC diagnosis data with R 2 = 0.98, and the hepatitis C diagnosis data with R 2 = 0.99. Fig 4C gives the estimate of hepatitis C prevalence rate R x in 2013 as 1.13% (95% CI: 0.92% -1.33%). From Fig 4D, the percentage of CHC cases undiagnosed D x in 2013 was 18.8% (95% CI: 13.3% -23.9%). The estimated mean number of total CHC cases and undiagnosed CHC cases is given in S4 Appendix.

CHC in Canada for post-1964 birth years cohort
For the cohort of individuals across Canada who were born after 1964, Fig 5A and Fig 5B show the estimates by model (12), (13) of the HCC and hepatitis C diagnosis data listed in S4 Table. The mean estimates fit the HCC diagnosis data with R 2 = 0.85, and the hepatitis C diagnosis data with R 2 = 0.28. Fig 5C gives the estimate of hepatitis C prevalence rate R x in 2013 as 0.41% (95% CI: 0.32% -0.51%). From Fig 5D, the percentage of CHC cases undiagnosed D x in 2013 was 31.9% (95% CI: 19.1% -44.2%). The estimated mean number of total CHC cases and undiagnosed CHC cases is given in S4 Appendix. CHC in Canada averaged over all birth cohorts. Fig 6A shows the prevalence rate estimate for Canada averaged over all birth cohorts. The CHC prevalence rate estimate in 2013 was 0.63% (95% CI: 0.53% -0.72%). Fig 6B shows the percentage of CHC cases in Canada that were undiagnosed, averaged over all birth cohorts. The percentage of CHC cases undiagnosed in 2013 was estimated to be 27.1% (95% CI: 19.3% -36.1%).

Sensitivity analysis
A one-way sensitivity analysis of the calibrated models was conducted to determine the parameter categories causing the most variation in estimates of prevalence rate and undiagnosed CHC proportion. The parameters tested were grouped into nine categories, listed in S6 Table. The parameters in each group were varied together by +/-50%, and the effects of these perturbations on the areas under the curves (AUC) of R x and D x between years 1999 and 2013 were evaluated. The groups were then ranked according to how sensitive the cohort-averaged estimates of R x and D x AUCs were to each. The CHC prevalence rate (R x ) AUC, averaged over all three birth cohorts, was most sensitive to perturbations in the number of new infections (Fig  7). A perturbation of +50% in the yearly number of new infections yielded a change of +9.4% in the cohort-averaged R x AUC. The AUC for the proportion of the CHC population undiagnosed (D x ), averaged over the three birth cohorts, was also most sensitive to perturbations to the number of new infections, followed by perturbations to the annual probability of CHC diagnosis (Fig 8). Here, a perturbation of -50% in the number of new infections yielded a change of -6.50% in the cohort-averaged D x AUC, whereas a perturbation of -50% in the annual probability of CHC diagnosis yielded a corresponding increase of +3.53%.

Discussion
CHC often progresses silently and asymptomatically until late in its course. Recently, hepatologists have issued hepatitis C screening recommendations for Canadian baby-boomers [20]. However, hepatitis C screening in Canada has only been recommended for high-risk individuals by the Public Health Agency of Canada [21], partly due to the uncertain prospective financial burden of a publicly funded screening and treatment policy. An accurate estimate of the national prevalence rate of CHC and the proportion undiagnosed is therefore needed to reliably predict the disease burden and to perform economic evaluations of population screening and disease eradication strategies.
The aim of this work was to establish a mathematical framework to estimate CHC populations of interest. Starting with a well-established hepatitis C natural history model, our study proposed a method with which to calibrate the model using available diagnosis data. As a case study, we demonstrated the use of the framework in estimating Canada's CHC prevalence and the proportion of the Canadian CHC population that is undiagnosed. The averaged prevalence rate of CHC in Canada was estimated to be 0.63% (95% CI: 0.53% -0.72%) in 2013. The averaged percentage of CHC cases undiagnosed in Canada in 2013 was estimated to be 27.1% (95% CI: 19.3% -36.1%). Among the Canadian baby boomer generation (birth cohort 1945-1964), the CHC prevalence rate was estimated to be falling between 1999 and 2013, and at 1.13% (95% CI: 0.92% -1.33%) with 18.8% (95% CI: 13.3% -23.9%) of individuals with CHC undiagnosed in 2013. On the other hand, among the younger birth cohort (births years > 1964), the CHC prevalence rate was estimated to be rising between 1999 and 2013, and at 0.41% (95% CI: 0.32% -0.51%) in 2013 with 31.9% (95% CI: 19.1% -44.2%) of individuals with CHC undiagnosed in 2013. The falling trend over 1999-2013 in the baby boomer cohort is to be expected since a greater number of CHC infected individuals in this cohort have already progressed to advanced liver disease stage, in which the mortality rate is high. The prevalence rate in the younger cohort is low but shows a rising trend over the 1999-  2013 period. With the baby-boomer infected population falling and the lower-prevalence rate younger cohort forming an increasingly large proportion of the Canadian population over 1999-2013, the overall trend in the CHC prevalence rate was a downward one, as seen in Fig   Fig 6. Cohort- 6A. As more data becomes available with time, it can easily be incorporated into the modeling framework of this paper to yield more up-to-date estimates of CHC prevalence.
This study's estimates were based on 1) a comprehensive mathematical framework that is built on a validated hepatitis C natural history model; and 2) an approach that incorporates the sources of uncertainty inherent to hepatitis C natural history. Compared with previous results, our national CHC prevalence rate estimate for 2007 is 0.68% (95% CI 0.61% -0.76%) and is therefore lower than the 2007 estimate by Remis of 0.78% [3]. On the other hand, our 2011 prevalence rate estimate is 0.65% (95% CI 0.56-0.73) and is therefore in agreement with the 0.64-0.71% estimate for 2011 reported in the Canada Communicable Disease Report [5]. When compared with the seroprevalence rate of 0.5% (range 0.3-0.9%) estimated from the Canadian Health Measures Survey from 2007 to 2011 [22], our model estimated a higher prevalence rate. Estimates from the Canadian Health Measures Survey may be biased towards the low side as only regular households were surveyed while marginalized and high-needs populations may have been undersampled [22].
The one-way sensitivity analyses identified those parameters which had the greatest impact on the model estimates. Both the prevalence rate estimates and the undiagnosed CHC proportion were found to be most sensitive to the annual number of new infections. The undiagnosed CHC proportion was additionally found to be sensitive to the annual probability of CHC diagnosis. These particular parameters are the least well known of all the parameters of the model. As shown in S5 Appendix, there are significant differences between the posterior and prior distributions of these parameters, indicating their sensitivity to the observed numbers of diagnosed HCC and hepatitis C cases. This highlights the importance of high quality calibration data for the accurate calibration of these parameters and the subsequent generation of reliable model estimates. The sensitivity analyses therefore point to where additional research effort should be directed to improve estimate accuracy. This is particularly useful information for health economists who may wish to use this model for a 'value of information' analysis [23,24].
With respect to the backcalculation estimate in [5], our model quantifies the hepatitis C epidemic in more detail, in several respects: 1) it models the adoption of treatment among CHC patients as well as the probability of achievement of a sustained virologic response (SVR), a state in which HCV patients are considered effectively cured of the viral infection, 2) it accounts for the dependence, on viral genotype, of the rates of treatment, SVR and CHC progression, 3) it models the progression of CHC to advanced liver disease, and, 4) it models the heightened rate of mortality among younger CHC population cohorts [3,15,16]. Furthermore, a wider and more up-to-date set of data is used in this study with respect to [5]: first, our framework quantifies the proportion of HCC diagnosis numbers (used in the calibration) that are attributable to CHC. Second, Trubnikov et al. [5] made no use of statistics on liver fibrosis distributions at diagnosis as was done in this work. Finally, the Bayesian approach we adopt in calibrating the model returns prevalence estimates in the form of posterior probability distributions that reflect the parametric uncertainty in the model.
Although the model used in this study is more comprehensive than those used in [3] and [5], the framework we have presented is also subject to some limitations. The model relied on the total number of HCC cases diagnosed each year as one of the observables. Because progression from a new HCV infection to HCC may occur over a period of 20-40 years [25], the model may underestimate the prevalence of CHC in the youngest cohort. A further limitation arises from uncertainty in the calibration data. These data were derived from public databases which are subject to the usual limitations of under-reporting, incompleteness, duplication and sampling bias. In future work, uncertainty in our estimates can be reduced with calibration data that i) report the annual numbers of HCC cases that are confirmed as being CHC-induced and ii) decompose the annual numbers of hepatitis C diagnoses into acute, resolved and chronic cases.
Using a Bayesian Markov Chain Monte Carlo approach, this paper has presented a method for the estimation of CHC populations of interest from a hepatitis C natural history model calibrated using publicly available data on the occurrence of HCV-related events. One of the advantages of this framework is its ability to incorporate multiple sources of uncertainty into the model in arriving at its estimates. This factor makes the estimates more robust and is essential for decision-making [26]. In future work, this approach can be generalized and used to produce region-specific estimates.
Supporting information S1 Table. Literature-derived model parameters. Annual probabilities of progression to CHC, progression between fibrosis stages, progression to DC, and advanced liver disease progression. (PDF) S2  Table. Parameter groups perturbed in one-way sensitivity analysis.