A Standardized Vascular Disease Health Check in Europe: A Cost-Effectiveness Analysis

Background No clinical trials have assessed the effects or cost-effectiveness of health check strategies to detect and manage vascular disease. We used a mathematical model to estimate the cost-effectiveness of several health check strategies in six European countries. Methods We used country-specific data from Denmark, France, Germany, Italy, Poland, and the United Kingdom to generate simulated populations of individuals aged 40–75 eligible for health checks in those countries (e.g. individuals without a previous diagnosis of diabetes, myocardial infarction, stroke, or serious chronic kidney disease). For each country, we used the Archimedes model to compare seven health check strategies consisting of assessments for diabetes, hypertension, lipids, and smoking. For patients diagnosed with vascular disease, treatment was simulated in a standard manner. We calculated the effects of each strategy on the incidence of type 2 diabetes, major adverse cardiovascular events (MACE), and microvascular complications in addition to quality of life, costs, and cost per quality-adjusted life-year (QALY). Results Compared with current care, health checks reduced the incidence of MACE (6–17 events prevented per 1000 people screened) and diabetes related microvasular complications (5–11 events prevented per 1000 people screened), and increased QALYs (31–59 discounted QALYs) over 30 years, in all countries. The cost per QALY of offering a health check to all individuals in the study cohort ranged from €14903 (France) to cost saving (Poland). Pre-screening the population and offering health checks only to higher risk individuals lowered the cost per QALY. Pre-screening on the basis of obesity had a cost per QALY of €10200 (France) or less, and pre-screening with a non-invasive risk score was similar. Conclusions A vascular disease health check would likely be cost effective at 30 years in Denmark, France, Germany, Italy, Poland, and the United Kingdom.


Introduction
This report describes the calibration of the care processes in the Archimedes Model, specifically focusing on the care processes used in the Simulator 2.3 version of the Model and used by ARCHeS, the webbased interface to the Model. As reported elsewhere 1 , the Archimedes Model has several parts. Two of the most important are the sub-model of physiology, which determines the occurrence and progression of diseases, and the representations of care processes, which determine how patients are cared for to prevent or manage diseases 2 . It is the latter that is the subject of this report. Methods for building and validating the physiology part of the Model are discussed elsewhere 3 .
The calibration of care processes involves several steps. The first is to identify measures against which outcomes of the Model should be compared, and targets for those measures. The target values for the measures should be based on empirical observations in the setting of interest. The second is to run the Model to calculate values of the measures. The third is to compare the calculated values to the empirically-observed values, and make judgments about the goodness of fit. The fourth is to modify parameters of the care processes and repeat the steps until the calculated and observed values match within an acceptable range.
The report describes each of these steps, as well as the role of care processes and the importance of calibrating care processes to match actual practices in settings of interest, limitations of calibration of care processes, how care processes are implemented in the Archimedes Modeling framework, sources and methods for performing the calibration, the results of the calibration, and conclusions.

Role of Care Processes in the Archimedes Model
An integral part of the Archimedes Model is the set of care processes that describe how diseases are prevented and managed. These care processes are a critical part of every analysis. There are three main reasons. The first is that the effectiveness of an intervention depends on how it is delivered. For example, the effect of a drug will be different if it is simply given at a particular dose, versus titrated to achieve a goal, versus included in a step-wise algorithm (e.g., first-order drug, second-order drug, and so forth). The second reason is that the effect of an intervention also depends on the other care processes used to manage patients. More specifically, the effects of interventions are always measured relative to some baseline or reference standard of care. Thus the same intervention will have different effects in different settings, depending on what care processes are being practiced in the settings. For example, a protocol for determining when patients seen in an emergency room for chest pain should be referred to a specialist will have a different effect in a setting where the current practice is to have every chest-pain patient referred to a specialist than in a setting where the current practice is to refer no one to a specialist 4 . A corollary is that to the greatest extent possible the background care processes in a simulation should match the background care processes in the setting of interest. Stated another way, analyses should be customized to match the settings of interest. Failure to explicitly include and calibrate care processes to the settings of interest amounts to an implicit assumption that the care processes in the setting of interest are the same as the care processes that were being followed in the setting where the effect of the intervention was originally measured 5 . A third reason care processes are important is that they themselves are often the targets of interventions. The entire field of process improvement is devoted to increasing the efficiency, cost, and safety of processes.
It is for these reasons that, for the interventions it is intended to analyze, the Archimedes Model explicitly separates the physiology of diseases from the management of those diseases -the care processes followed to prevent, diagnose, and treat them. By addressing each of these separately, the Model can more accurately represent how an intervention will be implemented, and the background care into which the intervention will be introduced and against which it will be compared. This approach also enables the user to customize the Model to different settings and to analyze process improvement projects.

Objectives and Limitations of Calibration Objectives
The objective of calibration is to create a model that represents care as it is actually delivered in the setting of interest. For Simulator 2.3 and ARCHeS, the reference setting is current care in the United States. Representing actual or current care involves three main steps. The first is to develop a systematic representation of care processes in a mathematical form. The method should be sufficiently flexible and general to enable representation of current guidelines at a realistic level of complexity and detail. The method should also include parameters that can be set to modify a care process to match the care that is actually being delivered in a particular setting. The second step is to identify the guidelines that are recommended by professional groups as the standard of care. We will call this "nominal" or "ideal" care and use it as the starting point for representing actual care. However, it is well known that not all providers follow guidelines precisely, and not all patients follow the recommendations of their providers. Thus a third step is needed in which the parameters in the care processes are tuned or calibrated to represent how providers and patients behave with respect to following recommended guidelines. It is the combination of the recommended care and the patient and provider behaviors that determines what actually happenswhat we will call "actual" or "current" care.

Limitations
The ability to calibrate a model to current care is subject to a number of limitations. First, practices are continually changing to incorporate new technologies, new science, new evidence, and modifications of guidelines. At any time, the Model is calibrated to the most current available data, but the calibration is subject to change to keep the Model current.
A second limitation is that while the objective is to represent current care as it is practiced in the US, in reality practice patterns vary widely across different geographic regions, medical centers, physician groups, and providers. Practices also vary with socioeconomic status, insurance coverage, urban versus rural settings, and other non-medical factors. There is no such thing as a single practice pattern that applies to all settings in the US. Thus, current care as incorporated in the Archimedes Model is suitable for national policies and can be used as the starting point for delivery systems that, on average, follow national patterns. Because the Archimedes Model explicitly represents care processes and behaviors, it can be modified or customized to different settings as needed, provided sufficient data are available to perform the calibrations.
Guidelines also change over time. Because of this, in reality a person can be subject to a variety of different guidelines as he or she ages. Because it is impossible to know for each patient the spectrum and timing of guidelines to which the patient was subject, it is necessary to make simplifying assumptions. For calibrating the care processes in Simulator 2.3, we assumed that simulated patients were subject to current guidelines and healthcare utilization rates for their entire lives. For example, for a simulated 70-year-old man who was diagnosed with dyslipidemia at age 30, we will assume that he has been taking a statin for 40 years, whereas in reality statins have only been on the market since 1987.
A fourth set of limitations relate to the available data. Because there is no single data source that defines all important aspects of care, calibration requires the use of several different sources, which can involve different populations, methods, definitions, time periods, and other factors. The data do not always map perfectly to outcomes and events calculated by the Model, or even to other data sources. Furthermore, for many conditions, the outcomes and events of interest are relatively infrequent, raising issues about statistical variability and wide ranges of uncertainty. Thus for many of the measures used to calibrate a model, there is no single unambiguous number to serve as the target. Rather, it is necessary to review several sources, make judgments about the applicability of each source, and define targets that, in the judgment of appropriately chosen experts, best represent current US care.
A fourth limitation derives from the fact that the starting point for calibration is the set of guidelines published by national organizations. Branch points in the guidelines, where providers and/or patients may take different actions, are identified. Probabilities are assigned to represent the proportions of providers and patients who choose various branches. Examples are the probability that a physician applies a guideline at all, the probability that the physician orders a recommended test, and the probability that a patient follows a provider's recommendation. As will be described below, parameters such as these are "tuned" to fit the observed levels of biomarkers and utilization rates in the population. The issue is that for some practices it is not possible to find any set of parameter values that fit all the available data. This is caused by the fact that some providers do not start with the national guidelines at all, but rather follow other practices which are rarely described and virtually never measured.
For all these reasons and others, it is not possible to find a set of parameters for care processes that cause the Model to fit all the data from all the sources, and there will always be some discrepancies that for all intents and purposes are unavoidable. The objectives of calibration are to get as close as possible to observed levels of care, and to identify places where potentially important discrepancies exist. Calibration is not a test of the accuracy or validity of the model. If calibration tests anything, it tests the extent to which actual practices are consistent with nationally recommended guidelines.

Initiation of Care Processes
In a simulation using the Archimedes Model, care processes are initiated when a patient encounters the healthcare system. There are two main categories of care processes based on whether or not the encounter is caused by an acute or non-acute condition. Encounters can be initiated when a patient develops a symptom of a disease. In the Model, simulated patients have behaviors that recognize the occurrence of a symptom, determine whether they will seek care at that time or wait until the symptom progresses, and determine where they will seek care. Depending on the urgency of the symptom and the patient's behavior, a patient may seek care acutely in an emergency room or non-acutely in an outpatient setting. Acute care processes triggered by emergency room presentations can lead to ambulatory or inpatient testing and treatment. Each of these behaviors has parameters that can be modified to match observed behaviors.
While some encounters are initiated by the patient, encounters can also be initiated by healthcare providers. This can occur if the person meets criteria for some outreach or follow-up protocol, or it can occur during the course of managing a patient, where an evaluation done as part of one care process reveals a test result or diagnosis that in turn triggers another care process. Thus once initiated, care processes can occur in cascades depending on what information is revealed, how patients respond to treatments, and what events unfold.
For the calibration process, a screening visit is scheduled for all patients at the age of twenty. At this visit they are screened for a number of conditions (following guideline recommendations). Like any other visit, this visit can lead to a cascade of subsequent care processes, or a healthy patient may not return for years.

Representation of Care Processes
Care processes vary widely in their complexity. To enable representation and parameterization of the full range of care processes issued by national organizations, the Archimedes Modeling framework includes methods for breaking care processes into components which can then be used as building blocks to represent care processes at high levels of detail and complexity. Here we will describe the components of the non-acute care processes. Processes for the evaluation, diagnosis, treatment, and management of all modeled non-acute care processes are modeled using this framework.
There are two main phases: the first involves evaluation and diagnosis. The second involves treatment and monitoring.

Non-acute evaluation and diagnosis
The evaluation/diagnosis phase has four steps: 1. Candidacy. Determine if a patient meets the inclusion and exclusion criteria for the care process.
Eligibility can be based on a wide variety of factors such as demographic information, symptoms, risk factors, or past medical history. Some care processes involve multiple sets of inclusion and exclusion criteria, such as ""low-," "medium-," and "high-" risk groups. If so, then in this step a patient will be placed in the appropriate group. 2. Eligibility. If a patient meets criteria for a care process or a particular group in a care process, the next step is to determine if the patient is eligible or indicated for any tests. This takes into account not only the list of indicated tests but any rules for the timing of tests. For example, a care process may call for a person on cholesterol medication to be tested every 6-12 months, in which case this step will search the records to determine the time of the last test and determine if 6 months has passed. 3. Testing. Based on the results of the previous step, perform any tests indicated by the care process. 4. Diagnosis. Finally, make a diagnosis based on the results of the tests and other information. The diagnosis can indicate one or more conditions, or no condition.

Non-acute treatment and monitoring
After a diagnosis is made, the second phase of the care process -treatment and monitoring -begins. This has five steps.
1. Candidacy. Determine that the patient meets the inclusion and exclusion criteria for the treatment. In this phase the criteria will include the results of tests and diagnoses made in the evaluation/diagnosis phase, along with other relevant information. 2. Eligibility. Determine the treatments for which the patient is eligible and check any conditions on the timing of treatments. For example, newly diagnosed patients should receive treatment immediately, whereas the interval of follow-up care varies for each guideline and treatment. 3. Set goal. Determine whether any treatment goal applies, such as "if a patient has a history of myocardial infarction, LDL treatment should be given with the goal of producing the LDL to <100 mg/dL." 4. Treatment. Order any treatment(s) specified by the guideline. Examples are writing a prescription and scheduling a procedure. Depending on the treatment, this step could trigger additional care processes such as those relating to the performance of a procedure. If there is a treatment goal in the guideline, then the treatment will be titrated depending on whether the patient has reached the goal. 5. Monitoring. After a treatment is given, arrange for any follow-up visits and tests that are recommended in the care process. These protocols frequently include recommendations for timing such as, "after a patient is placed on cholesterol-lowering treatment, retest the cholesterol in 4 to 6 months." Monitoring can also include referrals to specialists. The monitoring step is usually dynamic, with specific recommendations depending on the patient's response to treatment, the results of follow-up tests, and so forth.
The non-acute components as well as the potential for a cascade of care processes are illustrated in Figure 1.

Acute care processes and prioritization of care
In the Archimedes Model acute care processes are also broken down into components with the objective of being able to prioritize the diagnosis and treatment of conditions based on the urgency of the condition. For example, if a patient seeks care in an emergency department for chest pain, an EKG finds signs of a myocardial infarction with ST elevation (STEMI), and a blood pressure test finds the patient has hypertension, and then the care process for managing STEMI will take precedence over the care process for evaluating hypertension. Prioritization of the care processes is based on standard clinical practices as described in the guidelines themselves, supplemented by the judgments of clinicians and experts.
The components of acute care processes include two phases: testing and treatment. When a patient presents with one or more symptoms, the relevant testing processes are exercised in a way that will allow the most urgent condition to be diagnosed first. Once a diagnosis is made, treatment of that condition is initiated. When multiple conditions are diagnosed, treatment of those conditions is also ordered such that the most urgent condition is treated first.

Multiple care processes
Any of these components can trigger additional care processes. For example, if the evaluation component in one care process calls for certain tests to be performed, then depending on the results of those tests the patient may become eligible for another care process, in addition to the care process that is in progress. Or, depending on how a patient responds to a treatment, additional evaluations and treatments may become indicated. The components can be sequenced, repeated, and branched to any extent necessary to represent the full complexity of the progression of diseases and the application of care processes to manage them. © 2012

Scheduling of care processes
Because a patient may be subject to multiple care processes, each of which has monitoring and followup visits, the methods for representing care processes include time windows for visits as well as algorithms for coordinating visits and combining tests and treatments that occur at visits, as occurs in reality.

Resolution of ambiguities in guidelines
In our experience it is possible to represent virtually all existing guidelines using these building blocks. Ambiguities in existing guidelines such as "[Test A] may be considered in some cases" are resolved when possible using data on utilization, or by expert judgment and observation of common clinical practice when utilization data are not available. Ambiguities in existing guidelines and lack of sufficient data to resolve ambiguities are important limitations of the calibration process.

Parameterization of Care Processes
Construction of care processes is only the first step in the calibration of the Model to match current practice patterns. The care processes built using the methods just described represent what we are calling "ideal care" -the care that is specified by the national organizations. This level of care is not realistic in the sense that not all providers and patients behave as called for in the care process. Discrepancies between the "ideal care" specified by the guideline, versus the "actual care" or "current care" that is actually practiced, are addressed by calibration.
To enable calibration of the care processes from the ideal to the real practices, each step in the care process includes two sets of parameters. One set specifies the probabilities that the provider will execute that step in the care process as specified -e.g., correctly determine a patient's eligibility, conduct the evaluation as specified, order recommended tests, make a diagnosis accurately, give a treatment as recommended, schedule follow-up visits as recommended, and so forth. The other set of parameters specifies the probabilities that the patient will adhere to the actions recommended by the provider -e.g., take the test, take the treatment, show up for the follow-up visits, see the specialist, and so forth. The parameters can be simple probabilities, or can be calculated as functions of other variables in the model such as a patient's age, risk factors, and so forth. It is these parameters or equations for parameters that are tuned to calibrate the "ideal" care processes to represent observed behaviors and levels of utilization.

Calibration Parameters
Because the structure for the care processes is general and all care processes are built using the same building blocks, it is possible to specify a set of parameters that can be used to calibrate any care process. By setting the parameters to different values, the Model can be calibrated or customized to a wide variety of different settings, and to different patterns of practice within any particular setting (e.g., conservative, average, aggressive practice). The parameters are divided into three main categories: patient response to symptoms, provider performance, and patient adherence. They are described in Table 1.

Table 1. Healthcare Process Calibration Parameters.
When a simulated individual "perceives" a symptom, a part of the Model that addresses patient behavior determines how the person responds to the symptom. The probability that the patient seeks care is determined by a parameter (parameter #1). If the patient decides not to seek care, he or she does not enter the healthcare system. If the patient does seek care, there are two parameters that determine where the patient will seek care, at an emergency department (parameter #2), from a specialist (probability #3), or from his or her primary care physician. For example, if parameter # 1 is 0.2 and parameter #2 is 0.5 and parameter #3 is 0.25, then 20% of individuals who perceive symptoms will ignore them. Of the remaining 80% that seek care for their symptom 50% will present to the emergency department (40% of the total group). Of the remaining 40%, 25% will present to a specialist (10% of the total group). The other 30% of those who perceive symptoms will present to their primary care physician. This level of detail is important because it is an important determinant of the cost of care.
In the non-acute setting, the performance of providers is affected by four parameters (parameters #4 though #7). Parameters #4 and #5 affect the flow of patients through the evaluation phase of care, while the other two affect monitoring.
The acute care setting does not include parameters associated with provider performance. In this setting providers are assumed to follow recommended care processes with 100% performance. This aspect of the care-process model can be made more realistic if needed.
In addition to parameters that affect the performance of providers, the Model includes parameters that affect the probabilities that patients will adhere to particular steps in the care processes (parameters #8

Parameter Number Description
1 Probability that the person seeks care for a symptom.
2 Probability that the person presents to the emergency room when symptomatic. 3 Probability that the person presents to a specialist for care when symptomatic.

4
Probability that an individual does not receive a care process for which he or she is eligible.

5
Probability that a patient with a disease is not diagnosed with that disease. 6 Probability that a condition that is controlled is incorrectly determined to be uncontrolled (possibly causing intensification of treatment).

7
Probability that a condition that is not controlled is incorrectly determined to be controlled (possibly leading to an inappropriate withholding of treatment).

8
Probability that a patient has a prescribed test.

9
Probability that a patient takes a prescribed treatment.

10
Probability that a patient attends a scheduled appointment.

11
A window of time in which a follow-up visit can occur.
through #10). The patient adherence parameters apply to specific tests, treatments, and appointment requests and can be used in both the non-acute and acute settings. The methods for setting these parameters are very general; adherence to a specific test, treatment, or encounter can be a simple probability, or it can be a more complex function of such factors as the patient's history, co-morbidities, past adherence, and other measures of health and behavior.
The final parameter (parameter # 11) affects the scheduling of care processes. This parameter defines a window of time in which a follow-up visit can occur. This is used by the Model to coordinate visits. Figure 2 illustrates how the calibration parameters are integrated into the non-acute components.

Sources for Guidelines
Because the reference version of Archimedes Simulator 2.3 is designed to represent current care as delivered in the US, the care processes are based on US national guidelines, and are calibrated using data from the US national datasets. The guidelines currently incorporated in Simulator 2.3 are shown in Table 2.

Sources for Calibration Data
Approximately thirty national-level datasets were surveyed to identify measures and target values for the calibration process. Five main datasets were chosen because they provided data that were nationally representative, spanned the diseases and care processes in the Archimedes Model, and included information on subpopulations. They are listed in Table 3. Additional sources were used when needed to calibrate aspects of the Model that were not adequately covered by these sources.

Calibration Measures
Using the sources in Table 3, and considering the level of detail available in these sources and in the Archimedes Model, we identified the following types of measures as appropriate targets for the calibration process (Table 4).

Mean biomarker values
Mean of biomarkers compared to estimates from NHANES.

Prevalence demographics
Prevalence of gender, race, and ethnicity demographics compared to estimates from NHANES.

Prevalence of diagnosed conditions
Prevalence of diagnosed conditions at specified time and compared to estimates from NHANES and NAMCS/NHAMCS -OPD.

Incidence of diagnosed conditions
Incidence of diagnosed conditions over specified interval, then normalized to annual rate and compared to estimates from combined data from the NHDS and NHAMCS. Incidence is defined prospective per capita (number of diagnoses per capita in one year).
Incidence of diagnosed deaths 6 Incidence of deaths due to different causes over specified interval then normalized to annual incidence and compared to estimates from CMF. Incidence is defined prospective per capita (number of diagnoses per capita in one year).

Hospitalizations per capita
Number of inpatient admissions prospective per capita over specified interval, then normalized to annual rate and compared to estimates from NHDS (only admissions for conditions in the Model).

Outpatient visits per capita
Number of outpatient visits prospective per capita over specified interval, then normalized to annual rate and compared to estimates from NAMCS/NHAMCS -OPD.
Tests per capita Number of tests performed prospective per capita over specified interval, then normalized to annual rate and compared to estimates from NAMCS and NHAMCS (both OPD and ED).

Treatments per capita
Number of treatments prospective per capita over specified interval, then normalized to annual rate and compared to estimates from combined data from the NHDS and NAMCS/NHAMCS -OPD.

Prevalence taking intervention
Prevalence of patients taking an intervention at specified time compared to estimates from NHANES.

Subpopulations
The calibration process involves not only calibrating care processes to match data for the entire population but also calibrating the Model for important subpopulations. Subject to the availability of data, for each of the measures in Table 4, we defined targets for the following subpopulations (ages and co-morbidities at baseline): •

Target Values for Calibration Measures General Methods
To the greatest extent permitted by the available data, a target value was specified for each measure, and the care processes in the Model were calibrated to match those values. For some measures there were multiple data sources for specifying target values, in which case judgments had to be made about a "primary" target and upper and lower bounds. This section describes some of the methodological issues that arise in specifying target values and bounds.
The methodology for estimating targets requires mapping simulated outcomes to real-world diagnosisrelated groups (DRGs), ICD codes, or CPT codes. Mappings were established by a committee of physiology modelers and internal medical staff. All primary datasets include weights for the individuals that enable generalization of the dataset to a nationally representative population. Subpopulation sample size estimates derived from 2005-2006 NHANES data using the interview weight 7 were used as normalization factors across the various source datasets. These steps ensured that the measures were as uniform as possible.
For many of the measures used to calibrate care processes, target values had to be obtained by adding rates obtained from multiple data sources. Setting a target for the incidence of myocardial infarctions is one example. MIs seen in hospital emergency departments and discharged without admission are recorded in NHACS. MIs seen in ambulatory care settings are recorded NACS, and MIs admitted are recorded in NHDS. Silent MIs and sudden deaths may not be recorded in any of these. Furthermore an event might be recorded in more than one dataset; a MI that occurred in an ambulatory setting but was later admitted might be recorded in both ambulatory and hospital datasets. Because of gaps and overlaps like these the data reported in each dataset had to be analyzed carefully to derive the most accurate possible estimate of the target value. In some cases the best that could be accomplished was to identify values that represented upper or lower bounds for the "true" value.
Another problem was that for some measures, such as the use of tests, a value reported in the dataset might be too inclusive. For example, a particular test might be used for a wide range of conditions, not all of which are currently in the Archimedes Model. In such cases the observed rate of test usage would be expected to overestimate the rate calculated by the Model. To the extent permitted by the available data, observed rates were adjusted to reflect the diseases in the Model.

Calculation of Target Values Used for Calibration
The following sources used to identify target values for each type of measure used for the calibration.

Target values for prevalence of diagnosed conditions
Prevalence rates of diagnosed conditions were estimated primarily from NHANES and NAMCS/NHAMCS-OPD data. Neither NHANES nor NAMCS/NHAMCS-OPD has chronic disease markers that cover all the diseases that are in the Archimedes Model. NHANES was used because it has the best coverage. Another complication with the NAMCS/NHAMCS-OPD data was that the three diagnosis fields were related to visits and did not include all the diagnoses a person may have. Therefore NAMCS/NHAMCS-OPD data represented lower bounds for the target values. The prevalence rates of obesity and dyslipidemia are much higher in NHANES than in NAMCS/NHAMCS-OPD and probably too high for diagnosed conditions; however, NHANES was still used for these targets to maintain consistency.

Target values for incidence of diagnosed conditions (other than death)
No single data source reported all the possible ways the first occurrence of a condition might be recorded for estimating incidence rates. For example ambulatory-care surveys record cases that are seen in ambulatory care, while hospital-based surveys record cases first seen in that setting. To derive target values for incidence rates of diagnosed conditions, for most conditions it was necessary to add rates reported in NAMCS/NHAMCS-OPD and NHAMCS-ED. Exceptions were myocardial infarction (MI) and stroke. For these conditions it was necessary to also include incidence rates reported in NHDS. The NHDS dataset does not have sufficient markers of chronic condition to enable estimation of incidence for subpopulations based on disease histories; it provided information only for subpopulations based on gender and age.
When using data from NAMCS and NHAMCS, it was important to distinguish between visits for "new" conditions and existing conditions. Only the individual's first diagnosis and the reason for the visit were considered. The "high," "best," and "low" estimates for incidence rates were based on the recorded reason for the visit.

Target values for diagnosed deaths
Incidence rates of diagnosed deaths were derived from CDC mortality data. As for other diagnoses, deaths that occur in the simulation had to be mapped to ICD codes (in this case ICD-113 group codes and ICD-10 codes). There is some uncertainty in this approach because definitions of conditions by ICD code are not consistent. For example the definition of CHD death can vary greatly across different studies and sources, if a definition is provided at all. Because of this, it was not always clear how conditions calculated in the Model should be mapped to ICD codes. Where there was ambiguity about whether a particular ICD code was calculated by the Model, we conducted a sensitivity analysis in which we designated the code as either "modeled," "possibly modeled," "probably not modeled," or "not modeled." This method was used to define a lower bound, best estimate, and upper bound for the target value for the outcome, as follows: • Lower bound: Count only persons who had a "modeled" ICD code • Best estimate: Count all persons who had a "modeled" ICD code, and half of those with a "possibly modeled" ICD code • Upper bound: Count all persons with "modeled," "possibly modeled," or "probably not modeled" ICD codes.

Target values for incidence of treatments/procedures
Incidence rates for treatments (procedures) were calculated as the sum of rates reported in the NAMCS/NHAMCS-OPD and NHDS datasets. When using the NAMCS/NHAMCS-OPD data, diagnoses associated with a visit were checked to confirm that the procedure of interest occurred. These data have several types of procedure codes spanning eight potential fields. Each of these eight procedure fields was checked. In the NHDS dataset the fields that describe procedures are not as complex but had to be checked to confirm the DRG codes associated with each visit. Because the NHDS dataset does not have fields that describe chronic conditions, it was not possible to estimate incidence rates for subpopulations that had chronic conditions.

Target values for prevalence of individuals receiving interventions
Prevalence rates of persons taking various treatments were calculated from NHANES. The primary limitation of using NHANES for this measure is that in this dataset information on medication usage is acquired by questionnaire. This methodology often overestimates true rates. For an extreme example, it has been observed that 87% of persons with diabetes report that they adhere to diabetes lifestyle changes, which is a much higher rate than reported the literature using more reliable methods.

Target values for hospitalizations
Data on hospitalization rates were obtained from NHDS. As described above, this dataset does not have sufficient markers of chronic conditions to enable estimation of hospitalizations for subpopulations based on disease histories; data are available only by gender and age. In the real world, admissions are grouped into hospitalization categories (DRGs), and these are used for estimating costs. To calibrate the care processes in the Archimedes Model, the data were queried in such a way as to mimic the DRG methodology.
The first step was to create a set of "Archimedes DRG" codes that map to "real-world DRG" codes. This was necessary because real-world DRGs are based on both diagnoses and procedures. While the DRG reported in NHDS may not be included in the Model, a component of the hospitalization may be included in the Model. For example, a patient could present with an MI but receive a heart transplant during the admission. The real-world DRG recorded would be for the heart transplant. The Archimedes Model does not include transplants, but it is important to include the admission for the MI in the data analysis. Therefore in this example, the real-world DRG for the heart transplant should be mapped to an Archimedes DRG for the MI. For these reasons we analyze hospitalization data at a finer level of detail than is captured by the real-world DRG classification.
These DRG, ICD-9, and procedure code mappings enable us to query the NHDS dataset to determine all possible Archimedes-DRG categories in which a hospitalization could be classified. Archimedes-DRG outcomes are ranked by cost using Medicare data, and the most expensive Archimedes DRG is assigned to each hospitalization. This technique mimics how hospitalizations are classified into real-world DRGs, by cost.
Even after making these adjustments, it was apparent that target values estimated from the datasets did not always align with simulated points. In some cases the Model includes the "chronic" aspects of a disease but does not include acute exacerbations that could bring an individual to the hospital (e.g. atrial fibrillation). Most of the missed costs are corrected (e.g. an average cost of having atrial fibrillation is applied to life years spent with this chronic condition), but the true rate of hospitalizations was still low. Furthermore, hospitalizations for some conditions are relatively infrequent. For all these reasons we determined that the range of uncertainty and potential for errors were too high to justify using hospitalizations rates to calibrate the care processes. We calculate them but only use them for general insights into the model and the data, and therefore do not report the results here.

Target values for outpatient visits
Incidence rates of outpatient visits were calculated from NAMCS/NHAMCS-OPD datasets. Use of these datasets was complicated by the need to define which visits reported in the datasets are relevant to the diseases in the Model. This was addressed by reviewing the full set of ICD-9 codes covered by the Model and finding visits that include these codes as the primary diagnosis. All of the reasons for visits given in the dataset were reviewed, and visits for irrelevant reasons (such as visits for social-problem counseling, injuries, or administrative purposes like school exams or driving tests) were excluded. A visit was included in data analysis if a relevant diagnosis was made or if the reason for the visit is covered by the Model.
However, despite taking these steps, it was determined that the potential mismatches between visits calculated by the model and real-world visits recorded by ICD-nine codes were too great to justify reliance on these data for calibrating the Model. Therefore, as with hospitalizations, these results were calculated and examined for general insights, but are not reported here.

Target values for tests
Incidence rates for tests were estimated from NAMCS and NHAMCS (both outpatient and emergency department (ED)) datasets, which cover ambulatory care. Ideally the target values for rates of test use would also include tests performed on an inpatient basis. However, NHDS does not have information on tests. Furthermore, while the ED component of NHAMCS contains test information, it does not capture repeat tests or tests taken if someone is admitted to the hospital for several days. For these reasons, estimates based on these datasets almost certainly underestimate the actual rates of test use.
An additional problem is that use of tests calculated in the Model does not always align perfectly with the use of tests reported in the datasets, or with the reasons tests are given in the real world. For example, the NAMCS and NHAMCS-OPD datasets do not have specific fields for FPG tests, but instead have a general data field for "glucose tests." In contrast, the Archimedes Model includes specific fields for FPG, RPG, OGTT, and HbA1c tests. In the case of blood pressure tests, in the real world, blood pressure is measured almost every time a person makes contact with the healthcare system, but in many instances the provider does not act on an abnormal finding. In the Model, each test is taken with the purpose of making a diagnosis, and providers act on abnormal findings with 100% performance. This difference in performance rates either causes prevalence rates of conditions in the Model to be greater than observed prevalence rates (if testing is done at the same rate in both the Model and the real world), or causes rates of tests to be lower in the Model than in the real world (if testing rates are decreased to create alignment between prevalence rates in the Model and prevalence rates in the real world). The Model can be calibrated to either the rates of tests or the prevalence of conditions diagnosed by the tests, but not both.
For these and other reasons, we determined that rates of test usage were too uncertain and unreliable to justify using calibration. Results were calculated and examined for general insights, but are not reported here.

Example of Estimating a Target Value: Incidence of MIs
The previous section gives an overview of how target values are set for different types of calibration measures. This section illustrates the methods by describing how target values were estimated for the incidence of MI.
When specifying a target value for calibration, the objective is to use real data to determine the true rate of the event or outcome of interest. Occasionally a dataset will measure exactly the event or outcome of interest, and the rate reported in the dataset can be used without modification to set a target value for that measure. Far more commonly, estimation of the true rate requires examination and interpretation of multiple datasets. Myocardial infarctions provide a good example, because people with MIs can receive care in several different settings: ambulatory, hospital ambulatory, hospital admission, etc. Thus to capture all the MIs that occur it is necessary to add the incidences rates found in several different surveys, while trying to avoid double counting. An additional complication is that when using NAMCS and NHAMCS, one has to distinguish between visits for "new" conditions and existing conditions. This issue does not arise when using NHDS because this dataset only includes hospitalizations (inpatient stays), and these are a direct result of the condition diagnosed in that stay.

Use of NAMCS and NHAMCS Datasets
For these two datasets it is important to distinguish "new" from existing conditions when calculating incidence. For example, if the visit results in a final diagnosis of CAD, it should count as a CAD diagnosis and contribute to incidence only if the person had never been diagnosed with CAD previously. While there is no unequivocally correct way to determine this, for the NAMCS and NHAMCS datasets two steps were taken to address this issue. First, only the person's first diagnosis (three are provided) was used, and second, the reason for the visit was taken into account. To determine the visits that were relevant to a condition, each visit was reviewed to see if the person's first diagnosis (DIAG1) was in the list of codes provided for that disease (see ICD-9 diagnosis codes listed for MIs below). The diagnosis was not counted if it was a second or third diagnosis (DIAG2 or DIAG3) because those are not incident (new) cases.

Specification of Lower and Upper Bounds and Best Estimates for MI Incidence Based on NAMCS and NHAMCS
After finding all visits with diagnoses relevant to MIs, lower and upper bounds and best estimates of incidence were specified based on the reason for the visit. There are variables in the NAMCS and NHAMCS datasets that describe the major reason for a visit (e.g., whether the visit was due to a new problem, a chronic problem, a chronic flare-up, post-/pre-operative, preventive, or none), and variables that describe the actual reasons for the visit (including disease-related codes). This information was used in conjunction with the ICD-9 codes to determine whether a particular visit should be included in the calculation of lower and upper bounds, and best estimates for the incidence rate.
The following ICD-9 codes are used for the diagnosis of MI with ST elevation (STEMI) and MI with no ST elevation (NSTEMI) The following rules were used to combine information on the ICD-9 codes and the reasons for visits in order to derive lower-bound, best, and upper-bound estimates for the incidence rate: • Upper bound: Person must have one of the ICD-9 diagnoses listed above, and the first reason for the visit must not be administrative (e.g., driver's test), adverse effect, or family planning. • Best estimate: The person must meet all of the specifications for the upper bound calculation and must also have the first reason for visit be either a new problem or blank; here we exclude major reasons for visit that are coded as chronic, chronic flare-up, pre-/post-surgical, or preventive. • Lower bound: Person must meet all of the specifications required for the best estimate and must also have one of his reasons for the visit be heart examination, EKG, treadmill test, heart catheterization, or review of EKG test results.

Use of NHDS Dataset
The NHDS includes only hospitalizations (inpatient stays), and these are a direct result of the condition diagnosed in that stay. To determine if a hospitalization is for an MI, the person's first diagnosis (DX1) is checked against the same diagnosis codes used when querying NAMCS and NHAMCS. As was done for NAMCS and NHAMCS, only the primary diagnosis code was considered. (NHDS can have seven diagnoses associated with each visit, labeled DX1-DX7).

MI Incidence Estimates
Final estimates of MI incidence were calculated as the sum of incidences found in NAMCS, NHAMCS, and NHDS surveys. The incidence calculated with the NHDS dataset was summed with results of analyzing the NAMCS and NHAMCS datasets to calculate final values for lower and upper bounds and best estimates.
Although data from NAMCS, NHAMCS, and NHDS are highly reliable and our methodology was rigorous, additional sources were sought to confirm these estimates. The National Heart, Lung, and Blood Institute's (NHLBI) Incidence & Prevalence: 2006 Chart Book on Cardiovascular and Lung Diseases was used to provide additional estimates of MI incidence rates. The data sources are included in Table 5.
Because the NHLBI resource includes many studies, Table 5 also provides information on the study and population from which the data were obtained.

Caveats
This example illustrates several of the limitations inherent in using datasets to define target values for calibration. Even with rigorous survey methods and highly detailed data it is not always clear how the observed data should be mapped to the outcomes of a simulation. Using upper and lower bounds helps show how sensitive the data are to unknown factors but it does not resolve all the uncertainties. It is often necessary to consider several datasets when estimating a target value for a measure because no single dataset covers all of the factors completely. When combining data in this way, care must be given to use a consistent set of assumptions and not double-count events. Finally it is necessary to confirm estimates with independent sources whenever possible.

Prioritization of Measures
As described above (Table 3), a wide range of measures have been identified for calibrating the care processes in the Model. The calibration process begins with an internal committee of science and medical staff who select the measures, outcomes, and subpopulations of greatest importance, based on new science of evidence, and the intended uses of the Model. Calibration objectives are then set for the chosen measures, outcomes, and subpopulations.
An important consideration when calibrating the care processes in the Model to a particular measure is the population to which the measure most closely applies. In general, matching utilization rates for a subpopulation is more important than matching rates for the total population. For example, the use of diabetic medications should be calibrated to rates observed in the diabetic subpopulation before attempting to match utilization rates in the total population. The objective is to match outcomes in all the relevant subpopulations first, and then to the total population. This will ensure that outcomes in the relevant subpopulations are correct.

Objectives of Measures
The measures are used in three main ways to help calibrate care processes. We use the terms "match," "observe," and "review" to describe the different objectives. The distinction is required by the fact that the measures vary in the extent to which they depend on the underlying physiology (i.e., the occurrence of diseases and their symptoms) versus the care processes used to prevent and manage the diseases. For example, the age-specific incidence of type II diabetes is determined by the interaction between two parts of the Archimedes Model. One is the physiology model which calculates glucose physiology and the development of elevated FPG levels. The other is the part of the Model that represents care processes and behaviors, and, in the context of this particular example, determines the use of FPG tests. In a setting in which everyone is screened frequently, the age-specific incidence rates will be shifted to younger ages and will be higher than in settings in which no testing is done. In the latter setting the disease is diagnosed only after a patient seeks care for symptoms, or perhaps not at all. Thus if there is a discrepancy between the diabetes incidence rates calculated by the Model and those observed in reality, the gap could be due to the physiology model, the care processes for FPG testing, or a combination.
The extent to which measures depend on the underlying physiology affects how discrepancies between the Model's results versus observed results should be interpreted, and what modifications if any should be made to the care processes or physiology model. In general, measures of demographics, biomarkers, diagnosed conditions (prevalence and incidence), and diagnosed deaths are determined primarily by the physiology model. Measures of utilization of procedures, tests, drugs, and other interventions are more affected by care processes. Because of these differences, these two groups or types of measures have different objectives associated with them. The measures that primarily reflect utilization are used to calibrate the care processes and behaviors. Specifically they are the measures used to tune the parameters in Table 1 to achieve as close a fit or "match" as possible. In contrast, the measures that are primarily related to physiology are rarely used to modify any parameters in the care processes. They are simply "observed." If there are discrepancies between the values of these observed measures calculated by the Model compared to the values observed in reality, then that information is most often used to focus attention on the relevant parts of the physiology model to see if any changes may be indicated.
That is, they serve as flags to help identify potential improvements in the physiology model, but they are rarely used directly to modify or tune the physiology model.
While calibration efforts focus on those measures that are intended to be matched or observed, other measures are calculated and reviewed to gain greater visibility into the Model. They may flag something that needs further investigation, or they may be helpful in understanding an issue flagged elsewhere in the model-building and validation processes.
In the tables below that report the results of the calibration exercises, each measure is marked as to whether it is used to seek a match ("M"), is observed ("O"), or is reviewed ("R").

Calculation of Goodness of Fit of the Model to the Target Value
Calibration is driven by how close the value of the measure calculated by the Model is to the target value. To determine this "goodness of fit," for each measure we calculated a ratio of the calculated value to the target value. (For convenience we call this the "match ratio.") Ideally the match ratio would be 1 for every measure, implying that the calculated values exactly matched the target values. With this said, the values are never expected to match exactly (i.e., match ratio = 1), due to sampling variability if nothing else, and the acceptable discrepancy between the calculated and target values can vary widely.
Reasons include the quality of the data in the data sources, how well the data sources can be mapped to outcomes calculated by the Model, the frequency of the outcomes represented by the measure, and other factors discussed above.
Each of these was considered when defining the acceptable range of the match ratio for a measure. Additional factors considered were the importance of the measure in the disease area (e.g., incidence of MIs in people with diabetes is more important that frequency of FPG tests), and the extent to which a measure addresses the intended uses of the Model. The ranges of match ratios considered acceptable varied from 5% (i.e., a ratio of 0.95 to 1.05) to a factor of 5 (i.e., a ratio of 0.2 to 5), depending on the factors discussed above.
To provide additional context for interpreting the match ratio, we calculated the 95% confidence interval for the match ratio, based on the sample size of the data source and the simulation.

Criteria for a measure passing
A measure is considered to "pass" if the match ratio is within the acceptable range or if the 95% confidence interval for ratio includes one. Otherwise the measure "fails." Calibration continues until each of the match measures passes.

Calibration Results
The tables in this section show the results of the calibrations. They show the values of the measures at the start of a simulation.
The method for creating a simulated population are described in the model description and validation papers. Briefly, individuals are sampled from the NHANES dataset. Each sampled individual is returned to birth in the simulation and then "grows up" through the simulation to the age at which they were sampled from NHANES. The proximity of the biomarkers of the simulated individual to their values in NHANES provides a measure of the performance of the Simulator. The results are shown in the tables for biomarkers.
Prevalence and incidence measures are also calculated to evaluate the performance of the Model at baseline. References, where available, are shown for the prevalence and incidence measures. The reference table is at the end of this section. As discussed in the "Methods of Data Collection" section, we do not have data on the incidence of diagnosed conditions, deaths, or hospitalizations for the DM and CAD subpopulations.
In the tables, each measure is marked "M","O", or "R" to indicate whether it is a matched, observed, or reviewed measure.

Summary and Conclusions
This report describes how care processes and related behaviors are represented in the Model, how the care processes in the Simulator 2.3 version of the Model have been calibrated to the US population and care delivery setting, and the results of the calibration.
Care processes are a critical part of the Archimedes Model. They enable more accurate representations of the processes by which interventions are implemented in realistic settings, the background care against which interventions will be compared, and process improvement programs. They also enable the Model to be customized to different settings.
The calibration of care processes is limited by the quality and quantity of the available data. Particularly important are the sizes of the datasets (especially for the analysis of subpopulations and infrequent events), and the correspondence between outcomes calculated by the Model versus the type of data reported in the data sources (e.g. ICD and CPT codes, and DRGs). These and other factors affect the acceptable range around a target value, and decisions about when calculated values of a measure are sufficiently close to the target values to be considered a satisfactory match. Another factor that is considered when evaluating the calibration of the Model to a particular measure is the importance of the measure to the intended uses of the Model. A final factor is the extent to which the measure is determined by the underlying physiology versus the care processes.
For these reasons, the measures against which the care processes are calibrated are sorted into three categories. Those for which there are good data, that are considered particularly important for the intended applications of the Model, and that are determined primarily by care processes (not the underlying physiology), are used to tune the parameters of the care processes to achieve a good match (" matched" measures). Other measures for which the data are sufficiently good to enable specification of a target value, but that are determined more by the underlying physiology than care processes, are observed but not used to modify parameters of the care processes. They serve primarily as flags to help identify potential improvements in the physiology model, but they are rarely used directly to modify or tune the physiology model ("observed" measures). "Reviewed" measures," a third category, are calculated to gain greater visibility into the Model. They may flag something that needs further investigation, or they may be helpful in understanding an issue flagged elsewhere in the model-building and validation processes.

Results
Overall, for 87% of the match measures, the calculated values were within an acceptable range of their corresponding target values. For the observed measures, the calculated values were within an acceptable range of their target values 83% of the time.
Measures relating to biomarkers and demographics matched their target values very well. Most were within ±5% of the target values. The most important exceptions were seen in the diabetes subpopulation. The calibration process identified the physiology related to BMI and triglycerides in diabetics as opportunities for potential improvement.
Measures relating to prevalence and incidence rates of conditions varied in the extent to which the calculated and target values matched. Measures for the prevalence of chronic conditions defined by biomarkers, such as hypertension and dyslipidemia were very good. This is an important indication that care processes relating to screening for these biomarkers are accurately represented by the Model.
Measures for medication usage matched target values very well. All were within acceptable ranges of their target values except the prevalence of patients diagnosed with diabetes taking sulfonylurea. This measure was too low, even when adherence to the national guideline was set to 100%. This indicates a possibility that in the real world, providers are not observing contraindications for the use of this drug that are specified in the guidelines.
There were too few events and too much variability between data sources to enable specification of meaningful target values for the per-capita incidence rates of procedures. Better data are needed to improve the usefulness of this measure.

Future Work
The calibration of care processes could be improved by the availability of better dataspecifically more and larger datasets, with a higher level of clinical detail to enable better mapping of observed events to events calculated by the Model. Larger datasets would also enable better analysis of events in subpopulations and of the events that are infrequent but still clinically important.