Wrote the first draft of the manuscript: WD DPW LAR TH AW. Contributed to the writing of the manuscript: WD DPW LAR DW MG TH AW.
The authors have declared that no competing interests exist.
Public health responses to HIV epidemics have long relied on epidemiological modelling analyses to help prospectively project and retrospectively estimate the impact, costeffectiveness, affordability, and investment returns of interventions, and to help plan the design of evaluations. But translating model output into policy decisions and implementation on the ground is challenged by the differences in background and expectations of modellers and decisionmakers. As part of the
In almost all areas of public health, mathematical models are used to provide quantification and insight that can inform decisionmaking. Epidemiological data can be collected about individuals, and clinical trials can measure individuallevel effects in a selected study population (often under bestcase circumstances), but public health decisionmaking requires an understanding of the dynamics of disease across a population under a variety of conditions. Mathematical modelling aims to unite knowledge and assumptions about behavioural dynamics, biology, costs, and constraints to generate estimates of impact and costeffectiveness, and recommendations for resource allocation.
Models are especially useful in the case of infectious diseases, where they can estimate temporal changes in disease burden and treatment needs, and so underpin projections of the counterfactuals in some quasiexperimental impact evaluation designs, and power calculations for prospective experimental study designs. These are important applications, especially in contexts where empirical data are not available. Thus, models have increased in prominence over the last several years, including in establishing optimal responses to emerging pathogens
Investigators from many different disciplines generate models, and the techniques and presentation formats employed have tended to follow a corresponding diverse set of conventions and presumptions. Meanwhile, those who rely on modelling output have highly varied needs and expectations from epidemiological modelling analyses. It is not uncommon for different models addressing very similar questions to produce—or appear to produce—widely different estimates
Therefore, there is a need for constructive dialogue between “producers” and “consumers” of modelling results about a model's assumptions and structure, the policy implications of the results, and what further empirical and modelling studies should be planned. The World Bank Global HIV/AIDS Program, as a funder, coordinator, and evaluator of HIV prevention efforts, has become increasingly reliant on mathematical modelling and has initiated a modelling guidelines development process through its Prevention Science and Mathematical Modelling Reference Group, a panel of experts in HIV prevention, and modelling relating to HIV prevention, created and convened by the World Bank on the basis of individuals' publication records and institutional roles. In consultation with the reference group and other HIV modelling experts, we have developed a set of principles for the construction, reporting, and interpretation of HIV epidemiological models for public health decisionmaking on all aspects of HIV.
The nine principles, discussed below and summarised in
Principle  Model Producer Considerations  Model Consumer Considerations 

Are the rationale, scope, and objectives clearly stated?  Are the rationale, scope, and objectives understood? 
Is there a statement about why epidemiological modelling is appropriate for this problem?  Is epidemiological modelling appropriate for this problem?  

Is the model structure completely described, such that all analyses can be reproduced?  Is the model presented comprehensively, such that the inclusion/exclusion of any particular assumption or feature can be identified? 
Is there a description of key model features?  Is the justification for model structure/key assumptions reasonable, considering the primary rationale, scope, and objectives of the study?  
Has a justification for the model structure been provided?  

Is there an understandable and complete listing of the model parameters, their values, and their justification?  Are the implicit inputs upon which the model predictions are made understood, and are they satisfactorily justified? 

Are the model fitting, calibration, and validation approaches with respect to relevant data defined and justified?  Does the model produce, or fail to produce, outputs that can be compared to real world data, and does the model output reflect realistic conditions? 
Does the comparison with real world data increase confidence in the suitability of the model for the current enquiry?  

Have the uncertainties been captured for all relevant factors included in the model?  Have the uncertainties been captured for all relevant factors included in the model? 
Is the key result of the study robust to that uncertainty?  Are the results sufficiently robust for confident decisionmaking, or is further analysis or data collection required?  
Are specific recommendations for new data analyses/collections appropriate?  

Are sufficient details provided about limitations of the study, specifically about model structure, parameterization, and application/generalisability?  Are the limitations of the model and its findings clearly understood, including the limits of applicability and generalisability? 
Considering the strength of the evidence, how are the model findings relevant for informing public health decisionmaking?  

Have relevant previous studies been referenced and differences/similarities discussed?  Is there an understanding of the overarching conclusion(s) from modelling studies on the topic? 
Is it clearly specified whether a new result versus a confirmation/contradiction of a previous result is presented?  Are the general reasons (assumptions or underlying real world conditions) for why models differ in their conclusions understood?  

Where relevant, are understandable and appropriate estimates of epidemiological impact provided, such that health economic inferences can be made?  Can the modelbased estimates be used to infer costeffectiveness measures of relevant interventions or be extended to health economics? 
Is the degree of uncertainty in estimates relevant to costeffectiveness understood, particularly with respect to the sensitivity of key parameters?  

Are model scenarios described in clear formal terms (separate from interpretations about reality) that facilitate technical understanding and evaluation?  Are there clear explanations of intended correspondences between inputs used in the model and key real world conditions such as epidemiological conditions, policy, and programmes? 
Our focus complements more general reviews of modelling
As in any scientific report, the rationale, scope, and objectives of a modelling study should be clearly stated. The reporting of a modelling study should include an explicit explanation for why epidemiological modelling, rather than another study design (e.g., systematic review, metaanalysis, quasiexperimental design, or a randomized controlled trial), is appropriate for the problem, the exact questions the work seeks to address, and the readership for which it is intended. This statement of rationale, scope, and objectives provides the criteria against which all modelling decisions should be judged, assists in framing the interpretation of the work, and should be referred to at key points throughout the writeup, to maintain the alignment of aims, model, results, and interpretation. Examples might be: “We aimed to generate estimates for the cost of rolling out a male circumcision programme in South Africa so that stakeholders can compare these costs against those of other possible interventions, and use the comparison to inform decisions about allocation of funding”; “We aimed to explore the extent to which HIV incidence rates can be influenced by changes in condom use among sex workers and their clients under different assumptions about sexual mixing patterns in concentrated HIV epidemics, so that recommendations can be made for data collection during the implementation of a condom distribution campaign”.
For studies that aim to estimate the potential populationlevel impact of a given biomedical intervention, there are differences in emphasis in their purpose that should be clear from the outset and throughout the presented work. An important distinction is between investigation of the potential benefits of a hypothetical biomedical intervention that is currently in development but has unknown efficacy, and an intervention that has a proven efficacy, such as from a trial setting. Typically, the purpose of the first type of study is to estimate the populationlevel effectiveness of the hypothesized intervention and to identify key properties the intervention would need to have to be effective (such as for vaccines
The model chosen for the analysis should be described completely and clearly (commonly in the form of an online technical appendix, ideally with the model's computer code made available), so that other investigators can reproduce its findings and projections. Justification for the choice of model (individual versus populationbased, stochastic versus deterministic, linear versus nonlinear) should be provided, along with a description of the model's structure and key features, with crossreferences to the scope and objectives. A flow diagram, representing how individuals or subpopulations transition through the different demographic, behavioural, or clinical states in the model can be an excellent way to communicate the model's main structure.
The model structure, and the consequent key demographic, behavioural, biological, clinical, and epidemiological factors represented or omitted by the model, may affect the interpretation of the results. Certain biological or behavioural features of HIV transmission, prevention, and treatment may be at the core of the issue addressed by the model, and cannot be omitted. However, additional features that are irrelevant to the primary objectives of the analysis may obscure the main conclusions or may open unnecessary debate about the validity of parameter values that are not essential to interpretation of the model output
Discussion of how the model structure could have influenced the results should always be included. Examples of formal evaluations of differently structured models addressing similar research questions but reaching different conclusions can be found in various branches within the infectious disease modelling field, e.g., in the modelling of chlamydia
Another set of assumptions in a model concerns the values that are given to the parameters. Examples of parameters include the probability of HIV transmission per sex act for an individual on ART, the fraction of patients still alive and on ART three years after ART initiation, and the annual population growth rate. It is essential for any modelling study to include a transparent listing of all model parameters, providing the following for each parameter: the name of the parameter; the mathematical symbol of the parameter (if appropriate); the meaning of the parameter in plain language; the value(s) assigned to the parameter (a point estimate and range/confidence interval as appropriate); and a contextual justification for used values, with references for the origins of the model parameter(s), and any relevant caveats (particularly important if more than one value for the model parameter exists or if the parameter is fit in the model or is derived from another modelling analysis).
This notion of justifying or formally “fitting” individual parameters—or a model in its entirety—to data covers many possibilities. As these also do not lie on a clear continuum from “rough heuristic/qualitative” to “formally rigorous and unbiased”, some ad hoc critical evaluation is appropriate for the most important inputs into any model. All model fitting relies on the notion of the likelihood of observing a set of data. There are then various possible approaches to (1) maximising the likelihood, i.e., selecting the particular model within which the data are most consistent, or (2) performing a sensitivity analysis, i.e., identifying ranges of model parameters that are consistent with the data and determining the relative importance of each model parameter. Note that the “likelihood function” itself can capture multiple sources of randomness, such as the usually unavoidable incompleteness of sampling and random effects in population processes themselves.
Some parameters, such as the mothertochild HIV transmission rate under a particular care regimen, can be more or less directly “measured” in an appropriate (typically randomized) study, using observation and standard robust biostatistical methods, but there may be subtle artefacts. For example, using logistic regression to identify the characteristics of individuals that are associated with an HIV infection or transmission event may be misleading in ways that are seldom systematically explored in routine application, beyond noting the potential for “residual confounding”. A particular shape for a relationship between a predictor (such as viral load or age) and an outcome (transmission) is implicitly assumed, although it may be inappropriate—age in particular may correlate strongly with health status, but not necessarily monotonically.
For parameters where it is very difficult to obtain direct measurements, e.g., to capture behavioural dynamics such as risk reduction in the face of risk perception, heuristic parametrization may indicate which parameter sets are plausible and which are clearly at odds with data: a heuristically sensible model and a formally fitted model should be clearly distinguished, with sensitivity analyses where applicable.
Often the most important assumptions concern those specifying a simulated intervention, and it is recommended that these be prominently and exhaustively listed. For instance, if the intervention of interest relates to a policy change in ART, specifying a “coverage” and “efficacy” may not be enough: assumptions about enrolment rates, adherence, and retention, as well as behavioural characteristics (e.g., risk reduction or compensation) and demographic impacts (e.g., reduced mortality rates and increased size of the HIVpositive population)
Here the emphasis shifts to assessing the alignment of output from a particular epidemiological scenario model to data. Understanding the modelled scenarios produced, and relating these to data by backfitting them to a model, naturally forms an important component of the evaluation and application of any model. It is particularly important to indicate whether, and to what extent, input parameters were chosen to maximise the correspondence of outputs to data, or whether correspondences emerged naturally from choosing externally justified inputs. Demonstrating that a model can reproduce observed patterns provides a certain level of reassurance that the model is capturing the system appropriately, and where models cannot demonstrate this, extreme caution should be taken in interpreting results.
The most desirable situation is when a model that has been fitted to some data (a training set) produces output in close correspondence with additional data (a testing set). There are two primary caveats to this approach: (1) fitting a smooth model to slowly varying data and extrapolating a little may be “too easy”, and might indicate little about the suitability of the model, and (2) in key applications relevant to impact evaluation, asking the model to produce other independent data may be an unreasonable demand, tantamount to asking a model to predict future changes in the financial or political context. There may be deeper differences between the scenarios producing the training/testing datasets than can realistically be captured by a model—such as changes in treatment uptake or effects of improved treatment programmes on mortality.
While correspondence between models and data is reassuring and potentially useful—if not taken as absolute confirmation of the correctness of either model structure or parameter values—it is important to consider whether there are multiple ways to fit the data, and to realise that there may be scientific progress in a failure to fit data, either at all or without resorting to implausible values, ranges, or correlations of parameters. For example, simple (biological) models of ART cannot reproduce both the consistently strong reductions in patient viral loads and the inability to achieve viral eradication observed in the real world, without implausible “fine tuning” of individual subjects' treatment efficacy parameters into a narrow range. This situation diagnoses a model limitation, namely, the neglect of the fact that interactions between cells, drugs, and virions vary among compartments within the infected host.
The difficulties of “correctly” capturing a complex set of shifting contextdefining processes impinge not only on the interpretation of correspondence between models and historical data, but also on the interpretation of the predictive component of scenarios. One useful application of modelling, when there are insufficient data to construct scenarios with conventional predictive credibility, is to pose questions such as what characteristic of a program would be required for certain goals to be achieved (e.g., what level of risk compensation, captured in a suitably clearly defined parameter, would be required to negate the risk reduction of a planned intervention).
The output of any modelling study needs to be presented clearly, using explicitly defined metrics and with any deviance in the interpretation between the model metric and the real world analogue explained. The many assumptions involving the structure of the model, the parameter estimates, and the data will all have uncertainties, and it is important to understand how these propagate to key model outputs. In some cases, uncertainty in a particular parameter will be benign—a result is reached irrespective of any credible assumption about that parameter—and this serves to increase confidence in the findings. In other cases, different credible values for a parameter (or model structure or interpretation of data) would lead to different conclusions, and this should be noted.
Uncertainties are best depicted as part of the modelling results presentation—either in tables or as part of the graphical output of the model. If sufficient information is available about inputs, computational techniques can manufacture a distribution for model outcomes, so that the main result can be given as a “credible interval”. In addition to uncertainty analyses, formal sensitivity analyses of the importance of each model parameter in influencing the variability in model outcomes can be useful for identifying items for further data collection or investigation (see
As Box and Draper
One thing that modellers may implicitly understand but that model consumers may not—and which therefore should always be made clear—is that capturing complex reality is not really the purpose of mathematical models. Practicality implies that one can never capture full dynamical structure, such as all conceivable population compartments, transition rules, or stochasticity. A mathematical model is a minimalist approach to representing the essential elements of reality that are necessary and sufficient for addressing a specific research question
Some of the limitations of modelling studies can be addressed by uncertainty or sensitivity analyses as discussed above
It is common for multiple modelling groups to attempt to address similar research questions but with different modelling approaches: using models that have been designed to describe different populations, involve different model structures, and make different parameter assumptions. Apparently conflicting results in the modelling literature may consequently lead to greater confusion for the consumers of models or to distrust in the use of models for decisionmaking. Therefore, it is necessary that interpretations of results are contextualised with previous modelling findings relevant to the topic. It should be made clear whether a new result is being presented or whether study findings concur with previously published results.
Meanwhile, journal editors should recognise the value of works that rigorously confirm or draw together previous findings. Review papers that summarise the modelling literature on a specific topic are highly useful (see the recent special issue on HIV epidemic modelling in
A public health policy or programme decisionmaker generally desires to take actions that will have maximal impact whilst minimising the amount of money required to achieve the health outcomes—based, for example, on estimates of either the maximum impact that can be achieved for a given amount of money, or the money needed to achieve specific set levels of impact. Therefore, the costeffectiveness, affordability, and returns on investments of interventions are among the most important considerations in their potential implementation. HIV epidemic modelling studies often attempt to estimate the populationlevel impact associated with changes in programme or policy conditions, and hence estimate the denominator (effectiveness) in the incremental costeffectiveness ratio. Ideally, such models should be designed to produce outputs amenable to recycling into analyses of cost implications and estimates of primary epidemiological effects that are understandable and relevant to decisionmakers, such as the number of incident infections or deaths averted, qualityadjusted life years gained, or disabilityadjusted life years averted. Effective assessment of affordability and costeffectiveness may require different time horizons than those chosen in epidemiological modelling analyses, hence additional simulations may be necessary prior to attaching costs, benefits, and utilities to epidemiological model outputs.
There are numerous good examples of modelling studies that have provided outputs that are relevant for use in health economic calculations or that have been integrated into costeffectiveness analyses
A particular challenge that arises when using models to evaluate the impact of interventions is a lack of clarity around the intervention itself. Such a lack of clarity minimises the usefulness of results for policymakers in deciding which interventions to prioritise. While modellers are usually keenly aware of the technical details of the model, the interpretation of model features—both in the input and output phase—is prone to oversimplification by both modellers and readers. It can be convenient but misleading to present a correspondence in the real world between an actual policy choice and future events. For instance, a writeup should highlight that what is modelled is a reduction in the proportion of “unprotected sex acts”, which is not an intervention per se but could be the outcome of an intervention (e.g., an increase in condom distribution points or a targeted education campaign).
It is probably better to risk erring on the side of repetitiveness in efforts to keep focusing on precise model assumptions (qualitative and quantitative), and for consumers to process the model first on its own terms, before evaluating model scenarios in broad correspondence to reality and potential policy implications. At the same time, it is important that modellers use language that facilitates easy communication, without loss of precision and of key real world messages to consumers.
The issue of using models in decisionmaking is especially important for the field of HIV prevention, which has now reached a critical point. Just as spending on HIV has levelled off or declined
Mathematical models are used to inform public health decisionmaking about many questions in the response to HIV epidemics, and here we present our recommendations for “best practices” for constructing, interpreting, and presenting such models.
An overarching theme of our recommendations is that it is crucial for modellers to be explicit about the choices they make—about model structure, parameters, and model fitting and interpretation—and the reasoning behind their choices.
Modellers need to make the limitations of their models clear, and model consumers (such as policy and decisionmakers) need to appreciate the caveats and limitations of modelling studies when considering their results.
One of the least appreciated ways to address the limitations of models is through comparing the parameters, structure, and outputs of alternate models of the same processes.
Especially useful are consensus documents that bring together conclusions from numerous modelling studies and summarise what researchers agree on and where uncertainty persists.
The authors are grateful to Ms Britta Jewell for editorial assistance.
antiretroviral therapy