Skip to main content
Advertisement
  • Loading metrics

Optimizing laboratory-based surveillance networks for monitoring multi-genotype or multi-serotype infections

  • Qu Cheng,

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Philip A. Collender,

    Roles Conceptualization, Formal analysis, Investigation, Methodology, Writing – review & editing

    Affiliation Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Alexandra K. Heaney,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Aidan McLoughlin,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Biostatistics, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Yang Yang,

    Roles Methodology, Writing – review & editing

    Affiliation College of Public Health and Health Professions and Emerging Pathogens Institute, University of Florida, Gainesville, Florida, United States of America

  • Yuzi Zhang,

    Roles Methodology, Writing – review & editing

    Affiliation Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America

  • Jennifer R. Head,

    Roles Conceptualization, Writing – review & editing

    Affiliation Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Rohini Dasan,

    Roles Data curation, Writing – review & editing

    Affiliation Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • Song Liang,

    Roles Data curation, Writing – review & editing

    Affiliation Department of Environmental and Global Health College of Public Health and Health Professions, University of Florida, Gainesville, Florida, United States of America

  • Qiang Lv,

    Roles Data curation, Writing – review & editing

    Affiliation Institute of Health Informatics, Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, People’s Republic of China

  • Yaqiong Liu,

    Roles Data curation, Writing – review & editing

    Affiliation Institute of Acute Infectious Disease Control and Prevention, Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, People’s Republic of China

  • Changhong Yang,

    Roles Conceptualization, Data curation, Writing – review & editing

    Affiliation Division of Business Management and Quality Control, Sichuan Center for Disease Control and Prevention, Chengdu, Sichuan, People’s Republic of China

  • Howard H. Chang,

    Roles Methodology, Writing – review & editing

    Affiliation Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America

  • Lance A. Waller,

    Roles Methodology, Writing – review & editing

    Affiliation Department of Biostatistics and Bioinformatics, Rollins School of Public Health, Emory University, Atlanta, Georgia, United States of America

  • Jon Zelner,

    Roles Methodology, Writing – review & editing

    Affiliations Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America, Center for Social Epidemiology and Population Health, School of Public Health, University of Michigan, Ann Arbor, Michigan, United States of America

  • Joseph A. Lewnard,

    Roles Methodology, Writing – review & editing

    Affiliation Division of Epidemiology, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  •  [ ... ],
  • Justin V. Remais

    Roles Conceptualization, Funding acquisition, Supervision, Writing – review & editing

    jvr@berkeley.edu

    Affiliation Division of Environmental Health Sciences, School of Public Health, University of California, Berkeley, Berkeley, California, United States of America

  • [ view all ]
  • [ view less ]

Abstract

With the aid of laboratory typing techniques, infectious disease surveillance networks have the opportunity to obtain powerful information on the emergence, circulation, and evolution of multiple genotypes, serotypes or other subtypes of pathogens, informing understanding of transmission dynamics and strategies for prevention and control. The volume of typing performed on clinical isolates is typically limited by its ability to inform clinical care, cost and logistical constraints, especially in comparison with the capacity to monitor clinical reports of disease occurrence, which remains the most widespread form of public health surveillance. Viewing clinical disease reports as arising from a latent mixture of pathogen subtypes, laboratory typing of a subset of clinical cases can provide inference on the proportion of clinical cases attributable to each subtype (i.e., the mixture components). Optimizing protocols for the selection of isolates for typing by weighting specific subpopulations, locations, time periods, or case characteristics (e.g., disease severity), may improve inference of the frequency and distribution of pathogen subtypes within and between populations. Here, we apply the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework to simulate and optimize hand foot and mouth disease (HFMD) surveillance in a high-burden region of western China. We identify laboratory surveillance designs that significantly outperform the existing network: the optimal network reduced mean absolute error in estimated serotype-specific incidence rates by 14.1%; similarly, the optimal network for monitoring severe cases reduced mean absolute error in serotype-specific incidence rates by 13.3%. In both cases, the optimal network designs achieved improved inference without increasing subtyping effort. We demonstrate how the DIOS framework can be used to optimize surveillance networks by augmenting clinical diagnostic data with limited laboratory typing resources, while adapting to specific, local surveillance objectives and constraints.

Author summary

Laboratory-based tests can determine the specific agents that cause infectious diseases, providing important information for disease surveillance, and helping to understand the transmissibility, clinical spectrum, evolutionary trends, and subtype-specific risk factors of infections caused by pathogens with multiple types. However, pathogen typing is relatively expensive and scarce, and thus there is widespread interest in the optimal allocation of laboratory typing resources in the design of disease surveillance systems, even as such surveillance optimization methods have been understudied. Here we apply the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework to the problem of optimal allocation of laboratory-typing within clinical surveillance systems. We develop methods for optimizing allocation of laboratory-typing across locations and clinical subgroups (e.g., severe vs. mild cases), and demonstrate the approach using real-world data from a surveillance network monitoring Hand Foot and Mouth Disease in western China. Using a series of simulation-optimization studies, we identified surveillance networks that are capable of reducing the mean absolute error of serotype-specific incidence rates by 13.3% among severe cases, and 14.1% among all cases. The methods demonstrated here are but one of many approaches through which the DIOS framework could be utilized to better leverage laboratory-typing infrastructure to track pathogen-specific epidemiologic trends.

1 Introduction

Laboratory procedures to identify pathogen subtypes (e.g., with respect to strain, genotype, serotype, variant, or phenotypic traits such as drug resistance) are important components of infectious disease surveillance, yielding information on transmissibility, clinical spectrum, evolutionary trends, and subtype-specific risk factors [17]. Indeed, information gathered from laboratory pathogen typing is integral to modern disease surveillance, enabling the discovery of SARS-CoV-2 variants with higher transmissibility [7], influenza A serotypes with high mortality and transmissibility [5], changes in the prevalence rate of drug-resistant tuberculosis and Methicillin-resistant Staphylococcus aureus (MRSA) [8,9], shifts in dominant serotypes causing invasive pneumococcal disease[6], and differing routes of infection across hepatitis C virus genotypes [10].

Such findings can guide the development, allocation, and evaluation of public health interventions. For instance, knowledge about the relative prevalence and virulence of pathogen subtypes is used to prioritize subtypes for vaccine or treatment development [1113]; identify high-risk subpopulations to target with interventions [14]; and evaluate the risk of unintended consequences of interventions, such as serotype replacement [15,16]. Because of the high cost and complexity of collecting and processing laboratory samples, and because data on pathogen subtype may not inform clinical decision-making for individual patients, typing is often undertaken for only a small subset of clinical cases. As examples, 2.8% of COVID-19 cases in the United States have been sequenced since January 10, 2020 [17]; <3% of hand foot and mouth disease (HFMD) cases in China were serotyped between 2011 and 2015 [2]; and only 9 influenza cases per participating laboratory are required to be characterized every other week across the United States to evaluate whether circulating influenza viruses are sufficiently similar genetically and/or antigenically to those that are included in current influenza vaccines [18].

Subtyping even a small proportion of cases may enable relevant inferences about the distribution of pathogen subtypes of interest within the larger set of clinically identified cases of a disease. However, in the absence of well-designed protocols for selection of isolates for subtyping, direct extrapolation of data from subtyped cases to the much broader population of clinical cases is susceptible to substantial biases, e.g., laboratory typing tends to be affected by clinical severity, healthcare capabilities, case clustering status, seasonality and other factors. In China, for example, severe cases of HFMD were serotyped at a rate of 72%, but only 2% of mild cases were serotyped [2].

Such imbalanced sampling regimes, often arising from practical clinical considerations, can substantially impact estimates of genotype-, serotype-, or other subtype-specific epidemiologic parameters (e.g., subtype-specific incidence; response of pathogen subtype distribution to public health interventions; etc.) [2]. Statistical inference may be improved by modifying sampling design to minimize such biases across the surveillance network, such as by redistributing total samples across time, space, or populations. In practice, sampling designs for laboratory subtyping vary widely across surveillance systems, and are generally ad hoc in nature, constrained by budget, logistics, or infrastructure [2,4]. Optimizing sampling under these constraints is a high priority for laboratory surveillance systems [2,19].

Here, we develop methods to support the optimization of sampling clinical cases for laboratory typing with the goal of improved monitoring of the distribution of specific pathogen subtypes, while abiding by constraints on available resources, e.g., the total number of clinical cases subjected to subtyping. Our work is based on the Disease Surveillance Informatics Optimization and Simulation (DIOS) framework [20], which iteratively evaluates surveillance network performance on predefined goals while varying surveillance system design parameters using numerical optimization algorithms. We adapt the DIOS framework to the problem of optimal allocation of laboratory typing resources across subregions and case severity groups of a surveillance network in order to minimize error in estimating the incidence rates of pathogen subtypes causing a clinically-diagnosed disease. We examine major enteroviruses causing HFMD in a region experiencing a high HFMD burden in China to illustrate the application of this framework.

2 Materials and methods

2.1 General framework for optimizing laboratory-based surveillance systems to monitor multi-genotype or multi-serotype infections

Simulation framework.

DIOS [20] is a simulation-based optimization framework to facilitate the design of robust disease surveillance systems. DIOS functions by linking disease system models that simulate epidemiologic processes with surveillance system models that simulate information derived from alternative surveillance system designs. Applying DIOS involves specifying surveillance objectives (e.g., accurate estimation of disease frequency; timely outbreak detection; accurate estimation of intervention effectiveness), defining relevant surveillance design parameters (e.g., target population, diagnostic techniques, and site enrollment), and imposing operational constraints (e.g., total resources available for laboratory typing) (Box 1).

Box 1. Example DIOS optimization procedure

Consider the problem of identifying the optimal active surveillance strategy to estimate the incidence rate of a disease in key subpopulations, with possible designs given by altering the number of individuals to be surveyed across each subpopulation and the diagnostic test.

Objective: minimize bias in estimated incidence rate within each subpopulation

Design parameters: 1) number of persons to be selected from each subpopulation for diagnostic testing; and 2) laboratory technique used for diagnostic testing (e.g., polymerase chain reaction test, rapid antigen test, and culture)

Models: The disease system model simulates the underlying dynamics of the target disease in each subpopulation. The surveillance model selects a given number of persons from each subpopulation for testing according to the current design parameter values, simulates test results according to the sensitivity, specificity, or any other relevant characteristics of the test, and extrapolates incidence rates from the test results. The performance of the surveillance model is then evaluated by how close estimated incidence rates are, on average, to the true values simulated by the disease system model, using a score such as mean absolute error. After each evaluation, an optimization search algorithm (e.g., simulated annealing; evolutionary algorithm; particle swarm optimization) is used to update the design parameter, possibly based on an archive of previous performance including the current iteration. The following process is repeated:

  1. propose a new design →
  2. simulate disease and surveillance processes →
  3. evaluate performance of design →;
until a stopping criterion is met, such as exceeding a preset computational budget or failing to improve upon the best simulated design for a certain number of iterations.

The design parameters associated with the best performance are returned (see [20]).

The disease system model (see [20]) may be statistical, mechanistic, or an ensemble of different models or parameters that account for epistemic and parametric uncertainties, and should be developed with special attention to representing any processes thought to be relevant to the surveillance process. Multiple realizations of the disease system model, which may comprise incident cases or other phenomena of interest, is then filtered through measurement processes simulated by the surveillance model [20], which mimics relevant data collection and processing behaviors of a surveillance system, subsequently yielding estimates of the target epidemiologic parameter(s) (e.g., disease incidence; probability of an outbreak; change in incidence following intervention) that can be compared to true underlying values generated by the disease system model to assess the performance of the surveillance design.

Adaptation of DIOS to the design of laboratory-based surveillance for monitoring infections caused by multiple genotypes or serotypes

To apply the DIOS framework to the optimization of surveillance for multiple pathogen subtypes (Fig 1), a first step is to define objective functions to evaluate surveillance performance on estimating epidemiologic parameter(s) related to pathogen subtype(s) of interest. For instance, researchers may be interested in early detection of a more infectious variant of a circulating infection, e.g., the Delta variant of SARS-CoV-2, and therefore specify an objective as minimizing prevalence of that subtype by the time it is detected. If the overall composition of cases associated with multiple pathogen subtypes is of interest, a suitable objective might be to minimize the mean absolute error of incidence rate estimates across subtypes. More example objective functions can be found in Fig 1.

thumbnail
Fig 1. Schematic of the DIOS framework for optimizing surveillance of infections caused by multiple pathogen subtypes, with example design parameters and objective functions presented in green boxes.

https://doi.org/10.1371/journal.pcbi.1010575.g001

Second, design parameters relevant to laboratory surveillance must be conceptualized and defined in the surveillance system model. Examples of surveillance design parameters that may bear on the abovementioned objectives include the number of cases sampled for typing across different subpopulations; the sampling protocols used to select cases to subtype from these subpopulations; and the laboratory techniques used to identify pathogen subtypes.

Third, the disease system model must represent the dynamics of multiple pathogen subtypes and their possible interactions, and be able to correct for known biases in the observed surveillance data. For example, the negative interaction between dengue virus serotypes—possibly due to short-term cross-protection [21]—would need to be accounted for in a disease model simulating the incidence of dengue fever associated with multiple serotypes. Similarly, any tendency to select severe cases for typing would need to be corrected by incorporating the heterogenous selection probability of different disease severity groups in the disease system model [2,4].

Finally, the DIOS surveillance model must be able to represent necessary characteristics of laboratory-based surveillance systems, such as assay-dependent classification performance, turnaround time, or cost. For instance, if the design parameter subject to optimization is the laboratory technique used to determine the presence of a pathogen subtype, the surveillance model should be able to simulate known relevant attributes of the candidate techniques, such as the probability of false positive or false negative results.

2.2 Application of DIOS to optimize laboratory-based surveillance of serotypes of enteroviruses causing HFMD

2.2.1 Background.

HFMD is a pediatric infectious disease of growing public health importance [22,23], with a particularly high burden in East and Southeast Asia [22,24]. A variety of enteroviruses transmitted through fecal-oral or respiratory routes are causative agents of HFMD—including enterovirus-A71 (EV-A71), coxackievirus-A16 (CV-A16), CV-A6, and CV-A10 [25]. EV-A71 and CV-A16 have long been the serotypes associated with the highest disease burden, but other serotypes, such as CV-A6 are emerging with increasing clinical relevance in recent years [26,27]. The specific etiology of HFMD impacts the severity of symptoms, and has ramifications for intervention strategies, particularly vaccination. In China, recent deployment of monovalent vaccines against EV-A71, the most virulent serotype, has led to a reduction in the incidence of severe HFMD, but the overall incidence of HFMD is still rising, suggesting the possibility of serotype replacement [15]. Thus, it is critical to optimize laboratory surveillance to accurately estimate incidence of all HFMD and severe HFMD attributable to various enterovirus serotypes within the constraints of available resources.

2.2.2 Study region and surveillance system.

Between 2004–2013, HFMD was the leading cause of death for children under five years old in China amongst all 39 nationally notifiable infectious diseases, and had the highest incidence of any infectious disease in the country [28, 29]. Since the inclusion in 2008 of HFMD on the list of mandatory notifiable infectious diseases in China, over 22.5 million cases have been reported across the country as of 2019 [30]. Sichuan Province (population >80 million) exhibits strong spatial and temporal heterogeneity in HFMD disease burden across prefectures, and is among multiple ongoing centers of transmission [31]. Clinically diagnosed HFMD cases are registered by the National Infectious Disease Reporting System (NIDRS), which covers nearly all healthcare facilities in China [32]. Clinical cases of HFMD are diagnosed by the presence of papular or vesicular rash on hands, feet, mouth or buttocks with or without fever, and are required to be reported to NIDRS within 24 hours [23]. Because of the narrow affected age group, distinct clinical features, and known seasonality of the disease, clinical diagnosis is considered to be highly specific [33].

Specimens are collected from a subset of clinical cases presenting at sentinel hospitals in an ad hoc manner to determine the underlying serotype using reverse-transcriptase polymerase chain reaction (RT-PCR), and the test results are reported to a laboratory surveillance system [31]. Deidentified data on clinical HFMD cases were obtained for the 21 prefectures of Sichuan from Sichuan Center for Disease Control and Prevention, including serotype information (recorded as EV-A71, CV-A16, or other enterovirus) and indicators of case severity, and were aggregated at the prefectural level for each year from 2009 up to 2015, stopping one year before the introduction of EV-A71 vaccines into the region in 2016 [15]. Prefecture-level population data were collected from public sources for 2009–2015 [34].

The epidemiologic data supporting the optimization analysis herein included a total of 388,365 HFMD cases reported from 2009–2015 in Sichuan, of which 0.87 percent (3,380 cases) were severe. Annual HFMD incidence rates increased gradually over time (Fig 2A) and varied substantially across space (Fig 2B), with the highest annual mean incidence rate observed in Chengdu, the capital prefecture, and its surrounding prefectures, as well as the city with the highest per capita gross regional product, Panzhihua, in the southwest of the province. Laboratory tests were conducted for 22,100 cases (5.7%), with 52% of severe cases and 5.3% of mild cases subjected to serotyping. The number of laboratory-tested cases increased over time (Fig 2C) and exhibited substantial spatial variation (Fig 2D). The proportion of all, mild, and severe HFMD cases tested from 2009–2015 by prefecture are shown in S1 Fig. CV-A16, EV-A71, and other enteroviruses caused 26.6%, 29.1% and 44.3% of all serotyped cases, and 7.3%, 58.5% and 34.1% of severe serotyped cases, respectively, indicating EV-A71 (CV-A16) tended to cause severe (mild) symptoms. CV-A6 and CV-A10 likely constitute the majority of other enteroviruses in circulation [3538].

thumbnail
Fig 2.

Temporal and spatial variations in HFMD incidence rate (A,B) and laboratory serotyping (C,D). (A) HFMD incidence rate for Sichuan 2009–2015; (B) annual mean HFMD incidence rate for each prefecture; (C) number of serotyped HFMD cases by year; (D) proportion of all serotyped cases drawn from each prefecture from 2009–2015. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.g002

2.2.3 Defining the optimization problem.

We pursued optimization of estimates of total and severe HFMD incidence across serotypes, with the proportion of typing allocated to each prefecture (“location”) and case severity group (mild and severe) as design parameters. The optimization seeks the sample allocation vector θ = {θ1, θ2,…,θI, θs} (I = 21) that minimizes the mean absolute error (MAE) in the estimates of serotype-specific incidence rate of: 1) total; and 2) severe HFMD across time, space, and realizations, where θi represents the proportion of total serotyping resources allocated to the i-th location in the study province, and θs represents the probability of a severe case being tested, which is assumed to be fixed across locations. After allocating typing resources to severe cases as defined by θs, the remaining available typing according to θi at location i will be allocated to mild cases. The total number of cases sampled for subtyping each year is fixed at the observed frequency of typing (Fig 2C). The optimization problem can be formalized as: where fn(θ) is the n-th objective function, representing MAE (i.e., performance) of the candidate surveillance system defined by the design parameter θ.

The first objective function explored (f1(θ)) represents the MAE of the estimated serotype-specific incidence rates of all cases (i.e., incidence rates of EV-A71, CV-A16, and other enteroviruses) across locations, time, serotypes, and realizations (i.e., samples from the posterior distribution) of disease system model, expressed as:

Where I, T, K, and R represent the total number of locations (I = 21), study years (T = 6), serotypes (K = 3; for CA-V16 [k = 1], EV-A71 [k = 2], and other enterovirus [k = 3]), and disease system model realizations (R = 80, selected to ensure convergence of the estimated MAE across model runs), respectively; represents the simulated incidence rate of the ith location during the tth year for the kth serotype in the rth realization of the disease system model; and represents the corresponding incidence rate estimated using the laboratory surveillance information ascertained by the surveillance system defined by the design parameter θ. The methods for simulating and estimating with HFMD surveillance data in the study province are described below in sections 2.2.4 and 2.2.5, respectively.

An alternative objective function examined (f2(θ)) represented the MAE of the estimated serotype-specific incidence rates of severe cases across locations, time, serotypes, and realizations of disease system model, defined as: where represents the simulated probability of the kth serotype causing severe disease in the rth realization, while represents its estimate with information ascertained by the surveillance system defined by θ.

2.2.4 Disease system model.

We estimated the underlying serotype-specific incidence rates in each region (λikt) and the serotype-specific probability of severe disease (pk) using data in the study region with a multivariate spatio-temporal Bayesian hierarchical framework (i.e., “the disease system model”; see schematic and hyperparameter priors in S2 Fig). The unobserved incidence rate of cases caused by serotype k, in location i, in year t, λikt, is modeled as: where β0 represents the intercept; Xikt represents disease risk factors with corresponding coefficients βkt (although for simplicity, we incorporate only an intercept, but no risk factors, in the model); and γikt is a random effect. The vector of γikt is organized as with a covariance matrix Σ, which is a separable multivariate space-time conditional autoregressive (MSTCAR) structure. More specifically, Σ is the Kronecker product of three covariance matrices characterizing: the spatial dependence; between-serotype dependence; and the temporal dependence (see S2 Fig for details) [39].

Observed data, representing total HFMD cases at location i, in year t, with severity s, are denoted as , and serotyping results, , are used to infer the latent disease process parameters, as well as parameters of the observation process. Given the large population size of each location, the number of new cases in each location is assumed to be adequately represented by a Poisson distribution [40]: where represents the aggregated incidence rate across serotype groups in location i, during year t, with severity s (s = 1 represents severe disease; s = 2, mild disease); and Nit represents the population size of location i at year t. We denote the probability of serotype k causing severe disease as pk. Thus, the incidence rate of cases of severe disease is , while that of mild disease is , and . We assume that the probability of being selected for laboratory typing does not depend on serotype after conditioning on case severity [2]. The number of annual tests is large, and thus test-positive case counts for each serotype are assumed to be adequately represented by a Poisson distribution: where represents the number of cases tested positive for serotype k at location i, year t, with severity s; and represents the probability of being selected for serotyping at location i, year t, with severity s, which was estimated by smoothing observed data in the study region by assuming spatial and temporal autocorrelation. Epidemiologic parameters estimated by the disease system model can be found in S1 Text.

Generating disease data. To ensure that our optimized surveillance scenarios account for uncertainty in the observation process and parameter estimates, after fitting the disease process and observation model to data from 2009–2014, R sets (R = 80) of serotype-specific incidence rates () and parameter values (including β0 and hyperparameters of Σ) were drawn from the joint posterior distributions. The sampled parameter values were used by the surveillance model (section 2.2.5) to simulate and to estimate and under different surveillance designs.

2.2.5 Surveillance model.

The surveillance model generates realizations of surveillance information conditional on the simulated disease data and the candidate design parameter. After proposing a sample allocation vector θ, we first estimated the number of typing tests allocated to location i in year t, for case severity s based on θ and the total number of laboratory typing tests conducted in year t across all locations (Fig 2C), then further estimated , the probability of being typed based on the estimated number of typing tests and the total observed number of HFMD cases at location i and year t. The estimated probability , together with the rth sample of β0 and hyperparameters of Σ, were then used to re-estimate and based on the disease system model described in section 2.2.4, and to further evaluate the objective functions f1(θ) and f2(θ).

2.2.6 Optimization search.

Since design vector θ is constrained by , possibly rendering the optimization search process less efficient than an unconstrained optimization problem, we first converted the 22-dimensional design vector θ to an unconstrained 21-dimension internal design vector θ by following methods described elsewhere [41]. This internal design vector θ was then optimized with a genetic algorithm (GA)—a metaheuristic optimization algorithm inspired by a natural selection process [42]. GAs have the ability to handle complex optimization problems, avoid local optima, and find near-optimal solutions within a reasonable amount of time [43,44], and have been used extensively in public health and medical research [4550].

To optimize with a GA, first an initial population of n random designs was generated and the objective function value (i.e., MAE of estimated subtype-specific incidence rates, f1(θ)) of each design was evaluated. A small number of designs with the lowest MAEs survived to the next generation, while other designs were selected for recombination with probability determined by a function of their MAE. For each randomly matched pair of designs in the recombination pool, two new descendants were produced for the next generation, during which crossover occurs with high probability, pcrossover, and mutation occurs with low probability, pmutation. If crossover occurs, the descendants were generated as linear combinations of the parent designs with randomly sampled weights. For example, if the two parents are and , and the random weight sampled from Uniform(0,1) is ω, then the two descendants are and , respectively. When mutation happens, one random element of the design vector is changed to a random number sampled from its domain. Following previous studies [49,51], we set the initial population size n = 50, pcrossover = 0.8, and pmutation = 0.05. The optimization process took about 45 hours on two nodes, each with a 96 GB RAM and two Skylake 20-core 2.1 GHz processors.

2.2.7 Benchmarking and evaluation of robustness of optima.

The surveillance performance of the optimal design was benchmarked against seven archetypal designs: 1) the existing allocation of laboratory typing across locations (Fig 2D, hereafter referred to as Existing); 2) an equal allocation of typing across all locations (hereafter Equal); 3) allocation of typing proportional to the location’s population (hereafter PopSize); 4) allocation of typing proportional to absolute number of HFMD cases (hereafter Case); 5) allocation of typing proportional to HFMD incidence rate (hereafter IncRate); 6) allocation of typing proportional to absolute number of severe HFMD cases (hereafter SevereCase); 7) allocation of typing proportional to HFMD incidence rate of severe cases (hereafter SevereIncRate). See S3 and S4 Figs for the proportion of serotyping allocated to each location under each of these archetypal designs. The proportion of typing tests allocated to each location for these archetypal designs was estimated based on the 2009–2014 data, while the probabilities of severe cases serotyped were set to the values that minimize the MAEs with the corresponding locational allocation strategy, according to grid searches (S5 Fig). These seven archetypal designs were included in the initial population of the GA, together with another 43 randomly generated designs.

To examine the robustness of the designs selected by the optimization process, epidemiologic data for 2015 were held out to establish whether optimal designs based on 2009–2014 data performed well for the near-term future. During this process, we compared surveillance performance of the optimal design obtained with the 2009–2014 data to that of the seven archetypal designs described above, using only 2015 data. Furthermore, to investigate how robust the optimal design was when the total typing capacity changes, we repeated the analyses with halved, doubled, and quintupled total frequency of typing across all locations in each year (i.e., scaling the observed frequencies shown in Fig 2C). As an alternative method to examine if the performance of the surveillance system changes with resource limits, we also randomly selected 300 designs from the design space, evaluated the two objective values with the original resource constraint and when each constraint was halved, doubled, or quintupled, and investigated if the designs that performed well under one constraint also performed well under others.

2.2.8 Computing platform and code availability.

All analyses were conducted in R 4.0.3 [52] on Berkeley’s Savio computational cluster [53], with rstan package 2.18.2 for Bayesian hierarchical modeling [54], GA package 3.2 for implementation of the genetic algorithm [55], and packages ggplot2 3.1.1 [56], cowplot 0.9.4 [57], and tmap 2.2 [58] for visualization. All code and data are available at: https://github.com/qu-cheng/Lab_surveillance_optimization

3 Results

3.1 Optimal designs

Allocation by location.

The existing laboratory surveillance network (Existing archetypal design) allocates approximately a quarter of all subtyping effort to the most populous prefecture in the study region (Chengdu), while in contrast, less than one percent of subtyping effort is allocated to Ganzi, a remote prefecture in the northwestern mountainous region of the study area (Figs 3A and 2D). The optimal designs to minimize error in estimated serotype-specific incidence rates of all HFMD cases (Optimal for all) and only severe HFMD cases (Optimal for severe) shift the typing allocation substantially (Figs 3C and 3D and S6). Although the very populous Chengdu prefecture still receives the largest proportion of typing resources, the optimal designs allocate just 12.2% and 9.5% of total typing resources for the two objectives, respectively. Notably, in S7 Fig, which shows the proportion of cases being serotyped at each location according to the Optimal for all and Optimal for severe designs, certain prefectures with low absolute typing allocations (e.g., Ganzi and Aba in Fig 3C and 3D) are able to serotype a large proportion of total cases (e.g., >30 percent of cases are serotyped in Ganzi and Aba); for the populous Chengdu prefecture, optimal designs serotyped less than 2% of total cases in this prefecture, by comparison.

thumbnail
Fig 3. Comparison between Existing, IncRate, and Optimal subtyping allocation strategies across locations.

Treemaps show the proportion of typing efforts allocated to each location in the (A) Existing, (B) IncRate, and Optimal designs that minimize the error in estimated serotype-specific incidence rate of (C) all HFMD cases and (D) only severe HFMD cases. Tiles represent study locations, with the area of the tile representing the proportion of all typing efforts allocated to the location, and the color of the tile representing the location’s annual mean HFMD incidence rate. Tiles are ordered by decreasing annual mean incidence rate from top to bottom, then left to right. Scatterplots show the correlation between annual mean incidence rate of the optimal proportion of total typing resources allocated to each location to minimize error in estimated serotype-specific incidence rate of (E) all HMFD cases and (F) only severe HFMD cases. Black dots represent the archetypal design IncRate (see definition in section 2.2.7), blue triangles in (E) and squares in (F) represent the optimal allocation strategy for minimizing error in estimating serotype-specific incidence rate for all cases and only severe cases, respectively. The blue lines represent the best fit relating annual mean incidence rates to typing allocations across the Optimal designs. Vertical arrows represent changes from IncRate to Optimal: red arrows represent increases in typing efforts from IncRate to Optimal; green arrows represent reductions in typing efforts from IncRate to Optimal. Inset figures show data for all prefectures, showing the range (red dashed rectangle) displayed in the main panel.

https://doi.org/10.1371/journal.pcbi.1010575.g003

The designs Optimal for all and Optimal for severe are similar to the archetypal design IncRate (compare Fig 3B with Fig 3C and 3D). Moreover, the optimal proportions of total typing resources to allocate to each location for both surveillance objectives are correlated with the annual mean incidence rate of the location (Fig 3E and 3F), although the typing efforts are more equally distributed in the Optimal for severe design than in the Optimal for all design (compare locations across Fig 3C and 3D, and the difference in slope of the blue lines in Fig 3E and 3F).

Allocation by case severity.

The optimized proportion of severe cases to serotype depended strongly on the surveillance objective: 0.17 when minimizing errors in serotype-specific total HFMD incidence rates, and 0.70 when minimizing errors in serotype-specific severe HFMD incidence rates. To explore the effect of changing the proportion of severe cases being serotyped on surveillance performance, we fixed the spatial allocation of typing resources for each objective at the values in the Optimal designs while varying the proportion of severe cases subjected to serotyping from 0.01 to 0.99. The mean absolute error (MAE) of estimated total serotype-specific HFMD incidence rates was minimized at 11% of severe cases serotyped (Fig 4A). Notably, the MAE increases for this goal as severe cases are increasingly prioritized for serotyping.

thumbnail
Fig 4.

Impact of the proportion of severe cases serotyped on mean absolute error (MAE) of the estimated serotype-specific incidence rate of (A) all HFMD cases and (B) severe HFMD cases. Colored lines are smoothed by Gaussian process models. Black dot and triangle represent the probabilities of severe cases being serotyped that lead to the lowest error in estimating serotype-specific incidence rate of all (dot) and only severe (triangle) HFMD cases; blue dot and triangle represent the optimal designs from GA.

https://doi.org/10.1371/journal.pcbi.1010575.g004

For severe HFMD cases, the MAE initially decreases as greater proportions of severe cases are serotyped, then plateaus when about half of the severe cases are serotyped, reaching its optimum when the proportion of severe cases serotyped is 0.65. The optimal proportion of severe cases subjected to serotyping identified by GA are very close to the ones identified in this experiment, which suggests that the GA successfully explored the design space. For further analyses, we updated the probability of serotyping severe cases in both Optimal designs to be the values identified in this grid search of θs conditioning on optimal values of θ1, θ2,…,θI found by the GA, as the conditional grid search guarantees a better or equal estimate of θs.

3.2 Comparisons with archetypal designs

The optimal allocation of subtyping among regional subpopulations and case severity groups—while adhering to the same level of typing effort as the current design (Existing)—yielded a significant improvement in estimating the target parameters. The distribution of error (MAE) of estimated serotype-specific incidence rate of all HFMD and severe HFMD cases, across location, serotype, and year in 1000 realizations of the disease model for the optimal design was compared to the seven archetypal designs described in section 2.2.7 (Fig 5). When compared with the current surveillance design (Existing), with the same number of cases subjected to serotyping, the selected optimal designs (Optimal) exhibit 14.1 and 20.5 percent lower average MAE for the estimated serotype-specific incidence rate of all cases for the 2009–2014 (Fig 5A) and 2015 (Fig 5C) period, respectively; and a 13.3 and 14.8 percent lower average MAE of the estimated serotype-specific incidence rate of only severe cases for the 2009–2014 (Fig 5B) and 2015 (Fig 5D) period, respectively. Among the archetypal designs, IncRate generally performed well for both objectives. The results indicate that optimal designs based on historical observed data from 2009–2014 performed well for the 2015 year, which was held out of the optimization procedure, suggesting that optimal designs identified by DIOS may be useful for planning typing resource allocations in the short-term future.

thumbnail
Fig 5. Surveillance performance of the optimal design and the seven archetypal designs evaluated with data from 2009–2014 and 2015 over 1000 realizations of the disease system model.

Violin plots and boxplots for different designs (shades of color) show the distribution of mean absolute error (MAE) in estimating serotype-specific incidence rates of (A) all cases and (B) only severe cases using 2009–2014 data; and (C) all cases and (D) only severe cases using 2015 data, which was not used in the optimization procedure. The horizontal dashed lines show the median MAEs of the optimal designs.

https://doi.org/10.1371/journal.pcbi.1010575.g005

3.3 Sensitivity of selected designs to the total number of cases sampled for subtyping

Allocation by location.

To investigate whether the optimal design is robust to changes in the availability of typing resources, we compared the optimal designs for both objectives when the frequency of typing is set to half, two times, or five times that of historical serotyping rates. With more typing resources, MAE of estimated serotype-specific incidence rate of total and severe cases decreases substantially (S8 Fig), while the optimal location-wise allocation changes modestly (Figs 6 and S9 and S10). For both surveillance objectives, as serotyping resources increase, the optimal proportion of typing allocated at each location tends to become more evenly distributed, particularly for estimating the serotype-specific incidence rates of severe HFMD, because the marginal benefits of more intensive serotyping at locations with higher incidence fall, while more frequent serotyping at locations with lower incidence rates can continue to reduce estimation error.

thumbnail
Fig 6.

Scatterplots of annual mean incidence rate and the proportion of typing resources allocating to each location under the archetypal design IncRate (black dots) and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of all HFMD cases (blue triangles) when the available typing resources is (A) halved, (B) doubled, and (C) quintupled; and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of severe HFMD cases (blue squares) when the available typing resources is (D) halved, (E) doubled, and (F) quintupled.

https://doi.org/10.1371/journal.pcbi.1010575.g006

When examining 300 randomly sampled designs, the MAE of estimated serotype-specific incidence rate of total and severe cases across the four resource limit scenarios were highly correlated (>0.8, S11 Fig), which again suggests that the optimal allocation of laboratory resources is relatively insensitive to resource constraints in this framework, even as additional typing resources results in lower estimation errors.

Allocation by case severity.

When seeking to minimize error in estimated serotype-specific incidence rates of all HFMD cases, the optimal proportion of severe cases to serotype decreases as the availability of typing resources increases (Fig 7A). Conversely, when seeking to minimize error in estimated serotype-specific incidence rates of severe HFMD cases, the optimal proportion of severe cases to serotype increases as the availability of typing resources increases (Fig 7B). This is likely because for estimating the serotype-specific incidence rates of all HFMD cases, the marginal improvements diminish as long as enough samples of severe cases are tested to accurately estimate the virulence of each serotype; while for estimating the serotype-specific incidence rates of severe HFMD cases, the errors continue to decrease as more severe cases are tested.

thumbnail
Fig 7.

Optimal proportion of severe cases to be subjected to serotyping as the availability of typing resources changes, when seeking to minimize error serotype-specific incidence rates of (A) all HFMD cases and (B) only severe HFMD cases.

https://doi.org/10.1371/journal.pcbi.1010575.g007

4 Discussion

Laboratory-based disease surveillance networks are often designed in an ad hoc manner, guided by budgetary, logistical, or infrastructural considerations [19], which may lead to inefficient use of limited typing resources. Here, we adapted the DIOS framework to provide a quantitative platform for the simulation of epidemiologic and surveillance processes in the context of optimizing the allocation of scarce laboratory typing resources under operational constraints. In a case study, we apply the framework to determine how a limited number of samples for typing should be drawn from subpopulations to optimize estimation of serotype-specific incidence rates for all—and the subset of severe—HFMD cases in a study region in China.

We demonstrated that, with the same level of typing effort as the existing network, optimal designs chosen using DIOS can reduce the mean absolute error of estimates of serotype-specific incidence rates and proportions of clinical cases caused by each serotype by 14.1 and 13.3 percent, respectively. Although beyond the scope of this study, the DIOS framework accommodates multi-objective optimization as well [20], providing a means to identify optimal designs for simultaneously optimizing both objectives. Changes to the total number of cases sampled for subtyping minimally impacted the relative performance of surveillance designs.

Our optimization identified that allocating laboratory typing resources across locations in proportion to their HFMD incidence rates gave near-optimal performance for estimating both the total serotype-specific incidence rates and serotype-specific incidence rates of severe HFMD. For estimating total HFMD incidence, this is fairly intuitive, since errors in incidence rates will exhibit higher variance when the incidence rate itself is higher, additional typing to stabilize these estimates across locations will benefit the average MAE. The optimal design for estimating serotype-specific incidence rates of severe cases involves a slightly more equal distribution of subtyping resources, in part because fewer tests are available to type mild cases as the proportion of severe cases typed increases in the optimal design, which results in insufficient resources to accurately estimate background serotype-specific incidence of mild cases at locations with low incidence rates.

Our study opens several areas for future research. While we focused on a surveillance design parameter representing the proportion of all typed cases to be drawn from each region, other design parameters can certainly be examined, such as the sampling of cases for subtyping across demographic groups, the selection of laboratories to include in the surveillance network, and the assays used for typing. Besides the total number of typing tests, other constraints, such as the total cost for processing and shipping the specimens given the fact that the cost may vary across locations, can also be considered. What is more, other surveillance objectives—beyond estimating serotype-specific incidence rates for all cases and only severe cases—are possible, such as early detection of a new subtype or an unusual increase in existing subtypes, evaluation of the effectiveness of subtype-specific interventions, and confirmation of the elimination or eradication of a specific subtype. Multiple objectives can also be evaluated simultaneously through multi-objective optimization, as we have demonstrated elsewhere [20].

Expanding on the use of a single disease system model here, multiple models with different structures—e.g., hierarchical models with different covariance structures, machine learning algorithms, and mechanistical models—and different parameter values or covariates could be run in an ensemble to better represent uncertainty in the underlying epidemiologic processes. Periodic intensive, cross-sectional sampling may also help to validate and fine-tune the design optimization process by providing high-resolution, high-confidence estimates of incidence rates. Furthermore, while this study assumed that the optimal design is fixed and does not change over time, future optimizations could update optimal designs iteratively as new data becomes available, refitting the disease system model and updating the optimal design. Such an adaptive sampling approach may result in improved surveillance performance in settings where transmission dynamics change substantially over time [59].

In conclusion, we have shown that designing laboratory networks for surveillance systems with the DIOS framework can reveal designs that allocate limited resources more efficiently. For jurisdictions with sophisticated computational capabilities, the analyses in this work could be repeated to identify the optimal designs for specific settings and surveillance goals. For regions with limited resources, rules of thumb, such as the allocation of typing resources in proportion to incidence rates, may emerge from simulations of general scenarios. Future work is needed to generate such transcendent surveillance rules for various surveillance design parameters and goals, and to yield improved understanding of the design parameters that would allow the most cost-effective laboratory-based surveillance architectures. The scope of applications of the DIOS framework extends across many dimensions of laboratory-based surveillance networks and associated goals, raising important opportunities for developing the next generation of laboratory surveillance systems to monitor pathogen subtypes.

Supporting information

S1 Text. Epidemiological parameter estimates by the disease system model.

https://doi.org/10.1371/journal.pcbi.1010575.s001

(DOCX)

S1 Fig.

Annual mean percentage of cases being tested for (A) all clinical HFMD cases, (B) mild cases, and (C) severe cases between 2009–2015. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.s002

(PDF)

S2 Fig. Schematic of the multivariate spatio-temporal Bayesian hierarchical model.

See the main text for the definitions of notations. Priors of the hyperparameters are highlighted in blue, while observed data are highlighted in green.

https://doi.org/10.1371/journal.pcbi.1010575.s003

(PDF)

S3 Fig.

Proportion of typing resources allocate to each location for the archetypal designs: (A) Existing, (B) Equal, (C) PopSize, (D) Case, (E) IncRate, (F) SevereCase, and (G) SevereIncRate. See descriptions of these designs in Section 2.2.7 of the main text. The prefectures are colored by the proportion of serotyping resources allocated to them, with darker colors representing more serotyping resources. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.s004

(PDF)

S4 Fig.

Proportion of serotyping resources allocate to each location for the archetypal designs: (A) Existing, (B) Equal, (C) PopSize, (D) Case, (E) IncRate, (F) SevereCase, and (G) SevereIncRate. See descriptions of these designs in Section 2.2.7 of the main text. Each tile represent one location, with the area of the tile proportional to the amount of typing resources allocated to it and the color of the tile representing the annual mean incidence rate of that location.

https://doi.org/10.1371/journal.pcbi.1010575.s005

(PDF)

S5 Fig.

Optimal probability of severe cases being serotyped for each archetypal design minimizing mean absolute errors (MAE) of the estimated serotype-specific incidence rate of (A) all HFMD cases and (B) only severe HFMD cases. Different colors represent different archetypal designs. The colored lines are smoothed by Gaussian Process model. Black dots and triangles represent the optimal probability of severe cases being serotyped for each archetypal design minimizing mean absolute errors (MAE) of the estimated serotype-specific incidence rate of all HFMD cases and only severe HFMD cases, respectively.

https://doi.org/10.1371/journal.pcbi.1010575.s006

(PDF)

S6 Fig.

The optimal proportion of subtyping to allocate to each location for minimizing mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) severe cases. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.s007

(PDF)

S7 Fig.

The proportion of cases being subtyped according to the optimal designs that minimize mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) severe cases. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.s008

(PDF)

S8 Fig.

Mean absolute error in estimating serotype-specific incidence rate of (A) all cases and (B) only severe cases when the availability of typing resources changes.

https://doi.org/10.1371/journal.pcbi.1010575.s009

(PDF)

S9 Fig.

The optimal proportion of subtyping to allocate to each location for minimizing mean absolute error in estimating serotype-specific incidence rate of all cases when the total amount of subtyping resources is (A) half, (B) two times, or (C) five times that of the observed frequency; and for minimizing mean absolute error in estimating serotype-specific incidence rate of severe cases when the total amount of subtyping resources is (D) half, (E) two times, or (F) five times that of the observed frequency. The boundaries of the prefectures were obtained from https://gadm.org/download_country.html.

https://doi.org/10.1371/journal.pcbi.1010575.s010

(PDF)

S10 Fig.

Scatterplots of annual mean incidence rate and the proportion of typing resources allocating to each location under the archetypal design IncRate (black dots) and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of all HFMD cases (blue triangles) when the available typing resources is (A) halved, (B) doubled, and (C) quintupled; and the Optimal designs for minimizing the MAE of estimated serotype-incidence rate of severe HFMD cases (blue squares) when the available typing resources is (D) halved, (E) doubled, and (F) quintupled.

https://doi.org/10.1371/journal.pcbi.1010575.s011

(PDF)

S11 Fig. Correlation between the objective function values under four resource limit scenarios.

Correlation between the MAEs of estimated serotype-specific incidence rate of (A) all cases and (B) only severe cases.

https://doi.org/10.1371/journal.pcbi.1010575.s012

(PDF)

Acknowledgments

This research benefitted from the Savio computational cluster resource provided by the Berkeley Research Computing program at the University of California, Berkeley, which is supported by the UC Berkeley Chancellor, Vice Chancellor for Research, and Chief Information Officer.

References

  1. 1. Caini S, Kroneman M, Wiegers T, El Guerche-Séblain C, Paget J. Clinical characteristics and severity of influenza infections by virus type, subtype, and lineage: a systematic literature review. Influenza and other respiratory viruses. 2018;12(6):780–92. pmid:29858537
  2. 2. Tang X, Yang Y, Yu H-J, Liao Q-H, Bliznyuk N. A spatio-temporal modeling framework for surveillance data of multiple infectious pathogens with small laboratory validation sets. Journal of the American Statistical Association. 2019;114(528):1561–73. pmid:31937981
  3. 3. Coudeville L, Garnett GP. Transmission dynamics of the four dengue serotypes in southern Vietnam and the potential impact of vaccination. PloS one. 2012;7(12):e51244. pmid:23251466
  4. 4. Fisher L, Wakefield J, Bauer C, Self S. Time series modeling of pathogen-specific disease probabilities with subsampled data. Biometrics. 2017;73(1):283–93. pmid:27378138
  5. 5. Viboud C, Bjørnstad ON, Smith DL, Simonsen L, Miller MA, Grenfell BT. Synchrony, waves, and spatial hierarchies in the spread of influenza. science. 2006;312(5772):447–51. pmid:16574822
  6. 6. Harboe ZB, Benfield TL, Valentiner-Branth P, Hjuler T, Lambertsen L, Kaltoft M, et al. Temporal trends in invasive pneumococcal disease and pneumococcal serotypes over 7 decades. Clinical Infectious Diseases. 2010;50(3):329–37. pmid:20047478
  7. 7. Davies NG, Abbott S, Barnard RC, Jarvis CI, Kucharski AJ, Munday JD, et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B. 1.1. 7 in England. Science. 2021. pmid:33658326
  8. 8. Shen X, DeRiemer K, Yuan Z-A, Shen M, Xia Z, Gui X, et al. Drug-resistant tuberculosis in Shanghai, China, 2000–2006: prevalence, trends and risk factors. The International Journal of Tuberculosis and Lung Disease. 2009;13(2):253–9. pmid:19146756
  9. 9. Styers D, Sheehan DJ, Hogan P, Sahm DF. Laboratory-based surveillance of current antimicrobial resistance patterns and trends among Staphylococcus aureus: 2005 status in the United States. Annals of clinical microbiology and antimicrobials. 2006;5(1):1–9. pmid:16469106
  10. 10. Martinot-Peignoux M, Roudot-Thoraval F, Mendel I, Coste J, Izopet J, Duverlie G, et al. Hepatitis C virus genotypes in France: relationship with epidemiology, pathogenicity and response to interferon therapy. Journal of viral hepatitis. 1999;6(6):435–43.
  11. 11. Bruni L, Diaz M, Castellsagué M, Ferrer E, Bosch FX, de Sanjosé S. Cervical human papillomavirus prevalence in 5 continents: meta-analysis of 1 million women with normal cytological findings. Journal of Infectious Diseases. 2010;202(12):1789–99. pmid:21067372
  12. 12. Garland SM, Hernandez-Avila M, Wheeler CM, Perez G, Harper DM, Leodolter S, et al. Quadrivalent vaccine against human papillomavirus to prevent anogenital diseases. New England Journal of Medicine. 2007;356(19):1928–43. pmid:17494926
  13. 13. Xu J, Qian Y, Wang S, Serrano JMG, Li W, Huang Z, et al. EV71: an emerging infectious disease vaccine target in the Far East? Vaccine. 2010;28(20):3516–21. pmid:20304038
  14. 14. Bubar KM, Reinholt K, Kissler SM, Lipsitch M, Cobey S, Grad YH, et al. Model-informed COVID-19 vaccine prioritization strategies by age and serostatus. Science. 2021;371(6532):916–21. pmid:33479118
  15. 15. Head JR, Collender PA, Lewnard JA, Skaff NK, Li L, Cheng Q, et al. Early Evidence of Inactivated Enterovirus 71 Vaccine Impact Against Hand, Foot, and Mouth Disease in a Major Center of Ongoing Transmission in China, 2011–2018: A Longitudinal Surveillance Study. Clinical Infectious Diseases. 2019.
  16. 16. Centers for Disease Control Prevention. Direct and indirect effects of routine vaccination of children with 7-valent pneumococcal conjugate vaccine on incidence of invasive pneumococcal disease—United States, 1998–2003. MMWR Morbidity and mortality weekly report. 2005;54(36):893. pmid:16163262
  17. 17. GISAID. GISAID 2021 [cited 2021 09–16]. Available from: https://www.gisaid.org/.
  18. 18. Association of Public Health Laboratories. Influenza Virologic Surveillance Right Size Roadmap 2021 [cited 2021 Oct 11]. Available from: https://www.aphl.org/programs/infectious_disease/influenza/Influenza-Virologic-Surveillance-Right-Size-Roadmap/Pages/default.aspx.
  19. 19. World Health Organization. Global epidemiological surveillance standards for influenza. 2013.
  20. 20. Cheng Q, Collender PA, Heaney AK, Li X, Dasan R, Li C, et al. The DIOS framework for optimizing infectious disease surveillance: Numerical methods for simulation and multi-objective optimization of surveillance network architectures. PLOS Computational Biology. 2020;16(12):e1008477. pmid:33275606
  21. 21. Reich NG, Shrestha S, King AA, Rohani P, Lessler J, Kalayanarooj S, et al. Interactions between serotypes of dengue highlight epidemiological impact of cross-immunity. Journal of The Royal Society Interface. 2013;10(86):20130414. pmid:23825116
  22. 22. Koh WM, Bogich T, Siegel K, Jin J, Chong EY, Tan CY, et al. The epidemiology of hand, foot and mouth disease in Asia: a systematic review and analysis. The Pediatric infectious disease journal. 2016;35(10):e285. pmid:27273688
  23. 23. Xing W, Liao Q, Viboud C, Zhang J, Sun J, Wu JT, et al. Hand, foot, and mouth disease in China, 2008–12: an epidemiological study. The Lancet infectious diseases. 2014;14(4):308–18. pmid:24485991
  24. 24. Koh WM, Badaruddin H, La H, Mark I, Chen C, Cook AR. Severity and burden of hand, foot and mouth disease in Asia: a modelling study. BMJ global health. 2018;3(1):e000442. pmid:29564154
  25. 25. Wong S, Yip C, Lau S, Yuen K. Human enterovirus 71 and hand, foot and mouth disease. Epidemiology & Infection. 2010;138(8):1071–89. pmid:20056019
  26. 26. Li J, Sun Y, Du Y, Yan Y, Huo D, Liu Y, et al. Characterization of coxsackievirus A6-and enterovirus 71-associated hand foot and mouth disease in Beijing, China, from 2013 to 2015. Frontiers in microbiology. 2016;7:391. pmid:27065963
  27. 27. Zeng H, Lu J, Zheng H, Yi L, Guo X, Liu L, et al. The epidemiological study of coxsackievirus A6 revealing hand, foot and mouth disease epidemic patterns in Guangdong, China. Scientific reports. 2015;5:10550. pmid:25993899
  28. 28. Yang B, Liu F, Liao Q, Wu P, Chang Z, Huang J, et al. Epidemiology of hand, foot and mouth disease in China, 2008 to 2015 prior to the introduction of EV-A71 vaccine. Eurosurveillance. 2017;22(50):16–00824.
  29. 29. Yang S, Wu J, Ding C, Cui Y, Zhou Y, Li Y, et al. Epidemiological features of and changes in incidence of infectious diseases in China in the first decade after the SARS outbreak: an observational trend study. The Lancet Infectious Diseases. 2017;17(7):716–25. pmid:28412150
  30. 30. National Health Commission of the People’s Republic of China. Overview of the national notifiable infectious diseases in 2008–2019 2009. Available from: http://www.nhc.gov.cn.
  31. 31. Peng D, Ma Y, Liu Y, Lv Q, Yin F. epidemiological and aetiological characteristics of hand, foot, and mouth disease in Sichuan province, China, 2011–2017. Scientific reports. 2020;10(1):1–9.
  32. 32. Liang S, Yang C, Zhong B, Guo J, Li H, Carlton EJ, et al. Surveillance systems for neglected tropical diseases: global lessons from China’s evolving schistosomiasis reporting systems, 1949–2014. Emerging themes in epidemiology. 2014;11(1):19. pmid:26265928
  33. 33. Ventarola D, Bordone L, Silverberg N. Update on hand-foot-and-mouth disease. Clinics in dermatology. 2015;33(3):340–6. pmid:25889136
  34. 34. Statistical Bureau of Sichuan. Sichuan statistical Yearbook 2010–2016. Beijing: China Statistics Press; 2017.
  35. 35. Guan H, Wang J, Wang C, Yang M, Liu L, Yang G, et al. Etiology of multiple Non-EV71 and non-CVA16 enteroviruses associated with hand, foot and mouth disease in Jinan, China, 2009—June 2013. PloS one. 2015;10(11):e0142733. pmid:26562154
  36. 36. Yang Q, Ding J, Cao J, Huang Q, Hong C, Yang B. Epidemiological and etiological characteristics of hand, foot, and mouth disease in Wuhan, China from 2012 to 2013: outbreaks of coxsackieviruses A10. Journal of medical virology. 2015;87(6):954–60. pmid:25754274
  37. 37. Hu Y-Q, Xie G-C, Li D-D, Pang L-L, Xie J, Wang P, et al. Prevalence of coxsackievirus A6 and enterovirus 71 in hand, foot and mouth disease in Nanjing, China in 2013. The Pediatric infectious disease journal. 2015;34(9):951–7. pmid:26090576
  38. 38. Wang J, Teng Z, Cui X, Li C, Pan H, Zheng Y, et al. Epidemiological and serological surveillance of hand-foot-and-mouth disease in Shanghai, China, 2012–2016. Emerging microbes & infections. 2018;7(1):1–12.
  39. 39. Quick H, Waller LA, Casper M. Multivariate spatiotemporal modeling of age-specific stroke mortality. The Annals of Applied Statistics. 2017;11(4):2165–77.
  40. 40. Wang J, Zhou J, Xie G, Zheng S, Lou B, Chen Y, et al. The Epidemiological and Clinical Characteristics of Hand, Foot, and Mouth Disease in Hangzhou, China, 2016 to 2018. Clinical Pediatrics. 2020;59(7):656–62. pmid:32146823
  41. 41. Moeini A, Abbasi B, Mahlooji H. Conditional distribution inverse method in generating uniform random vectors over a simplex. Communications in Statistics—Simulation and Computation. 2011;40(5):685–93.
  42. 42. Mitchell M. An introduction to genetic algorithms: MIT press; 1998.
  43. 43. Yang X-S. Nature-inspired optimization algorithms: Academic Press; 2020.
  44. 44. Katoch S, Chauhan SS, Kumar V. A review on genetic algorithm: past, present, and future. Multimedia Tools and Applications. 2020:1–36.
  45. 45. Uyheng J, Rosales JC, Espina K, Estuar MRJ, editors. Estimating parameters for a dynamical dengue model using genetic algorithms. Proceedings of the genetic and evolutionary computation conference companion; 2018.
  46. 46. VanderWaal K, Enns EA, Picasso C, Alvarez J, Perez A, Fernandez F, et al. Optimal surveillance strategies for bovine tuberculosis in a low-prevalence country. Scientific reports. 2017;7(1):1–12.
  47. 47. Vandewater L, Brusic V, Wilson W, Macaulay L, Zhang P. An adaptive genetic algorithm for selection of blood-based biomarkers for prediction of Alzheimer’s disease progression. BMC bioinformatics. 2015;16(18):1–10. pmid:26680269
  48. 48. Modersitzki N, Phan LV, Kuper N, Rauthmann JF. Who is impacted? Personality predicts individual differences in psychological consequences of the COVID-19 pandemic in Germany. Social Psychological and Personality Science. 2020:1948550620952576.
  49. 49. Matrajt L, Halloran ME, Longini IM Jr. Optimal vaccine allocation for the early mitigation of pandemic influenza. PLoS Comput Biol. 2013;9(3):e1002964. pmid:23555207
  50. 50. Araz OM, Fowler JW, Nafarrate AR. Optimizing service times for a public health emergency using a genetic algorithm: Locating dispensing sites and allocating medical staff. IIE Transactions on Healthcare Systems Engineering. 2014;4(4):178–90.
  51. 51. Hassanat A, Almohammadi K, Alkafaween E, Abunawas E, Hammouri A, Prasath V. Choosing mutation and crossover ratios for genetic algorithms—a review with a new dynamic approach. Information. 2019;10(12):390.
  52. 52. R Core Team. R: A language and environment for statistical computing. Vienna, Austria; 2013.
  53. 53. Research IT. High Performance Computing 2021 [2021/06/05]. Available from: https://docs-research-it.berkeley.edu/services/high-performance-computing/overview/.
  54. 54. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B, Betancourt M, et al. Stan: A probabilistic programming language. Journal of statistical software. 2017;76(1).
  55. 55. Scrucca L. GA: a package for genetic algorithms in R. Journal of Statistical Software. 2013;53(4):1–37.
  56. 56. Wickham H. ggplot2: elegant graphics for data analysis: springer; 2016.
  57. 57. Wilke CO. cowplot: streamlined plot theme and plot annotations for ‘ggplot2’. CRAN Repos. 2016;2:R2.
  58. 58. Tennekes M. tmap: Thematic Maps in R. Journal of Statistical Software. 2018;84(6):1–39.
  59. 59. Pacheco RA, Rerolle F, Lemoine J, Hernandez L, Meïté A, Bibaut A, et al. Finding hotspots: development of an adaptive spatial sampling approach. medRxiv. 2020.