Figures
Abstract
Identifying and controlling the emergence of antimicrobial resistance (AMR) is a high priority for researchers and public health officials. One critical component of this control effort is timely detection of emerging or increasing resistance using surveillance programs. Currently, detection of temporal changes in AMR relies mainly on analysis of the proportion of resistant isolates based on the dichotomization of minimum inhibitory concentration (MIC) values. In our work, we developed a hierarchical Bayesian latent class mixture model that incorporates a linear trend for the mean log_{2}MIC of the non-resistant population. By introducing latent variables, our model addressed the challenges associated with the AMR MIC values, compensating for the censored nature of the MIC observations as well as the mixed components indicated by the censored MIC distributions. Inclusion of linear regression with time as a covariate in the hierarchical structure allowed modelling of the linear creep of the mean log_{2}MIC in the non-resistant population. The hierarchical Bayesian model was accurate and robust as assessed in simulation studies. The proposed approach was illustrated using Salmonella enterica I,4,[5],12:i:- treated with chloramphenicol and ceftiofur in human and veterinary samples, revealing some significant linearly increasing patterns from the applications. Implementation of our approach to the analysis of an AMR MIC dataset would provide surveillance programs with a more complete picture of the changes in AMR over years by exploring the patterns of the mean resistance level in the non-resistant population. Our model could therefore serve as a timely indicator of a need for antibiotic intervention before an outbreak of resistance, highlighting the relevance of this work for public health. Currently, however, due to extreme right censoring on the MIC data, this approach has limited utility for tracking changes in the resistant population.
Citation: Zhang M, Wang C, O’Connor A (2020) A hierarchical Bayesian latent class mixture model with censorship for detection of linear temporal changes in antibiotic resistance. PLoS ONE 15(1): e0220427. https://doi.org/10.1371/journal.pone.0220427
Editor: Praveen Rishi, Panjab University, INDIA
Received: July 11, 2019; Accepted: December 11, 2019; Published: January 31, 2020
Copyright: © 2020 Zhang et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: The CDC NARMS data (for application examples 1 and 2) can be found either in the "Download" panel of the website https://wwwn.cdc.gov/narmsnow/, or in the Github repository https://github.com/MinZhang95/AMR-Linear/blob/master/IsolateData_all%20NARMS%20Now%20data.csv. Please note that the data on the CDC website can be updated by the NARMS program, while the Github data was downloaded from the CDC website in March 2019. The ISU VDL data (for application example 3) is also available in the Github repository https://github.com/MinZhang95/AMR-Linear/blob/master/Reduced%20Food%20Animal%20Susceptibility%20and%20Salmonella%20NVSL%20from%202003_used%20for%20PLOS%20ONE%20linear%20paper.xlsx.
Funding: This project was supported by funding from United States Department of Agriculture National Institution for Food and Agriculture (https://nifa.usda.gov/). The grant was awarded to Dr. Annette O’Connor and Dr. Chong Wang (Grant Award Number: 2018-67030-28361). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Rationale
Identifying and controlling the emergence of antimicrobial resistance (AMR) is a high priority for researchers and public health officials. A critical component of this control effort is surveillance for emerging or increasing resistance, as evidenced by the number and scale of surveillance programs around the world [1] [2]. The aims of these surveillance programs are to enable detection of emerging resistance in a timely manner and to facilitate antimicrobial stewardship programs to be implemented properly and accurately [3]. Currently, detection of temporal changes in AMR relies primarily on analysis of the proportion of resistant isolates based on the dichotomization of minimum inhibitory concentration (MIC) [3]. The MIC can be obtained from laboratory methods or machine-learning approaches [3] [4]. Regardless of the approach to MIC determination or the breakpoint used, dichotomization results in a loss of information.
Previous work and challenges
To date, the predominant approach to assessing changes in AMR has focused on evaluation of changes in the proportion of isolates resistant to a particular antibiotic over time. Several statistical methods have been employed, including the Cochran-Armitage trend test, logistic regression model with time as a co-variate [5] [6] [7], and the Mann-Kendall non-parametric method to test monotonic trends over time [8]. These statistical methods are based on MIC data that are dichotomized to resistant and non-resistant. Mazloom et al. [9] pointed out that methods based on categorizations cause information loss. As such, a focus on changes in proportion of bacteria that are categorized as resistance means that changes in the mean MIC of isolates above or below the resistant breakpoint, a phenomena referred to as MIC creep/decline, are not included in the current surveillance monitoring [10]. Similarly, reliance on dichotomized MIC data prevents monitoring of correlations in mean MIC, despite the fact that such information would aid in the identification of emerging joint resistance patterns.
Mean MIC estimation must address the natural characteristics of MIC values, which are measured from serial dilution experiments [11] or obtained using the whole-genome sequencing based machine learning method [4]. Regardless of the method for obtaining MIC, currently observed MIC values are all censored. For example, an observed or predicted MIC of 8 for the organism A actually implies that the true MIC is >4, ≤8, and ultimately unknown. Estimation of the mean MIC without adjusting for censorship is biased and likely to overestimate the bacterial resistance to an antibiotic [12]. An additional issue is modeling of the underlying distribution of the true unknown MIC values. With respect to MIC data, bacteria isolates typically consist of a mixture of two components. Depending upon the focus (or region), these two populations may be considered resistant and non-resistant populations or wild-type and non-wild-type populations; however, the presence of two overlapping populations is commonly considered reasonable for these bacteria. For each component, the true concentration of antibiotic required to inhibit bacterial growth (i.e., the true MIC value) is believed to follow a log-normal distribution [13]; hence, the log_{2}MIC follows a normal distribution.
Statistical approaches for estimation of the mean MIC have been proposed previously. Kassteele et al. [14] suggested a model for mean log_{2}MIC estimation that incorporated the censored nature of MIC data and adjusted for such bias using the interval-censored normal distribution as the underlying distribution. This model is a reasonable accommodation for censorship; however, the approach did not address the mixture of resistant and non-resistant populations in the observed data. Craig [15] proposed that the underlying distribution of log_{2}MIC can be modeled by a mixture of Gaussian distributions with resistant and non-resistant populations. Jaspers et al. [16] [17] [18] [19] modeled the full continuous MIC distribution for wild-type and non-wild-type bacteria populations determined by epidemiological cut-off rather than clinical breakpoints. According to their definition of bacterial categorization, the non-wild-type population has less informative distributions and was therefore estimated in non-parametric ways. These previously published approaches suggest that log_{2}MIC mean can be estimated, although none of the above mentioned studies evaluated an approach for estimating a temporal trend in mean log_{2}MIC, and such an approach is clearly a critical need for AMR surveillance programs. Therefore, building upon these previous studies, we sought to describe an approach to estimate the mean log_{2}MIC and describe temporal changes in AMR, while simultaneously addressing the censored nature of MIC data and the mixed distributions of populations.
Contribution
In this paper, we proposed a hierarchical Bayesian latent class mixture model with censoring to detect temporal changes in AMR. This proposed model was applied to data from human samples from the Center for Disease Control National Antimicrobial Resistance Monitoring System (NARMS) surveillance program and swine samples from the Iowa State University Veterinary Diagnostic Laboratory (ISU VDL). The human data consisted of Salmonella I,4,[5],12:i:- tested with ceftiofur (TIO) and chloramphenicol (CHL). The swine data included Salmonella I,4,[5],12:i:- tested with TIO. Our applications revealed some interesting patterns in the Results Section. Simulation studies showed that our model was accurate and robust in the estimation of the mean log_{2}MIC in non-resistant populations, the linear temporal trend in non-resistant populations, and the proportion of resistant bacteria over time. Future work stemming from our model is also discussed in the Discussion Section. Inclusion of such analyses into current surveillance programs would provide additional insight for monitoring of temporal changes in AMR and would increase the value of information extracted from such surveillance systems.
Methods
Model notation and assumptions
Our hierarchical Bayesian model for detection of linear temporal changes in AMR must take into account the censored nature of the data and the underlying mixed distribution of the observations. The commonly used approach for analysis of two-fold serial dilution observations is to transform to base 2 logarithm. To account for censoring statistically, each observed MIC value was assumed to represent an interval of true MIC values rather than a single discrete point value. With the following notations, Table 1 explains the conversion between the observed log_{2}MIC values and a continuous scale interval (l_{ij}, u_{ij}) for each isolate and each antibiotic.
- : observed value of log_{2}MIC for isolation j in year i;
- y_{ij}: latent true value of log_{2}MIC for isolation j in year i;
- l_{ij}, u_{ij}: lower bound and upper bound of the true value y_{ij}, y_{ij} ∈ (l_{ij}, u_{ij}); and
- c_{ij}: latent indicator of the bacterial population from which the observation was drawn, where i = 1, 2, …I, and j = 1, 2, …, n_{i}. Here, I is the total number of years, and n_{i} is the total number of isolates tested in the ith year.
Denote S as the conversion function between and y_{ij}, and therefore, , which is depicted in Fig 1 by the step-like plot. The distribution of the latent true value of log_{2}MIC was assumed to be a bimodal Gaussian mixture model, which corresponds to the underlying mixture of resistant and non-resistant populations.
In this example, the serial dilution experiment starts at MIC = 2^{−2} = 0.25 and ends at MIC = 2^{4} = 16.
We proposed a model with a linear trend in the mean log_{2}MIC over time. The motivation for this model stemmed from the results of a naïve analysis of Salmonella enterica I,4,[5],12:i:- and antibiotic CHL in the CDC NARMS dataset (the grey line displayed in Fig 2). Since the true log_{2}MIC cannot be observed with the raw data due to censoring, the naïve mean log_{2}MIC was calculated and presented over time. The naïve analysis for mean log_{2}MIC ignored the nature of censoring of MIC data, resulting in calculation of the arithmetic average of log_{2}MIC each year. For example, if the observed MIC value was ≤ 2, then log_{2}MIC = log_{2}(2) = 1 was treated as the corresponding log_{2}MIC value and was therefore used for the naïve mean calculation. Although mathematically this approach has some issues, it served to illustrate the linear trend in the non-resistant population. The potential to observe and assess the presence of a linear trend is less useful in the resistant population, because the majority of the observed log_{2}MIC results are right censored at the highest concentration of the serial dilutions. As a consequence, in the resistant population, the naïve mean log_{2}MIC barely changed over time. Our assumption about the linear trend in the naïve mean log_{2}MIC was only applicable to the non-resistant population.
The grey bars represent the observed proportions of resistant bacteria (left y-axis). The grey line indicates the naïve mean of log_{2}MIC in the non-resistant population. The red line connects the point estimates of the mean log_{2}MIC in the non-resistant population. The blue line represents the estimated linear trend, shaded with its 95% CI.
Model description
Below, we describe in detail the approaches we used; however, as an introduction, we first provide a brief lay summary of the approach. In the first level of the model, we modeled the data with a mixture Gaussian distribution within each year. This level of model did not allow for estimation of a time effect, because each year was analyzed separately without being imposed by any pattern in time. The second level added complexity to the model via regression of the yearly mean log_{2}MIC towards a line. For the resistant population, we assumed a flat line and that yearly randomness arose around this line. For the non-resistant population, we assumed a non-flat line that is a linear function of time. This time effect was estimated by a slope parameter in the model (described below): a positive result implies that the AMR is increasing with time, while a negative result implies that the AMR is decreasing with time. We used the real data for such a model with two levels and showed that the linear trend model was able to quantify the year effect observed in the year-to-year mode. Based on the notations and assumptions described in the beginning of the Methods Section, the construction procedure of the hierarchical Bayesian latent class mixture model with censoring and linear trends is described as follows: (1) (2) where i = 1, 2, …, I; j = 1, 2, …, n_{i}. The variable before the pipe (“|”) was modeled with some distribution parameterized by the variable(s) behind the pipe. Ber(p) denotes a Bernoulli distribution with probability p, and N(β, σ^{2}) denotes a normal distribution with mean β and variance σ^{2}. In the ith year, the jth isolate comes from the resistant population with probability p_{i} and from the non-resistant population with probability 1 − p_{i}. The parameters β_{0i} represent the mean log_{2}MIC for the non-resistant group in ith year, and the parameters β_{1i} represent the mean log_{2}MIC for the resistant group in the ith year. The variances for both components, and , were set as invariant across the years, because we expected the spread of the observations within one population to be consistent over time. Using a predetermined cutoff value to categorize isolates into resistant and non-resistant can be problematic, because the threshold is not always consistent over years for some datasets (e.g., ISU VDL dataset contains examples of inconsistent threshold). With the Gaussian mixture model described in (1) and (2), the isolates for each year are separated into two components based on the posterior probabilities.
So far, the model allowed estimation of the mean log_{2}MIC for each year but has not imposed any constraints on the yearly means. Considering the heterogeneity of bacteria isolates in the MIC dataset, due perhaps to different sampling collection methods from year to year or different labs used to test isolates (e.g., CDC NARMS dataset contains data collected from multiple institutes), a hierarchical modeling strategy was adopted to borrow information about mean log_{2}MIC values across years and to integrate uncertainty from each individual year.
Based on the descriptive naïve means of log_{2}MIC in the non-resistant population, we proposed to incorporate a linear trend into the model above to describe the temporal changes of the mean log_{2}MIC in the non-resistant group for the organisms and antibiotics that appeared to be candidates for formal assessment of a linear pattern. First, we modeled the yearly mean log_{2}MIC of the non-resistant population by introducing the hyper-parameters γ_{0} and γ_{1}, with a simple linear model as follows: (3) where i = 1, 2, …, I. . Time (year) was used as a covariate with t_{i} = i. For the first year of our observation t_{i} = i = 1. This is equivalent to (4) where μ_{0i} = γ_{0} + γ_{1}t_{i}. Second, we modeled the yearly mean log_{2}MIC of the resistant population, using the hyper-parameter μ_{1} which is a constant: (5) This model implied that the yearly mean of log_{2}MIC in the non-resistant population is distributed around a straight line with intercept γ_{0} and slope γ_{1}, with normally distributed error and variance quantified by , while the yearly mean of log_{2}MIC in the resistant population is distributed around a constant μ_{1}, also with normally distributed error and variance quantified by . We modeled the yearly mean log_{2}MIC of the resistant population with a constant instead of a linear trend because the MIC determination of the observed log_{2}MIC is often highly right censored, meaning that we do not have enough information to estimate β_{1i} or its trend. If more studies reported end-point dilutions for resistant isolates, modeling the time trend in the resistant population would likely be feasible.
Further modeling on top of the first level, i.e. Eqs (1) and (2), involved addition of more hyper-parameters in the hierarchical structure for the proportion of the resistant population in the ith year, p_{i}. This parameter was modeled with a normal distribution through a logit link function: (6) (7)
Let Θ be the vector of all unknown parameters ; f be a generic expression for probability density function (pdf) or probability mass function (pmf). Also,, where . The joint likelihood function was used as follows: (8) (9) (10) Based on densities and masses from Eqs (1) to (7): (11)
For Eq (9), our latent variables y_{ij} and c_{ij} were integrated (summed) over their possible range (values) to obtain the likelihood function of the observed data. In Eq (10), the mean and proportion parameters were also integrated over their supports. Eq (11) shows the derivation of the likelihood of latent variables and parameters from the data model.
Prior distribution for hierarchical model parameters
The full Bayesian analysis required a joint prior distribution of all unknown parameters in the model. In our model setting, the vector of unknown parameters was . Furthermore, we assumed independent prior distribution for each parameter. The inverse gamma distribution was assigned to each variance of the data model and hierarchical part, due to their positive supports. The normal distribution was assigned to each linear parameter and each mean parameter of the hierarchical part, due to their supports on the whole real line. Since we did not have sufficient prior knowledge about these parameters, non-informative priors were used within the conjugate family. In particular, small values were chosen for the shape and scale parameters of the inverse gamma prior distribution; and large variance was chosen for the normal prior distribution. (12) (13) Using the Bayesian rule, our goal was to obtain samples and draw inference from the posterior distribution, which can be expressed based on densities from (8) to (13): (14) The posterior distribution did not have a closed form, and we illustrated the sampling approach in the following section with some real data applications.
Application to real datasets
The goal of the assessment of the linear trend was to add a new dimension to understanding the temporal changes of AMR in both humans and livestock. To illustrate this, we applied the proposed model to human data from the CDC NARMS surveillance program and swine data submissions to the ISU VDL. The CDC NARMS data included Salmonella I,4,[5],12:i:- tested with CHL and TIO, while the swine data included Salmonella I,4,[5],12:i:- tested with TIO.
Description of the CDC NARMS data used
NARMS program collects isolates of Salmonella spp., Escherichia coli and Campylobacter spp. The AMR data collected for the CDC surveillance program were obtained from bacteria isolated from patients who attend public health departments or hospitals that are part of the CDC NARMS surveillance network between 1996 and 2015. In the CDC NARMS dataset, the most abundant species was Salmonella enterica, which accounted for 58.70% of the 54351 total isolates. Serotype I,4,[5],12:i:- had 892 records in the dataset, accounting for 2.79% of the Salmonella enterica isolates. We chose Salmonella enterica I,4,[5],12:i:- because this strain is an emerging pathogen with public health implications for both hosts. The antibiotics CHL and TIO were selected for evaluation of a linear trend based on evidence of a linear trend in the naïve mean log_{2}MIC in the non-resistant population descriptive analysis. For each of the two organism-antibiotic combinations, the distribution of the observed MIC was a mixture of two components: resistant and non-resistant populations. Therefore, the model assumptions were satisfied for these two combinations. Prior to the linear trend analysis, we discarded records before 2002, because the data were sparse for the first six years of the surveillance dataset (fewer than 10 isolates).
Description of the VDL data used
Organisms isolated from livestock and submitted to veterinary diagnostic laboratories offer insight into the emergence and patterns of AMR. This population of organisms could offer different and unique information about AMR, providing insight into the implications for the spread of resistance through the local environment as well as occupational exposure. Yuan et al. (2018) reported that the swine population submitted to the ISU VDL suggested emergence of the pathogen Salmonella I,4,[5],12:i:- sooner than the NARMS data. For comparison with the analysis for the CDC NARMS data and also for the sake of model assumptions, we applied the proposed model to a subset of ISU VDL data: Salmonella enterica I,4,[5],12:i:- tested with TIO from swine submissions. We did not evaluate CHL as for the CDC NARMS data, as this antibiotic was not used by the ISU VDL.
The ISU VDL dataset provided surveillance data on MIC from 2003 to 2018, comprising 93,634 isolates in total in the swine subset. Of these, 21,392 isolates are Salmonella enterica. Our target subset, Salmonella enterica serotype I,4,[5],12:i:- in the swine submissions included 1,967 isolates. We removed data obtained in 2018, because records in 2018 were not complete and still being processed in the diagnostic laboratory at the time of our analysis. We also excluded data from the first 8 years and focused on data obtained between 2011 and 2017, due to the sparsity of isolates observed in the first few years.
The implementation of the Bayesian model to the CDC NARMS dataset and the VDL dataset followed the same procedure. We describe the determination of the initial values of the Markov Chain Monte Carlo (MCMC) chain and calculation for inference in the next subsection.
Implementation
Our proposed hierarchical Bayesian latent class mixture model with censoring and linear trend was implemented using the MCMC Gibbs sampling method. The Gibbs sampling algorithm was adapted for censorship in a finite mixture model [20] [21]. The algorithm of the Gibbs sampler is provided in S1 Appendix. All computation was implemented using R 3.3.5. All R scripts (including data cleaning, model construction, model implementation, results extraction, and results visualization) are provided on Github Github repository: https://github.com/MinZhang95/AMR-Linear.
The initial values for MCMC were obtained from the raw data, so that the MCMC chain converged soon. For one combination of organism and antibiotic, the naïve mean of log_{2}MIC in the non-resistant population in each tested year with censorship was calculated as the initial values for β_{0i}. Similarly, the naïve mean of log_{2}MIC values in the resistant population in each year was calculated as initial values for β_{1i}, i = 1, 2, …, I. For the CDC NARMS dataset, I = 14, and for the VDL dataset, I = 7. The initial values for the linear parameters γ_{0} and γ_{1} were obtained from fitting β_{0i} and years i = 1, 2, …, I with simple linear regression. The estimated standard deviation of the error term of the simple linear regression was used as the initial value for τ_{0}. The sample mean and sample standard deviation of β_{1i}(i = 1, 2, …, I) were calculated as the initial values for μ_{1} and τ_{1}, respectively. The initial values for the proportion of the resistant population p_{i} were calculated by dividing the number of resistant isolates by the total number of isolates in each year. The initial values of α_{i}(i = 1, 2, …, I) were obtained by performing a logit transformation on the proportions p_{i}.
Ten thousand iterations were performed, and the remaining 6,000 iterations after the 4,000 burn-in iterations were collected to make inferences. The parameters in the model were estimated by the mean of posterior distribution. The 2.5^{th} and 97.5^{th} percentiles of those 6,000 samples of posterior draws were used for determination of the 95% credible interval (CI). Results are presented and interpreted in the following section.
Results
Using the proposed hierarchical Bayesian latent class mixture model with censoring and linear trend, we analyzed the resistance level of Salmonella enterica I,4,[5],12:i:- tested with CHL and TIO in the CDC NARMS human data and also Salmonella enterica I,4,[5],12:i:- tested with TIO in the ISU VDL swine dataset. The goal of the analyses was to illustrate the evaluation of increasing or decreasing trends in mean log_{2}MIC over time in order to identify trends with important public health implications. The estimates of the yearly mean log_{2}MIC in both the non-resistant () and resistant populations () and estimates of the proportions of the population designated as resistant (), along with their 95% CIs, are presented in Tables 2, 3 and 4. A linear model was fitted to the mean log_{2}MIC in the non-resistant population to borrow information across years and to reveal a potential linear trend. The intercept (γ_{0}) was interpreted as the baseline of the mean log_{2}MIC in the non-resistant population, while the slope (γ_{1}) was interpreted as an increase in the resistance from the previous year (for i = 2, …, I) or from the baseline (for i = 1). The point and interval estimates of the linear model parameters for each example are presented in Table 5. Among all the above-mentioned estimates, the estimates for the yearly mean log_{2}MIC () and the linear parameters () of the non-resistant population are of the greatest interest, as the main objective of our study was to use the proposed model to detect linear temporal changes in AMR in the susceptible group. If shows an increase over time and the estimated slope is positive, then this result could signify increasing resistance for the organism to the antibiotic. The estimates for these parameters are also presented in Figs 2, 3 and 4 for better visualization. The yearly mean log_{2}MIC in the resistant population () was not our priority in this study, as most of the observations in the resistant population are right censored and thus do not provide enough information for parameter estimations. This right censoring also underlay the wide credible intervals of .
The grey bars represent the observed proportions of resistant bacteria (left y-axis). The grey line indicates the naïve mean of log_{2}MIC in the non-resistant population. The red line connects the point estimates of the mean log_{2}MIC in the non-resistant population. The blue line represents the estimated linear trend, shaded with its 95% CI.
The grey bars represent the observed proportions of resistant bacteria (left y-axis). The grey line indicates the naïve mean of log_{2}MIC in the non-resistant population. The red line connects the point estimates of the mean log_{2}MIC in the non-resistant population. The blue line represents the estimated linear trend, shaded with its 95% CI.
From the first row in Table 5, which shows the estimations of the linear model parameters for Salmonella I,4,[5],12:i:- tested with CHL in the CDC NARMS dataset, the 95% CI for the slope was (0.0088, 0.0773), indicating a significantly positive slope estimation and therefore a significantly increasing trend in the mean log_{2}MIC of the non-resistant population. Fig 2 depicts this increasing pattern by plotting the fitted regression line (blue line) through the estimated non-resistant means (red points). The yearly non-resistant means were scattered around the regression line in a random pattern, in agreement with the linear model assumption of independence. The grey histogram in Fig 2 shows the observed proportions of the resistant isolates. Notably, no Salmonella I,4,[5],12:i:- isolate resistant to CHL was observed for some years, while the mean of log_{2}MIC below the resistance threshold was estimated to be increasing constantly. This example demonstrated a situation in which analysis based on dichotomized MIC alone would misleadingly indicate a decreasing level of resistance from 2009 to 2012 and neglect the increasing level of resistance below the break point. Based on these results, we concluded that an intervention for use of CHL for Salmonella I,4,[5],12:i:- in human is suggested to prevent a possible outbreak of resistance if the linearly increasing pattern is allowed to continue in the following years.
In the example of Salmonella I,4,[5],12:i:- tested with TIO in the CDC NARMS dataset, we found an insignificant slope estimation, with a 95% CI of (−0.0168, 0.0844) (second row in Table 5). Despite the notion the true value of the slope parameter is within an interval that contains zero, our best estimation was positive, and the major coverage of the CI was greater than zero. No organism exhibited resistance to TIO above the threshold in 2012, reflecting a rapid decrease from more than 5% in 2011. This phenomenon was accompanied by a stable MIC increase in the non-resistant population.
For the ISU VDL dataset, we detected a significantly increasing pattern in the non-resistant means for Salmonella I,4,[5],12:i:- tested with TIO, with a 95% CI of (0.0049, 0.2713) (third row in Table 5). As shown in Fig 4, the 95% CI of the estimated regression line (the shaded area) is rather wide compared with those in Figs 2 and 3, and the difference likely resulted from the limited number of observations in this example. In this VDL swine dataset, the observed resistant proportions exhibited an abrupt decrease in 2012, an abrupt increase in 2015, and a continuous decrease since that time. Our estimated regression line added more dimensions in the non-resistant population by revealing the creep in its mean log_{2}MIC.
Model evaluation with simulation
In order to assess the performance of the proposed hierarchical Bayesian latent class mixture model with censoring and linear trend, a simulation study was conducted based on the CDC NARMS-CHL example. In our model, the parameters of interest were , and ν^{2}. In the simulation study, these parameters were predetermined according to the estimation results from the example and were denoted with a “hat” on the Greek letter. Also, we simulated the same number of observations for the ith year as the total number of isolates (n_{i}) in the CDC NARMS-CHL dataset. The data generation process is described in the following steps.
For i = 1, 2, …, I; j = 1, 2, …, n_{i}; and t_{i} = i:
- Generate .
- Calculate p_{i} by performing an expit transformation (inverse of logit) on α_{i}, .
- Generate the latent class indicator .
- Generate the mean log_{2}MIC in the non-resistant population .
- Generate the mean log_{2}MIC in the resistant population .
- Generate the log_{2}MIC for isolate j in year i with .
The simulated p_{i} and β_{0i} were treated as the real proportion of resistant population and the real yearly mean of log_{2}MIC, which were compared with estimations based on the simulated observations subsequently. Until this point, the simulated y_{ij} were continuous values drawn from a mixture of Gaussian distributions. To mimic the censoring nature of log_{2}MIC, y_{ij} was also censored according to its value. We previously defined l_{ij} and u_{ij} to be the lower bound and upper bound of y_{ij}. For CHL, the starting dilution was 2 mg/ml, and the ending dilution was 32 mg/ml for both serotypes. This indicated that if y_{ij} ≤ log_{2}(2), then it will be left censored as y_{ij} ≤ 1 with l_{ij} = −∞ and u_{ij} = 1. Similarly, if y_{ij} > log_{2}(32), it will be right censored as y_{ij} > 5, with l_{ij} = 5 and u_{ij} = ∞. If log_{2}(2) < y_{ij} ≤ log_{2}(32), then y_{ij} will be interval censored with l_{ij} as its nearest integer to the left and u_{ij} as its nearest integer to the right. This censoring operation corresponds to step 7. - Censor the underlying true values of log_{2}MIC, y_{ij}, to the observed values with , where represents rounding up to the nearest integer.
End of simulation.
As a single set of observed log_{2}MIC (i.e., ) could be generated by completing steps 1 to 7, we simulated 100 datasets by repeating the above procedure 100 times. With each simulated dataset, estimation was conducted using the proposed hierarchical Bayesian model, which produced a set of estimations: , etc., for i = 1, 2, …, I.
As a comparison, parameters were also estimated with the naïve estimation method, where is nothing but the yearly proportion of resistant component; is the yearly arithmetic mean of the observed log_{2}MIC in the non-resistant component; and the are estimations obtained from simple linear regression on the against time. The naïve estimation method does not resolve the censorship issue, and separate the isolates with a hard threshold.
In order to assess and compare the performance of our hierarchical Bayesian model and the naïve approach, two metrics were calculated. The first was the mean bias over the 100 simulations, and the second was the root of mean squared error (RMSE). Mean bias measures how close the estimations are relative to the true parameter values, while RMSE measures the variation of estimates around the true parameters.
Table 6 shows the mean bias and RMSE for the yearly parameters, and Table 7 shows the two metrics for the linear parameters, both of which compare between the proposed hierarchical Bayesian model and the naïve approach. The mean biases and RMSE for p_{i}, β_{0i}, and γ_{0} from our proposed model were closer to 0 compared with those from the naïve approach. The mean biases and RMSE for γ_{1} produced by the two approaches were both fairly small with similar values. This simulation study indicates that the estimations for the most relevant parameters are precise and robust by utilizing our proposed model.
Discussion
Our goal with this study was to address a deficiency in the available approaches for analysis of AMR data using MIC values. AMR is a serious public health issue worldwide, and enormous resources have been devoted to monitoring the changes in MIC over the years. We proposed a Bayesian hierarchical model with a linear trend and demonstrated that this model enables additional information about the linear patterns in the mean log_{2}MIC in the non-resistant population in addition to the proportion of resistant bacteria. The linear changes in the non-resistant population may be linear creep that signals a need for intervention or a linear decline that implies a successful intervention. Therefore, our proposed model offers a tool to a more complete picture of the resistance level of organisms against various antibiotics, providing valuable information for the surveillance programs.
The proposed approach was founded on the concept of identifying a valid and robust mean log_{2}MIC estimate that addresses the latent nature of the MIC data. Under our framework, there were two sources of latent parameters. One resulted from censorship, as the true underlying continuous values for the censored observed MIC values are unknown. The other involved the population from which the bacterial isolates arose (resistant or non-resistant population). By tackling the censorship problem and incorporating the mixed components of the data, our Bayesian hierarchical model corrected the systematic bias of the mean MIC estimations and separated the isolates from different groups. We then added a higher level of complexity to this fundamental model setup: linear regression in the hierarchical model.
Considering the variation of the MIC values shown in the dataset, we allowed the mean log_{2}MIC to vary across different years. The Bayesian hierarchical model yielded a more robust estimation by shrinking the mean estimates towards a linear regression line in the non-resistant population and towards a common constant in the resistant population. In this way, we were able to quantify the linear pattern in the mean log_{2}MIC in the group of isolates that are often undervalued by researchers. Compared with regressing the mean log_{2}MIC to a constant in the non-resistant population, a linear trend with a non-zero slope provided a better fit to the datasets and satisfied our model assumptions.
Our model relied on several assumptions. We assumed normal distributions for resistant and non-resistant populations. This assumption was supported by the observed MIC distribution for the examples used in this paper; however, under violation of this assumption, non-parametric methods, such as spline fitting, could be used to replace the normality assumption [16]. For both resistant and non-resistant populations, we also assumed invariant variances across years, following the principle of parsimonious models. This assumption could be important when there are observations from many years but not enough observations within each year. In addition, we assumed that the proportion of the resistant population was independent across all years. We also assumed the mean log_{2}MIC had independent errors in the linear model and the constant mean model in the sub-populations. Violation of this assumption would require inclusion of a correlation structure in the proportions or the errors terms.
In conclusion, we proposed a framework of analysis of longitudinal log_{2}MIC data using a Bayesian hierarchical approach with linear trend. We not only estimated the mean of log_{2}MIC values properly and accurately but also detected a significant linear increase in the mean log_{2}MIC in the non-resistant population for some given organisms and antibiotics, potentially signaling the need for intervention. Additional directions from this proposed framework include studying the correlations among multiple antibiotics, between human and animal resistance, and between different surveillance programs for the same population. In addition, analysis of the relationship between clinical interventions and the MIC responses to the interventions based on these models is of interest.
Supporting information
S1 File. Editing certificate.
The certificate of English editing is attached in the Supporting information as an external file.
https://doi.org/10.1371/journal.pone.0220427.s002
(PDF)
Acknowledgments
We are grateful to Dr. Chaohui Yuan for providing help with the R scripts for the basic model without the linear temporal trend. We also appreciate the staff at the Veterinary Diagnostic Laboratory, Iowa State University for assistance with the swine isolate data.
References
- 1. Mouton JW. Breakpoints: current practice and future perspectives. International Journal of Antimicrobial Agents. Elsevier. 2002 Apr;19(4):323–331.
- 2. van de Kassteele J, van Santen-Verheuvel MG, Koedijk FD, van Dam AP, van der Sande MA, de Neeling AJ. New statistical technique for analyzing MIC-based susceptibility data. Antimicrobial Agents and Chemotherapy. American Society for Microbiology. 2012 Jan;56:1557–1563.
- 3. Craig BA. Modeling approach to diameter breakpoint determination. Diagnostic Microbiology and Infectious Disease. Elsevier. 2000 Mar;36(3):193–202.
- 4. Komárek A. A new R package for Bayesian estimation of multivariate normal mixtures allowing for selection of the number of components and interval-censored data. Computational Statistics & Data Analysis. Elsevier. 2009 Oct;53(12):3932–3947.
- 5. Jaspers S, Aerts M, Verbeke G, Beloeil PA. A new semi-parametric mixture model for interval censored data, with applications in the field of antimicrobial resistance. Computational Statistics & Data Analysis. Elsevier. 2014 Mar;71:30–42.
- 6. Jaspers S, Verbeke G, Böhning D, Aerts M. Application of the vertex exchange method to estimate a semi-parametric mixture model for the MIC density of Escherichia coli isolates tested for susceptibility against ampicillin. Biostatistics. Oxford University Press. 2016 Jan;17(1):94–107.
- 7. Jaspers S, Lambert P, Aerts Marc. A Bayesian approach to the semiparametric estimation of a minimum inhibitory concentration distribution. The Annals of Applied Statistics. Institute of Mathematical Statistics. 2016;10(2):906–924.
- 8. Deckert A, Gow S, Rosengren L, Leger D, Avery B, Daignault D, et al. Canadian integrated program for antimicrobial resistance surveillance (CIPARS) farm program: results from finisher pig surveillance. Zoonoses and Public Health. Wiley Online Library. 2010 Nov;57(1):71–84.
- 9. Gagliotti C, Balode A, Baquero F, Degener J, Grundmann H, Gür D, et al. Escherichia coli and Staphylococcus aureus: bad news and good news from the European Antimicrobial Resistance Surveillance Network (EARS-Net, formerly EARSS), 2002 to 2009. Euro surveillance. European Centre for Disease Prevention and Control. 2011 Mar 17;16(11).
- 10.
The National Antimicrobial Resistance Monitoring System: manual of laboratory methods (third edition). Available from: https://www.fda.gov/media/101423/download. 2016.
- 11. Aerts M, Faes C, Nysen R. Development of statistical methods for the evaluation of data on antimicrobial resistance in bacterial isolates from animals and food. EFSA Supporting Publications. Wiley Online Library. 2011 Dec;8(12).
- 12. Cummings KJ, Perkins GA, Khatibzadeh SM, Warnick LD, Aprea VA, Altier C. Antimicrobial resistance trends among Salmonella isolates obtained from horses in the northeastern United States (2001–2013). American Journal of Veterinary Research. American Veterinary Medical Association. 2016 May;77(5):505–513.
- 13. Hanon JB, Jaspers S, Butaye P, Wattiau P, Meroc E, Aerts M, et al. A trend analysis of antimicrobial resistance in commensal Escherichia coli from several livestock species in Belgium (2011–2014). Preventive Veterinary Medicine. Elsevier. 2015 Dec 1;122(4):443–452.
- 14. Tadesse DA, Zhao S, Tong E, Ayers S, Singh A, Bartholomew MJ, et al. Antimicrobial drug resistance in Escherichia coli from humans and food animals, United States, 1950–2002. Emerging Infectious Diseases. Centers for Disease Control and Prevention. 2012 May;18(5):741–749.
- 15. Mazloom R, Jaberi-Douraki M, Comer JR, Volkova V. Potential information loss due to categorization of Minimum Inhibitory Concentration frequency distributions. Foodborne Pathogens and Disease. 2018 Jan;15(1):44–54. pmid:29039983
- 16. Ruiz J, Villarreal E, Gordon M, Frasquet J, Castellanos A, Ramirez P. From MIC creep to MIC decline: Staphylococcus aureus antibiotic susceptibility evolution over the last 4 years. Clinical Microbiology and Infection. Elsevier. 2016 Aug;22(8):741–742.
- 17. Hamilton MA, Rinaldi MG. Descriptive statistical analyses of serial dilution data. Statistics in Medicine. Wiley Online Library. 1988 Apr;7(4):535–544.
- 18. Annis DH, Craig BA. Statistical properties and inference of the antimicrobial MIC test. Statistics in Medicine. Wiley Online Library. 2005 Dec 15;24(23):3631–3644.
- 19.
Dey DK, Ghosh SK, Mallick BK. Generalized Linear Models: A Bayesian Perspective. CRC Press. 2000 May.
- 20. Nguyen M, Long SW, McDermott PF, Olsen RJ, Olson R, Stevens RL, et al. Using machine learning to predict antimicrobial MICs and associated genomic features for nontyphoidal Salmonella. Journal of Clinical Microbiology. American Society for Microbiology. 2019 Jan;57(2).
- 21. Jaspers S, Komárek A, Aerts M. Bayesian estimation of multivariate normal mixtures with covariate -dependent mixing weights, with an application in antimicrobial resistance monitoring. Biometrical Journal. Wiley Online Library. 2018 Jan;60(1):7–19.