Figures
Abstract
Classifiers have been developed to help diagnose dengue fever in patients presenting with febrile symptoms. However, classifier predictions often rely on the assumption that new observations come from the same distribution as training data. If the population prevalence of dengue changes, as would happen with a dengue outbreak, it is important to raise an alarm as soon as possible, so that appropriate public health measures can be taken and also so that the classifier can be re-calibrated. In this paper, we consider the problem of detecting such a change in distribution in sequentially-observed, unlabeled classification data. We focus on label shift changes to the distribution, where the class priors shift but the class conditional distributions remain unchanged. We reduce this problem to the problem of detecting a change in the one-dimensional classifier scores, leading to simple nonparametric sequential changepoint detection procedures. Our procedures leverage classifier training data to estimate the detection statistic, and converge to their parametric counterparts in the size of the training data. In simulated outbreaks with real dengue data, we show that our method outperforms other detection procedures in this label shift setting.
Citation: Evans C, G’Sell M (2024) Sequential label shift detection in classification data: An application to dengue fever. PLoS ONE 19(9): e0310194. https://doi.org/10.1371/journal.pone.0310194
Editor: José Ramos-Castañeda, Instituto Nacional de Salud Pública: Instituto Nacional de Salud Publica, MEXICO
Received: April 15, 2024; Accepted: August 26, 2024; Published: September 16, 2024
Copyright: © 2024 Evans, G’Sell. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Full code for the data analysis and simulations presented in this paper is available at https://github.com/ciaran-evans/label-shift-detection. The data used is publicly available at https://journals.plos.org/plosntds/article?id=10.1371/journal.pntd.0003638.
Funding: Both authors were partially supported by National Science Foundation DMS1613202. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Competing interests: The authors have declared that no competing interests exist.
Introduction
Dengue fever is a viral infection which affects up to 400 million people a year [1]. To improve diagnosis, several authors have developed classifiers based on simple diagnostic and laboratory measurements, such as temperature, vomiting, and white blood cell count [2–4]. Such a classifier will necessarily be applied sequentially, making a prediction for each new patient with possible dengue symptoms, while the true dengue status may remain unobserved. However, the prevalence of dengue in a community may change quickly, due to both seasonal trends and outbreaks [5–7]. We need to detect this change as soon as possible, because a change in community prevalence impacts the quality of our classifier predictions, and also as a matter of public health.
In this paper, we propose a method to detect changes in sequentially-observed classification data, by directly using classifier predictions to construct a detection statistic. We apply our approach to simulated outbreaks of different speeds and severities, using existing dengue classification data from [3], and demonstrate competitive performance compared to other changepoint detection procedures.
In the case of a dengue outbreak, we have (a) a stream of unlabeled data (new patients) which require predictions, (b) a classifier making predictions sequentially on the new data, and (c) a change in the distribution of new data (i.e., the outbreak). To detect changes in the prevalence of dengue fever, our work leverages the label shift assumption, in which the marginal distribution of the labels changes, but the conditional distributions of the features (given the label) do not. In the dengue setting, label shift occurs if the prevalence of dengue changes but the symptoms of the disease do not, and label shift has previously been proposed as a reasonable mechanism for changes in disease prevalence [8]. From a public health perspective, an assumption like label shift is valuable because it allows us to characterize the type of change we expect to see in the population, and thereby develop a method to target this change.
Our work proposes a nonparametric procedure for detecting label shift by calculating a detection statistic with each new classifier prediction. That is, by using classifier predictions, we do not need to specify the underlying distribution of the observed data. Our method leverages the label shift assumption directly, and outperforms other nonparametric procedures when the label shift assumption is approximately correct, while requiring less knowledge about the data distributions than optimal procedures.
Below, we formally define the problem of detecting label shift in dengue data, and review existing changepoint detection methods. We then describe our proposed label shift detection procedure. Through simulations, we show that our proposed method outperforms other detection procedures when the label shift assumption holds, and can still detect changes even when the label shift assumption is violated; this is consistent with the work of [9], who showed that two-sample tests for label shift perform well for detecting other, more general changes in distribution. To demonstrate performance of our procedure with real classifier data, we use real dengue data from [3] and simulate a variety of changes in dengue prevalence. All code and data to reproduce the analysis in this paper is available at https://github.com/ciaran-evans/label-shift-detection.
Methods
Motivation: Dengue fever
Dengue, a viral infection transmitted by mosquitoes, is found in tropical and sub-tropical regions around the world, and affects up to 400 million people a year [1]. Diagnosis of dengue is important for the patient to receive appropriate treatment, and early treatment can improve prognosis. However, dengue cases are commonly mis-diagnosed [1]; while gold-standard diagnostic tests and rapid antigen tests exist, these may not always be available to healthcare providers. To assist healthcare workers in diagnosis and early detection of dengue [3], developed a classifier based on simple diagnostic and laboratory measurements, such as temperature, vomiting, and white blood cell count. The authors recommend deploying the classifier to help diagnose dengue in patients, which entails sequentially applying the classifier to make a prediction for each new patient.
However, the prevalence of dengue in a community may change quickly, due to both seasonal trends and outbreaks [5–7]. When a sudden change in dengue prevalence occurs, it is vital to raise an alarm; as noted by [7], “strategies are needed to respond quickly to unexpected incidents.”
We apply our changepoint detection procedure to the problem of detecting a change in the prevalence of dengue, using data and classifier predictions from the work of [3]. As the prevalence of dengue changes, but the symptoms are expected to stay the same, the label shift assumption is appropriate for this change. We have simulated the changes in dengue prevalence to explore a variety of different changes, but the data used for simulation are real, and the classifier is adapted from [3].
Problem statement
To formally define the problem and our proposed method, some notation is needed. Suppose (X1, Y1), (X2, Y2), is a sequence of feature vectors Xi and associated unobserved binary labels Yi. In our dengue example, Xi represents diagnostics measurements like white blood cell count and platelet count, while Yi represents true dengue status. We emphasize here that while the feature vectors Xi are observed, the true labels Yi are unobserved. That is, if we are diagnosing dengue fever, we observe the symptoms but not the true disease status.
Definition 1 (Changepoint). Suppose that at some time ν ≥ 0, the joint distribution of (Xi, Yi) changes, with before the change, and
after the change occurs. (For example, a dengue outbreak causes an increase in the rate of positive cases). We call ν the changepoint.
Our aim is to detect this change in the distribution of (Xi, Yi), using only the observed Xi; the general problem of detecting that a change has occurred in sequentially-observed data is called sequential changepoint detection, and we discuss mathematical details below. In this case, we are aided by a labeled training set . The assumption that the observations are independent is common in many changepoint detection methods, and is used when characterizing the performance of the detection procedure.
Because arbitrary changes to high-dimensional classification data (that is, data with many features recorded for each observations) may be impossible to correct or detect, it is standard to make additional assumptions on the nature of the change. Because it frequently arises in practice, we will focus on the label shift setting [10, 11], which has received recent attention in the machine learning literature [8, 9, 12, 13]. Label shift assumes that the marginal distribution of Y changes, but the conditional distribution of X|Y does not:
Definition 2 (Label shift). Let f∞,X,Y, f∞,Y, f∞,X, and f∞,X|Y=y denote the probability functions of (X, Y), Y, X, and X|Y = y respectively, under . Similarly define f0,X,Y, f0,Y, f0,X, and f0,X|Y=y. The label shift assumption is that f0,X|Y=y ≡ f∞,X|Y=y for all y, so
(1)
and
(2)
Label shift is simply a change in the mixing proportion for the class distributions X|Y = 0 and X|Y = 1. In the dengue example, label shift implies that the symptoms X have a common distribution conditional on the dengue status Y, while the prevalence of dengue cases changes. An illustration of label shift with a toy univariate distribution is shown in Fig 1. Below, we show how to leverage the label shift assumption for changepoint detection.
The left-hand panel shows the conditional distributions f∞,X|Y=y(x) for y = 0 and y = 1; these conditional distributions are the same for pre- and post-change data. The middle panel shows the marginal distribution f∞,X when f∞,Y(1) = 0.25, and the right-hand panel shows the marginal distribution f0,X when f0,Y(1) = 0.4.
Sequential changepoint detection
To detect a change in the unlabeled sequence X1, X2, …, classical changepoint detection procedures use a recursive detection statistic
(3)
where λ(Xt) = f0,X(Xt)/f∞,X(Xt) is the likelihood ratio at time t, Ψ is an update function, and the initial value is
. For example, the CUSUM procedure has Ψ(r) = max{1, r} and x = 1, while the Shiryaev-Roberts procedure has Ψ(r) = 1 + r and x = 0 [14]. A change is detected when
crosses a pre-specified threshold
, with stopping time
(4)
However, the pre- and post-change data distributions are rarely known in practice, and a variety of nonparametric alternatives have been proposed. Several authors have adapted nonparametric hypothesis tests to the changepoint detection problem, such as Kolmogorov-Smirnov tests [15], Cramer-von-Mises tests [16], and graph-based nearest-neighbors tests [17, 18]. Others have replaced the likelihood ratio λ with an estimate , to detect specific shifts in the mean or variance [19–23] or a change to a stochastically larger/smaller distribution [24–27]. Several papers have also estimated λ using samples from the pre- and post-change distributions, either with nonparametric estimation of the densities [28] or by directly estimating the ratio [29–35].
In general, we expect that samples from the post-change distribution are unavailable until the (unknown) changepoint occurs. In the label shift case, however, the difficulty of having samples from the post-change distribution is reduced to knowing the post-change marginal distribution of Y (see Eq (1)). In this manuscript, we propose a simple estimate of the likelihood ratio that leverages the label shift assumption.
Operating characteristics
The performance of sequential detection procedures, with stopping time Tx(A) at threshold A, is typically assessed by two operating characteristics, the average time to false alarm (also called the average run length, or ARL), and the average detection delay
, which are expected stopping times under the pre- and post-change distributions respectively. The goal is to minimize the average detection delay, subject to a lower bound on the average time to false alarm, and the CUSUM and Shiryaev-Roberts procedures are known to be optimal or approximately optimal for this problem [36–38]. We therefore compare average detection delay and average time to false alarm as a way to assess procedures in this manuscript.
Proposed method
Detection procedures which are at least approximately optimal for detecting a change in the unlabeled sequence X1, X2, … require the likelihood ratio λ(Xt) = f0,X(Xt)/f∞,X(Xt). While this likelihood ratio is hard to estimate in general, under the label shift assumption the likelihood ratio has a simple expression:
(5)
with pre- and post-change proportions π∞ = P∞(Y = 1) and π0 = P0(Y = 1). Our proposed method for detecting label shift is straightforward: use labeled training data to estimate P∞(Y = 1|X = x). Since label shift is a concern precisely because we wish to apply a classifier to new data, the existence of labeled training data is no burden.
Let denote our labeled training set (from the pre-change distribution), and suppose we use the labeled training set to train a classifier
with
. For example,
could be a logistic regression classifier, a random forest, or a neural network. Given π∞ and π0, our estimated likelihood ratio is
(6)
where the subscripts
and m denote dependence on the classifier and the size of the training set.
Label shift detection with known π0.
To detect a label shift change from the unlabeled sequence X1, X2, …, we calculate the classifier prediction for each new observation. Let
(7)
and
(8)
for some detection threshold A. Our goal is to minimize the detection delay
, while controlling the time to false alarm
. The process is summarized in Fig 2.
(a) Data X1, X2, … is observed from the pre-change distribution and the post-change distribution
. At each time t, a prediction
is made. If f0,X and f∞,X are known, then a detection statistic Rt can be calculated using the likelihood ratio λ(Xt). A change is detected when Rt ≥ A (or equivalently log Rt ≥ log A). (b) When the true likelihood ratio λ is unknown, we can use an estimate
instead;
is the resulting detection statistic. When
is close to λ, the stopping times
and T(A) are also expected to be close. (c) When a change is detected after the true changepoint ν, then T(A) − ν is the detection delay. (d) When T(A) < ν, then we have a false alarm, and T(A) is the time to false alarm.
Using the estimated likelihood ratio will not improve detection performance over the true likelihood ratio λ. However, as classifier performance improves—that is, as
—we expect that performance of our detection method will approach the optimal performance with the true likelihood ratio.
Remark 1. Using classifier predictions to estimate the likelihood ratio is natural in the label shift setting, as a classifier is already constructed and being applied to make predictions for new data. However, an advantage of the label shift setting is that it supports a variety of other approaches to likelihood ratio estimation. For example, kernel mean matching [34] and uLSIF [30] rely on both pre- and post-change data; under the label shift assumption, a post-change sample can be generated by re-sampling or re-weighting the training data when π0 is known. We compare this approach to our classifier-based likelihood ratio estimate in simulations below.
Label shift detection with unknown π0.
The estimated likelihood ratio in Eq (6) requires the post-change fraction of positive cases π0. While we have access to labeled pre-change training data to train our classifier
, we do not expect a sample of post-change data (labeled or not), and so the post-change parameter π0 may be unknown. To overcome an unknown π0, we mix over a set Π0 ⊂ [0, 1] of potential values for the post-change parameter, with a weight distribution w. Here we are inspired by the work of [39], which deals with the computational complexity involved in the integration by considering a window-limited approach that uses only a fixed number of the most recent observations. Let Π0 be the set of possible values for π0, and let w(π0) be a density on Π0. Each potential π0 results in a different likelihood ratio function
. Lai defines a CUSUM-type mixture stopping rule with detection statistic Rt,w and stopping time Tw(A) [39]:
(9)
where mα is the window size. In our label shift setting, we have
(10)
For each π0, we replace
with its estimate
from Eq (6), yielding the detection statistic
and stopping time
:
(11)
When
is a good estimate of
, we expect
(11) to be close to Tw(A) (9).
Remark 2. An alternative to mixing over Π0 is to maximize over possible values of π0 at each time step. This is the generalized likelihood ratio (GLR) approach, and has also been studied in previous research (see, e.g., [40]). For exponential families, some optimality properties of the GLR have been shown, but it is typically harder to control the average run length to false alarm [41]. Another option is to perform detection with a worst-case [42], which provides a worst-case bound on detection delay.
Simulations
We investigate the empirical performance of the classifier-based label shift detection procedure with the likelihood ratio estimate in Eq (6). Our likelihood ratio estimate depends on a classifier, and for simplicity we will use an LDA classifier, since it is easy to control whether the LDA assumptions are satisfied. For comparison, we consider several other detection procedures, which represent different approaches to changepoint detection. These procedures are summarized below and in Table 1. Because our label shift detection procedure is designed specifically for label shift, it leverages more information than the other nonparametric detection procedures. In particular, as summarized in Table 1, estimating the likelihood ratio with Eq (6) assumes that the label shift assumption holds, and the classifier performs well. Through simulations, we show that detection with Eq (6) outperforms the other nonparametric procedures when these assumptions are met, and can still perform well when the assumptions are violated. While we use a simple setting for simulations, we also apply the same methods to detect a change in dengue prevalence using the data and classifier from [3], with similar results to our simulations in this section.
We compare the following methods:
- Classifier-based CUSUM This is the nonparametric method proposed in the Methods, with likelihood ratio estimate (6). For the purposes of simulations,
in Eq (6) is an LDA classifier. Here we use a CUSUM procedure, so Ψ(r) = max{1, r}.
- Optimal CUSUM The optimal CUSUM procedure [43] uses the true likelihood ratio, and can be implemented when the true likelihood ratio is known.
- uLSIF CUSUM uLSIF [30] is a nonparametric method for estimating the likelihood ratio, by maximizing an empirical divergence. As described above, uLSIF can be used with training data under the label shift assumption by re-weighting or re-sampling training points, but it does not exploit the label shift structure of the likelihood ratio. A variety of similar density ratio estimation approaches exist, including KLIEP and kernel mean matching [32, 34, 44], and we take uLSIF as a representative. Here we use the densratio package [45] to implement uLSIF, and employ the resulting estimate in a CUSUM procedure.
- CPM [12] perform nonparametric label shift detection using the CPM framework described in [46, 47]. The CPM framework detects changes in a sequence of univariate data using repeated nonparametric tests; [12] applied repeated Cramer–von-Mises tests to a sequence of cosine divergences calculated between new data and training data. We evaluate CPM applied to both the classifier predictions and the cosine divergences used by [12]. CPM stopping times are calculated with the cpm package [47].
- kNN [17, 18] propose a sequential graph-based k-nearest neighbors (kNN) detection procedure, based on repeated nearest-neighbor two-sample tests in a sliding window. Note that while the kNN approach uses training data, only a fixed window of data is considered. Similar to some parameters in [18], we set the window size to 200 and the number of nearest neighbors to k = 5. Stopping times are calculated with the gStream package [48].
Metrics.
Performance of each detection procedure is measured by detection delay, calculated as (for CUSUM procedures, this corresponds to Lorden’s [36] detection delay). As is standard, we compare detection delays with each method calibrated to have the same average run length
. Here we use
, which is a common value in the sequential detection literature. Expected stopping times are estimated via Monte Carlo simulation.
Scenarios.
Under the label shift assumption, the classifier-based CUSUM procedure uses classifier predictions to estimate the likelihood ratio. To compare performance of the different detection procedures, we use two different simulation scenarios. In the first scenario, we change the training sample size and the performance of the classifier (by changing the distribution of the data Xi and violating LDA assumptions). In the second scenario, we change the performance of the classifier and the suitability of the label shift assumption.
- Scenario 1: Data is generated as X|(Y = 0) ∼ N(μ0, Σ0) and X|(Y = 1) ∼ N(μ1, Σ1). In all simulations, π∞ = 0.4, π0 = 0.7, μ0 = [0, 0], μ1 = [1.5, 1.5], and Σ0 = I. Training data
is simulated from the pre-change distribution, and used to train the LDA classifier, estimate the uLSIF likelihood ratio, and startup the CPM and kNN detection statistics. We consider m ∈ {200, 1000, 5000}, and
.
- Scenario 2: Pre-change data is generated as X|(Y = 0) ∼ N(μ∞,0, Σ0) and X|(Y = 1) ∼ N(μ∞,1, Σ1), while post-change data is generated as X|(Y = 0) ∼ N(μ0,0, Σ0) and X|(Y = 1) ∼ N(μ0,1, Σ1). In all simulations, π∞ = 0.4, π0 = 0.7, μ∞,0 = [0, 0], μ∞,1 = [1.5, 1.5], and Σ0 = I. Training data
is simulated from the pre-change distribution, and used to train the LDA classifier, estimate the uLSIF likelihood ratio, and startup the CPM and kNN detection statistics. We consider
and the following pairs for μ0,0 and μ0,1: μ0,0 = [0.5, 0.5] and μ0,1 = [1, 1]; μ0,0 = [0.75, 0.75] and μ0,1 = [0.75, 0.75]; and μ0,0 = [1, 1] and μ0,1 = [0.5, 0.5].
Case study: Dengue fever
Data.
Data comes from [3], who collected information on 5720 febrile patients aged 15 or younger in three Vietnamese hospitals. Of these patients, 30% had dengue. The authors recorded their true dengue status (using a gold-standard test), the results of an NS1 rapid antigen test, and a variety of physical measurements for classification with a logistic regression classifier. This dataset is anonymized and publicly available, and neither author of the present study was involved in [3], nor did we have any means of identifying any of the patients in these studies.
Classifier.
We use 1000 patients as training data for the classifier, and save the rest for evaluating our classifier and estimating changepoint detection performance. With the training set, we construct a logistic GAM classifier to predict true dengue status with the following covariates: vomiting (yes/no), skin bleeding (yes/no), BMI, age, temperature, white blood cell count, hematocrit, and platelet count. As in [3], the ROC curve has an AUC of approximately 0.8. The explanatory variables chosen here were included because they fit the label shift assumption in exploratory data analysis, and previous research [3, 49] demonstrates that adding additional variables to the model does not improve predictive performance or generalizability to new populations. The use of logistic regression (and variants) is also common in the dengue prediction literature [3, 4, 49–51], found that logistic regression methods were comparable or outperformed other approaches.
Scenarios.
To assess change detection, we simulate a change in the prevalence of dengue by resampling the 4720 patients not used for training. As the group of patients in the study aims to represent the population of patients who would be tested for dengue, we take the sample proportion of 30% as our baseline dengue prevalence among patients who would be tested. The degree of change in this prevalence, when an outbreak occurs, depends on the magnitude of the outbreak and the baseline prevalence in the population. Magnitude of change varies; for example, Hanoi, Vietnam saw roughly a five-fold increase in 2009 and 2015 [52, 53], while Kaohsiung City, Taiwan saw a 15-fold increase in 2014 [7]. Baseline prevalence in the full population varies depending on location—for example [5], shows approximately 1 in 1 million for certain areas of Thailand, whereas [7] show roughly 1 in 10000 on average in Taiwan. For Vietnam [52], report roughly 1 in 10000 to 1 in 1000 in Hanoi, with a peak of 384 per 100000 in 2009. For our purposes, we consider two label shift changes in prevalence:
- Abrupt change: We simulate an abrupt 5-fold increase, and take the baseline prevalence in the population to be roughly 1 in 10000. Applying Bayes rule, this gives a post-change prevalence of about 68% in our study population, and so we simulate a change from 30% to 68% and assess our ability to detect this shift.
- Gradual change: When the change occurs, prevalence increases gradually, rather than abruptly. Here, prevalence in the study population changes smoothly from 30% to 68% over the course of 100 observations.
Methods for comparison in dengue setting.
We compare the methods discussed above in the simulations to detect the change in dengue prevalence. The classifier CUSUM detection procedure is implemented using Eq (6) with the predicted probabilities from the dengue classifier described above. We also compare CUSUM with binarized predictions, using both a threshold of 0.5 and the threshold, 0.33, which maximizes sensitivity + specificity. The optimal CUSUM procedure uses the true dengue status, which is observable if gold-standard tests are available, and we also include CUSUM with binary predictions from the NS1 rapid antigen tests, which again may not be available. The rapid test has a specificity of approximately 99% and a sensitivity of 70% [3], compared with a specificity and sensitivity of 82% and 70% for the binarized classifier at threshold 0.33. As in the simulations, we also compare CPM using the classifier predicted probabilities, and CPM with divergences. uLSIF was considered but failed to consistently estimate the likelihood ratio, while kNN was not considered because it performed worse than the other methods examined. Finally, as the post-change parameter is typically unknown, we include the mixing procedure described in Eq (11). We use Π0 = [0.6, 0.8], which corresponds to a 3.5-fold to 9-fold increase in prevalence.
For the abrupt change scenario, all methods are compared. For the gradual change, we compare the mixture CUSUM procedure to CPM with classifier predictions, as these two methods perform well at detecting an abrupt change and do not require knowledge of the post-change parameter, and we include optimal CUSUM for reference.
Results
Simulation results
Table 2 shows the results for Scenario 1, when the label shift assumption holds. We can see that when the LDA assumptions are met (specifically Σ1 = Σ0 = I), LDA performs very close to the optimal CUSUM procedure, as we would predict. Performance of the LDA detection procedure relative to the optimal CUSUM procedure declines as the assumption that Σ1 = Σ0 is violated, but is still better than the other nonparametric methods. This suggests that if the label shift assumption holds, the likelihood ratio estimate in Eq (6) is a good choice for detecting the change, even if the classifier is mis-specified. Detection with the uLSIF procedure improves with training sample size m, as it becomes easier to estimate the likelihood ratio function and variability in the likelihood ratio estimate decreases. CPM also performs better as the sample size increases, as training data is used to construct the detection statistic. While the kNN method makes no assumptions about the change or the distribution of data, the cost of this flexibility is a decrease in detection performance.
Table 3 shows the results for Scenario 2, when the label shift assumption is violated. When the label shift assumption is approximately true (μ0,0 = [0.5, 0.5] and μ0,1 = [1, 1]), we can see that LDA detection is comparable to uLSIF and CPM. However, the LDA procedure is more sensitive to large departures from the label shift assumption, for which methods with fewer assumptions perform better. Overall, CPM with classifier predictions performs well, as the classifier predictions are a useful summary of the data even when label shift doesn’t hold.
Case study results: Detecting a dengue outbreak
Abrupt change.
Fig 3 and Table 4 show the relationship between (average time to false alarm) and
(average detection delay) for each method (uLSIF is not shown in Fig 3 because the detection delays are too large). As expected, the true dengue status and the rapid antigen test give the best detection performance. The predicted probabilities outperform the binarized predictions, as binarization throws away information on the likelihood ratio. The two binarized predictions are close, but the optimal threshold—which maximizes sensitivity + specificity − performs better. Mixture CUSUM and CUSUM with the predicted probabilities perform equally well, likely because all π0 ∈ Π0 = [0.6, 0.8] provide similar results. While CPM performs worse than CUSUM with predicted probabilities, it still provides a competitive alternative that requires no assumptions on the post-change prevalence. uLSIF has difficulty estimating the likelihood ratio, and performs substantially worse than the other methods.
Left: Comparison of detection performance for CUSUM procedures using different detection procedures, for a change in dengue prevalence from π∞ = 0.3 to π0 = 0.68. For ease, the method labels for the plot are displayed in descending order of detection delay. Right: Comparison of detection performance when π0 changes gradually from 0.3 to 0.68.
Gradual change.
Fig 3 shows the relationship between and
for each method. Detection delays are longer for all methods under gradual change than abrupt change, because the magnitude of change is initially smaller. However, each method can raise an alarm reasonably quickly. This is valuable because real changes in prevalence are expected to be continuous, rather than an abrupt switch from one prevalence to another. While the classic CUSUM procedure, and the nonparametric methods discussed in this paper, are designed to detect an abrupt change, Fig 3 demonstrates that these methods are sensitive to other changes too.
Discussion
Many previous papers have proposed classifiers to help diagnose dengue fever. When these classifiers are applied sequentially over time, it is important to detect any change in the distribution of the data. First, distributional shifts can affect the validity of classifier predictions, and second, a change in distribution may suggest a problem like a disease outbreak. In this paper, we consider procedures for detecting label shift, which can occur when the prevalence of a disease changes over time, but the symptoms of the disease remain the same.
As we focus on detecting changes in classification data, it is natural to use the classifier predictions in our detection procedure. Here we propose a simple, nonparametric sequential changepoint detection method that uses the classifier predictions to approximate the true likelihood ratio (6). Our procedure requires no additional estimation or training, assuming only that a reasonable value of the post-change prevalence π0 can be specified. Furthermore, when this post-change parameter is unknown, we combine our nonparametric procedure with Lai’s mixture CUSUM approach [39], and mix over the unknown prevalence.
Performance of the detection procedure then depends directly on classifier performance. Through simulations, we illustrate that our proposed detection procedure outperforms other nonparametric methods when the label shift assumption holds, and still achieves comparable performance when the the label shift assumption is violated. The same holds true when these methods are applied to real dengue classification data, in which we apply the classifier described in [3] to detect a simulated dengue outbreak. First, we see that improved classifier performance results in improved detection performance—if the gold standard dengue test is unavailable, only the NS1 rapid antigen test (which has better specificity than the classifier from [3]) outperforms our proposed procedure. Second, other nonparametric procedures respond more slowly to the outbreak, because they leverage less information about a change in prevalence.
Due to limitations of the available data, the changes in dengue prevalence in this manuscript have been simulated to illustrate the proposed changepoint detection procedure. A valuable direction for future research would be to monitor features of dengue diagnosis over the course of population outbreaks and assess the power of the label shift assumption used here. However, while the label shift assumption might not hold exactly in real data [9], found that testing for label shift is still a useful way for finding more general changes in distribution. This is supported by our simulation results, in which our label shift detection procedure still performs well under mild violations of the label shift assumption.
We also note that the classifier considered in the motivating example in this manuscript was trained on data from [3], which is specific to several hospitals in Vietnam. To assess the generalizability of dengue classifiers between different populations [49], studied classifier performance across five different publicly available dengue datasets, and their results show that dengue predictions do not always generalize to new populations. Given the importance of classifier performance in the changepoint detection procedure presented here, we recommend that for now, classifiers be constructed on relevant training data from the population of interest.
Finally, we note that the classic changepoint detection methods discussed here assume a sequence of independent observations. The purpose of this paper is to show how classifier predictions can be incorporated into a changepoint detection procedure based on the likelihood ratio, and classifier predictions could also be used in other detection procedures which rely on the likelihood ratio. Independence may also be a reasonable approximation in many settings. For example, dengue fever is typically transmitted through mosquitoes after a period of about a week, and is not spread by close contact, respiratory droplets, or bodily fluids (unless blood is involved); there may therefore be much less dependence between cases.
Supporting information
S1 Data. Full data and code for all results in this manuscript.
Full code for the data analysis and simulations presented in this paper is available at https://github.com/ciaran-evans/label-shift-detection. The data used in the dengue case study was made publicly available by [3], and a copy is provided in the repository with the code.
https://doi.org/10.1371/journal.pone.0310194.s001
(ZIP)
Acknowledgments
The authors gratefully acknowledge Samuel Ackerman, Chris Genovese, Aaditya Ramdas, Alessandro Rinaldo, and Zack Lipton for helpful discussions and feedback.
References
- 1.
WHO. Dengue and severe dengue; 2020. Available from: https://www.who.int/en/news-room/fact-sheets/detail/dengue-and-severe-dengue.
- 2. Tanner L, Schreiber M, Low JG, Ong A, Tolfvenstam T, Lai YL, et al. Decision tree algorithms predict the diagnosis and outcome of dengue fever in the early phase of illness. PLOS Neglected Tropical Diseases. 2008;2(3):e196. pmid:18335069
- 3. Tuan NM, Nhan HT, Chau NVV, Hung NT, Tuan HM, Van Tram T, et al. Sensitivity and specificity of a novel classifier for the early diagnosis of dengue. PLOS Neglected Tropical Diseases. 2015;9(4):e0003638. pmid:25836753
- 4. Ngim CF, Husain SMT, Hassan SS, Dhanoa A, Ahmad SAA, Mariapun J, et al. Rapid testing requires clinical evaluation for accurate diagnosis of dengue disease: A passive surveillance study in Southern Malaysia. PLOS Neglected Tropical Diseases. 2021;15(5):e0009445. pmid:34014983
- 5. Wiwanitkit V. An observation on correlation between rainfall and the prevalence of clinical cases of dengue in Thailand. Journal of Vector Borne Diseases. 2006;43(2):73. pmid:16967819
- 6. Garg A, Garg J, Rao Y, Upadhyay G, Sakhuja S. Prevalence of dengue among clinically suspected febrile episodes at a teaching hospital in North India. Journal of Infectious Diseases and Immunity. 2011;3(5):85–89.
- 7. Hsu JC, Hsieh CL, Lu CY. Trend and geographic analysis of the prevalence of dengue in Taiwan, 2010–2015. International Journal of Infectious Diseases. 2017;54:43–49. pmid:27865829
- 8.
Lipton ZC, Wang YX, Smola A. Detecting and correcting for label shift with black box predictors. In: International Conferenceon Machine Learning. PMLR; 2018. p. 3122-3130.
- 9.
Rabanser S, Günnemann S, Lipton Z. Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift. Advances in Neural Information Processing Systems 32. 2019.
- 10. Saerens M, Latinne P, Decaestecker C. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Computation. 2002;14(1):21–41. pmid:11747533
- 11.
Storkey A. When training and test sets are different: characterizing learning transfer. Dataset Shift in Machine Learning. 2009; p. 3–28.
- 12.
Ackerman S, Dube P, Farchi E. Sequential Drift Detection in Deep Learning Classifiers. arXiv preprint arXiv:200716109. 2020.
- 13.
Azizzadenesheli K, Liu A, Yang F, Anandkumar A. Regularized learning for domain adaptation under label shifts. arXiv preprint arXiv:190309734. 2019;.
- 14. Polunchenko AS, Tartakovsky AG. State-of-the-art in sequential change-point detection. Methodology and Computing in Applied Probability. 2012;14(3):649–684.
- 15. Madrid Padilla OH, Athey A, Reinhart A, Scott JG. Sequential nonparametric tests for a change in distribution: an application to detecting radiological anomalies. Journal of the American Statistical Association. 2019;114(526):514–528.
- 16. Ross GJ, Adams NM. Two nonparametric control charts for detecting arbitrary distribution changes. Journal of Quality Technology. 2012;44(2):102–116.
- 17. Chen H. Sequential change-point detection based on nearest neighbors. The Annals of Statistics. 2019;47(3):1381–1407.
- 18. Chu L, Chen H. Sequential change-point detection for high-dimensional and non-Euclidean data. IEEE Transactions on Signal Processing. 2022; 70:4498–4511.
- 19.
Brodsky E, Darkhovsky BS. Nonparametric methods in change point problems. vol. 243. Springer Science & Business Media; 1993.
- 20.
Brodsky E, Darkhovsky BS. Non-parametric statistical diagnosis: problems and methods. vol. 509. Springer Science & Business Media; 2000.
- 21. Tartakovsky AG, Polunchenko AS, Sokolov G. Efficient computer network anomaly detection by changepoint detection methods. IEEE Journal of Selected Topics in Signal Processing. 2012;7(1):4–11.
- 22. Tartakovsky AG, Rozovskii BL, Blažek RB, Kim H. Detection of intrusions in information systems by sequential change-point methods. Statistical Methodology. 2006;3(3):252–293.
- 23. Tartakovsky AG, Rozovskii BL, Blazek RB, Kim H. A novel approach to detection of intrusions in computer networks via adaptive sequential and batch-sequential change-point detection methods. IEEE Transactions on Signal Processing. 2006;54(9):3372–3382.
- 24. Bell C, Gordon L, Pollak M. An efficient nonparametric detection scheme and its application to surveillance of a Bernoulli process with unknown baseline. Lecture Notes-Monograph Series. 1994; p. 7–27.
- 25. Gordon L, Pollak M. An efficient sequential nonparametric scheme for detecting a change of distribution. The Annals of Statistics. 1994; p. 763–804.
- 26. Gordon L, Pollak M. A robust surveillance scheme for stochastically ordered alternatives. The Annals of Statistics. 1995; p. 1350–1375.
- 27. McDonald D. A CUSUM procedure based on sequential ranks. Naval Research Logistics. 1990;37(5):627–646.
- 28. Baron MI. Nonparametric adaptive change point estimation and on line detection. Sequential Analysis. 2000;19(1-2):1–23.
- 29. Nguyen X, Wainwright MJ, Jordan MI. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Transactions on Information Theory. 2010;56(11):5847–5861.
- 30. Kanamori T, Hido S, Sugiyama M. A least-squares approach to direct importance estimation. Journal of Machine Learning Research. 2009;10(Jul):1391–1445.
- 31.
Kawahara Y, Sugiyama M. Change-point detection in time-series data by direct density-ratio estimation. In: Proceedings of the 2009 SIAM International Conference on Data Mining. SIAM; 2009. p. 389–400.
- 32. Sugiyama M, Suzuki T, Nakajima S, Kashima H, von Bünau P, Kawanabe M. Direct importance estimation for covariate shift adaptation. Annals of the Institute of Statistical Mathematics. 2008;60(4):699–746.
- 33. Liu S, Yamada M, Collier N, Sugiyama M. Change-point detection in time-series data by relative density-ratio estimation. Neural Networks. 2013;43:72–83. pmid:23500502
- 34. Gretton A, Smola A, Huang J, Schmittfull M, Borgwardt K, Schölkopf B. Covariate shift by kernel mean matching. Dataset Shift in Machine Learning. 2009;3(4):5.
- 35.
Yu Y, Szepesvári C. Analysis of kernel mean matching under covariate shift. arXiv preprint arXiv:12064650. 2012;.
- 36. Lorden G. Procedures for reacting to a change in distribution. The Annals of Mathematical Statistics. 1971;42(6):1897–1908.
- 37. Moustakides GV. Optimal stopping times for detecting changes in distributions. The Annals of Statistics. 1986;14(4):1379–1387.
- 38. Tartakovsky AG, Pollak M, Polunchenko AS. Third-order Asymptotic Optimality of the Generalized Shiryaev–Roberts Changepoint Detection Procedures. Theory of Probability & Its Applications. 2012;56(3):457–484.
- 39. Lai TL. Information bounds and quick detection of parameter changes in stochastic systems. IEEE Transactions on Information Theory. 1998;44(7):2917–2929.
- 40. Siegmund D, Venkatraman E. Using the generalized likelihood ratio statistic for sequential detection of a change-point. The Annals of Statistics. 1995; p. 255–271.
- 41.
Tartakovsky A, Nikiforov I, Basseville M. Sequential analysis: Hypothesis testing and changepoint detection. Chapman and Hall/CRC; 2014.
- 42. Unnikrishnan J, Veeravalli VV, Meyn SP. Minimax robust quickest change detection. IEEE Transactions on Information Theory. 2011;57(3):1604–1614.
- 43. Page ES. Continuous inspection schemes. Biometrika. 1954;41(1/2):100–115.
- 44. Kanamori T, Suzuki T, Sugiyama M. Statistical analysis of kernel-based least-squares density-ratio estimation. Machine Learning. 2012;86(3):335–367.
- 45.
Makiyama K. densratio: Density Ratio Estimation; 2019. Available from: https://CRAN.R-project.org/package=densratio.
- 46. Ross GJ, Tasoulis DK, Adams NM. Nonparametric monitoring of data streams for changes in location and scale. Technometrics. 2011;53(4):379–389.
- 47. Ross GJ. Parametric and Nonparametric Sequential Change Detection in R: The cpm Package. Journal of Statistical Software. 2015;66(3):1–20.
- 48.
Chen H, Chu L. gStream: Graph-Based Sequential Change-Point Detection for Streaming Data; 2019. Available from: https://CRAN.R-project.org/package=gStream.
- 49. Lu B, Li Y, Evans C. Assessing generalizability of a dengue classifier across multiple datasets. bioRxiv. 2024.
- 50. Cavailler P, Tarantola A, Leo YS, Lover AA, Rachline A, Duch M, et al. Early diagnosis of dengue disease severity in a resource-limited Asian country. BMC Infectious Diseases. 2016;16:1–8.
- 51. Ho TS, Weng TC, Wang JD, Han HC, Cheng HC, Yang CC, et al. Comparing machine learning with case-control models to identify confirmed dengue cases. PLOS Neglected Tropical Diseases. 2020;14(11):e0008843. pmid:33170848
- 52. Cuong HQ, Hien NT, Duong TN, Phong TV, Cam NN, Farrar J, et al. Quantifying the emergence of dengue in Hanoi, Vietnam: 1998–2009. PLOS Neglected Tropical Diseases. 2011;5(9):e1322. pmid:21980544
- 53. Cheng J, Bambrick H, Yakob L, Devine G, Frentiu FD, Thai PQ, et al. Heatwaves and dengue outbreaks in Hanoi, Vietnam: New evidence on early warning. PLOS Neglected Tropical Diseases. 2020;14(1):e0007997. pmid:31961869