Smooth operator: Modifying the Anhøj rules to improve runs analysis in statistical process control

Introduction The run chart is one form of statistical process control chart that is particularly useful for detecting persistent shifts in data over time. The Anhøj rules test for shifts by looking for unusually long runs (L) of data points on the same side of the process centre (mean or median) and unusually few crossings (C) of the centre depending on the number of available data points (N). Critical values for C and L have mainly been studied in isolation. But what is really of interest is the joint distribution of C and L, which has so far only been studied using simulated data series. We recently released an R package, crossrun that calculates exact values for the joint probabilities of C and L that allowed us to study the diagnostic properties of the Anhøj rules in detail and to suggest minor adjustments to improve their diagnostic value. Methods Based on the crossrun R package we calculated exact values for the joint distribution of C and L for N = 10–100. Furthermore, we developed two functions, bestbox() and cutbox() that automatically seek to adjust the critical values for C and L to balance between sensitivity and specificity requirements. Results Based on exact values for the joint distribution of C and L for N = 10–100 we present measures of the diagnostic value of the Anhøj rules. The best box and cut box procedures improved the diagnostic value of the Anhøj rules by keeping the specificity and sensitivity close to pre-specified target values. Conclusions Based on exact values for the joint distribution of longest run and number of crossings in random data series this study demonstrates that it is possible to obtain better diagnostic properties of run charts by making minor adjustment to the critical values for C and L.


Introduction
The run chart is one form of statistical process control chart that is particularly useful for detecting persistent shifts in data over time. The Anhøj rules test for shifts by looking for unusually long runs (L) of data points on the same side of the process centre (mean or median) and unusually few crossings (C) of the centre depending on the number of available data points (N). Critical values for C and L have mainly been studied in isolation. But what is really of interest is the joint distribution of C and L, which has so far only been studied using simulated data series. We recently released an R package, crossrun that calculates exact values for the joint probabilities of C and L that allowed us to study the diagnostic properties of the Anhøj rules in detail and to suggest minor adjustments to improve their diagnostic value.

Methods
Based on the crossrun R package we calculated exact values for the joint distribution of C and L for N = 10-100. Furthermore, we developed two functions, bestbox() and cutbox() that automatically seek to adjust the critical values for C and L to balance between sensitivity and specificity requirements.

Results
Based on exact values for the joint distribution of C and L for N = 10-100 we present measures of the diagnostic value of the Anhøj rules. The best box and cut box procedures improved the diagnostic value of the Anhøj rules by keeping the specificity and sensitivity close to pre-specified target values.

Introduction
Within statistical process control (SPC) runs analysis is being used to detect persistent shifts in process location over time [1]. Runs analysis deals with the natural limits of number of runs and run lengths in random processes. A run is a series of one or more consecutive elements of the same kind, for example heads and tails, diseased and non-diseased individuals, or numbers above or below a certain value. A run chart is a point-and-line chart showing data over time with the median as reference line (Fig 1). In a random process, the data points will be randomly distributed around the median, and the number and lengths of runs will be predictable within limits. All things being equal, if the process shifts, runs tend to become longer and fewer. Consequently, runs analysis may help detect shifts in process location. Process shifts are a kind of non-random variation in time series data that are of particular interest to quality control and improvement: If a process shifts, it may be the result of planned improvement or unwanted deterioration.
Several tests (or rules) based on the principles of runs analysis for detection of shifts exist. In previous papers we demonstrated, using simulated data series, that the currently best performing rules with respect to sensitivity and specificity to shifts in process location are two simple tests [1][2][3]: • Shifts test: one or more unusually long runs of data points on the same side of the centre line.
• Crossings test: the curve crosses the centre line unusually few times.
Collectively, we refer to these tests as the Anhøj rules, which are the default rules used for run and control chart analysis with the qicharts2 package for R [4]. For a thorough discussion of the practical use of run and control charts for quality improvement we refer to the qicharts2 package vignette.
Critical values for run length and number of crossings depend on the total number of data points in the chart, excluding data points that fall directly on the centre line. The number of crossings follows a binomial distribution, b(N-1, 0.5), where N is the number of data points and 0.5 the success probability. Thus, the lower prediction limit for number of crossings may, for example, be set to the lower 5th percentile of the corresponding cumulative binomial distribution [5]. However, no closed form expression exists for the distribution of longest runs. Consequently, the upper prediction limit for longest runs has traditionally been either a fixed value (usually 7 or 8) [6] or an approximate value depending on N as with the Anhøj rules: log 2 (N) + 3 rounded to the nearest integer [7]. Fig 1 has 20 data points, the curve crosses the centre line 9 times, and the longest run (points 3-6) contains 4 data points. In a random process with 20 data points, we should expect at least 6 crossings and the longest run should include no more than 7 data points. Thus, according to the Anhøj rules, Fig 1 shows random variation.
Each of the two tests has an overall specificity (true negative proportion) around 95%. The sensitivity (true positive proportion) of a test depends on the size of the shift (signal) relative to the random variation inherent in the process (noise). When applied together, the sensitivity increases, while the specificity decreases a bit and fluctuates around 92.5% (see red line in Fig  2).  Historically, runs tests have been studied in isolation. But what is really of interest because the rules are linked-when runs grow longer, crossings become fewer-is the properties of the joint distribution of number of crossings (C) and longest runs (L).
We recently released an R package, crossrun [8,9], that includes functions for calculating the joint probabilities of C and L in random data series of different lengths (N) and with and without shifts in process location expressed in standard deviation units (SD) . Fig 3 illustrates this for a run chart with N = 11 and SD = 0 (no shift). To avoid very small numbers, the probabilities are shown using the times representation, that is, the probabilities times 2 N-1 , which is 1024 for N = 11. The red box encloses the combinations of C and L that would indicate random variation according to the Anhøj rules (true negatives). The area outside the box represents combinations of C and L that would indicate non-random variation (false positives).
With the crossrun package it became feasible to calculate exact joint probabilities of C and L over a range of N and SD. And consequently, it became feasible to investigate the diagnostic properties of run charts using exact values for specificity and sensitivity rather than values based on time consuming, inaccurate, and complicated simulation studies.
As shown in Fig 2 the specificity of the Anhøj rules (red line) jumps up and down as N changes. This is a consequence of the discrete nature of the two tests-especially the shifts test. Although the specificity of the Anhøj rules does not decrease continuously as N increases, which is the case for other rules [2], we hypothesised that it would be possible to improve the diagnostic value further by smoothing the specificity using minor adjustments to C and L depending on N.
The aims of this study were to provide exact values for the diagnostic properties of the Anhøj rules and to suggest a "smoothing" procedure for improving the value of runs analysis.

Likelihood ratios to quantify the diagnostic value of runs rules
The value of diagnostic tests has traditionally been described using terms like sensitivity and specificity. These parameters express the probability of detecting the condition being tested for when it is present and not detecting it when it is absent: Specificity = P(no signal | no shift) = P(true negative) = 1 -P(false positive) Sensitivity = P(signal | shift) = P(true positive) = 1 -P(false negative) For example, the specificity of the Anhøj rules in a run chart with 11 data points may be calculated from Fig 3 as the proportion enclosed by the red box, which is 974 / 1024 = 0.9512. The sensitivity may be obtained from a similar matrix (not shown) including a shift as the proportion being outside the box. With a shift of 0.8 SD, the sensitivity is 0.3493 (Table 1).
However, we usually seek to answer the opposite question: what is the likelihood that a positive or negative test actually represents the condition being tested for? Likelihood ratios (LR) do this: Detailed explanations of likelihood ratios have been given previously [3,10]. As stated in [3], a likelihood ratio greater than 1 speaks in favour of the condition being tested for, and a likelihood ratio less than 1 speaks against the condition. As a rule of thumb, also presented in [3], a positive likelihood ratio (LR+) greater than 10 is described as strong evidence that the condition is present, and a negative likelihood ratio (LR-) smaller than 0.1 is described as strong evidence against the condition [10]. For example, if LR+ = 5 and LR-= 0.2, a positive test means that it is 5 times more likely that the condition is present than not present, and a negative test means that it is 5 times less likely that the condition is present than not present. Thus, as detailed in [3], likelihood ratios always occur in pairs and together constitute combined measures of the usefulness of a diagnostic test. Specifically, for our purpose, run charts are diagnostic tests for non-random variation in time series data [1,3].

Best box and cut box adjustments to improve the Anhøj rules
To fix some terms, we define a box as a rectangular region C � c, L � l that may be used to define random variation. The corner of the box is its upper right cell C = c, L = l. In Fig 3 the box C � 2, L � 6, marked with red, specifies the Anhøj rules for N = 11. The corner of this box is the cell C = 2, L = 6. Based on the crossrun package, which we described in detail in our previous article [9], we developed two functions, bestbox() and cutbox() that automatically seek to adjust the critical values for C and L to balance between sensitivity and specificity requirements. Specifically, the bestbox() function finds the box with highest sensitivity for a pre-determined shift (the target shift), among boxes with specificity � a pre-determined value (the target specificity). The cutbox() function subsequently cuts cells from the topmost horizontal and rightmost vertical borders of the best box, starting from the corner while keeping specificity � its target value, and the sensitivity for the target shift as large as possible. The result of cutbox() is not necessarily a box, but still a reasonable region for declaring random variation where the corner itself, possibly together with one or more of its neighbours downwards or to the left, may be removed from the best box.
In this study we used a target specificity of 0.925, which is close to the actual average specificity for the Anhøj rules for N = 10-100 and a target shift of 0.8. Fig 3 illustrates these principles for a run chart with 11 data points. Thus, for N = 11, the Anhøj rules would signal a shift if C < 2 or L > 6; best box would signal if C < 3 or L > 7; and cut box would signal if C < 3 or L > 7, and also when C = 3 and L = 7.
The following notation is introduced to describe the cut box rules (Table 1): In the rightmost vertical border of the best box (L = l) the part retained within the cut box is stated as C � Cbord. Similarly, in the topmost horizontal border of the best box (C = c) the part retained within the cut box is stated as L � Lbord. For N = 11, Cbord = 4 and Lbord = 6 (Fig 3  and Table 1), in which case only the corner is cut. If no cut is done, Cbord and Lbord are not specified, these are the cases in which the cut box is identical to the best box.

Results
We calculated the limits for the Anhøj, best box, and cut box rules together with their corresponding positive test proportions and likelihood ratios for N = 10-100 and SD = 0-3 (in 0.2 SD increments). The limits, specificities, and sensitivities (for SD = 0.8) are presented in Table 1. The R code to reproduce the full results set and the figures from this article is provided in the S1 File_crossrunbox.R. Note that to preserve numerical precision, the code stores the log of likelihood ratios. To get the actual likelihood values back, use exp(log-likelihood). Fig 2 illustrates the effect of the best box and cut box procedures on the specificity of the runs analysis. As expected, the variability in specificity with varying N is markedly reduced and kept above and closer to the specified target-more with cut box than with best box. N = number of data points in chart. C = lower limit for number of crossings, L = upper limit for longest run, for declaring random variation by the Anhøj and best box rules. Cbord and Lbord = Additional information for the cut box rules. When specified, parts of the border of the best box to retain to declare random variation.
When not specified, cut box is identical to best box (see text for details). Specificity = true negative proportion (no shift). Sensitivity = true positive proportion (shift = 0.8 SD). As expected and shown previously in our simulation studies, the power of the runs analysis increases with increasing N and SD [1][2][3]. The smoothing effect of best box and cut box appears to wear off as N and SD increases.  Table 1 Figs 6 and 7 compare the positive and negative likelihood ratios of the Anhøj rules to the box adjustments. The smoothing effect appear to be of practical value only for positive tests.

Discussion and conclusion
Based on procedures suggested in our previous paper [9], this study provides exact values for the diagnostic properties of the Anhøj rules for run charts with 10-100 data points including shifts up to 3 standard deviation units.
To our knowledge, and with the exception of our previous work, the properties of the joint distribution of number of crossings and longest runs in random data series have not been studied before. Furthermore, the study demonstrates that it is feasible to reduce the variability in run chart specificity with varying number of data points by using the best box and cut box adjustments of the Anhøj rules.
Most importantly, Figs 6 and 7 confirm our experience from years of practical use of runs analysis, that the Anhøj rules constitute a useful and robust method for detection of persistent shifts only slightly larger than 1 standard deviation units and with as little as 10-12 data points. This can be seen by the fact that LR+ > 10 for SD > 1 and N � 10. Although, the best box and cut box procedures will not change this, the box adjustments may improve the practical value of runs analysis by reducing sudden shifts in sensitivity and specificity when the number of available data points changes. Whether this holds true in practice remains to be confirmed.
The study has two important limitations. First, the calculations of box probabilities require that the joint distribution of the number of crossings and longest run is known. As shown in [9] this is the case when the process centre is fixed and known in advance, for example, the median from historical data. In practice the centre line is often determined from the actual data in the run chart, in which case the calculations of box probabilities do not apply. Preliminary studies suggest that this is mostly relevant for short data series. We plan to include a function in a future update of crossrun to calculate the box probabilities with empirical centre lines.
Second, the procedures have so far only been checked for up to 200 data points as detailed in [9]. Because of the iterative procedures and use of high precision numbers using functions from the Rmpfr R package [11] to calculate the joint distributions for varying N, the computations are time consuming, and for N > 100 the precision had to be increased. On a laptop with an Intel Core i5 processor and 8 GB RAM, it takes about one hour to complete https://doi.org/10.1371/journal.pone.0233920.g007 S1_crossrunbox.R for N = 10-100 and SD = 0-3, and the objects created consume over 6 GB of memory. We have no reason to believe that the procedures are not valid for higher N, but the application of the box procedures for larger N may be impractical at the moment.
Also, one should be aware that the value of the box procedures rely on the choice of target specificity and target shift values. Other target values will give different diagnostic properties. Preliminary studies suggest that increasing the target specificity to, say, 0.95 in fact increases the positive likelihood ratios a bit without affecting the negative likelihood ratios considerably. By supplying the R code, we encourage readers to adapt our findings to their own needs.
Regarding the practical application of the box adjustment of the Anhøj rules, we are in the process of testing a method argument for the qic() function from the qicharts2 package that allows the user to choose between "anhoej", "bestbox", and "cutbox" methods to identify nonrandom variation in run and control charts with up to 100 data points. This will allow us and others to quickly gain practical experience with box adjustments on real life data.
In conclusion, this study provides exact values for the diagnostic properties of the Anhøj rules for run charts with 10-100 data points including shifts up to 3 standard deviation units, and demonstrates that it is feasible to reduce the variability in run chart specificity from varying numbers of data points by using the best box and cut box adjustments of the Anhøj rules.