Process monitoring using inflated beta regression control chart

Luiz M. A. Lima-Filho; Tarciana Liberal Pereira; Tatiene C. Souza; Fábio M. Bayer

doi:10.1371/journal.pone.0236756

Abstract

This paper provides a general framework for controlling quality characteristics related to control variables and limited to the intervals (0, 1], [0, 1), or [0, 1]. The proposed control chart is based on the inflated beta regression model considering a reparametrization of the inflated beta distribution indexed by the response mean, which is useful for modeling fractions and proportions. The contribution of the paper is twofold. First, we extend the inflated beta regression model by allowing a regression structure for the precision parameter. We also present closed-form expressions for the score vector and Fisher’s information matrix. Second, based on the proposed regression model, we introduce a new model-based control chart. The control limits are obtained considering the estimates of the inflated beta regression model parameters. We conduct a Monte Carlo simulation study to evaluate the performance of the proposed regression model estimators, and the performance of the proposed control chart is evaluated in terms of run length distribution. Finally, we present and discuss an empirical application to show the applicability of the proposed regression control chart.

Citation: Lima-Filho LMA, Pereira TL, Souza TC, Bayer FM (2020) Process monitoring using inflated beta regression control chart. PLoS ONE 15(7): e0236756. https://doi.org/10.1371/journal.pone.0236756

Editor: Feng Chen, Tongii University, CHINA

Received: April 8, 2020; Accepted: July 11, 2020; Published: July 30, 2020

Copyright: © 2020 Lima-Filho et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are within the manuscript and its Supporting Information files.

Funding: The author(s) received no specific funding for this work.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Standard control charts are directly applied to the output of a quality characteristic. However, the quality characteristic (process output) can be affected by external covariates (control variables), where we rather control a varying mean than a constant one. In these cases, the regression control chart [1] may be an effective statistical process control tool. Such method is widely used when the quality of a process or product is better characterized by a functional relationship between the response variable and one or more explanatory variables [2].

The standard regression control chart is based on the linear regression model, where the variable of interest is assumed to be normally distributed. However, in practice, several of these variables may not follow a normal distribution, leading to poor Gaussian-based inferences. Thus, several studies have been proposing non-Gaussian model-based control charts. [3] presented a model-based scheme for monitoring multiple gamma-distributed variables. By considering that robust methods can be effective in the presence of outlying observations, [4] explored the robust generalized linear model for a gamma-distributed response. [5] used deviance residual for monitoring variables in a three-stage process assuming gamma, normal, and Poisson distributions.

Examples of a non-Gaussian process output are variables that assume values in the standard unit interval, such as fractions and proportions. In such instances, the usual regression control chart may be inappropriate since double bounded data are typically asymmetric and the Gaussian-based assumption is not suitable. In this sense, [6] proposed the beta regression control chart to monitor fractions and proportions related to control variables. The proposed control chart considers the beta regression model with varying dispersion [7], assuming that the mean and dispersion parameters of beta distributed variables are related to exogenous variables and modeled by regression structures. However, fractions and proportions may contain zeros and/or ones, leading to the unsuitable use of the beta distribution for data modeling [8].

Alternative regression models have been proposed to mend beta regression flaws in the presence of zeros and/or ones. [9] presented a unit inflated beta model for modeling efficiency scores as a function of exogenous variables. [10] proposed a zero inflated beta model to analyze data in corporate capital structures. [8] introduced a general class of zero or one inflated beta regression models, which is a natural extension of the beta regression model [11] to model variables that assume values in (0, 1] or [0, 1). [12] proposed an inflated beta regression model based on a reparametrization of the inflated beta distribution. This model accommodates mixed random variable responses, with non-negligible probabilities of assuming zeros and/or ones and continuous values in the interval (0, 1) that follows a beta distribution. The inflated beta regression model introduced by [12] may be useful for developing model-based control charts for monitoring inflated beta distributed processes as it considers an interesting parametrization in terms of the response variable mean. However, the model proposed by [12] does not consider a regression structure for the precision parameter. The monitoring of the mean and precision (or dispersion) is relevant to the statistical process control [6, 13, 14]. In addition, incorrect modeling of the dispersion can generate a high number of false alarms or loss of detection power of special causes [6]. Moreover, dispersion modeling is necessary in regression models in order to obtain accurate inferences about the structure parameters of the mean regression [15].

Control chart is a dynamic tool that works under two different phases, namely Phase I and Phase II. In practical situations, the in-control parameters are unknown and have to be estimated from a Phase I data set. Different Phase I data sets lead to different control chart performance. Thus, it is important to study the practitioner-to-practitioner variability due to parameter estimation. The aim of Phase I analysis is to estimate the parameters, while quick detection of out-of-control state is conducted in Phase II [16]. The literature offers some studies related to Phase I and Phase II analyses in regression models. For example, [17] proposed Phase I profile monitoring schemes for binary responses that can be represented by logistic regression models. [17] developed several Hotelling T²-type Phase I control charts for monitoring the parameters of a logistic regression linking to a binary response and one or more predictor variables. [18] developed control charts by integrating an exponentially weighted moving average scheme with a likelihood ratio test based on logistic regression models in Phase II study. [19] proposed a new modeling and monitoring framework for Phase I analysis of multivariate profiles by incorporating the regression-adjustment technique into the functional principal components analysis. [20] proposed the monitoring of profiles using generalized linear models during Phase II in which the explanatory variables can be a fixed design or any random arbitrary design.

In this context, this paper introduces the inflated beta regression control chart (IBRCC) with varying dispersion, useful for monitoring double bounded variables when zeros or ones appear in the data along with the presence of control variables. The process output may represent individual measures (e.g. efficiency score) or a ratio between continuous numbers (e.g. relative humidity). The contribution of the present paper is twofold. First, we extend the inflated beta regression model proposed by [12] by allowing a regression structure for the precision parameter. We also discuss likelihood inference of the model parameters. Second, we introduce the IBRCC based on the proposed inflated beta regression model with varying dispersion. Since in practice the parameters of the regression model are unknown, the proposed control chart is implemented into two phases. In Phase I, the parameters are estimated from an in-control sample, and in Phase II, we perform the monitoring scheme.

The remaining of the paper unfolds as follows. In Section 2, we describe the IBRCC and introduce the beta inflated mean regression model with varying dispersion. We also discuss likelihood inference and present the control limits estimation procedure. Section 3 presents a simulation study to evaluate (i) the inflated beta regression model with varying dispersion estimators and (ii) the performance of the proposed IBRCC and some competing control charts in the literature based on the run length (RL). In Section 4, we discuss and present an empirical application to show the applicability of the proposed IBRCC in real situations. Finally, some conclusions are presented in Section 5.

2 Inflated beta regression control chart

In this section, we introduce the IBRCC. Firstly, in Subsection 2.1 we present the inflated beta regression model with varying dispersion. The model we propose in this work is an extension of the model proposed by [12], where the authors used a reparametrization of the inflated beta distribution indexed by the response mean. In the Subsection 2.2 we present the model-based control limits for the proposed IBRCC. Secondly, in Subsubsection 2.2.1, we discuss the likelihood inference for the model parameters. Finally, in Subsubsection 2.2.2 we present the control limits estimation procedure.

2.1 Inflated beta regression model with varying dispersion

The inflated beta density function is given by [12] (1) where 0 < α₀ < 1, 0 < α₁ < 1, 0 < γ < 1, and ϕ > 0 are the distribution parameters, c = 1 − α₀(1 − γ) − α₁ γ, μ = γ(1 − α₁)/c, and is the beta density function given by [11] where 0 < μ < 1, ϕ > 0, and Γ(⋅) is the gamma function. Here, ϕ is a precision parameter (inversion of the dispersion), E(y) = γ, and . Density (1) is said to be zero and one inflated beta, i.e., . Note that P(y = 0) = α₀(1 − γ) and P(y = 1) = α₁ γ, thus if α₀ = 0 and α₁ > 0, the distribution in (1) is called one inflated beta distribution. Differently, if α₀ > 0 and α₁ = 0, the distribution given in (1) is called zero inflated beta distribution.

Let y₁, …, y_n be independent random variables where each y_t has the density in (1) for t = 1, …, n. The inflated beta regression model with varying dispersion, which is an extension of the regression model proposed by [12], is given by the following structures for modeling the response y_t (2) (3) (4) (5) where , , , and are vectors of unknown regression parameters, , , , and are observations on k₁, k₂, k₃, and k₄ covariates, respectively, and , , , and are real-valued link functions with continuous second derivatives. For g₁(⋅), g₂(⋅), and g₃(⋅), several different link functions can be used, such as logit, probit, log-log, complementary log-log, or Cauchy. For g₄(⋅), the choices are log or square root link functions, for example. More details about link functions on the class of beta regression models can be found in [11] and [21].

Notably, the model proposed by [12] assume that, for t = 1, …, n, ϕ_t = ϕ (constant precision). In the present study, the precision parameter is allowed to vary across observations, making the proposed model more general than the original inflated beta regression model. The assumption of non-constant dispersion is natural in several production processes [22–24]. In practical situations, it is important to monitor the dispersion of the process because an increase in dispersion may indicate process deterioration, while a reduction in dispersion means an improvement in process capability [25]. In addition, it is possible to consider control variables for modeling the parameters α₀ and α₁, which are related to the probabilities of zero and one, respectively.

2.2 Model-based control chart limits

The purpose of IBRCC is to monitor double bounded processes that contain values equal to zero or one, considering that the mean, precision, and parameters related to the probabilities of zero and one (α₀ and α₁) of the quality characteristic of interest are affected by control variables. Let (1 − α) be a control region where α is the type I error probability, the lower control limit (LCL), center line (CL), and upper control limit (UCL) of the proposed control chart are defined, respectively, by where is the inflated beta cumulative distribution function and F⁻¹(⋅) is the quantile function of the inflated beta variable. The parameters α_0t, α_1t, γ_t, and ϕ_t are functions of ω, κ, β, and ζ, respectively, and through (2), (3), (4), and (5) we have , , , and . In practice, the model parameters are unknown and estimation methods are necessary to estimate the in-control limits. Thus, we consider the likelihood theory [26–28], which we discuss in the following subsection. We presented results of the log-likelihood and score functions, which are extensions of those developed for the inflated beta regression model proposed by [12].

2.2.1 Likelihood inference.

We shall consider the maximum likelihood estimator (MLE) for the parameter vector θ = (ω^⊤, κ^⊤, β^⊤, ζ^⊤)^⊤. The log-likelihood function is given by (6) with where is an indicator function that equals 1 if y = 0 and 0 if y ∈ (0, 1], is an indicator function that equals 1 if y = 1 and 0 if y ∈ [0, 1), and in which α_0t, α_1t, γ_t, and ϕ_t are given by the regression structures in (2), (3), (4), and (5), respectively. Additionally,

By deriving the log-likelihood function in (6) with respect to each element of the parameter vector θ, we obtain the score vector given by U(θ) = (U_ω(θ)^⊤, U_κ(θ)^⊤, U_β(θ)^⊤, U_ζ(γ)^⊤)^⊤. The MLE of θ is obtained by solving the non-linear system (U_ω(θ)^⊤, U_κ(θ)^⊤, U_β(θ)^⊤, U_ζ(γ)^⊤) = 0, where 0 is a null vector of dimension (k₁ + k₂ + k₃ + k₄). The MLEs cannot be expressed in closed-form, hence the maximization of the log-likelihood function needs to be numerically conducted through a Newton or quasi-Newton algorithm. In this work, we used the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [29] for computational implementation.

The Fisher information matrix (K(θ)), which is useful for large sample inferences, requires the expectations of second order derivatives of the log-likelihood function. The score vector U(θ) and Fisher’s information matrix can be found in the Appendix.

To test hypotheses on the parameter θ_j, j = 1, …, (k₁ + k₂ + k₃ + k₄), we consider the null hypothesis versus . The Wald test may be considered using the following z statistic [26] where represents the MLE of θ_j and the standard error of is given by , in which is the asymptotic variance and covariance matrix of . In large sample sizes and under , the z statistic follows a standard normal distribution [26]. The test is performed by comparing the computed z statistic with the usual quantiles of the standard normal distribution.

2.2.2 Control limits estimation.

After obtaining the MLE of θ, , and considering an in-control process, the estimated control limits are given by (7) (8) where , , , and . Thus, we propose the following algorithm to implement the IBRCC.

Fit the inflated beta regression model with varying dispersion under Phase I and obtain the MLEs, namely , , , and .
Using covariates in Phase II, estimate α_0t, α_1t, γ_t, and ϕ_t such that , , , and .
For a given type I error probability α and the estimates , , , and , compute the estimated control limits using (7) and (8).
Plot each data point y_t together with the estimated control limits and , for t = 1, …, n.

The observation y_t that is out of the estimated control limits interval ( is considered out-of-control.

3 Simulation study

This section presents a Monte Carlo simulation study to evaluate the estimators of the introduced inflated beta regression model with varying dispersion and the performance of the IBRCC. The performance of the proposed control chart is compared with some alternatives in literature, namely: the usual linear regression control chart (RCC) [1], the beta regression control chart (BRCC) [6], and the inflated beta control chart (IBCC) [30]. Note that the RCC is a classical regression control chart that works under Gaussian assumptions. The other control charts are state-of-the-art alternatives, but the BRCC does not consider inflation in zeros and/or ones and the IBCC does not include covariates.

We used the following structures for data generation with t = 1, …, n. The values of , , and were obtained from a Bernoulli distribution with parameter p = 0.3, and x_t was generated from a uniform distribution in the interval (0, 1), thus considering discrete and continuous random variables. We considered 5, 000 Monte Carlo replications and sample sizes n = 100, 200, and 500. According to [31] and [32], this number of replications is enough to obtain accurate results. All simulations were performed using the R programming language [33].

In the numerical evaluation, we considered several scenarios with different characteristics, namely: zero and one inflated beta regression model (Scenario 1), zero inflated beta regression model (Scenarios 2, 4, and 6), and one inflated beta regression model (Scenarios 3, 5, and 7). The parameter values are shown in Table 1. In Scenario 1, the mean is centered on the standard unit interval, γ ∈ [0.43, 0.57], and the average percentage of zeros and ones in the sample is approximately equal to 13% for both. Scenarios 2, 4, and 6 consider the mean close to zero, with γ ∈ [0.063, 0.154], [0.012, 0.039], and [0.039, 0.119], and average percentages of zeros in the sample equal to 9.3%, 3.4%, and 6.4%, respectively. For Scenarios 3, 5, and 7, the mean is close to one, with γ ∈ [0.881, 0.971], [0.668, 0.924], and [0.858, 0.952], and average percentages of ones in the sample equal to 8.3%, 2.6%, and 23.5%, respectively.

Download:

Table 1. Different scenarios considered in the simulation study.

https://doi.org/10.1371/journal.pone.0236756.t001

3.1 Point estimation evaluation

For the point estimation evaluation, we computed the mean, percentage relative bias (RB), and mean square error (MSE) for each estimator in all Scenarios (see Table 1). For brevity and similarity of results, we only present results for Scenarios 1, 2, and 7 (n = 100 and n = 500) as shown in Table 2. The figures show that the mean of the estimators is close to the corresponding parameter values. The RB and MSE decrease when the sample size increases, indicating that the MLEs are consistent. For instance, for β₁ (γ submodel) in Scenario 1, the RB of the estimator is equal to 0.2257% for n = 100 and equal to −0.0548% for n = 500. Regarding MSE, considering ω₀ in Scenario 2 and n ∈ {100, 500}, the MSE is equal to 0.2291 and 0.0373, respectively. As in other studies related to beta regression [21, 34], it is noteworthy that the RB of MLEs corresponding to the precision covariate parameters is greater than those of that model the mean response. For instance, consider (ϕ submodel) in Scenario 7, we have RB = −8.2035% for n = 100 and RB = −2.4411% for n = 500. Regarding parameters related to the probabilities of zeros and ones, the bias also decreases considerably as sample size increases. For example, in Scenario 1 and n = 100, the estimator of ω₁ (α₀ submodel) yields RB = 16.9130% and the estimator of η₁ (α₁ submodel) yields RB = 14.8922%. For n = 500, the bias for the same estimators reduces to 4.5210% and 3.4431%, respectively.

Download:

Table 2. Monte Carlo simulation results of point estimation evaluation.

https://doi.org/10.1371/journal.pone.0236756.t002

In practice, the regression model relating the output and covariates is rarely known and the parameters have to be estimated. Our simulation results show that the MLE in the proposed model perform well, presenting low MSE for the estimates in all situations. This way, the proposed control chart may also present good performance in practice. In the next section, we shall investigate the run length performance of the IBRCC with estimated parameters.

3.2 Control charts performance

This section presents a run length analysis to evaluate the performance of the considered control charts. When the process is in-control, the run length (RL) distribution follows a geometric distribution with parameter α, which is the type I error probability [35]. The ability of a control chart to detect changes in the process is usually measured by the average number of observations until the detection of an out-of-control point (ARL) [36]. However, other measures can also be used for this purpose. We considered another location measure, the median (MRL), a dispersion measure, and the standard deviation (SDRL) of the RL distribution. Additionally, we computed the mean absolute percentage error (MAPE) for each measure for all evaluated control charts.

We compared the proposed IBRCC with the standard RCC [1], and the state-of-the-art charts, namely BRCC [6] and IBCC [30]. Since the BRCC does not consider values equal to zero or one, we replaced zeros by 0.0001 and ones by 0.9999 for its application. For all considered control charts, we examined two aspects of evaluation: in-control (, , ) and out-of-control (, where β is the type II error probability) [30, 35]. For brevity, we do not present the MRL and SDRL results for the out-of-control process. The control charts were evaluated in Scenarios 2 to 7 (Table 2), considering inflation in 0 or 1. Scenario 1 was not covered in this section because it does not reflect real statistical process control situations, being possible to present perfect nonconforming and perfect conforming in the same process.

To ensure that the comparisons between ARL₁ occur between control charts of same ARL₀, we adjusted the chart limits to obtain ARL₀ equal to the specified nominal values of 100 and 370. This control chart calibration is suggested in the literature [30, 37–39]. After ARL₀ calibration, a δ change was induced in the mean and precision regression structures to generate out-of-control processes as the following: logit(γ_t) = δ + β₀ + β₁ x_t and . By enabling the process to be out-of-control, we obtained the estimated ARL₁ for different values of δ. When δ = 0, the process is in-control and the ARL₀ can be evaluated.

The , , and evaluation results are shown in Tables 3 and 4. Consider an in-control process with α − values of 0.01 and 0.0027, from the geometric distribution of the RL we have ARL values equal to 100 and 370, nominal MRL values equal to 69.0 and 256.1, and values of SDRL equal to 99.5 and 369.5, respectively. The IBRCC showed better performance than BRCC, RCC, and IBCC, reaching empirical values closer to the nominal levels in all evaluated scenarios. In Scenarios 2, 4, and 6, the IBRCC and IBCC obtained 0 as the lower control limit, thus no point exceeded this limit. Similarly, in Scenarios 3, 5, and 7, the upper control limit of the mentioned charts were 1. The fact that these scenarios present the 0 or 1 as control limits is related to the value of the probabilities of occurrence of 0 or 1. That is, the IBRCC and IBCC will present zero as control limit when and one as control limit when . The high probability of Y assuming values equal to one or zero means that these values are not atypical (out of control) but usual occurrences of the process.

Download:

Table 3. Run length analysis to evaluate the IBRCC, BRCC, RCC, and IBCC with α = 0.01.

https://doi.org/10.1371/journal.pone.0236756.t003

Download:

Table 4. Run length analysis to evaluate the IBRCC, BRCC, RCC, and IBCC with α = 0.0027.

https://doi.org/10.1371/journal.pone.0236756.t004

Tables 3 and 4 also show the MAPE results. The proposed control chart has the lowest values for the MAPE. For example, consider α = 0.01, corresponding ARL₀, and n = 200, the MAPE obtained for the IBRCC, BRCC, RCC, and IBCC were, 1.32, 83.82, 61.17, and 21.80, respectively. It is noteworthy that, for the IBRCC, the MAPE decreases considerably when the sample size increases.

Among the considered alternative control charts, the IBCC achieved better performance than the BRCC and RCC in all scenarios. In Scenario 2 (Table 4), for n = 200, the IBCC presented a false alarm after 411 samples, when, in fact, a false alarm was expected for every 370 samples. In the same scenario, the BRCC and the RCC presented a false alarm rate in approximately 11 and 106 samples, respectively. These results show the importance of considering an accurate model to reduce false alarms. We also note that the BRCC obtained the worst performance. In Table 3, consider Scenario 3 and n = 100, the RCC and BRCC presented a false alarm in approximately each 50 and 27 observations, respectively. It is important to note that BRCC performance worsened as the 0 or 1 percentage increased. Confirming this fact, the IBCC also presented lower MAPE than BRCC and RCC in all scenarios. Considering α = 0.0027, n = 500, and the MRL₀ measure, the MAPE obtained for the IBRCC, BRCC, RCC, and IBCC were, respectively, 4.82, 95.15, 80.22, and 35.45.

Results of the ARL₁ evaluation are shown graphically in Figs 1 and 2. It was not possible to correct ARL₀ for the BRCC due to the poor in-control performance. Thus, the evaluation of ARL₁ was given only for the IBRCC, RCC, and IBCC. It is noteworthy that when several control charts are compared in terms of ARL, the one that presents the lowest ARL₁ among those with same ARL₀ is the control chart that outperforms the competitors [30]. By analyzing the ARL₁ results, when the perturbation was introduced in the mean of the process (Fig 1), we observe that in Scenario 3 the IBRCC performs better than the RCC and IBCC, and in Scenario 2 the performance of the control charts are similar. We note that the IBRCC detects more quickly the out-of-control process. For example, in Scenario 3, ARL = 370, n = 100, and δ = −0.4, the IBRCC takes 176 samples on average to detect a change in the process, while the IBCC takes 186 and the RCC takes 192 to detect a change of same magnitude. The simulation results showed similar behavior when a perturbation in the precision of the process occurs (Fig 2). The control charts detect process changes more quickly as the precision increases (dispersion decreases).

Download:

Fig 1.

curves evaluation for the inflated beta regression control chart (solid line), regression control chart (dashed line), and inflated beta control chart (dotted line) when the mean is out-of-control.

https://doi.org/10.1371/journal.pone.0236756.g001

Download:

Fig 2.

curves evaluation for the inflated beta regression control chart (solid line), regression control chart (dashed line), and inflated beta control chart (dotted line) when the precision is out-of-control.

https://doi.org/10.1371/journal.pone.0236756.g002

By considering the results obtained in the simulation, we see a necessity of using a control chart based on an appropriate regression model, such as the IBRCC, when the variable of interest is restricted to the intervals [0, 1) or (0, 1]. The use of the linear regression-based control chart is inappropriate for data of this type since the support of the usual regression model is the whole real space. Interestingly, the BRCC proved to be more inadequate in the presence of values equal to zero or one than the traditional RCC or the IBCC that uses inflated beta distribution but does not consider a regression structure. Since the BRCC does not accommodate values equal to zero or one, by substituting zeros for 0.0001 and ones for 0.9999, an inflation in these values is induced. That is, the probability mass at 0.0001 and/or 0.9999 exceeds what is allowed by the beta distribution, which is an absolutely continuous distribution. This reflects on the estimates of the parameters of the regression structures and, automatically, the estimates of the control limits are impaired.

4 Real data application

This section contains an empirical application in which the proposed control chart (IBRCC) and three other competing control charts are analyzed: the RCC, BRCC, and IBCC. The data evaluated in this section refer to the public administrative efficiency of the municipalities in the state of São Paulo, Brazil. The data are a subset of those analyzed by [40], who considered all Brazilian municipalities. The dataset we used contains 427 municipalities for the year 2000 and it is available at http://www.de.ufpb.br/~luiz/datasets/Dataset_plosone.txt. The covariates are from Secretaria do Tesouro Nacional (http://www.tesouro.fazenda.gov.br/), Instituto Brasileiro de Geografia e Estatística (IBGE) (https://www.ibge.gov.br/), and Instituto de Pesquisa Econômica Aplicada (IPEA) (https://www.ipea.gov.br/portal/), Brazil. The quality characteristic, y, is introduced by [40] and represents individual observations of an efficiency index, assuming values in (0, 1] and measuring how well mayors spend taxpayer money in order to provide them with public services. The efficiency index is equal to one when there is full efficiency. There are 32 units that are fully efficient (i.e., about 7.5% of the observations are equal to one). A brief description of the variables used in the analysis is presented in Table 5. Variables CONS, R2, and MT are dummies, i.e., they are equal to 0 or 1. The covariate CONS equals 1 if the municipality participates in the inter-municipal consortia, the covariate R2 equals 1 whenever the municipality receives more than 10% of its tax revenue to royalty, and the covariate MT equals 1 whenever the municipality is tourist, 0 otherwise for the three dummies covariates. It is important to mention that 100 municipalities were sorted to estimate the model parameters (Phase I), while the remaining observations were used for monitoring (Phase II).

Download:

Table 5. Description of the variables for efficiency data.

https://doi.org/10.1371/journal.pone.0236756.t005

At the outset, the inflated (at one) beta mean regression model, the beta regression model substituting 1 for 0.9999, and a linear regression model were selected and fitted. We used the logit link for γ and α₁ and the log link for ϕ. For the beta regression, we considered logit for μ and log link for ϕ. The maximum likelihood estimates of the models parameters are displayed in Table 6. All covariates were significant at the nominal level of 5%. In order to compare the fitted regression models, we considered the MAPE and MSE between the observed and fitted values. According to these criteria the inflated beta regression model outperforms the other ones, with MAPE = 26.8835 and MSE = 0.0387, while the beta regression model obtains MAPE = 29.4101 and MSE = 0.0454, and linear regression model achieves MAPE = 29.3617 and MSE = 0.0389.

Download:

Table 6. Adjusted models for efficiency data.

https://doi.org/10.1371/journal.pone.0236756.t006

Table 7 presents some descriptive statistics of the estimated control limits. Note that the proposed control chart and the IBCC are the only ones that have an upper control limit constant and equal to one. Differently, when beta regression control chart is used, the control limits were restricted to the open interval (0, 1) and thus, in this case, fully efficient municipalities are considered out-of-control. In addition, we verify that, by using the standard RCC, the limits assume values below zero and above one, not being restricted to the interval (0, 1], where the data are distributed. The interpretation of the limits, in this case, makes no practical sense and leads to loss of detection power of out-of-control points.

Download:

Table 7. Descriptive statistics—minimum (min), first quantile (Q_1/4), median, mean, third quantile (Q_3/4), and maximum (max)—control limits for efficiency data.

https://doi.org/10.1371/journal.pone.0236756.t007

Fig 3 graphically presents the control limits of the (a) IBRCC, (b) BRCC, (c) RCC, and (d) IBCC together with the observed values of efficiency considering ARL₀ = 100. Considering the fact that the efficiency index assumes values in (0, 1], the proposed model-based control chart (IBRCC) presents limits with a smaller range. Interestingly, the BRCC does not accommodate values equal to one by substituting values equal to one for 0.9999, an inflation in these values is induced, therefore the BRCC is less adequate in the presence of values equal to one than the traditional RCC. The use of the linear regression-based control chart is inappropriate for data of this type since the support of the usual regression model is the whole real space. Finally, the IBCC that uses inflated beta distribution but does not consider regression structure presents constant limits that are not appropriate in situations were we have control variables (covariates). It is worth mentioning that IBRCC detected 7 out-of-control points, while BRCC detected 36 out-of-control points. Lastly, we carried out the RESET misspecification test [41], where the null hypothesis is that the fitted model is correctly specified and the alternative hypothesis is that there is model misspecification. We perform the test using the second power of the estimated mean linear predictor as testing variables. We do not reject the null hypothesis at the 1% nominal level, thus suggesting that our model is correctly specified.

Download:

Fig 3. Plot of the control limits based on (a) inflated beta regression control chart, (b) beta regression control chart, (c) regression control chart, and (d) inflated beta control chart for monitoring the efficiency indexes for municipalities in the state of São Paulo, Brazil, considering ARL₀ = 100.

https://doi.org/10.1371/journal.pone.0236756.g003

5 Conclusions

In this paper, we proposed a new model-based control chart for controlling quality characteristics limited to the intervals (0, 1] or [0, 1) using the inflated beta regression model. For this purpose, we extended the inflated beta regression model proposed by [12] by allowing a regression structure for the precision parameter. In this way, it is possible to model the mean response, the data precision, and functions of the probability of a given observation assuming zero or one through a regression framework. Our simulation study showed that the relative bias and mean square error decrease when the sample size increases. With regard to the sensitivity analysis in terms of run length (RL), the proposed IBRCC showed the best performance in all considered cases. In addition, the results indicated that it is better to ignore the explanatory variables and use the inflated beta control chart (IBCC) than to use a control chart based on an inappropriate regression model. We also considered an application to real data and highlight the practical importance of the proposed chart when the response is distributed in unit intervals containing ones. Finally, we suggest the use of the inflated beta regression control chart to monitor output quality characteristics, which is better characterized by a functional relation between the response variable, double bounded in unit intervals containing zeros or ones along with one or more explanatory variables.

A Score function and Fisher’s information matrix

In this appendix we obtain the score function and presented a closed-form expression for Fisher’s information matrix for all parameters of the inflated beta regression model with varying dispersion. We assume that the observed values of the dependent fractional variable are sorted according to the 0, 1, and (0, 1)-values with n₀, n₁, and n − m terms, respectively, where m = n₀ + n₁. Furthermore, and , where ψ(⋅) is the digamma function.

Let

The score function for ω is given by where . Therefore,

The score function for κ is given by where . Thus,

For β, the score function is given by where and . Then we have

The score function for ζ is given by where . Therefore,

In matrix form, each term of the score vector is given by where , , , , , , M = diag{m₁, …, m_n}, , A₁ = diag{(1 − α₀₁), …, (1 − α_0n)}, A₂ = diag{(1 − α₁₁), …, (1 − α_1n)}, , , , , , , , , , is a n × k₁ matrix whose t-th row is , is a n × k₂ matrix whose t-th row is , X is a n × k₃ matrix whose t-th row is , is a n × k₄ matrix whose t-th row is , , , and .

The joint information matrix for the parameter vector θ = (ω^⊤, κ^⊤, β^⊤, ζ^⊤)^⊤ is given by where , , , , , , , , , , , , , , , f_t = −ψ′((1 − μ_t)ϕ_t), D = diag{d₁, …, d_n}, , C = diag{c₁, …, c_n}, and c_t = 1 − α_0t(1 − γ_t) − α_1t γ_t.

Supporting information

S1 Dataset.

https://doi.org/10.1371/journal.pone.0236756.s001

(TXT)

References

1. Mandel B. J., (1969). The regression control chart. Journal of Quality Technology 1 (1), 1–9.
- View Article
- Google Scholar
2. Shu L., Tsung F., Tsui K.-L., (2004). Run-length performance of regression control charts with estimated parameters. Journal of Quality Technology 36 (3), 280–292.
- View Article
- Google Scholar
3. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2003). Process monitoring for correlated gamma-distributed data using generalized-linear-model-based control charts. Quality and Reliability Engineering International 19, 477–491.
- View Article
- Google Scholar
4. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2005). Model-based process monitoring using robust generalized linear models. International Journal of Production Research 43 (7), 1337–1354.
- View Article
- Google Scholar
5. Jearkpaporn D., Borror C., Runger G., Montgomery D., (2007). Process monitoring for mean shifts for multiple stage processes. International Journal of Production Research 45, 5547–5570.
- View Article
- Google Scholar
6. Bayer F., Tondolo C., Müller F., (2018). Beta regression control chart for monitoring fractions and proportions. Computers & Industrial Engineering 119 (1), 416–426.
- View Article
- Google Scholar
7. Bayer F. M., Cribari-Neto F., (2017). Model selection criteria in beta regression with varying dispersion. Communications in Statistics—Simulation and Computation 46 (1), 729–746.
- View Article
- Google Scholar
8. Ospina R., Ferrari S. L. P., (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics and Data Analysis 56, 1609–1623.
- View Article
- Google Scholar
9. Hoof A., (2007). Second stage DEA: comparison of approaches for modelling the DEA escore. European Journal of Operational Research 181, 425–435.
- View Article
- Google Scholar
10. Cook D. O., Kieschnick R., McCullough B. D., (2008). Regression analysis of proportions in finance with self selection. Journal of Empirical Finance 15 (5), 860–867.
- View Article
- Google Scholar
11. Ferrari S., Cribari-Neto F., (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31 (7), 799–815.
- View Article
- Google Scholar
12. Bayes C. L., Valdivieso L., (2016). A beta inflated mean regression model for fractional response variables. Journal of Applied Statistics 43 (10), 1814–1830.
- View Article
- Google Scholar
13. Teyarachakul S., Chand S., Tang J., (2007). Estimating the limits for statistical process control charts: A direct method improving upon the bootstrap. European Journal of Operational Research 178, 472–481.
- View Article
- Google Scholar
14. Capizzi G., Masarotto G., (2011). A least angle regression control chart for multidimensional data. Technometrics 53 (3), 285–296.
- View Article
- Google Scholar
15. Smyth G. K., Verbyla A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10 (6), 695–709.
- View Article
- Google Scholar
16. Rahimi S. B., Amiri A., Ghashghaei R., (2019). Simultaneous monitoring of mean vector and covariance matrix of multivariate simple linear profiles in the presence of within profile autocorrelation. Communications in Statistics - Simulation and Computation Accepted.
17. Yeh A. B., Huwang L., Li Y.-M., (2009). Profile monitoring for a binary response. IIE Transactions 41 (11), 931–941.
- View Article
- Google Scholar
18. Shang Y., Tsung F., Zou C., 7 2011. Profile monitoring with binary data and random predictors. Journal of Quality Technology 43, 196–208.
- View Article
- Google Scholar
19. Zhang J., Ren H., Yao R., Zou C., Wang Z., (2015). Phase I analysis of multivariate profiles based on regression adjustment. Computers & Industrial Engineering 85, 132—144.
- View Article
- Google Scholar
20. Qi D., Wang Z., Zi X., Li Z., (2016). Phase ii monitoring of generalized linear profiles using weighted likelihood ratio charts. Computers & Industrial Engineering 94, 178—187.
- View Article
- Google Scholar
21. Canterle D. R., Bayer F. M., (2019). Variable dispersion beta regressions with parametric link functions. Statistical Papers 60, 1541–1567.
- View Article
- Google Scholar
22. Gan F. F., (1995). Joint monitoring of process mean and variance using exponentially weighted moving average control charts. Technometrics 37 (4), 446–453.
- View Article
- Google Scholar
23. Riaz M., (2008). Monitoring process variability using auxiliary information. Computational Statistics 23 (2), 253–276.
- View Article
- Google Scholar
24. Sheu S.-H., Tai S.-H., Hsieh Y.-T., Lin T.-C., (2009). Monitoring process mean and variability with generally weighted moving average control charts. Computers & Industrial Engineering 57 (1), 401–407.
- View Article
- Google Scholar
25. Huwang L., Huang C.-J., Wang Y.-H.T., (2010). New EWMA control charts for monitoring process dispersion. Computational Statistics & Data Analysis 54 (10), 2328–2342.
- View Article
- Google Scholar
26. Pawitan Y., (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford Science publications.
27. Casella G., Berger R. L., (2002). Statistical Inference, 2nd Edition. Thomson Learning.
28. Chen F., Chen S., Ma X., (2018). Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research 65, 153–159. pmid:29776524
- View Article
- PubMed/NCBI
- Google Scholar
29. Press W., Teukolsky S., Vetterling W., Flannery B., (1992). Numerical recipes in C: The art of scientific computing, 2nd Edition. Cambridge University Press.
30. Lima-Filho L. M. A., Pereira T. L., de Souza T. C., Bayer F. M., (2019). Inflated beta control chart for monitoring double bounded processes. Computers & Industrial Engineering 136, 265–276.
- View Article
- Google Scholar
31. Boon Chong M., 1 2004. Performance Measures for the Shewhart Control Chart. Quality Engineering 16 (4), 585–590.
- View Article
- Google Scholar
32. Schaffer J. R., Kim M.-J., (2007). Number of replications required in control chart Monte Carlo simulation studies. Communications in Statistics - Simulation and Computation 36 (5), 1075–1087.
- View Article
- Google Scholar
33. R Core Team, (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
34. Simas A. B., Souza W. B., Rocha A. V., (2010). Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis 54 (2).
- View Article
- Google Scholar
35. Ho L. L., Fernandes F. H., Bourguignon M., (2019). Control charts to monitor rates and proportions. Quality and Reliability Engineering International 35 (1), 74–83.
- View Article
- Google Scholar
36. Montgomery D. C., (2009). Introduction to Statistical Quality Control, 6th Edition. John Wiley & Sons.
37. Moraes D. A. O., Oliveira F., Quinino R., Duczmal L., (2014). Self-oriented control charts for efficient monitoring of mean vectors. Computers & Industrial Engineering 75, 102—115.
- View Article
- Google Scholar
38. Paroissin C., Penalva L., Pétrau A., Verdier G., (2016). New control chart for monitoring and classification of environmental data. Environmetrics 27, 182–193.
- View Article
- Google Scholar
39. Lima-Filho L. M. A., Bayer F. M., (2019). Kumaraswamy control chart for monitoring double bounded environmental data. Communications in Statistics - Simulation and Computation Accepted.
40. Sousa M. C. S., Cribari-Neto F., Stosic B. D., Nov. 2005. Explaining DEA Technical Efficiency Scores in an Outlier Corrected Environment: The Case of Public Services in Brazilian Municipalities. Brazilian Review of Econometrics 25 (2), 287–313.
- View Article
- Google Scholar
41. Pereira T. L., Cribari-Neto F., (2014). Detecting model misspecification in inflated beta regressions. Communications in Statistics—Simulation and Computation 43 (3), 631–656.
- View Article
- Google Scholar

[ref1] 1. Mandel B. J., (1969). The regression control chart. Journal of Quality Technology 1 (1), 1–9.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Shu L., Tsung F., Tsui K.-L., (2004). Run-length performance of regression control charts with estimated parameters. Journal of Quality Technology 36 (3), 280–292.
View Article
Google Scholar

[5] View Article

[6] Google Scholar

[ref3] 3. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2003). Process monitoring for correlated gamma-distributed data using generalized-linear-model-based control charts. Quality and Reliability Engineering International 19, 477–491.
View Article
Google Scholar

[8] View Article

[9] Google Scholar

[ref4] 4. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2005). Model-based process monitoring using robust generalized linear models. International Journal of Production Research 43 (7), 1337–1354.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref5] 5. Jearkpaporn D., Borror C., Runger G., Montgomery D., (2007). Process monitoring for mean shifts for multiple stage processes. International Journal of Production Research 45, 5547–5570.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref6] 6. Bayer F., Tondolo C., Müller F., (2018). Beta regression control chart for monitoring fractions and proportions. Computers & Industrial Engineering 119 (1), 416–426.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref7] 7. Bayer F. M., Cribari-Neto F., (2017). Model selection criteria in beta regression with varying dispersion. Communications in Statistics—Simulation and Computation 46 (1), 729–746.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref8] 8. Ospina R., Ferrari S. L. P., (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics and Data Analysis 56, 1609–1623.
View Article
Google Scholar

[23] View Article

[24] Google Scholar

[ref9] 9. Hoof A., (2007). Second stage DEA: comparison of approaches for modelling the DEA escore. European Journal of Operational Research 181, 425–435.
View Article
Google Scholar

[26] View Article

[27] Google Scholar

[ref10] 10. Cook D. O., Kieschnick R., McCullough B. D., (2008). Regression analysis of proportions in finance with self selection. Journal of Empirical Finance 15 (5), 860–867.
View Article
Google Scholar

[29] View Article

[30] Google Scholar

[ref11] 11. Ferrari S., Cribari-Neto F., (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31 (7), 799–815.
View Article
Google Scholar

[32] View Article

[33] Google Scholar

[ref12] 12. Bayes C. L., Valdivieso L., (2016). A beta inflated mean regression model for fractional response variables. Journal of Applied Statistics 43 (10), 1814–1830.
View Article
Google Scholar

[35] View Article

[36] Google Scholar

[ref13] 13. Teyarachakul S., Chand S., Tang J., (2007). Estimating the limits for statistical process control charts: A direct method improving upon the bootstrap. European Journal of Operational Research 178, 472–481.
View Article
Google Scholar

[38] View Article

[39] Google Scholar

[ref14] 14. Capizzi G., Masarotto G., (2011). A least angle regression control chart for multidimensional data. Technometrics 53 (3), 285–296.
View Article
Google Scholar

[41] View Article

[42] Google Scholar

[ref15] 15. Smyth G. K., Verbyla A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10 (6), 695–709.
View Article
Google Scholar

[44] View Article

[45] Google Scholar

[ref16] 16. Rahimi S. B., Amiri A., Ghashghaei R., (2019). Simultaneous monitoring of mean vector and covariance matrix of multivariate simple linear profiles in the presence of within profile autocorrelation. Communications in Statistics - Simulation and Computation Accepted.

[ref17] 17. Yeh A. B., Huwang L., Li Y.-M., (2009). Profile monitoring for a binary response. IIE Transactions 41 (11), 931–941.
View Article
Google Scholar

[48] View Article

[49] Google Scholar

[ref18] 18. Shang Y., Tsung F., Zou C., 7 2011. Profile monitoring with binary data and random predictors. Journal of Quality Technology 43, 196–208.
View Article
Google Scholar

[51] View Article

[52] Google Scholar

[ref19] 19. Zhang J., Ren H., Yao R., Zou C., Wang Z., (2015). Phase I analysis of multivariate profiles based on regression adjustment. Computers & Industrial Engineering 85, 132—144.
View Article
Google Scholar

[54] View Article

[55] Google Scholar

[ref20] 20. Qi D., Wang Z., Zi X., Li Z., (2016). Phase ii monitoring of generalized linear profiles using weighted likelihood ratio charts. Computers & Industrial Engineering 94, 178—187.
View Article
Google Scholar

[57] View Article

[58] Google Scholar

[ref21] 21. Canterle D. R., Bayer F. M., (2019). Variable dispersion beta regressions with parametric link functions. Statistical Papers 60, 1541–1567.
View Article
Google Scholar

[60] View Article

[61] Google Scholar

[ref22] 22. Gan F. F., (1995). Joint monitoring of process mean and variance using exponentially weighted moving average control charts. Technometrics 37 (4), 446–453.
View Article
Google Scholar

[63] View Article

[64] Google Scholar

[ref23] 23. Riaz M., (2008). Monitoring process variability using auxiliary information. Computational Statistics 23 (2), 253–276.
View Article
Google Scholar

[66] View Article

[67] Google Scholar

[ref24] 24. Sheu S.-H., Tai S.-H., Hsieh Y.-T., Lin T.-C., (2009). Monitoring process mean and variability with generally weighted moving average control charts. Computers & Industrial Engineering 57 (1), 401–407.
View Article
Google Scholar

[69] View Article

[70] Google Scholar

[ref25] 25. Huwang L., Huang C.-J., Wang Y.-H.T., (2010). New EWMA control charts for monitoring process dispersion. Computational Statistics & Data Analysis 54 (10), 2328–2342.
View Article
Google Scholar

[72] View Article

[73] Google Scholar

[ref26] 26. Pawitan Y., (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford Science publications.

[ref27] 27. Casella G., Berger R. L., (2002). Statistical Inference, 2nd Edition. Thomson Learning.

[ref28] 28. Chen F., Chen S., Ma X., (2018). Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research 65, 153–159. pmid:29776524
View Article
PubMed/NCBI
Google Scholar

[77] View Article

[78] PubMed/NCBI

[79] Google Scholar

[ref29] 29. Press W., Teukolsky S., Vetterling W., Flannery B., (1992). Numerical recipes in C: The art of scientific computing, 2nd Edition. Cambridge University Press.

[ref30] 30. Lima-Filho L. M. A., Pereira T. L., de Souza T. C., Bayer F. M., (2019). Inflated beta control chart for monitoring double bounded processes. Computers & Industrial Engineering 136, 265–276.
View Article
Google Scholar

[82] View Article

[83] Google Scholar

[ref31] 31. Boon Chong M., 1 2004. Performance Measures for the Shewhart Control Chart. Quality Engineering 16 (4), 585–590.
View Article
Google Scholar

[85] View Article

[86] Google Scholar

[ref32] 32. Schaffer J. R., Kim M.-J., (2007). Number of replications required in control chart Monte Carlo simulation studies. Communications in Statistics - Simulation and Computation 36 (5), 1075–1087.
View Article
Google Scholar

[88] View Article

[89] Google Scholar

[ref33] 33. R Core Team, (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.

[ref34] 34. Simas A. B., Souza W. B., Rocha A. V., (2010). Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis 54 (2).
View Article
Google Scholar

[92] View Article

[93] Google Scholar

[ref35] 35. Ho L. L., Fernandes F. H., Bourguignon M., (2019). Control charts to monitor rates and proportions. Quality and Reliability Engineering International 35 (1), 74–83.
View Article
Google Scholar

[95] View Article

[96] Google Scholar

[ref36] 36. Montgomery D. C., (2009). Introduction to Statistical Quality Control, 6th Edition. John Wiley & Sons.

[ref37] 37. Moraes D. A. O., Oliveira F., Quinino R., Duczmal L., (2014). Self-oriented control charts for efficient monitoring of mean vectors. Computers & Industrial Engineering 75, 102—115.
View Article
Google Scholar

[99] View Article

[100] Google Scholar

[ref38] 38. Paroissin C., Penalva L., Pétrau A., Verdier G., (2016). New control chart for monitoring and classification of environmental data. Environmetrics 27, 182–193.
View Article
Google Scholar

[102] View Article

[103] Google Scholar

[ref39] 39. Lima-Filho L. M. A., Bayer F. M., (2019). Kumaraswamy control chart for monitoring double bounded environmental data. Communications in Statistics - Simulation and Computation Accepted.

[ref40] 40. Sousa M. C. S., Cribari-Neto F., Stosic B. D., Nov. 2005. Explaining DEA Technical Efficiency Scores in an Outlier Corrected Environment: The Case of Public Services in Brazilian Municipalities. Brazilian Review of Econometrics 25 (2), 287–313.
View Article
Google Scholar

[106] View Article

[107] Google Scholar

[ref41] 41. Pereira T. L., Cribari-Neto F., (2014). Detecting model misspecification in inflated beta regressions. Communications in Statistics—Simulation and Computation 43 (3), 631–656.
View Article
Google Scholar

[109] View Article

[110] Google Scholar

Figures

Abstract

1 Introduction

2 Inflated beta regression control chart

2.1 Inflated beta regression model with varying dispersion

2.2 Model-based control chart limits

2.2.1 Likelihood inference.

2.2.2 Control limits estimation.

3 Simulation study

3.1 Point estimation evaluation

3.2 Control charts performance

4 Real data application

5 Conclusions

A Score function and Fisher’s information matrix

Supporting information

S1 Dataset.

References