Figures
Abstract
This paper provides a general framework for controlling quality characteristics related to control variables and limited to the intervals (0, 1], [0, 1), or [0, 1]. The proposed control chart is based on the inflated beta regression model considering a reparametrization of the inflated beta distribution indexed by the response mean, which is useful for modeling fractions and proportions. The contribution of the paper is twofold. First, we extend the inflated beta regression model by allowing a regression structure for the precision parameter. We also present closed-form expressions for the score vector and Fisher’s information matrix. Second, based on the proposed regression model, we introduce a new model-based control chart. The control limits are obtained considering the estimates of the inflated beta regression model parameters. We conduct a Monte Carlo simulation study to evaluate the performance of the proposed regression model estimators, and the performance of the proposed control chart is evaluated in terms of run length distribution. Finally, we present and discuss an empirical application to show the applicability of the proposed regression control chart.
Citation: Lima-Filho LMA, Pereira TL, Souza TC, Bayer FM (2020) Process monitoring using inflated beta regression control chart. PLoS ONE 15(7): e0236756. https://doi.org/10.1371/journal.pone.0236756
Editor: Feng Chen, Tongii University, CHINA
Received: April 8, 2020; Accepted: July 11, 2020; Published: July 30, 2020
Copyright: © 2020 Lima-Filho et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the manuscript and its Supporting Information files.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
Standard control charts are directly applied to the output of a quality characteristic. However, the quality characteristic (process output) can be affected by external covariates (control variables), where we rather control a varying mean than a constant one. In these cases, the regression control chart [1] may be an effective statistical process control tool. Such method is widely used when the quality of a process or product is better characterized by a functional relationship between the response variable and one or more explanatory variables [2].
The standard regression control chart is based on the linear regression model, where the variable of interest is assumed to be normally distributed. However, in practice, several of these variables may not follow a normal distribution, leading to poor Gaussian-based inferences. Thus, several studies have been proposing non-Gaussian model-based control charts. [3] presented a model-based scheme for monitoring multiple gamma-distributed variables. By considering that robust methods can be effective in the presence of outlying observations, [4] explored the robust generalized linear model for a gamma-distributed response. [5] used deviance residual for monitoring variables in a three-stage process assuming gamma, normal, and Poisson distributions.
Examples of a non-Gaussian process output are variables that assume values in the standard unit interval, such as fractions and proportions. In such instances, the usual regression control chart may be inappropriate since double bounded data are typically asymmetric and the Gaussian-based assumption is not suitable. In this sense, [6] proposed the beta regression control chart to monitor fractions and proportions related to control variables. The proposed control chart considers the beta regression model with varying dispersion [7], assuming that the mean and dispersion parameters of beta distributed variables are related to exogenous variables and modeled by regression structures. However, fractions and proportions may contain zeros and/or ones, leading to the unsuitable use of the beta distribution for data modeling [8].
Alternative regression models have been proposed to mend beta regression flaws in the presence of zeros and/or ones. [9] presented a unit inflated beta model for modeling efficiency scores as a function of exogenous variables. [10] proposed a zero inflated beta model to analyze data in corporate capital structures. [8] introduced a general class of zero or one inflated beta regression models, which is a natural extension of the beta regression model [11] to model variables that assume values in (0, 1] or [0, 1). [12] proposed an inflated beta regression model based on a reparametrization of the inflated beta distribution. This model accommodates mixed random variable responses, with non-negligible probabilities of assuming zeros and/or ones and continuous values in the interval (0, 1) that follows a beta distribution. The inflated beta regression model introduced by [12] may be useful for developing model-based control charts for monitoring inflated beta distributed processes as it considers an interesting parametrization in terms of the response variable mean. However, the model proposed by [12] does not consider a regression structure for the precision parameter. The monitoring of the mean and precision (or dispersion) is relevant to the statistical process control [6, 13, 14]. In addition, incorrect modeling of the dispersion can generate a high number of false alarms or loss of detection power of special causes [6]. Moreover, dispersion modeling is necessary in regression models in order to obtain accurate inferences about the structure parameters of the mean regression [15].
Control chart is a dynamic tool that works under two different phases, namely Phase I and Phase II. In practical situations, the in-control parameters are unknown and have to be estimated from a Phase I data set. Different Phase I data sets lead to different control chart performance. Thus, it is important to study the practitioner-to-practitioner variability due to parameter estimation. The aim of Phase I analysis is to estimate the parameters, while quick detection of out-of-control state is conducted in Phase II [16]. The literature offers some studies related to Phase I and Phase II analyses in regression models. For example, [17] proposed Phase I profile monitoring schemes for binary responses that can be represented by logistic regression models. [17] developed several Hotelling T2-type Phase I control charts for monitoring the parameters of a logistic regression linking to a binary response and one or more predictor variables. [18] developed control charts by integrating an exponentially weighted moving average scheme with a likelihood ratio test based on logistic regression models in Phase II study. [19] proposed a new modeling and monitoring framework for Phase I analysis of multivariate profiles by incorporating the regression-adjustment technique into the functional principal components analysis. [20] proposed the monitoring of profiles using generalized linear models during Phase II in which the explanatory variables can be a fixed design or any random arbitrary design.
In this context, this paper introduces the inflated beta regression control chart (IBRCC) with varying dispersion, useful for monitoring double bounded variables when zeros or ones appear in the data along with the presence of control variables. The process output may represent individual measures (e.g. efficiency score) or a ratio between continuous numbers (e.g. relative humidity). The contribution of the present paper is twofold. First, we extend the inflated beta regression model proposed by [12] by allowing a regression structure for the precision parameter. We also discuss likelihood inference of the model parameters. Second, we introduce the IBRCC based on the proposed inflated beta regression model with varying dispersion. Since in practice the parameters of the regression model are unknown, the proposed control chart is implemented into two phases. In Phase I, the parameters are estimated from an in-control sample, and in Phase II, we perform the monitoring scheme.
The remaining of the paper unfolds as follows. In Section 2, we describe the IBRCC and introduce the beta inflated mean regression model with varying dispersion. We also discuss likelihood inference and present the control limits estimation procedure. Section 3 presents a simulation study to evaluate (i) the inflated beta regression model with varying dispersion estimators and (ii) the performance of the proposed IBRCC and some competing control charts in the literature based on the run length (RL). In Section 4, we discuss and present an empirical application to show the applicability of the proposed IBRCC in real situations. Finally, some conclusions are presented in Section 5.
2 Inflated beta regression control chart
In this section, we introduce the IBRCC. Firstly, in Subsection 2.1 we present the inflated beta regression model with varying dispersion. The model we propose in this work is an extension of the model proposed by [12], where the authors used a reparametrization of the inflated beta distribution indexed by the response mean. In the Subsection 2.2 we present the model-based control limits for the proposed IBRCC. Secondly, in Subsubsection 2.2.1, we discuss the likelihood inference for the model parameters. Finally, in Subsubsection 2.2.2 we present the control limits estimation procedure.
2.1 Inflated beta regression model with varying dispersion
The inflated beta density function is given by [12]
(1)
where 0 < α0 < 1, 0 < α1 < 1, 0 < γ < 1, and ϕ > 0 are the distribution parameters, c = 1 − α0(1 − γ) − α1 γ, μ = γ(1 − α1)/c, and
is the beta density function given by [11]
where 0 < μ < 1, ϕ > 0, and Γ(⋅) is the gamma function. Here, ϕ is a precision parameter (inversion of the dispersion), E(y) = γ, and
. Density (1) is said to be zero and one inflated beta, i.e.,
. Note that P(y = 0) = α0(1 − γ) and P(y = 1) = α1 γ, thus if α0 = 0 and α1 > 0, the distribution in (1) is called one inflated beta distribution. Differently, if α0 > 0 and α1 = 0, the distribution given in (1) is called zero inflated beta distribution.
Let y1, …, yn be independent random variables where each yt has the density in (1) for t = 1, …, n. The inflated beta regression model with varying dispersion, which is an extension of the regression model proposed by [12], is given by the following structures for modeling the response yt (2)
(3)
(4)
(5)
where
,
,
, and
are vectors of unknown regression parameters,
,
,
, and
are observations on k1, k2, k3, and k4 covariates, respectively, and
,
,
, and
are real-valued link functions with continuous second derivatives. For g1(⋅), g2(⋅), and g3(⋅), several different link functions can be used, such as logit, probit, log-log, complementary log-log, or Cauchy. For g4(⋅), the choices are log or square root link functions, for example. More details about link functions on the class of beta regression models can be found in [11] and [21].
Notably, the model proposed by [12] assume that, for t = 1, …, n, ϕt = ϕ (constant precision). In the present study, the precision parameter is allowed to vary across observations, making the proposed model more general than the original inflated beta regression model. The assumption of non-constant dispersion is natural in several production processes [22–24]. In practical situations, it is important to monitor the dispersion of the process because an increase in dispersion may indicate process deterioration, while a reduction in dispersion means an improvement in process capability [25]. In addition, it is possible to consider control variables for modeling the parameters α0 and α1, which are related to the probabilities of zero and one, respectively.
2.2 Model-based control chart limits
The purpose of IBRCC is to monitor double bounded processes that contain values equal to zero or one, considering that the mean, precision, and parameters related to the probabilities of zero and one (α0 and α1) of the quality characteristic of interest are affected by control variables. Let (1 − α) be a control region where α is the type I error probability, the lower control limit (LCL), center line (CL), and upper control limit (UCL) of the proposed control chart are defined, respectively, by
where
is the inflated beta cumulative distribution function and F−1(⋅) is the quantile function of the inflated beta variable. The parameters α0t, α1t, γt, and ϕt are functions of ω, κ, β, and ζ, respectively, and through (2), (3), (4), and (5) we have
,
,
, and
. In practice, the model parameters are unknown and estimation methods are necessary to estimate the in-control limits. Thus, we consider the likelihood theory [26–28], which we discuss in the following subsection. We presented results of the log-likelihood and score functions, which are extensions of those developed for the inflated beta regression model proposed by [12].
2.2.1 Likelihood inference.
We shall consider the maximum likelihood estimator (MLE) for the parameter vector θ = (ω⊤, κ⊤, β⊤, ζ⊤)⊤. The log-likelihood function is given by
(6)
with
where
is an indicator function that equals 1 if y = 0 and 0 if y ∈ (0, 1],
is an indicator function that equals 1 if y = 1 and 0 if y ∈ [0, 1), and
in which α0t, α1t, γt, and ϕt are given by the regression structures in (2), (3), (4), and (5), respectively. Additionally,
By deriving the log-likelihood function in (6) with respect to each element of the parameter vector θ, we obtain the score vector given by U(θ) = (Uω(θ)⊤, Uκ(θ)⊤, Uβ(θ)⊤, Uζ(γ)⊤)⊤. The MLE of θ is obtained by solving the non-linear system (Uω(θ)⊤, Uκ(θ)⊤, Uβ(θ)⊤, Uζ(γ)⊤) = 0, where 0 is a null vector of dimension (k1 + k2 + k3 + k4). The MLEs cannot be expressed in closed-form, hence the maximization of the log-likelihood function needs to be numerically conducted through a Newton or quasi-Newton algorithm. In this work, we used the quasi-Newton Broyden-Fletcher-Goldfarb-Shanno (BFGS) method [29] for computational implementation.
The Fisher information matrix (K(θ)), which is useful for large sample inferences, requires the expectations of second order derivatives of the log-likelihood function. The score vector U(θ) and Fisher’s information matrix can be found in the Appendix.
To test hypotheses on the parameter θj, j = 1, …, (k1 + k2 + k3 + k4), we consider the null hypothesis versus
. The Wald test may be considered using the following z statistic [26]
where
represents the MLE of θj and the standard error of
is given by
, in which
is the asymptotic variance and covariance matrix of
. In large sample sizes and under
, the z statistic follows a standard normal distribution [26]. The test is performed by comparing the computed z statistic with the usual quantiles of the standard normal distribution.
2.2.2 Control limits estimation.
After obtaining the MLE of θ, , and considering an in-control process, the estimated control limits are given by
(7)
(8)
where
,
,
, and
. Thus, we propose the following algorithm to implement the IBRCC.
- Fit the inflated beta regression model with varying dispersion under Phase I and obtain the MLEs, namely
,
,
, and
.
- Using covariates in Phase II, estimate α0t, α1t, γt, and ϕt such that
,
,
, and
.
- For a given type I error probability α and the estimates
,
,
, and
, compute the estimated control limits using (7) and (8).
- Plot each data point yt together with the estimated control limits
and
, for t = 1, …, n.
The observation yt that is out of the estimated control limits interval ( is considered out-of-control.
3 Simulation study
This section presents a Monte Carlo simulation study to evaluate the estimators of the introduced inflated beta regression model with varying dispersion and the performance of the IBRCC. The performance of the proposed control chart is compared with some alternatives in literature, namely: the usual linear regression control chart (RCC) [1], the beta regression control chart (BRCC) [6], and the inflated beta control chart (IBCC) [30]. Note that the RCC is a classical regression control chart that works under Gaussian assumptions. The other control charts are state-of-the-art alternatives, but the BRCC does not consider inflation in zeros and/or ones and the IBCC does not include covariates.
We used the following structures for data generation
with t = 1, …, n. The values of
,
, and
were obtained from a Bernoulli distribution with parameter p = 0.3, and xt was generated from a uniform distribution in the interval (0, 1), thus considering discrete and continuous random variables. We considered 5, 000 Monte Carlo replications and sample sizes n = 100, 200, and 500. According to [31] and [32], this number of replications is enough to obtain accurate results. All simulations were performed using the R programming language [33].
In the numerical evaluation, we considered several scenarios with different characteristics, namely: zero and one inflated beta regression model (Scenario 1), zero inflated beta regression model (Scenarios 2, 4, and 6), and one inflated beta regression model (Scenarios 3, 5, and 7). The parameter values are shown in Table 1. In Scenario 1, the mean is centered on the standard unit interval, γ ∈ [0.43, 0.57], and the average percentage of zeros and ones in the sample is approximately equal to 13% for both. Scenarios 2, 4, and 6 consider the mean close to zero, with γ ∈ [0.063, 0.154], [0.012, 0.039], and [0.039, 0.119], and average percentages of zeros in the sample equal to 9.3%, 3.4%, and 6.4%, respectively. For Scenarios 3, 5, and 7, the mean is close to one, with γ ∈ [0.881, 0.971], [0.668, 0.924], and [0.858, 0.952], and average percentages of ones in the sample equal to 8.3%, 2.6%, and 23.5%, respectively.
3.1 Point estimation evaluation
For the point estimation evaluation, we computed the mean, percentage relative bias (RB), and mean square error (MSE) for each estimator in all Scenarios (see Table 1). For brevity and similarity of results, we only present results for Scenarios 1, 2, and 7 (n = 100 and n = 500) as shown in Table 2. The figures show that the mean of the estimators is close to the corresponding parameter values. The RB and MSE decrease when the sample size increases, indicating that the MLEs are consistent. For instance, for β1 (γ submodel) in Scenario 1, the RB of the estimator is equal to 0.2257% for n = 100 and equal to −0.0548% for n = 500. Regarding MSE, considering ω0 in Scenario 2 and n ∈ {100, 500}, the MSE is equal to 0.2291 and 0.0373, respectively. As in other studies related to beta regression [21, 34], it is noteworthy that the RB of MLEs corresponding to the precision covariate parameters is greater than those of that model the mean response. For instance, consider (ϕ submodel) in Scenario 7, we have RB = −8.2035% for n = 100 and RB = −2.4411% for n = 500. Regarding parameters related to the probabilities of zeros and ones, the bias also decreases considerably as sample size increases. For example, in Scenario 1 and n = 100, the estimator of ω1 (α0 submodel) yields RB = 16.9130% and the estimator of η1 (α1 submodel) yields RB = 14.8922%. For n = 500, the bias for the same estimators reduces to 4.5210% and 3.4431%, respectively.
In practice, the regression model relating the output and covariates is rarely known and the parameters have to be estimated. Our simulation results show that the MLE in the proposed model perform well, presenting low MSE for the estimates in all situations. This way, the proposed control chart may also present good performance in practice. In the next section, we shall investigate the run length performance of the IBRCC with estimated parameters.
3.2 Control charts performance
This section presents a run length analysis to evaluate the performance of the considered control charts. When the process is in-control, the run length (RL) distribution follows a geometric distribution with parameter α, which is the type I error probability [35]. The ability of a control chart to detect changes in the process is usually measured by the average number of observations until the detection of an out-of-control point (ARL) [36]. However, other measures can also be used for this purpose. We considered another location measure, the median (MRL), a dispersion measure, and the standard deviation (SDRL) of the RL distribution. Additionally, we computed the mean absolute percentage error (MAPE) for each measure for all evaluated control charts.
We compared the proposed IBRCC with the standard RCC [1], and the state-of-the-art charts, namely BRCC [6] and IBCC [30]. Since the BRCC does not consider values equal to zero or one, we replaced zeros by 0.0001 and ones by 0.9999 for its application. For all considered control charts, we examined two aspects of evaluation: in-control (,
,
) and out-of-control (
, where β is the type II error probability) [30, 35]. For brevity, we do not present the MRL and SDRL results for the out-of-control process. The control charts were evaluated in Scenarios 2 to 7 (Table 2), considering inflation in 0 or 1. Scenario 1 was not covered in this section because it does not reflect real statistical process control situations, being possible to present perfect nonconforming and perfect conforming in the same process.
To ensure that the comparisons between ARL1 occur between control charts of same ARL0, we adjusted the chart limits to obtain ARL0 equal to the specified nominal values of 100 and 370. This control chart calibration is suggested in the literature [30, 37–39]. After ARL0 calibration, a δ change was induced in the mean and precision regression structures to generate out-of-control processes as the following: logit(γt) = δ + β0 + β1 xt and . By enabling the process to be out-of-control, we obtained the estimated ARL1 for different values of δ. When δ = 0, the process is in-control and the ARL0 can be evaluated.
The ,
, and
evaluation results are shown in Tables 3 and 4. Consider an in-control process with α − values of 0.01 and 0.0027, from the geometric distribution of the RL we have ARL values equal to 100 and 370, nominal MRL values equal to 69.0 and 256.1, and values of SDRL equal to 99.5 and 369.5, respectively. The IBRCC showed better performance than BRCC, RCC, and IBCC, reaching empirical values closer to the nominal levels in all evaluated scenarios. In Scenarios 2, 4, and 6, the IBRCC and IBCC obtained 0 as the lower control limit, thus no point exceeded this limit. Similarly, in Scenarios 3, 5, and 7, the upper control limit of the mentioned charts were 1. The fact that these scenarios present the 0 or 1 as control limits is related to the value of the probabilities of occurrence of 0 or 1. That is, the IBRCC and IBCC will present zero as control limit when
and one as control limit when
. The high probability of Y assuming values equal to one or zero means that these values are not atypical (out of control) but usual occurrences of the process.
Tables 3 and 4 also show the MAPE results. The proposed control chart has the lowest values for the MAPE. For example, consider α = 0.01, corresponding ARL0, and n = 200, the MAPE obtained for the IBRCC, BRCC, RCC, and IBCC were, 1.32, 83.82, 61.17, and 21.80, respectively. It is noteworthy that, for the IBRCC, the MAPE decreases considerably when the sample size increases.
Among the considered alternative control charts, the IBCC achieved better performance than the BRCC and RCC in all scenarios. In Scenario 2 (Table 4), for n = 200, the IBCC presented a false alarm after 411 samples, when, in fact, a false alarm was expected for every 370 samples. In the same scenario, the BRCC and the RCC presented a false alarm rate in approximately 11 and 106 samples, respectively. These results show the importance of considering an accurate model to reduce false alarms. We also note that the BRCC obtained the worst performance. In Table 3, consider Scenario 3 and n = 100, the RCC and BRCC presented a false alarm in approximately each 50 and 27 observations, respectively. It is important to note that BRCC performance worsened as the 0 or 1 percentage increased. Confirming this fact, the IBCC also presented lower MAPE than BRCC and RCC in all scenarios. Considering α = 0.0027, n = 500, and the MRL0 measure, the MAPE obtained for the IBRCC, BRCC, RCC, and IBCC were, respectively, 4.82, 95.15, 80.22, and 35.45.
Results of the ARL1 evaluation are shown graphically in Figs 1 and 2. It was not possible to correct ARL0 for the BRCC due to the poor in-control performance. Thus, the evaluation of ARL1 was given only for the IBRCC, RCC, and IBCC. It is noteworthy that when several control charts are compared in terms of ARL, the one that presents the lowest ARL1 among those with same ARL0 is the control chart that outperforms the competitors [30]. By analyzing the ARL1 results, when the perturbation was introduced in the mean of the process (Fig 1), we observe that in Scenario 3 the IBRCC performs better than the RCC and IBCC, and in Scenario 2 the performance of the control charts are similar. We note that the IBRCC detects more quickly the out-of-control process. For example, in Scenario 3, ARL = 370, n = 100, and δ = −0.4, the IBRCC takes 176 samples on average to detect a change in the process, while the IBCC takes 186 and the RCC takes 192 to detect a change of same magnitude. The simulation results showed similar behavior when a perturbation in the precision of the process occurs (Fig 2). The control charts detect process changes more quickly as the precision increases (dispersion decreases).
By considering the results obtained in the simulation, we see a necessity of using a control chart based on an appropriate regression model, such as the IBRCC, when the variable of interest is restricted to the intervals [0, 1) or (0, 1]. The use of the linear regression-based control chart is inappropriate for data of this type since the support of the usual regression model is the whole real space. Interestingly, the BRCC proved to be more inadequate in the presence of values equal to zero or one than the traditional RCC or the IBCC that uses inflated beta distribution but does not consider a regression structure. Since the BRCC does not accommodate values equal to zero or one, by substituting zeros for 0.0001 and ones for 0.9999, an inflation in these values is induced. That is, the probability mass at 0.0001 and/or 0.9999 exceeds what is allowed by the beta distribution, which is an absolutely continuous distribution. This reflects on the estimates of the parameters of the regression structures and, automatically, the estimates of the control limits are impaired.
4 Real data application
This section contains an empirical application in which the proposed control chart (IBRCC) and three other competing control charts are analyzed: the RCC, BRCC, and IBCC. The data evaluated in this section refer to the public administrative efficiency of the municipalities in the state of São Paulo, Brazil. The data are a subset of those analyzed by [40], who considered all Brazilian municipalities. The dataset we used contains 427 municipalities for the year 2000 and it is available at http://www.de.ufpb.br/~luiz/datasets/Dataset_plosone.txt. The covariates are from Secretaria do Tesouro Nacional (http://www.tesouro.fazenda.gov.br/), Instituto Brasileiro de Geografia e Estatística (IBGE) (https://www.ibge.gov.br/), and Instituto de Pesquisa Econômica Aplicada (IPEA) (https://www.ipea.gov.br/portal/), Brazil. The quality characteristic, y, is introduced by [40] and represents individual observations of an efficiency index, assuming values in (0, 1] and measuring how well mayors spend taxpayer money in order to provide them with public services. The efficiency index is equal to one when there is full efficiency. There are 32 units that are fully efficient (i.e., about 7.5% of the observations are equal to one). A brief description of the variables used in the analysis is presented in Table 5. Variables CONS, R2, and MT are dummies, i.e., they are equal to 0 or 1. The covariate CONS equals 1 if the municipality participates in the inter-municipal consortia, the covariate R2 equals 1 whenever the municipality receives more than 10% of its tax revenue to royalty, and the covariate MT equals 1 whenever the municipality is tourist, 0 otherwise for the three dummies covariates. It is important to mention that 100 municipalities were sorted to estimate the model parameters (Phase I), while the remaining observations were used for monitoring (Phase II).
At the outset, the inflated (at one) beta mean regression model, the beta regression model substituting 1 for 0.9999, and a linear regression model were selected and fitted. We used the logit link for γ and α1 and the log link for ϕ. For the beta regression, we considered logit for μ and log link for ϕ. The maximum likelihood estimates of the models parameters are displayed in Table 6. All covariates were significant at the nominal level of 5%. In order to compare the fitted regression models, we considered the MAPE and MSE between the observed and fitted values. According to these criteria the inflated beta regression model outperforms the other ones, with MAPE = 26.8835 and MSE = 0.0387, while the beta regression model obtains MAPE = 29.4101 and MSE = 0.0454, and linear regression model achieves MAPE = 29.3617 and MSE = 0.0389.
Table 7 presents some descriptive statistics of the estimated control limits. Note that the proposed control chart and the IBCC are the only ones that have an upper control limit constant and equal to one. Differently, when beta regression control chart is used, the control limits were restricted to the open interval (0, 1) and thus, in this case, fully efficient municipalities are considered out-of-control. In addition, we verify that, by using the standard RCC, the limits assume values below zero and above one, not being restricted to the interval (0, 1], where the data are distributed. The interpretation of the limits, in this case, makes no practical sense and leads to loss of detection power of out-of-control points.
Fig 3 graphically presents the control limits of the (a) IBRCC, (b) BRCC, (c) RCC, and (d) IBCC together with the observed values of efficiency considering ARL0 = 100. Considering the fact that the efficiency index assumes values in (0, 1], the proposed model-based control chart (IBRCC) presents limits with a smaller range. Interestingly, the BRCC does not accommodate values equal to one by substituting values equal to one for 0.9999, an inflation in these values is induced, therefore the BRCC is less adequate in the presence of values equal to one than the traditional RCC. The use of the linear regression-based control chart is inappropriate for data of this type since the support of the usual regression model is the whole real space. Finally, the IBCC that uses inflated beta distribution but does not consider regression structure presents constant limits that are not appropriate in situations were we have control variables (covariates). It is worth mentioning that IBRCC detected 7 out-of-control points, while BRCC detected 36 out-of-control points. Lastly, we carried out the RESET misspecification test [41], where the null hypothesis is that the fitted model is correctly specified and the alternative hypothesis is that there is model misspecification. We perform the test using the second power of the estimated mean linear predictor as testing variables. We do not reject the null hypothesis at the 1% nominal level, thus suggesting that our model is correctly specified.
5 Conclusions
In this paper, we proposed a new model-based control chart for controlling quality characteristics limited to the intervals (0, 1] or [0, 1) using the inflated beta regression model. For this purpose, we extended the inflated beta regression model proposed by [12] by allowing a regression structure for the precision parameter. In this way, it is possible to model the mean response, the data precision, and functions of the probability of a given observation assuming zero or one through a regression framework. Our simulation study showed that the relative bias and mean square error decrease when the sample size increases. With regard to the sensitivity analysis in terms of run length (RL), the proposed IBRCC showed the best performance in all considered cases. In addition, the results indicated that it is better to ignore the explanatory variables and use the inflated beta control chart (IBCC) than to use a control chart based on an inappropriate regression model. We also considered an application to real data and highlight the practical importance of the proposed chart when the response is distributed in unit intervals containing ones. Finally, we suggest the use of the inflated beta regression control chart to monitor output quality characteristics, which is better characterized by a functional relation between the response variable, double bounded in unit intervals containing zeros or ones along with one or more explanatory variables.
A Score function and Fisher’s information matrix
In this appendix we obtain the score function and presented a closed-form expression for Fisher’s information matrix for all parameters of the inflated beta regression model with varying dispersion. We assume that the observed values of the dependent fractional variable are sorted according to the 0, 1, and (0, 1)-values with n0, n1, and n − m terms, respectively, where m = n0 + n1. Furthermore, and
, where ψ(⋅) is the digamma function.
The score function for ω is given by
where
. Therefore,
The score function for κ is given by
where
. Thus,
For β, the score function is given by
where
and
. Then we have
The score function for ζ is given by
where
. Therefore,
In matrix form, each term of the score vector is given by
where
,
,
,
,
,
, M = diag{m1, …, mn},
, A1 = diag{(1 − α01), …, (1 − α0n)}, A2 = diag{(1 − α11), …, (1 − α1n)},
,
,
,
,
,
,
,
,
,
is a n × k1 matrix whose t-th row is
,
is a n × k2 matrix whose t-th row is
, X is a n × k3 matrix whose t-th row is
,
is a n × k4 matrix whose t-th row is
,
,
, and
.
The joint information matrix for the parameter vector θ = (ω⊤, κ⊤, β⊤, ζ⊤)⊤ is given by
where
,
,
,
,
,
,
,
,
,
,
,
,
,
,
, ft = −ψ′((1 − μt)ϕt), D = diag{d1, …, dn},
, C = diag{c1, …, cn}, and ct = 1 − α0t(1 − γt) − α1t γt.
References
- 1. Mandel B. J., (1969). The regression control chart. Journal of Quality Technology 1 (1), 1–9.
- 2. Shu L., Tsung F., Tsui K.-L., (2004). Run-length performance of regression control charts with estimated parameters. Journal of Quality Technology 36 (3), 280–292.
- 3. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2003). Process monitoring for correlated gamma-distributed data using generalized-linear-model-based control charts. Quality and Reliability Engineering International 19, 477–491.
- 4. Jearkpaporn D., Montgomery D., Runger G., Borror C., (2005). Model-based process monitoring using robust generalized linear models. International Journal of Production Research 43 (7), 1337–1354.
- 5. Jearkpaporn D., Borror C., Runger G., Montgomery D., (2007). Process monitoring for mean shifts for multiple stage processes. International Journal of Production Research 45, 5547–5570.
- 6. Bayer F., Tondolo C., Müller F., (2018). Beta regression control chart for monitoring fractions and proportions. Computers & Industrial Engineering 119 (1), 416–426.
- 7. Bayer F. M., Cribari-Neto F., (2017). Model selection criteria in beta regression with varying dispersion. Communications in Statistics—Simulation and Computation 46 (1), 729–746.
- 8. Ospina R., Ferrari S. L. P., (2012). A general class of zero-or-one inflated beta regression models. Computational Statistics and Data Analysis 56, 1609–1623.
- 9. Hoof A., (2007). Second stage DEA: comparison of approaches for modelling the DEA escore. European Journal of Operational Research 181, 425–435.
- 10. Cook D. O., Kieschnick R., McCullough B. D., (2008). Regression analysis of proportions in finance with self selection. Journal of Empirical Finance 15 (5), 860–867.
- 11. Ferrari S., Cribari-Neto F., (2004). Beta regression for modelling rates and proportions. Journal of Applied Statistics 31 (7), 799–815.
- 12. Bayes C. L., Valdivieso L., (2016). A beta inflated mean regression model for fractional response variables. Journal of Applied Statistics 43 (10), 1814–1830.
- 13. Teyarachakul S., Chand S., Tang J., (2007). Estimating the limits for statistical process control charts: A direct method improving upon the bootstrap. European Journal of Operational Research 178, 472–481.
- 14. Capizzi G., Masarotto G., (2011). A least angle regression control chart for multidimensional data. Technometrics 53 (3), 285–296.
- 15. Smyth G. K., Verbyla A. P., (1999). Adjusted likelihood methods for modelling dispersion in generalized linear models. Environmetrics 10 (6), 695–709.
- 16.
Rahimi S. B., Amiri A., Ghashghaei R., (2019). Simultaneous monitoring of mean vector and covariance matrix of multivariate simple linear profiles in the presence of within profile autocorrelation. Communications in Statistics - Simulation and Computation Accepted.
- 17. Yeh A. B., Huwang L., Li Y.-M., (2009). Profile monitoring for a binary response. IIE Transactions 41 (11), 931–941.
- 18. Shang Y., Tsung F., Zou C., 7 2011. Profile monitoring with binary data and random predictors. Journal of Quality Technology 43, 196–208.
- 19. Zhang J., Ren H., Yao R., Zou C., Wang Z., (2015). Phase I analysis of multivariate profiles based on regression adjustment. Computers & Industrial Engineering 85, 132—144.
- 20. Qi D., Wang Z., Zi X., Li Z., (2016). Phase ii monitoring of generalized linear profiles using weighted likelihood ratio charts. Computers & Industrial Engineering 94, 178—187.
- 21. Canterle D. R., Bayer F. M., (2019). Variable dispersion beta regressions with parametric link functions. Statistical Papers 60, 1541–1567.
- 22. Gan F. F., (1995). Joint monitoring of process mean and variance using exponentially weighted moving average control charts. Technometrics 37 (4), 446–453.
- 23. Riaz M., (2008). Monitoring process variability using auxiliary information. Computational Statistics 23 (2), 253–276.
- 24. Sheu S.-H., Tai S.-H., Hsieh Y.-T., Lin T.-C., (2009). Monitoring process mean and variability with generally weighted moving average control charts. Computers & Industrial Engineering 57 (1), 401–407.
- 25. Huwang L., Huang C.-J., Wang Y.-H.T., (2010). New EWMA control charts for monitoring process dispersion. Computational Statistics & Data Analysis 54 (10), 2328–2342.
- 26.
Pawitan Y., (2001). In All Likelihood: Statistical Modelling and Inference Using Likelihood. Oxford Science publications.
- 27.
Casella G., Berger R. L., (2002). Statistical Inference, 2nd Edition. Thomson Learning.
- 28. Chen F., Chen S., Ma X., (2018). Analysis of hourly crash likelihood using unbalanced panel data mixed logit model and real-time driving environmental big data. Journal of Safety Research 65, 153–159. pmid:29776524
- 29.
Press W., Teukolsky S., Vetterling W., Flannery B., (1992). Numerical recipes in C: The art of scientific computing, 2nd Edition. Cambridge University Press.
- 30. Lima-Filho L. M. A., Pereira T. L., de Souza T. C., Bayer F. M., (2019). Inflated beta control chart for monitoring double bounded processes. Computers & Industrial Engineering 136, 265–276.
- 31. Boon Chong M., 1 2004. Performance Measures for the Shewhart Control Chart. Quality Engineering 16 (4), 585–590.
- 32. Schaffer J. R., Kim M.-J., (2007). Number of replications required in control chart Monte Carlo simulation studies. Communications in Statistics - Simulation and Computation 36 (5), 1075–1087.
- 33.
R Core Team, (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
- 34. Simas A. B., Souza W. B., Rocha A. V., (2010). Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis 54 (2).
- 35. Ho L. L., Fernandes F. H., Bourguignon M., (2019). Control charts to monitor rates and proportions. Quality and Reliability Engineering International 35 (1), 74–83.
- 36.
Montgomery D. C., (2009). Introduction to Statistical Quality Control, 6th Edition. John Wiley & Sons.
- 37. Moraes D. A. O., Oliveira F., Quinino R., Duczmal L., (2014). Self-oriented control charts for efficient monitoring of mean vectors. Computers & Industrial Engineering 75, 102—115.
- 38. Paroissin C., Penalva L., Pétrau A., Verdier G., (2016). New control chart for monitoring and classification of environmental data. Environmetrics 27, 182–193.
- 39.
Lima-Filho L. M. A., Bayer F. M., (2019). Kumaraswamy control chart for monitoring double bounded environmental data. Communications in Statistics - Simulation and Computation Accepted.
- 40. Sousa M. C. S., Cribari-Neto F., Stosic B. D., Nov. 2005. Explaining DEA Technical Efficiency Scores in an Outlier Corrected Environment: The Case of Public Services in Brazilian Municipalities. Brazilian Review of Econometrics 25 (2), 287–313.
- 41. Pereira T. L., Cribari-Neto F., (2014). Detecting model misspecification in inflated beta regressions. Communications in Statistics—Simulation and Computation 43 (3), 631–656.