Bootstrap-based inferential improvements to the simplex nonlinear regression model

Alisson de Oliveira Silva; Jonas Weverson de Ararújo Silva; Patrícia L. Espinheira

doi:10.1371/journal.pone.0272512

Abstract

In this paper we evaluate the performance of point and interval estimators based on the maximum likelihood(ML) method for the nonlinear simplex regression model. Inferences based on traditional maximum likelihood estimation have good asymptotic properties, but their performance in small samples may not be satisfactory. At out set we consider the maximum likelihood estimation for the parameters of the nonlinear simplex regression model, and so we introduced a bootstrap-based correction for such estimators of this model. We also develop the percentile and bootstrap_t confidence intervals for those parameters as competitors to the traditional approximate confidence interval based on the asymptotic normality of the maximum likelihood estimators (MLEs). We then numerically evaluate the performance of these different methods for estimating the simplex regression model. The numerical evidence favors inference based on the bootstrap method, in special the bootstrap_t interval, which was decisive in an application to real data.

Citation: Silva AdO, Silva JWdA, Espinheira PL (2022) Bootstrap-based inferential improvements to the simplex nonlinear regression model. PLoS ONE 17(8): e0272512. https://doi.org/10.1371/journal.pone.0272512

Editor: Angelo Moretti, Utrecht University: Universiteit Utrecht, NETHERLANDS

Received: May 14, 2022; Accepted: July 20, 2022; Published: August 9, 2022

Copyright: © 2022 Silva et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: The data are in Supporting information.

Funding: During her doctoral work, Jonas Weverson de Araújo Silva received a scholarship of approximately $423 (four hundred and twenty-three dollars) per month just to cover her living and food expenses. This scholarship is provided to the student by a federal agency in Brazil called “Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES”. The authors have no grant of any kind.

Competing interests: The authors have declared that no competing interests exist.

1 Introduction

Normal linear regression models are widely used in the most diverse areas of knowledge. Currently, several proposals of regression models for doubly-constrained regression models for doubly-constrained response variables, which assume continuous values in (a, b), where a and b are known and −∞ < a < b < ∞, thus, such support can be easily transformed to the unit interval.

In this context, where y ∈ (0, 1) or (y ∈ (a, b)), the normal linear model is inadequate, because besides the possibility of occurrence of fitted values smaller than 0(a) or larger than 1(b), in general, the data present asymmetry and heteroscedasticity, violating the usual assumptions of such model. Thus, it seems more appropriate to consider models based on distributions naturally supported on (0, 1) as is the case of the simplex regression model proposed by [1], for example.

The simplex distribution was developed from the generalized inverse Gaussian distribution and is part of the class of dispersion models defined by [2], which extend the [3] generalized linear models. Several papers have been conducted using this distribution. For example, [4] used it to evaluate longitudinal data considering the constant dispersion parameter, using generalized estimating equations. [5] modified this approach with the assumption that the dispersion parameter varies across observations. Based on the dispersion models, Using the Bayesian approach and Monte Carlo simulations, [6] evaluates the estimators of the parameters of the simplex model with variable dispersion.

Other approaches for modeling limited data are the beta [7], Kumaraswamy [8], Johnson S_B [9], unit gamma [10] regression models. Recently published papers show possible advantages of using the latter distribution over the beta distribution [11, 12]. Recently, [13] proposed to the class of non-linear simplex regression models, in which they estimate the model parameters using the maximum likelihood method and derive the local influence quantities. The authors showed that when data are concentrated at the extremes of the standard unit interval, the maximum likelihood estimation process of the simplex model is more stable than that of the beta regression model. [14] presented the zero-and-one-inflated simplex distribution for modeling proportion data. The authors introduced a new algorithm to compute maximum likelihood estimates of the parameters of the simplex distribution without covariates, and developed likelihood-based inference methods for the regression model using this new distribution.

The study of the behavior of asymptotic maximum likelihood estimators in small samples is an important area of research. These estimators can be biased when the sample size is small or even moderate. The bias is actually a measure of average risk. The average risk in replacing the true value of the parameter with a plausible estimated value. Bias can also be seen as how far the mean of an estimator is far from the true value of the parameter. Thus, it is desirable to obtain estimators with reduced bias in finite samples. When the sample size is large the bias tends to zero. In the literature, there are several ways to obtain less biased estimators in small samples. Here, we shall adopt a bias correction obtained from the bootstrap method [15].

In statistical inference it is of fundamental importance to associate reliability to the point estimates of the model, and one way to do this is through the construction of the interval estimators of the parameters in association with the probability that the estimators contain the true value of these parameters. Confidence intervals can be obtained through the assumption that the asymptotic distribution of the maximum likelihood estimators is the normal probability distribution, which may require large samples to ensure the validity of these approximation. In small samples, an alternative for constructing confidence intervals with good performance with respect to both the coverage rate of the true value of the parameter and the length of the interval is the bootstrap method [15]. Specifically we shall adopt two bootstrap-based confidence intervals, namely: the percentile and bootstrap_t. These two schemes typically have empirical coverage rates very close to the nominal ones [16].

Regarding modeling limited continuous data, several authors have already conducted improvements on inference based on the maximum likelihood estimation method. [17] propose both the nonlinear beta regression model and improvements for the maximum likelihood estimators. [18] present corrections to the generalized likelihood ratio statistic (LR) based on [19] for the class of beta regression models whereas [11] used the same strategy considering the unitary gamma distribution. [20] also evaluate the impact of model misspecification on empirical coverage of different prediction intervals, and investigate the impact of model misspecification on three bootstrap prediction intervals. [21] discuss test inference in small samples in the class of beta regression models. The authors consider the LR test and its bootstrap versions, show that the standard LR test tends to be quite liberal in small samples and that bootstrap-based tests provide more reliable inference even when the sample size is very small.

In this document our aim is twofold. At outset we shall developed bootstrap-based inferential improvements for the parameters that index the class of nonlinear simplex regression models proposed by [13]. In which the mean of the response variable and the dispersion parameter are related to covariates by means of nonlinear predictors. In sequence, we shall jointly evaluate the performance of the competing estimators, namely: the MLEs and the bootstrap-based estimators introduced by us.

We evaluate several aspects of interval estimation by Monte Carlo simulations. The bootstrap method proved to be an important tool estimation on nonlinear simplex regression, because through it we can get around several inferential MLEs’ problems in finite samples. Finally, we present an application whose data is from the Chemistry department of the National University of Colombia.

2 Nonlinear simplex regression model

In the literature there are several discrete and continuous distributions that belong to the class of dispersion models, among which we can mention the distributions: normal, inverse normal, gamma, Von Mises, Poisson, Binomial, negative Binomial, and others. In particular, if a random variable y follows the simplex distribution denoted by S⁻(μ, σ²) with parameters 0 < μ < 1 and σ² > 0, the density expression takes the following form: (1) where the deviance component d(y; μ) is given by The variance function for the simplex distribution is expressed as V(μ) = μ³(1 − μ)³. The mean and variance of this distribution are given, respectively, by and where Γ(a, b) corresponds to the incomplete gamma function, defined by . For more details on these properties, see [2]. The simplex distribution is quite flexible for modeling data in the continuous range (0, 1), showing different shapes according to the values of the parameters that index the distribution. For examples, such as the J shape for S⁻(0.9, 36), the U shape for S⁻(0.5, 121) and the inverse J shape for S⁻(0.1, 36), in addition to the common shapes, namely left-symmetric, right-symmetric and symmetric. Also, unlike the beta distribution, the simplex model is very useful for accommodating data with bimodal distributions, example for S⁻(0.5, 20).

Let y₁, …, y_n be independent random variables, where each y_t, t = 1, …, n, follows a simplex distribution, whose probability density function is given by (1) with mean μ_t and dispersion parameter . The nonlinear simplex regression model proposed by [13] is defined by (1) and the systematic components given by (2) where β = (β₁, …, β_k)^⊤ and γ = (γ₁, …, γ_q)^⊤ are unknown regression parameter vectors such that and , k + q < n, η_t = (η₁, …, η_n)^⊤ and ζ_t = (ζ₁, …, ζ_n)^⊤ are nonlinear predictors and and are, respectively, k₁ and q₁ observations of known covariates, which may coincide fully or partially such that k₁ ≤ k and q₁ ≤ q.

In the linear models k₁ = k, q₁ = q, and therefore, and are, respectively, the t-th rows of the matrices X and Z, for t = 1, …, n. Linear models are a particular case of nonlinear models. When we have nonlinearity in the parameters, at least one of the ∂f₁(⋅; β)/∂β_j, j = 1, …, k depends on β^⊤ = (β₁, …, β_k) and, at least one of the ∂f₂(⋅; γ)/∂ γ_l, l = 1, …, q depends on γ^⊤ = (γ₁, …, γ_q). For linear simplex models, these derivatives depend only on the covariates x₁, …, x_k and z₁, …, z_q, respectively, and so and .

Moreover in (2) the link functions and are strictly monotone and at least twice differentiable. Different link functions can be chosen for g and h. For example, for μ we can use the logit function g(μ) = log{μ/(1 − μ)}, the probit function g(μ) = Φ⁻¹(μ), where Φ(⋅) denotes the standard normal distribution function, the log-log function g(μ) = log{−log(1 − μ)} and the log-log complementary function g(μ) = log{−log(1 − μ)}, among others. Since σ² > 0, we can use the log function h(σ²) = log(σ²) and the identity function h(σ²) = σ². However, one should be aware if the estimates resulting from the likelihood maximization process take on positive values. If, in fact, the identity link function is the most appropriate, negative estimates shall not occur for the , t = 1, …, n and the diagnostic analysis shall corroborate for the model goodness-of-fit to the data when using such a link function. For more details, see [3, 22]. Finally, in (2) we have that g(μ_t) = η_t and , t = 1, …, n, are the mean and the dispersion submodels, respectively.

To provide the quantities related to the estimation by the maximum likelihood procedure we shall consider the general case, with nonlinearity in the parameters. Thus, we must emphasize that, f₁(⋅) and f₂(⋅) are differentiable functions with Jacobian matrices. Based on (1) we have that the logarithm of the likelihood function is given by , in which

The components of the score vector (U_β(β, γ)^⊤, U_γ(β, γ)^⊤)^⊤ are given by with and being derivative matrices of dimension n × k and n × q, respectively, y = (y₁, …, y_n)^⊤, μ = (μ₁, …, μ_n)^⊤ and a = (a₁, …, a_n)^⊤ are n × 1 matrices and U = diag{u₁, …, u_n} is a diagonal matrix in which the t-th component is defined as Moreover, (3)

To obtain the Fisher information matrix for the parameter vectors β and γ, we shall use the following results: , and [4]; , and Var[d(y; μ)] = 2(σ²)² [23] and [5]. The Fisher information matrix for the parameter vector θ = (β^⊤, γ^⊤)^⊤ so-called here by K(β, γ) is a diagonal matrix with two blocks of submatrices which are K_ββ and K_γγ defined as follows and . Here, W = diag{w₁, …, w_n} and D = diag(d₁, …, d_n) with Since K(β, γ) is a diagonal block matrix, the vectors β and γ are globally orthogonal [24] so that their MLEs and are asymptotically independent. For large samples and under regularity conditions the approximate distribution of the MLEs is given by (4) To measure the degree of non-constant dispersion, we define , t = 1, …, n. Note that the greater the λ the further away the simplex regression model with varying dispersion is from the model in which the dispersion is supposed to be fixed, since the constant dispersion models holds that , in either case λ = 1. Furthermore, this λ definition measure actually as the increase of variance response effects the estimation process of the model. To became σ² variable it is necessary increases the , otherwise the should be too small and, do not plausible. Thus, as greater is λ as greater is the response variances, in the real problems. Here during the simulations we control the value of the maximum variance, because exactly what we want is to evaluate the properties of the estimators when the variance does not explode, but only grows slightly. When working with real data, the occurrence of large values of estimated λ is substantial, i.e., λ > 1 (in particular, n when it is large).

We still need to discuss the variances of the responses further. The first part of the expression in (4) implies that the vector is asymptotically unbiased. Thus, as the sample size increases, is approximately unbiased and its bias should be close to zero. In theory this fact is true only when n approaches infinity, that is, asymptotically. In practice the better the approximation in (4), and this depends on the distribution, the faster the bias goes to zero, i.e. this can occur for sample sizes n = 40, 50….

However, this assumption is mostly valid for due to its relationship with which is theoretically unbiased, (exactly and not approximately, typically). On the other hand, the relationship of the vector is with , which is theoretically biased in most distributions. Thus, it is already expected that the bias takes a long time to converge to zero and requires large sample sizes for this to occur.

This discussion reveals that we should be more aware of how the corrections act on the . Note that biased shall induce biased response variances. As a consequence, hypothesis tests and confidence intervals should perform poorly and may lead to misleading conclusions about the model, such as the exclusion of important covariates.

3 Point estimation of the model parameters

The maximum likelihood estimators of β and γ are obtained by Fisher’s interactive scoring process in which the initial guess was proposed in [13], and are usually biased when the sample size is small or even moderate, particularly . Nevertheless, the estimator’s bias can be corrected and one possibility is to use resampling methods, which are schemes that use repeated sampling within the same sample to calculate estimates. The bootstrap method is one of the most widely used resampling methods, and one that gives very satisfactory results for estimating a model. In this paper we adopt the parametric bootstrap where in the regression models context it assumes that the probability distribution of the response variable is known and indexed by unknown parameters. [15]. The steps for performing this method both for bias correction of the MLEs and for obtaining the confidence interval are described in the Algorithms (1), (2) and (3).

Algorothm 1: Parametric method

1: Suppose that y = (y₁, …, y_n)^⊤ is a random sample such that each y_t, t = 1, …, n, follows a distribution F supposedly known and indexed by parameter vector θ;

2: From the original sample, obtain the estimate of θ;

3: Generate B bootstrap samples of size n, namely from , b = 1, …, B;

4: For each bootstrap sample compute ;

5: Repeat steps 3 and 4 a great number B of times, thus obtaining ;

6: Use the estimates , with b = 1, …, B for compute the desired quantities, for instance: mean, variance, confidence interval, etc. regarding distribution of y.

Once the estimate of the estimator’s bias is obtained we can construct the bias-corrected point estimators. Using the steps of the bootstrap method presented in Algorithm (1), a bootstrap estimate of the bias can be obtained by where , i.e., it is possible to approximate the expected value from the arithmetic mean of the bootstrap estimates of θ. Thus, we can obtain an estimator corrected up to second order by bootstrap [15, 25]: This estimator has the same asymptotic properties as the usual MLE and presents Lower bias in small samples [16]. A detailed discussion of the bootstrap second-order bias correction and its relation to the analytic correction can be found in [26].

4 Interval estimation of the model parameters

A set constructed on the basis of a point estimator in association with a probability that this set contains the true value of the parameter, defines a confidence interval estimator. The general form for approximate confidence intervals (CI) for θ is: where l₁ and l₂, (l₁ < l₂) are the lower and upper bounds of the confidence interval, respectively, and 1 − α is the confidence level which converges to the probability of coverage. We should emphasize that l₁ and l₂ are quantiles of a distribution indexed by the parameter θ. Whether we assume that this distribution is known, it is possible to construct exact confidence intervals. However, defining the exact analytical distribution of a random variable is typically highly challenging.

Fortunately, there are diverse approaches to building approximate confidence intervals. The most widely used is the asymptotic confidence interval, which assumes asymptotic normality of the MLEs. According to (4) for the simplex model in large samples the distribution of is approximately normal with mean equal to θ = (β^⊤, γ^⊤)^⊤ and the variance and covariance matrix given by (4). More precisely, we have that evaluated at is the k × k matrix of variances and covariances of and evaluated at is the q × q matrix of variances and covariances of .

Consider β_i and γ_j, with i = 1, …, k and j = 1, …, q, the ith and jth components of the vectors β and γ, respectively. We shall denote and as the i-th and j-th components of the main diagonal of the matrices and , respectively. Therefore, it follows that and are intervals with confidence approximately equal to 1 − α for β_i and γ_j, respectively, where is the quantile of the standard normal distribution. These intervals based on MLE may require large samples for the coverage to be close to the nominal ones. In small samples, they can have large coverage errors [15, 25].

An workaround for reaching improvements to confidence intervals in small samples, without analytical complexities, is the bootstrap method. This approach typically provides confidence intervals that have coverage levels close to the true coverage probability. Here, we shall discuss two strategies bootstrap-based confidence intervals, namely: the percentile and bootstrap_t.

The percentile bootstrap confidence interval we shall denote by ‘Bootp’ [16] is a bootstrap approach built based on a B finite replicates of the estimators of the parameters of interest. Furthermore, it displays the monotonic transformation invariance property. Let F(θ) be the distribution of the response variable assumed to be known and indexed by the parameter vector θ. Moreover, let be the empirical distribution function of obtained from the B bootstrap replicas. We can construct the percentile confidence interval, with approximate coverage level 1 − α, by calculating 1 − α/2 and α/2 quantiles of . The interval is given by Defining and . The expressions of the percentile bootstrap confidence intervals for the parameters of the nonlinear simplex regression model are given by: (5) i = 1, …, k and j = 1, …, q, the i-th and j-th components of the vectors β and γ. The percentile interval is not necessarily symmetric about the value of and . Its construction ensures that improper values for the parameter of interest are not included in the confidence interval. The steps for its construction are described in Algorithm 2.

Algorithm 2: Bootstrap confidence interval—Percentile

1: Generate B bootstrap samples based on , for b = 1, …, B;

2: Let , in which, y = (y₁, …, y_n) is the original sample. Thus, the respective bootstrap estimate of de θ is computed as follows: , b = 1, …, B;

3: The B replicas of must be ordered.

4: The lower and upper limits of the percentile interval are provide by the replicas of of order B × (α/2) and B × (1 − α/2), respectively, by assuming that B × (α/2) and B × (1 − α/2) are integers and 0 < α < 1; Meaning and .

4.1: Whether B × (1 − α/2) and B × (1 − α/2) are not integers, we can use the following procedure:

4.1.1: Assuming 0 < α < 1, let p = [(B + 1)α/2] be the largest integer less than or equal to the number (B + 1)α/2; then, we define the lower and upper bounds of the percentile interval by the p-th and (B + 1 − p)-th ordered elements of the B bootstrap replicas of , respectively.

The bootstrap_t confidence interval, here so-called as ‘Boot_t’ [16] is a pivotal method to construct confidence intervals that rely on the traditional t-Student confidence interval. This interval is based on the bootstrap estimate of the T distribution, where T is given by where is the standard error of . The construction of the bootstrap_t confidence interval is given by Algorithm 3.

Algorithm 3: Bootstrap confidence interval—Bootstrap_t

1: Generate B bootstrap samples from ;

2: For each bootstrap sample, it is compute with b = 1, 2, …, B, where is the estimated value of θ from the original sample y, is the estimated value of θ for the bootstrap sample y^*b and is the standard error of for the bootstrap sample y^*b. Note that ep = κ(θ), κ known function and ;

3: The α/2 and 1 − α/2 percentiles of T^*b are estimate by the values and , respectively, as follows

Thus, the bootstrap_t confidence interval is given by in which . The amounts and can be obtained as follows:

1. Sort the B bootstrap replicas T^*b;

2. The quantiles and are, respectively, the replicas corresponding to the integer parts of B × (α/2) and B × (1 − α/2);

2.1. If B × (α/2) and B × (1 − α/2) are not integers, we can use the following procedure:

Assuming 0 < α < 1, is k = [(B + 1)α/2] the largest integer less than or equal to the number (B + 1)α/2. Thus, the quantiles bootstrap and are given, respectively, by the k-th and (B + 1 − k)-th ordered elements of T^*b. Therefore, the bootstrap_t intervals for the parameters of the simplex nonlinear regression model are given by the following expressions with i = 1, …, k and j = 1, …, q, the i-th and j-th components of the vectors β and γ.

According to [16], the bootstrap_t intervals outperform the asymptotic interval displaying empirical coverages closer to the exact nominal levels, but tend not to be accurate in actual practice. Percentile intervals are more accurate, but display less satisfactory coverage performances. An outstanding discussion on bootstrap-based confidence intervals can be found in [27]. In what follows we shall evaluate the finite-sample performances of the confidence intervals introduced in this section.

5 Numerical results on point estimation

In this section we present the Monte Carlo simulations results, carried out to evaluate the performances of the maximum likelihood estimators of the nonlinear simplex regression model and the bootstrap versions on small samples. In what follows we shall assuming the following nonlinear simplex regression model: (6) where g(⋅) and h(⋅) are the logit and logarithmic link functions, respectively. The realizations of the covariates were generated using the uniform distribution as follows: , , and which are retained fixed for each Monte Carlo replication. Three different scenarios were considered for the mean response, namely: μ_t ∈ (0.02, 0.32) with β = (−2.4, 1.2, −1.5, −1.7)^⊤; μ_t ∈ (0.19, 0.86) with β = (−1.7, −1.8, 1.2, −1.3)^⊤ and μ_t ∈ (0.78, 0.98) with β = (2.1, −1.5, −1.6, −1.2)^⊤. Furthermore, concerning the degree of non-constant dispersion, we report here the results for λ ≈ 12 with γ = (−1.3, −1.6)^⊤; λ ≈ 45 with γ = (−1.3, −2.1)^⊤ and λ ≈ 128 with γ = (−1.3, −2.4)^⊤. The sample sizes chosen were n = 40, 80 and 120. For the last two cases we initially generated n = 40 covariates observations and these were replicated twice and three times, respectively to obtain the sample sizes n = 80 and n = 120. This was done to ensure that the non-constant dispersion intensity was the same for all sample sizes. The number of Monte Carlo and bootstrap replications were R = 10000 and B = 500, respectively. The parameter estimates in (6) were obtained by maximizing the log-likelihood function using the Fisher’s nonlinear optimization method.

For each both Monte Carlo replicate and the maximum likelihood estimate of the model parameters, B bootstrap replicate estimates were generated. Thus, at the end of the bootstrap some quantities regarding the parameters are estimated, namely: the corrected bootstrap estimates and the bootstrap confidence intervals, percentile and bootstrap_t. Finally, outside the bootstrap, the asymptotic intervals of the parameters are also computed based on the quantiles of the standard normal distribution.

Aiming to evaluate the performance of the point estimation of the parameters, the relative bias and the square root of the mean square error were calculated for each sample size. Additionally, we introduce a measure suggested during the review of the article, which we shall so-call Unified Quadratic Bias (UQB) define as . In Tables 1–3 we consider, respectively, the scenarios where μ_t ∈ (0.02, 0.32), (μ_t ≈ 0), μ_t ∈ (0.19, 0.86), (μ_t ≈ 0.5) and μ_t ∈ (0.78, 0.98), (μ_t ≈ 1), t = 1, …, n. In these tables are reported the relative biases and the square roots of the mean square errors (RMSEs) of the parameter estimators for n = 40, 80 and 120 and λ ≈ 12, 45 and 128. We observe that in modulo the estimates of the relative bias of the bootstrap corrected estimators are smaller than those of the maximum likelihood estimators, evidencing the efficacy of the bootstrap scheme in bias correction.

Download:

Table 1. Relative biases and root mean square errors of the Maximum Likelihood Estimators (MLEs-asymptotic) and bootstrap corrected MLEs of the model parameters:

and

, t = 1, …, n, β = (−2.4, 1.2, −1.5, −1.7)^⊤, μ_t ∈ (0.02, 0.32), t = 1, …, n.

https://doi.org/10.1371/journal.pone.0272512.t001

Download:

Table 2. Relative biases and root mean square errors of the Maximum Likelihood Estimators (MLEs) and bootstrap corrected MLEs of the model parameters:

and

, β = (−1.7, −1.8, 1.2, −1.3)^⊤, μ_t ∈ (0.19, 0.86), t = 1, …, n.

https://doi.org/10.1371/journal.pone.0272512.t002

Download:

Table 3. Relative biases and root mean square errors of the Maximum Likelihood Estimators (MLEs-asymptotic) and bootstrap corrected MLEs of the model parameters:

and

, β = (2.1, −1.5, −1.6, −1.2)^⊤, μ_t ∈ (0.78, 0.98), t = 1, …, n.

https://doi.org/10.1371/journal.pone.0272512.t003

For example, the relative bias estimate of the (BOOT) estimator is equal to 0.0003, while that of the (MLE-asymptotic) is 0.001. For μ_t ∈ (0.19, 0.86), n = 120 and λ ≈ 12, the estimated biased is equal to 0.001 for and < 0.0001 for . In fact, it is noteworthy the high performance of the bootstrap correction when μ_t ∈ (0.19, 0.86), since its estimators exhibit lower biases than the MLEs-asymptotic for all model parameters, for the different levels of non-constant dispersion and the sample sizes. In all scenarios considered, we note that the RMSEs of the estimators decrease when the sample size increases.

As we had expected the MLEs-asymptotic of the parameters of the dispersion submodel tend to be more biased than those of the mean submodel, especially regarding γ₁. For instance, for μ_t ∈ (0.02, 0.32), n = 120 and λ ≈ 45, the relative bias estimate of is equal to 0.043, while that of the is < 0.0001. More expressive are the biases of and which drop from (0.136, −0.010) to (−0.001,0.001) after the bootstrap correction, respectively, when n = 40 and λ ≈ 45.

For in particular, the bootstrap correction provides a substantial reduction of the estimated bias. This is important since the correct estimation of the of the dispersion submodel parameters, directly interferes with the estimates of the response variances, which, when corrected, produce Z-tests that lead to truer decisions. Even so, the corrections were also effective for , i = 1, 2, 3, 4 because the goal is for the bias values to be as close to zero as possible, and for these parameters, through correction, the estimated bias became some times < 0.0001, i.e. the goal was achieved.

It is important to note that the estimated biases of the usual and corrected maximum likelihood estimators are notably smaller when the mean of the response variable is close to the upper limit of the unit interval than for the two other scenarios considered (Tables 1 and 3). Based on the Unified Quadratic Bias measure it becomes more evident how effective the bias correction we propose is. Let shall evaluated the results on μ_t ∈ (0.78, 0.98) and n = 40, for λ ≈ 12, 45 and 128, we have that the values of the UQB for the original MLEs are equal to 0.132, 0.137 and 0.138, whereas for the corrected version these values became 0.005, 0.001 and 0.002, respectively (Table 3).

6 Numerical results on confidence intervals

Concerning interval estimation, we computing the empirical coverage of the intervals (%), obtained from the relative frequencies in which the intervals contained the true value of the parameter. The lower and upper bounds were also estimated (via the average after the end of the Monte Carlo process), thus we were able to estimate the average length of the intervals and left and right non-coverage rates. The left rate is computed whenever the interval upper limit is less than the true value of the parameter and right rate is computed whenever the interval lower limit is greater than true value of the parameter.

In what follows we report the results of Monte Carlo simulations on interval estimation. We shall just take the nominal levels 0.90 and 0.95 concerning to Tables 4 and 5, respectively. These tables display the coverage rates of the following competing interval estimators: the asymptotic ML-like or ML interval approximation (ML-I_a), bootstrap_t (Boot_t) and percentile (Bootp) for the model parameters in (6).

Download:

Table 4. Coverage rates of the interval estimators: ML-I_a, Boot_t and Bootp for θ, the model parameters:

and

, t = 1, …, n, 1 − α = 0.90.

https://doi.org/10.1371/journal.pone.0272512.t004

Download:

Table 5. Coverage rates of the interval estimators: ML-I_a, Boot_t and Bootp for θ, the model parameters:

and

, t = 1, …, n, 1 − α = 0.95.

https://doi.org/10.1371/journal.pone.0272512.t005

Regarding coverage rates the interval that performs best is the bootstrap_t, with empirical coverage substantially closest to the nominal levels, for all parameters model. The asymptotic confidence interval displayed considerable undercoverage and the percentile confidence interval Bootp overall outperforms the ML-I_a, only for γ₁ the Bootp displays a poor performance. For instance, consider n = 40, μ_t ∈ (0.19, 0.86), 1 − α = 0.90 and all λ′ s values, the Bootp coverage ratios for this parameter are approximately equal to 0.66. Whereas those of the ML-I_a and Boot_t are approximately equal to 0.80 and 0.90, respectively. These behavior are similar for all scenarios of the mean response. The conclusions about coverage rates are quite similar whatever the nominal level is. To exemplify, for 1 − α = 0.95 and for the same other settings when 1 − α = 0.90, the ML-I_a, Boot_t and Bootp coverage rates are around 0.87, 0.95 and 0.74, respectively (Table 5). Now consider n = 120, 1 − α = 0.95, λ = 45, μ_t ∈ (0.19, 0.86), as to β₃, those values are equal to 0.933, 0.942 and 0.934 (Table 8). Meaning, even when the size of the sample increases the empirical coverage of the Boot_t interval is closest to the nominal level.

Our interest hereafter shall lie in evaluate some interval properties only for the nonlinearity parameters of the mean and the dispersion submodels, meaning β₂ and γ₂. Tables 6 through 9 present the mean lower (Lower) and upper (Upper) bounds, mean lengths (Size), the empirical probability of coverage (Coverage), as well as the left and right coverage rates (%) of the interval estimators of the previously mentioned parameters. These last two quantities evaluate the balancing of the interval. Perfect balancing occurs when these two percentages (%) are identical.

Download:

Table 6. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of Lower (%Left) and upper (%Right) non-coverage of the ML-I_a, Boot_t and Bootp intervals for β₂, in the model:

and

, t = 1, …, n, μ_t ∈ (0.02, 0.32), β₂ = 1.2, n = 40.

https://doi.org/10.1371/journal.pone.0272512.t006

In Tables 6–8 are presented the results for β₂, n = 40, n = 80 and n = 120, and only for μ_t ∈ (0.02, 0.32), t = 1…, n, where β₂ = 1.2. Note that if β₂ = 1 the mean submodel becomes linear. For n = 40, λ ≈ 12 and λ ≈ 45 only the Boot_t interval and for 1− α = 0.99 considers this possibility.

Download:

Table 7. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of Lower (%Left) and upper (%Right) non-coverage of the ML-I_a, Boot_t and Bootp intervals for β₂, in the model:

and

, t = 1, …, n, μ_t ∈ (0.02, 0.32), β₂ = 1.2, n = 80.

https://doi.org/10.1371/journal.pone.0272512.t007

Download:

Table 8. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of lower (%Left) and upper (%Right) non-coverage of the ML-I_a, Boot_t and Bootp intervals for β₂, in the model:

and

, t = 1, …, n, μ_t ∈ (0.02, 0.32), β₂ = 1.2, n = 120.

https://doi.org/10.1371/journal.pone.0272512.t008

Its empirical probability of coverage is the closest to the true one, however, the its mean length interval is longer than those of the other two intervals, which allows the inclusion of β₂ = 1.0, i.e., linearity (Table 6). This behavior of the Boot_t interval holds for n = 80 (Table 7). When λ ≈ 128 all intervals include the possibility of linearity when n = 40, i.e., β₂ = 1.0 (Table 6), for all confidence nominal levels. This result is interesting, as it shows how the intensity of non-constant dispersion negatively affects the performances of the three confidence intervals considered.

As the sample size increases the problem is smoothed. For instance, when n = 80, 1 − α = 0.90, λ ≈ 128, only the bootstrap_t interval considers the possibility of linearity, namely: ML-I_a: = (1.006, 1.401), Boot_t: = (0.993, 1.410) and Bootp: = (1.010, 1.406). Nevertheless, if we would use only one decimal approximation those intervals would become ML-I_a: = (1.0, 1.4), Boot_t: = (1.0, 1.4) and Bootp: = (1.0, 1.4), therefore, admittedly equivalents. We should also point out that the average lengths (Size) of all intervals decrease as the sample size increases.

Our spotlight hereafter is show how the Boot_t considerably outperforms the accuracy and balance of its competitors, concerning for β₂ interval. We shall fix 1 − α = 0.95 and consider n = 40 and three λ’s values. We shall compose the following set consisting of: the empirical coverage and left and right non-coverages rates of the interval estimators, expressed as {(⋅), [⋅%][⋅%]}. For λ ≈ 12, the sets of the ML-I_a, Boot_t and Bootp estimators are equal to {(0.919), [3.61%][4.49%]}, {(0.950), [2.40%][2.61%]} and {(0.923), [3.20%][4.47%]}. For λ ≈ 45 those sets have become {(0.919), [2.65%][3.21%]}, {(0.950), [2.55%][2.67%]} and {(0.922), [2.97%][3.41]}. Finally, when λ ≈ 128 the respective sets are {(0.917), [3.94%][4.40%]}, {(0.949), [2.66%][2.39%]} and {(0.916), [3.98%][4.48%]}.

Table 9 present the simulation results for β₂ (μ_t ≈ 0.5 and μ_t ≈ 1), when n = 40, λ ≈ 128 and β₂ equal to −1.8 and −1.5, respectively. We note that for the three nominal levels and the different scenarios, the asymptotic type confidence interval has the shortest average length. We also note that the bootstrap_t confidence interval presents the best empirical coverage and balance proprieties, followed by the Bootp interval which had very similar values to the ML-I_a interval.

Download:

Table 9. Lower and upper bounds, size, empirical coverage (Coverage) and percentages of lower (%Left) and upper (%Right) non-coverage of the ML-I_a, Boot_t and Bootp intervals. For β₂ in the model:

and

, t = 1, …, n, n = 40, λ ≈ 128.

https://doi.org/10.1371/journal.pone.0272512.t009

Figs 1 and 2 contain histograms constructed from the 10000 maximum likelihood estimates of the parameter β₂ and γ₂, respectively, for n = 40, λ ≈ 128 and the different scenarios for μ_t, t = 1, …, n. The distinct lines represent the different confidence intervals under evaluation, and their lengths correspond to the respective average lengths. The values below and above of the vertical lines are the non-coverage rates, meaning the percentages of replicates in which the true value of the parameter was smaller than the lower limit of the interval (below) and larger than the upper limit of the range (above).

Download:

Fig 1. Interval estimation for β₂, n = 40, λ ≈ 128.

(a) β₂ = 1.2, (b) β₂ = −1.8 and (c) β₂ = −1.5.

https://doi.org/10.1371/journal.pone.0272512.g001

Download:

Fig 2. Interval estimation for γ₂, n = 40. γ₂ = −2.4.

https://doi.org/10.1371/journal.pone.0272512.g002

These graphics were designed according to [28]. Through them it is possible to verify that for the different μ_t scenarios, the analyzed intervals are approximately symmetrical around the true value of β₂. We further note that for μ_t ∈ (0.02, 0.32), the intervals were better balanced when compared to the scenarios where μ_t ∈ (0.19, 0.86) and μ_t ∈ (0.78, 0.98). Overall, the bootstrap_t confidence interval stands out as better balanced. Fig 2 show that only the asymptotic confidence interval is approximately symmetric around the true value’s γ₂.

The bootstrap_t confidence interval is slightly asymmetric around γ₂. Regarding the bootstrap percentile confidence interval it exhibits very strong asymmetry, especially for the nominal 99% level. We also observe that the asymptotic confidence interval exhibits strong unbalancing, as the rates (% Right) are markedly higher than the observed rates (% Left). However, the bootstrap_t and percentile confidence intervals are approximately balanced for all nominal levels and scenarios. Therefore, based on the results presented, we suggest using the bootstrap_t confidence interval which showed better coverage and balance performances.

7 Application: Fluid Catalytic Cracking Data (FCC)

In this application the data are from the Chemistry Department of the National University of Colombia [29] and concerns a process regarding the volume and quality of gasoline produced in a refinery. The fluid catalytic cracking process known as Fluid Catalytic Cracking (FCC) is used to convert high molecular weight hydrocarbons into small molecules of higher commercial value by contacting them with a catalyst. This process is often described as the heart of the refinery, as it allows production to be tailored for a higher demand and especially high profit products [29]. The process catalyst consists of fine particles of 10 to 150 microns, easily fluidizable having the zeolite Y [29] as the main component. Another important substance that participates in the catalysis process is the vanadium. This chemical component is known to participate in catalyst destruction, reducing the active surface, selectivity and crystallinity of the zeolite Y especially in the presence of steam. Every 1000 ppm of vanadium in the catalyst is known to reduce gasoline yield by about 2.3%. The process also depends on the temperature, which must be close to 720° C [29]. The data set consists of 28 observations.

Aiming to fit a model to these data [13] chose the following candidate to covariates: steam (x₂), temperature (x₃) and vanadium concentration (x₄). Moreover, the authors defined a linear predictor relating these covariates to unknown parameters. However, the residual analysis highlighted the possibility that the predictor is non-linear in some of the parameters. To build the nonlinear model, the authors follow several steps that are carefully detailed in their article. The model chosen uses probit and logarithmic link functions for the mean and dispersion submodels, respectively, and was defined as follows: , with t = 1, …, 28. Hereafter we shall so-called this model as ‘Model-I’. We emphasize that this simplex model outperformed a competing beta model [13]. Table 10 displays the maximum likelihood estimates, the bootstrap bias-corrected estimates, their respective standard errors (SE), and the p-values associated with the Z-tests for the significance of the model parameters. The ML estimates and their bootstrap corrected versions are quite similar with regards to the parameters of the mean submodel. Whereas the corrected estimates have lower standard errors than the ML estimates, with the exception only for the parameter β₁. Concerning the parameters of the dispersion submodel, we notice that the maximum likelihood estimates and their corrected version present slightly different values. Additionally, their respective standard errors are quite similar.

Download:

Table 10. Maximum likelihood estimates

, bootstrap bias-corrected estimates (

), standard errors (SE) and p-values associated with Z-tests for the parameters of the [13].

Fluid Catalytic Cracking Data (FCC).

https://doi.org/10.1371/journal.pone.0272512.t010

In Table 11 are reported the interval estimates of the Model-I parameters assuming nominal levels equal to 90%, 95% and 99%. Mindful the three estimation methods, ML-I_a, Boot_t and Bootp it is notice that the interval estimates for β₁, β₂, β₄ and β₅ are quite similar. Whereas for the β₃ parameter the bootstrap_t scheme estimates display lengths substantially longer than that of its competitors, for all nominal levels. Exemplify, for 1 − α = 0.95, the ML-I_a, Boot_t and Bootp interval estimates are, respectively, (−35.445; −20.240), (−40.929; −9.544) and (−34.745; −20.474). Another feature concerning the bootstrap_t interval estimator is that some of its estimates include the value zero for the parameters. This fact occurs for the β₂, (99%), (−0.466;0.142) and for the β₄, (95%) and (99%). Nevertheless, β₂ = 0 implies both in the exclusion of steam, which is a covariate recognized as important to the process and in the assumption of a linear predictor for the mean submodel. The most important information that the figures in Table 11 reveal, though, is that only the bootstrap_t interval considers the possibility that both γ₁ and γ₂ are simultaneously at equal to zero, both to 95% and 99%. Bootp reaches this conclusion for the 99% level, whereas for ML-Ia estimator it is only possible that γ₁ = 0 and when 1 − α = 0.99.

Download:

Table 11. ML-I_a, Boot_t and Bootp interval estimates for the parameters of the Model ‘Model-I’.

Fluid catalytic cracking (FCC) data.

https://doi.org/10.1371/journal.pone.0272512.t011

Therefore, we shall evaluate a nonlinear simplex model with constant dispersion. Among the competing models, the one that presented the best goodness-of-fit uses log-log complementary and logarithmic link functions for the mean and dispersion submodel, as we shall describe in the following: and , with t = 1, …, 28.

In what follows we shall appoint this model as ‘Model-II’ and provide some quantities about its parameters, namely: the maximum likelihood estimates, their bootstrap corrected versions: ‘(⋅)’ and the standard errors of the estimates ML: ‘[⋅]’. Thus, β₁: 0.9112 (0.9051) [0.1081], β₂: −1.4166 (−1.5062) [0.0016], β₃: −26.1208 (−25.7744) [3.6902], β₄: −0.2615 (−0.2571) [0.0635], β₅: −0.3366 (−0.3431) [0.0607] and γ₁: 0.0455 (0.2782) [0.2673]. Concerning Model-II it is possible to notice differences between the ML estimates and their bootstrap corrected versions. For example, is equal to −1.42 while its corrected version becomes −1.51. A further important issue regarding Model-II is that the correct modeling of the dispersion, considerably reduced the standard errors of the estimates. The SE of was 0.023 for Model-I whereas becomes 0.0016 for Model-II. Table 12 report the interval estimates of the ‘Model-II’ model parameters assuming nominal levels equal to 90%, 95% and 99%. It is noteworthy that the correct dispersion modeling improves the interval estimators performances. The accuracy of the boot_t and ML-Ia interval estimators regarding to the β₂ parameter is especially noteworthy. Let interval estimators be ML-Ia, Boot_t and Bootp, respectively, the interval estimates are (−1.419; −1.413), (−1.419; −1.414) and (−1.904; −0.757), for 99%. For 95%, (−1.419; −1.413), (−1.419; −1.413) and (−2.032; −0.631). Finally, (−1.420; −1.412), (−1.4212; −1.4116) and (−2.3049; −0.3229), for 90% Here, we note that the Bootp displays a poor performance. It should be reminded that β₂ is the parameter associated with the nonlinearity of the model, as well as β₃. In fact, after dispersion was assumed constant the Boot_t scheme provided intervals for β₃ with considerably shorter lengths compared to those of the ‘Model-I’ model.

Download:

Table 12. ML-I_a, Boot_t and Bootp interval estimates for the parameters of Model-II.

Fluid catalytic cracking (FCC) data.

https://doi.org/10.1371/journal.pone.0272512.t012

One area of research that we have been working on intensively regards model selection criteria for nonlinear models. The criterion proposed by [7] for the beta regression model, defined as the square of the correlation between g(y) and has proven quite effective in assessing the goodness-of-fit of models to data in the different applications we have performed on nonlinear models. The corrected was proposed by [30] and is defined as . Models I and II display measures equal to {0.6506, 0.5508} and {0.6818, 0.6095}, respectively. Thus, the choice of Model II is adequate and this model was inferred based on the bootstrap_t interval estimator.

8 Conclusion

In this paper we evaluate the point and interval estimation for the parameters indexing the nonlinear simplex regression model [13] in small samples. Additionally, we propose inferential improvements based on the bootstrap method.

Often MLEs can be biased when the sample size is small or even moderate. Thus, we consider comparing the point MLE performances of the model parameters and their corrected versions through a bootstrap scheme. The results of Monte Carlo simulations showed that, in general, the corrected estimators presented lower biases than the maximum likelihood estimators, evidencing the efficacy of the bootstrap scheme in bias correction. The MLEs of the parameters of the dispersion submodel are strongly biased, and the bootstrap corrected estimator provides a substantial reduction of these bias. Thus reinforcing the importance of using the proposed scheme in the bias correction of estimators of the nonlinear simplex regression model.

Usually the asymptotic confidence intervals based on MLE’s require large samples in order the coverage rates to be close to the nominal ones. An alternative to constructing adequate confidence intervals on small samples is through the bootstrap method. Thus, we consider three competing interval estimators, namely: the MLE-asymptotic, percentile and bootstrap_t estimator intervals. Regarding coverage rate in every simulation’s scenarios the bootstrap_t confidence interval outperformed the two others competitors. Furthermore, in almost all experiments it was the best balanced interval.

As a penalty for providing this outperformance, the bootstrap_t is typically larger than that of its competitors. Overall, however, the bootstapt interval proved to be the most appropriate estimator interval for nonlinear simplex regression. Not only from the simulation results, but it was also decisive in the application. In a scenario with only n = 28 observations, it was able to point the misspecification of the dispersion model which yield to a new and best fitted model.

Supporting information

S1 File.

https://doi.org/10.1371/journal.pone.0272512.s001

(OX)

S1 Text.

https://doi.org/10.1371/journal.pone.0272512.s002

(TXT)

References

1. Barndorff-Nielsen OE, Jørgensen B. Some parametric models on the simplex. Journal of Multivariate Analysis. 1991;39(1):106–116.
- View Article
- Google Scholar
2. Jørgensen B. The theory of dispersion models. London: Chapman and Hall; 1997.
3. McCullagh P, Nelder JA. Generalized linear models. London: Chapman and Hall; 1989.
4. Song PXK, Tan M. Marginal models for longitudinal continuous proportional data. Biometrics. 2000;56(2):496–502. pmid:10877309
- View Article
- PubMed/NCBI
- Google Scholar
5. Song PXK, Qiu Z, Tan M. Modelling heterogeneous dispersion in marginal models for longitudinal proportional data. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2004;46(5):540–553.
- View Article
- Google Scholar
6. López FO. A bayesian approach to parameter estimation in simplex regression model: a comparison with beta regression. Revista Colombiana de Estadística. 2013;36(1):1–21.
- View Article
- Google Scholar
7. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;31(7):799–815.
- View Article
- Google Scholar
8. Mitnik PA, Baek S. The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Statistical Papers. 2013;54(1):177–192.
- View Article
- Google Scholar
9. Lemonte AJ, Bazan JL. New class of Johnson S_B distributions and its associated regression model for rates and proportions. Biometrical Journal. 2016;58:727–746. pmid:26659998
- View Article
- PubMed/NCBI
- Google Scholar
10. Mousa AM, El-Sheikh AA, Abdel-Fattah MA. A gamma regression for bounded continuous variables. Advances and Applications in Statistics. 2016;49(4):305–326.
- View Article
- Google Scholar
11. Guedes AC, Cribari-Neto F, Espinheira PL. Modified likelihood ratio tests for unit gamma regressions. Journal of Applied Statistics. 2020; p. 1–25. pmid:35707584
- View Article
- PubMed/NCBI
- Google Scholar
12. Rocha SS, Espinheira LP, Cribari-Neto F. Residual and local influence analyses for unit gamma regressions. Statistica Neerlandica. 2021;75(2):137–160.
- View Article
- Google Scholar
13. Espinheira PL, Silva AO. Residual and influence analysis to a general class of simplex regression. Test. 2020; p. 1–30.
- View Article
- Google Scholar
14. Liu P, Yuen KC, Wu LC, Tian GL, Li T. Zero-one-inflated simplex regression models for the analysis of continuous proportion data. Statistics and Its Interface. 2020;13(2):193–208.
- View Article
- Google Scholar
15. Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics. 1979;7(1):1–26.
- View Article
- Google Scholar
16. Efron B, Tibshirani RJ. An introduction to the bootstrap. NewYork: Chapman & Hall; 1994.
17. Simas AB, Barreto-Souza W, Rocha AV. Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis. 2010;54(2):348–366.
- View Article
- Google Scholar
18. Ferrari SLP, Pinheiro EC. Improved likelihood inference in beta regression. Journal of Statistical Computation and Simulation. 2011;81(4):431–443.
- View Article
- Google Scholar
19. Skovgaard IM. Likelihood asymptotics. Scandinavian Journal of Statistics. 2001;28(1):3–32.
- View Article
- Google Scholar
20. Cribari-Neto F, Lima FP. Resampling-based prediction intervals in beta regressions under correct and incorrect model specification. Communications in Statistics-Simulation and Computation. 2019; p. 1–19.
- View Article
- Google Scholar
21. Lima FP, Cribari-Neto F. Bootstrap-based testing inference in beta regressions. Brazilian Journal of Probability and Statistics. 2020;34(1):18–34.
- View Article
- Google Scholar
22. Atkinson AC. Plots, transformations and regression: an introduction to graphical methods of diagnostic regression analysis. New York: Oxford University Press; 1985.
23. Silva FCd. Teste de diagnóstico baseado em influência local aplicado ao modelo de regressão simplex. Universidade Federal de Pernambuco; 2016.
24. Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society: Series B (Methodological). 1987;49(1):1–18.
- View Article
- Google Scholar
25. Davison AC, Hinkley DV. Bootstrap methods and their application. New York: Cambridge University Press; 1997.
26. Ferrari SL, Cribari-Neto F. On bootstrap and analytical bias corrections. Economics Letters. 1998;58(1):7–15.
- View Article
- Google Scholar
27. Hall P. Theoretical comparison of bootstrap confidence intervals. The Annals of Statistics. 1988; p. 927–953.
- View Article
- Google Scholar
28. Ospina R, Cribari-Neto F, Vasconcellos KL. Improved point and interval estimation for a beta regression model. Computational Statistics & Data Analysis. 2006;51(2):960–981.
- View Article
- Google Scholar
29. Salazar SMG. Contribuicion al estudio de la reaccion de decomposición de la Zeolita Y em presencia de vapor de agua y vanadio. Universidad Nacional de Colombia; 2005.
30. Bayer FM, Cribari-Neto F. Model selection criteria in beta regression with varying dispersion. Communications in Statistics-Simulation and Computation. 2017;46(1):729–746.
- View Article
- Google Scholar

[ref1] 1. Barndorff-Nielsen OE, Jørgensen B. Some parametric models on the simplex. Journal of Multivariate Analysis. 1991;39(1):106–116.
View Article
Google Scholar

[2] View Article

[3] Google Scholar

[ref2] 2. Jørgensen B. The theory of dispersion models. London: Chapman and Hall; 1997.

[ref3] 3. McCullagh P, Nelder JA. Generalized linear models. London: Chapman and Hall; 1989.

[ref4] 4. Song PXK, Tan M. Marginal models for longitudinal continuous proportional data. Biometrics. 2000;56(2):496–502. pmid:10877309
View Article
PubMed/NCBI
Google Scholar

[7] View Article

[8] PubMed/NCBI

[9] Google Scholar

[ref5] 5. Song PXK, Qiu Z, Tan M. Modelling heterogeneous dispersion in marginal models for longitudinal proportional data. Biometrical Journal: Journal of Mathematical Methods in Biosciences. 2004;46(5):540–553.
View Article
Google Scholar

[11] View Article

[12] Google Scholar

[ref6] 6. López FO. A bayesian approach to parameter estimation in simplex regression model: a comparison with beta regression. Revista Colombiana de Estadística. 2013;36(1):1–21.
View Article
Google Scholar

[14] View Article

[15] Google Scholar

[ref7] 7. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;31(7):799–815.
View Article
Google Scholar

[17] View Article

[18] Google Scholar

[ref8] 8. Mitnik PA, Baek S. The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Statistical Papers. 2013;54(1):177–192.
View Article
Google Scholar

[20] View Article

[21] Google Scholar

[ref9] 9. Lemonte AJ, Bazan JL. New class of Johnson S_B distributions and its associated regression model for rates and proportions. Biometrical Journal. 2016;58:727–746. pmid:26659998
View Article
PubMed/NCBI
Google Scholar

[23] View Article

[24] PubMed/NCBI

[25] Google Scholar

[ref10] 10. Mousa AM, El-Sheikh AA, Abdel-Fattah MA. A gamma regression for bounded continuous variables. Advances and Applications in Statistics. 2016;49(4):305–326.
View Article
Google Scholar

[27] View Article

[28] Google Scholar

[ref11] 11. Guedes AC, Cribari-Neto F, Espinheira PL. Modified likelihood ratio tests for unit gamma regressions. Journal of Applied Statistics. 2020; p. 1–25. pmid:35707584
View Article
PubMed/NCBI
Google Scholar

[30] View Article

[31] PubMed/NCBI

[32] Google Scholar

[ref12] 12. Rocha SS, Espinheira LP, Cribari-Neto F. Residual and local influence analyses for unit gamma regressions. Statistica Neerlandica. 2021;75(2):137–160.
View Article
Google Scholar

[34] View Article

[35] Google Scholar

[ref13] 13. Espinheira PL, Silva AO. Residual and influence analysis to a general class of simplex regression. Test. 2020; p. 1–30.
View Article
Google Scholar

[37] View Article

[38] Google Scholar

[ref14] 14. Liu P, Yuen KC, Wu LC, Tian GL, Li T. Zero-one-inflated simplex regression models for the analysis of continuous proportion data. Statistics and Its Interface. 2020;13(2):193–208.
View Article
Google Scholar

[40] View Article

[41] Google Scholar

[ref15] 15. Efron B. Bootstrap methods: another look at the jackknife. Annals of Statistics. 1979;7(1):1–26.
View Article
Google Scholar

[43] View Article

[44] Google Scholar

[ref16] 16. Efron B, Tibshirani RJ. An introduction to the bootstrap. NewYork: Chapman & Hall; 1994.

[ref17] 17. Simas AB, Barreto-Souza W, Rocha AV. Improved estimators for a general class of beta regression models. Computational Statistics & Data Analysis. 2010;54(2):348–366.
View Article
Google Scholar

[47] View Article

[48] Google Scholar

[ref18] 18. Ferrari SLP, Pinheiro EC. Improved likelihood inference in beta regression. Journal of Statistical Computation and Simulation. 2011;81(4):431–443.
View Article
Google Scholar

[50] View Article

[51] Google Scholar

[ref19] 19. Skovgaard IM. Likelihood asymptotics. Scandinavian Journal of Statistics. 2001;28(1):3–32.
View Article
Google Scholar

[53] View Article

[54] Google Scholar

[ref20] 20. Cribari-Neto F, Lima FP. Resampling-based prediction intervals in beta regressions under correct and incorrect model specification. Communications in Statistics-Simulation and Computation. 2019; p. 1–19.
View Article
Google Scholar

[56] View Article

[57] Google Scholar

[ref21] 21. Lima FP, Cribari-Neto F. Bootstrap-based testing inference in beta regressions. Brazilian Journal of Probability and Statistics. 2020;34(1):18–34.
View Article
Google Scholar

[59] View Article

[60] Google Scholar

[ref22] 22. Atkinson AC. Plots, transformations and regression: an introduction to graphical methods of diagnostic regression analysis. New York: Oxford University Press; 1985.

[ref23] 23. Silva FCd. Teste de diagnóstico baseado em influência local aplicado ao modelo de regressão simplex. Universidade Federal de Pernambuco; 2016.

[ref24] 24. Cox DR, Reid N. Parameter orthogonality and approximate conditional inference. Journal of the Royal Statistical Society: Series B (Methodological). 1987;49(1):1–18.
View Article
Google Scholar

[64] View Article

[65] Google Scholar

[ref25] 25. Davison AC, Hinkley DV. Bootstrap methods and their application. New York: Cambridge University Press; 1997.

[ref26] 26. Ferrari SL, Cribari-Neto F. On bootstrap and analytical bias corrections. Economics Letters. 1998;58(1):7–15.
View Article
Google Scholar

[68] View Article

[69] Google Scholar

[ref27] 27. Hall P. Theoretical comparison of bootstrap confidence intervals. The Annals of Statistics. 1988; p. 927–953.
View Article
Google Scholar

[71] View Article

[72] Google Scholar

[ref28] 28. Ospina R, Cribari-Neto F, Vasconcellos KL. Improved point and interval estimation for a beta regression model. Computational Statistics & Data Analysis. 2006;51(2):960–981.
View Article
Google Scholar

[74] View Article

[75] Google Scholar

[ref29] 29. Salazar SMG. Contribuicion al estudio de la reaccion de decomposición de la Zeolita Y em presencia de vapor de agua y vanadio. Universidad Nacional de Colombia; 2005.

[ref30] 30. Bayer FM, Cribari-Neto F. Model selection criteria in beta regression with varying dispersion. Communications in Statistics-Simulation and Computation. 2017;46(1):729–746.
View Article
Google Scholar

[78] View Article

[79] Google Scholar

Figures

Abstract

1 Introduction

2 Nonlinear simplex regression model

3 Point estimation of the model parameters

4 Interval estimation of the model parameters

5 Numerical results on point estimation

6 Numerical results on confidence intervals

7 Application: Fluid Catalytic Cracking Data (FCC)

8 Conclusion

Supporting information

S1 File.

S1 Text.

References