Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

New bounded unit Weibull model: Applications with quantile regression

  • Laxmi Prasad Sapkota ,

    Roles Conceptualization, Investigation, Project administration, Formal analysis, Methodology, Software, Resources, Writing – original draft, Writing – review & editing

    laxmisapkota75@gmail.com

    Affiliations Department of Statistics, Tribhuvan University, Tribhuvan Multiple Campus, Tansen, Nepal, Department of Mathematical and Physical Sciences, Miami University, Hamilton, Ohio, United States of America

  • Nirajan Bam,

    Roles Formal analysis, Methodology, Software, Writing – original draft, Writing – review & editing

    Affiliation Department of Mathematical and Physical Sciences, Miami University, Hamilton, Ohio, United States of America

  • Vijay Kumar

    Roles Methodology, Software, Supervision, Writing – review & editing

    Affiliations Department of Mathematical and Physical Sciences, Miami University, Hamilton, Ohio, United States of America, Department of Mathematics and Statistics, DDU Gorakhpur University, Gorakhpur, Uttar Pradesh, India

Abstract

In practical scenarios, data measurements like ratios and proportions often fall within the 0 to 1 range, posing unique modeling challenges. While beta and Kumaraswamy distributions are widely used, alternative models often yield better performance, though no clear consensus exists. This paper introduces a new bounded probability distribution based on a transformation of the Weibull distribution, with properties such as moments, entropies, and a quantile function. Additionally, we have developed the sequential probability ratio test (SPRT) for the proposed model. The maximum likelihood estimation method was employed to estimate the model parameters. A Monte Carlo simulation was conducted to evaluate the performance of parameter estimation for the model. Finally, we formulated a quantile regression model and applied it to data sets related to risk assessment and educational attainment, demonstrating its superior performance over alternative regression models. These results highlight the importance of our contributions to enhancing the statistical toolkit for analyzing bounded variables across different scientific fields.

Introduction

In many real-world applications, data are restricted to a bounded interval, often within the (0, 1) range, representing proportions, ratios, or fractions. The bounded nature of such data presents unique modeling challenges, leading researchers to focus on addressing these issues. The beta distribution and the Kumaraswamy distribution are frequently used in scenarios where outcomes are measured within the unit interval. Notably, the absence of a closed-form expression for the cumulative distribution function (CDF) of the beta distribution complicates the analysis of statistical properties that depend on the quantile function. Additionally, the beta distribution is not well-suited for sparse data scenarios [1]. To address these limitations, Kumaraswamy introduced the Kumaraswamy distribution as an alternative to the beta distribution. However, the Kumaraswamy distribution also lacks closed-form expressions for moments such as the mean and variance, posing challenges in the analytical study of statistical properties related to moments[2,3]. Thus, there is growing attention in research on introducing new distributions as alternatives to the beta and Kumaraswamy distributions.

In recent years, several distributions have been introduced as an alternative to the beta distribution and its associated regression models. Examples include the unit-Birnbaum-Saunders distribution [4], the two-parameter unit Gaussian distribution [5], the unit Johnson SU distribution [6], the two-parameter log-cosine-power unit distribution [7], and the unit-Lindley distribution [8], unit-gamma distribution [9], and the unit-exponentiated Lomax distribution [10]. Some of these distributions have more than two parameters, such as the unit-exponentiated Lomax distribution, which increases model complexity, while others, like the one-parameter unit Lindley distribution, offer less flexibility in controlling the precision of the distribution. More importantly, all of these distributions overlook a crucial statistical property known as the SPRT.

In applied research, researchers often require regression models where the response variable is bounded within the interval (0, 1). Early work on bounded regression, particularly in the context of police behavior, was conducted by Johns and Gates (1993) [11]. Subsequently, Paolino (2001) introduced a simplified version of beta regression, providing a more practical approach for such applications [12]. However, applying a regression framework to bounded data gained significant attention following the introduction of beta regression by Ferrari and Cribari-Neto (2004) [13]. The key distinction between Paolino’s and Ferrari’s work is that Paolino models both the conditional mean and the precision parameter as outcomes, while Ferrari and Cribari-Neto treat the precision parameter as a nuisance. Although beta regression has gained popularity for its flexibility in modeling continuous data bounded between 0 and 1, it is highly sensitive to outliers. Some conditional mean-based bounded regression models that have been proposed as alternatives to beta regression include unit Lindley regression [8], log-weighted exponential regression [14], and log-Bilal regression [15]. When the response variable exhibits skewness or contains outliers, traditional mean-based regression models may not be adequate to infer the relationship between the outcome variable and predictors, as the mean is highly sensitive to outliers. Consequently, median-based regression is suitable in cases where the response variable shows skewness or includes outliers, as the median is robust against outliers (see [16,17]).

Quantile regression [17] was first presented by Koenker and Bassett as a generalized linear framework to model the conditional quantiles of a response variable [18]. Its robustness to outliers has led to a surge in its popularity within both applied and theoretical statistics over the past few decades. Recently, there has been a growing interest in using quantile regression to model unit-bounded response variables along with sets of covariates. The extension of quantile regression to model bounded response variables and covariates is driven by two main reasons. Firstly, some distributions do not have a closed form for their mean but do have closed forms for their quantiles. Thus, using quantile regression with the quantile function simplifies the incorporation of regression structures in the proposed model. Another important factor is that replacing the conditional mean with the median is one of the most effective ways to reduce the influence of outliers, which is achievable through quantile regression.

Given the difficulties in fitting the beta distribution in engineering and hydrology applications, Kumaraswamy introduced the Kumaraswamy distribution as an alternative [1]. Nevertheless, its use was initially restricted due to the lack of closed-form moments. In 2013, Mitnik and Baek enhanced the Kumaraswamy distribution by reparametrizing it for quantile regression, which increased interest in Kumaraswamy regression [3]. Since then, quantile regression in bounded-area distributions has become more popular. For example, recent advancements in unit interval quantile regressions include the unit-Weibull distribution and its quantile regression [19], the two-parameter Burr XII distribution and its quantile regression [20], the log-log unit distribution and its quantile regression [21], the half-normal unit distribution and its quantile regression [22], the log-logistic unit distribution and its quantile regression [23], unit Burr XII quantile regression model [24], and the unit chain distribution and its quantile regression [25].

Although various regression models are discussed in the literature, some are designed to model the conditional mean, and others are designed to model the conditional median of bounded response variables. There is no agreement on a viable alternative to the beta and Kumaraswamy regression. While quantile-based regression is generally more robust to outliers than conditional mean-based regression, the widely used Kumaraswamy quantile regression remains susceptible to extreme values, particularly when the distribution exhibits unimodal behavior [26]. To address this limitation, [26] introduced the three-parameter Kumaraswamy Rectangular distribution, which enhances robustness. However, the inclusion of an additional parameter increases model complexity, making estimation and interpretation more challenging. Several other three-parameter probability distributions defined on the unit interval, along with their associated quantile regression models—such as the unit-exponentiated Lomax distribution [10] and the bounded exponentiated Weibull distribution [27]—further increase the complexity of statistical modeling.

Several unit-interval bounded probability distributions, along with their corresponding regression and quantile regression models have been discussed above, each with its own strengths and limitations. However, a fundamental property that all such distributions lack is the SPRT. SPRT is a key property of probability distributions with immediate applications in various applied fields. Some of its potential advantages include: it allows researchers to stop data collection early once sufficient evidence has been gathered to make a decision, thereby saving both time and resources [28]. Additionally, SPRT requires significantly smaller sample sizes compared to traditional hypothesis tests, making it more efficient in terms of both cost and data collection efforts [29]. Despite the importance of the SPRT in applied fields, it has not yet been introduced in the context of unit-interval bounded probability distributions. This gap underscores the need for a new, flexible probability distribution that not only supports an associated quantile regression model resilient to extreme values in bounded response variables but also integrates key statistical properties like SPRT, thereby enhancing its practical applicability and efficiency. Our proposed model and its quantile regression exhibit several benefits and improvements over existing models. Thus, the primary objective of this study is to introduce new bounded probability distributions and associated quantile regression models and to present essential properties, including the SPRT.

The main advantage of the new bounded Weibull distribution lies in its flexibility, unimodal nature, and the bathtub-shaped probability density function (PDF) and hazard rate function, featuring several key properties not found in other unit interval bounded distributions. For instance, the new bounded Weibull distribution has only two parameters. It provides closed-form expressions for the CDF and quantile functions, with its PDF being expressible as linear densities of the beta distribution. This feature enables us to introduce a new bounded quantile regression model as a viable alternative to the widely used beta [13] and Kumaraswamy regression [3] models. Furthermore, the inclusion of the SPRT, absent in other bounded data models, is a distinctive feature of our proposed approach. SPRT enables real-time hypothesis testing, allowing decisions to be made during data collection instead of waiting for the complete datasets, thus enhancing efficiency and practicality [28].

After introducing the PDF, CDF, quantile function, and several essential properties of the new bounded unit-Weibull distribution, we proceeded to validate it and its associated quantile regression through simulation studies. Furthermore, we assessed its applicability by applying it to two real datasets. One data set originates from a managerial cost-effectiveness study conducted by [30]. This study measures the variable firm cost that evaluates cost effectiveness, representing the total property and casualty premiums along with uninsured losses as a percentage. The variable is bounded within the range of 0 and 1 and exhibits outliers (see Fig 1). Another data set pertains to student achievement. Data on student achievement and associated predictor variables for OECD countries and other nations are available on the OECD website. Numerous studies have analyzed educational attainment in OECD countries using various regression models [3133]. Outliers may manifest in both datasets. Traditional regression approaches may violate critical assumptions, and attempts to rectify these assumptions may infringe upon the bounded nature of the response variable. Although beta regression may offer advantages over classical regression methods, it typically models the mean as the response variable, rendering it susceptible to outliers and potentially less robust in such scenarios. In addition, Kumaraswamy quantile regression is also not robust to extreme values, particularly when the data distribution exhibits unimodal behavior [26]. Therefore, we used our proposed quantile regression model in both aforementioned data scenarios and compared its performance to beta regression, Kumaraswamy quantile regression, and several other similar quantile regressions.

thumbnail
Fig 1. Different curves of PDF and CDF of NBUW distribution.

https://doi.org/10.1371/journal.pone.0323888.g001

The paper is organized as follows. In Section New Bounded Unit Weibull (NBUW) model, we formulate a new unit distribution. Section Some properties of NBUW model examines some properties of the new model. In Section NBUW quantile regression model, we present the new parameterized model and its quantile regression model. We conducted an extensive Monte Carlo simulation study to evaluate the performance of the emphasized estimators in Section Simulation. Then, we present the application of the regression model using two real-world data examples to illustrate the usefulness of our model in modeling real-world data in Section Application. Finally, conclusions are given in Section Conclusion.

New Bounded Unit Weibull (NBUW) model

The unit Weibull distribution was initially introduced by modifying the PDF and CDF of the Weibull distribution through the transformation Y = eX [34]. In contrast, this research uses a different transformation technique as outlined in [19], where the transformation is applied to transform the Weibull to Weibull (0,1] distribution. Here, Y follows a Weibull distribution with its corresponding PDF and CDF.

(1)(2)

By transforming Eqs (1) and (2), we have defined the new bounded Weibull model whose CDF and PDF are presented as follows.

(3)(4)

Various shapes of the CDF and PDF of the NBUW distribution are displayed in Fig 1. It is observed that the PDF can exhibit bathtub, right-skewed, and unimodal shapes.

Quantile function: The quantile function Qq of the NBUW model can be presented as

(5)

Hazard rate function (HRF): To model the quality or survival rate of an item or object in fields such as engineering or environmental studies, the HRF can be utilized. The HRF for the NBUW distribution is given by:

(6)

Some properties of NBUW model

0.1. Linear form of NBUW distribution

Using the Binomial and exponential expansions of the PDF defined in Eq (4), we can express it into a linear form of densities of beta distribution (first kind) as

(7)

where h(x) is the PDF of the beta distribution (first kind) with parameters and and .

0.2. Moments of NBUW distribution

The moment of about the origin can be calculated as

(8)

where x)b−1dx is the beta function. Using Eq (8), we can compute the mean and variance of the NBUW distribution as follows.

and the second moment is

Hence, the variance of X of the NBUW can be calculated as

Using the above expressions for mean and variance, we have presented the graphical view of the mean and variance for different values of the shape parameter in Fig 2.

thumbnail
Fig 2. Mean and variance plots of NBUW model for different values of the shape parameter .

https://doi.org/10.1371/journal.pone.0323888.g002

Moment Generating Function (MGF)

Using the expression for moments Eq (8), we can derive the MGF of the NBUW model as

Entropy

Entropy is an important statistical tool to obtain the randomness associated with an event or a system. There are so many types of entropies, here for the NBUW model, we have presented two types of entropies as follows.

Renyi entropy

The Renyi entropy can be defined as

(9)

For this we first need to find , where f(x) is the PDF defined in Eq (4) and can be obtained as

(10)

where Now substituting the Eq (10) in Eq (9), we get

(11)

q-entropy

Similarly, q-entropy for the NBUW distribution can be calculated using the Renyi entropy (11) as

Sequential Probability Ratio Test (SPRT)

Let X have a where unknown shape parameter where and is known scale parameter (say). Now we have the SPRT for testing the following hypothesis. Null hypothesis against Alternative hypothesis Then the sequential likelihood ratio statistic is calculated as where

Hence, the SPRT statistic can be computed as

For the SPRT test, we can decide the acceptance or rejection of H0 as follows.

  1. Accept H0 if
  2. Reject H0 if
  3. Continue sampling to examine an additional observation if

Here, A and B are boundary points and can be calculated as

Average Sample Number (ASN) function.

If the same SPRT is carried out repeatedly many times, then we shall get different values of the random variable N, the number of observations needed to decide on the SPRT. The average value of N to reach a decision is called ASN, and it is denoted by E(N). Since ASN depends upon the parameter to be tested, it is also called the ASN function and is denoted by for this study. Consider an SPRT of strength and boundary points (A,B) for testing against Let be a sequence of i.i.d. observations sampled from for the SPRT, then we can obtain the ASN as

where is the operating characteristic (OC) function of SPRT and it can be obtained under H0 as

The ASN required to terminate the SPRT under H0 is given by

provided that Similarly, under H1 is given by

provided that

Simulation study

In this section, we employed Monte Carlo simulations to examine the finite-sample properties of Maximum Likelihood Estimates (MLEs) and asymptotic confidence intervals for parameters in the NBUW distribution. We evaluated the simulations by estimating bias (B), mean squared error (M), and confidence interval (CI) for the estimated parameters. We chose sample sizes of , encompassing various combinations of the parameters and . A fixed number of Monte Carlo replications were performed, with N = 1000. All simulations were carried out using R software, utilizing the maxLik package [35] with the BFGS algorithm to compute the MLEs of and . The summarized results of these simulations are presented in Tables 1, 2, 3, 4, 5, and 6.

thumbnail
Table 1. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t001

thumbnail
Table 2. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t002

thumbnail
Table 3. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t003

thumbnail
Table 4. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t004

thumbnail
Table 5. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t005

thumbnail
Table 6. Bias, MSE, and CI for MLE estimates .

https://doi.org/10.1371/journal.pone.0323888.t006

NBUW distribution based on its quantiles

Let and solve it for shape parameter we get

Now we have redefined the PDF and CDF of the NBUW model by re-parameterizing, and can be expressed as

(12)(13)

where q represents the quantile parameter and it is supposed to be known, , , and We have displayed the various shapes of Eq (12) in Fig (3). Quantile function based on parameterized CDF is given by;

thumbnail
Fig 3. Different shapes of PDF of the re-parameterized NBUW model.

https://doi.org/10.1371/journal.pone.0323888.g003

(14)

where u has a uniform distribution over the interval (0,1). This quantile function is used to generate random samples from the NBUW distribution for specified quantiles.

NBUW quantile regression model

Regression analysis is one of the most widely used statistical tools for understanding the relationship between a response variable and a set of covariates. The choice of regression model, however, often depends on the nature of the response variable. When the response variable is continuous and bounded within the unit interval, beta regression is a commonly employed approach as it models the conditional mean of the response variable effectively.

However, in situations where the response variable exhibits skewness or the presence of outliers, methods based on conditional means can be less robust. In such cases, quantile regression offers a more flexible alternative by directly modeling conditional quantiles of the response variable. Among unit interval quantile regression methods, the Kumaraswamy quantile regression model [3] is a popular choice. There is a constant need to improve existing models, either by refining their structure or introducing alternative approaches.

In this study, we propose a novel bounded unit Weibull quantile regression model that enhances the flexibility and applicability of unit interval quantile regression methods. Based on the density function described in Eq (12), the quantile regression model is outlined below.

Let such that yi is an observation of for with unknown parameters and . The regression model based on quantile is defined as;

(15)

where is the ith design vector of predictors corresponding to the vector of regression coefficient . The function serves as the link function utilized to connect the linear predictors to the conditional quantile of the outcome variable. Further elaboration can be found in [36]. For example, when the parameter q = 0.5, quantile regression establishes the relationship between predictors and the median of the outcome variable. The logit link function is utilized to link the linear predictors to the conditional quantile of the response variable.

Parameter estimation

The parameter estimation of the unit exponential quantile regression is determined using the maximum likelihood estimation method. After simplifying the regression equation presented in Eq (14), the following expression (see, Eq (16) for is obtained:

(16)

The log-likelihood function, used to estimate the unknown parameters of the NBUW quantile regression model, is defined as follows.

(17)

where the unknown parameter vector is denoted as . To find the MLEs of , denoted , we maximize the likelihood function with respect to . The nonlinear nature of Eq (17) with respect to model parameters allows for direct maximization using software such as Matlab, R, Mathematica, Python, etc.

In particular, when q = 0.5, it represents modeling the conditional median. For maximization of Eq (17), we employ the maxLik package [35] of the R software. This package not only maximizes the likelihood function but also provides numerically computed asymptotic standard errors using the observed information matrix.

Residual analysis

Residual analysis is required to evaluate the fit of the model. Randomized quantile residual is proposed for NBUW quantile regression. Detailed information on the randomized quantile residual can be found in [37]. The randomized quantile residual was obtained by using the following relations;

where represents the CDF of the re-parameterized NBUW model (referenced by Eq (12)), is the quantile function of the standard normal distribution, and is estimated value of the quantile of the response variable. For a properly fitted model, the distribution of randomized quantile residuals should be standard normal.

Cox-Snell residuals are another form of residuals derived from the following relationship;

Detailed information on Cox-Snell residuals can be found in [38]. When the model fits the data appropriately, these residuals should follow an exponential distribution with a scale parameter of 1.0.

Simulation

This section presents simulation studies employing five distinct simulation schemes to assess the parameter recovery of the NBUW quantile regression model. Each scheme is accompanied by the presentation of the relative percentage bias (RB) and the root mean square error (RMSE). All simulations were performed in R using the optim function. Comparable results were attained using the maxlik function. Below is the NBUW quantile regression model whose parameters were recovered using simulation:

(18)

where true value of parameters and , and shape parameter are given below;

  1. Simulation scheme 1: , and
  2. Simulation scheme 2: , and
  3. Simulation scheme 3: , and
  4. Simulation scheme 4: , and
  5. Simulation scheme 5: , and

For each scheme, covariates were generated from a normal distribution with varying sample sizes: , and 300, respectively. For each scheme, the Monte Carlo simulation was repeated 10,000 times. Each simulation scheme was repeated for and q = 0.9.

The simulation results for q = 0.5 were presented in Table 7. Relative absolute bias and RMSE were presented in Figs 4, 5, and 6. Relative bias and RMSE decrease as the sample size increases. For a sample size of 300, both bias and RMSE are close to 0, indicating an outstanding recovery of the parameters. Although the shape parameter exhibits a large relative bias for small sample sizes, it reduces significantly as the sample size increases. Likewise simulation results for q = 0.1 and q = 0.9 was also presented in Table 7 and Figs 7, 8, 9, 10 11, 12, respectively. All results consistently demonstrated satisfactory recovery of the proposed parameters at quantiles q = 0.1 and q = 0.9, highlighting the model’s robustness and reliability.

thumbnail
Fig 4. Relative absolute bias of intercept and predictor for q = 0.5.

https://doi.org/10.1371/journal.pone.0323888.g004

thumbnail
Fig 5. RMSE of regression coefficients of intercept and predictor for q = 0.5.

https://doi.org/10.1371/journal.pone.0323888.g005

thumbnail
Fig 6. Relative absolute bias and RMSE of estimated shape parameter for q = 0.5.

https://doi.org/10.1371/journal.pone.0323888.g006

thumbnail
Fig 7. Relative absolute bias of intercept and predictor for q = 0.1.

https://doi.org/10.1371/journal.pone.0323888.g007

thumbnail
Fig 8. RMSE of regression coefficients of intercept and predictor for q = 0.1.

https://doi.org/10.1371/journal.pone.0323888.g008

thumbnail
Fig 9. Relative absolute bias and RMSE of estimated shape parameter for q = 0.1.

https://doi.org/10.1371/journal.pone.0323888.g009

thumbnail
Fig 10. Relative absolute bias of intercept and predictor for q = 0.9.

https://doi.org/10.1371/journal.pone.0323888.g010

thumbnail
Fig 11. RMSE of regression coefficients of intercept and predictor for q = 0.9

https://doi.org/10.1371/journal.pone.0323888.g011

Application

Educational attainment data

The educational attainment data, sourced from OECD.stats as cited by [21], encompass educational attainment percentages from 35 OECD countries, supplemented by four non-OECD countries. The dependent variable, the percentage of educational attainment, falls within the continuous range of 0 to 1. Educational attainment refers to the proportion of adults aged 25 to 64 years of age holding at least an upper secondary degree within the same age group population. In this examination, the covariates under consideration are life satisfaction and homicide rates. The distribution of the response variable, student attainment, skewed left with noticeable outliers (refer to Fig 13). Consequently, quantile regression, such as the model presented herein, proves suitable for modeling such data distributions. Therefore, the interplay between student attainment and predictors- namely, life satisfaction and homicide rates were explored utilizing the proposed new bounded unit Weibull quantile regression model.

We compared three popular regression models with the NBUW quantile regression model. These models include beta regression [13], Kumaraswamy quantile regression [3], and unit-Weibull quantile regression [39]. In this applied example, the following regression model was used.

(19)

where is the median for Kumaraswamy, unit-Weibull, and NBUW, and the mean for the beta regression model. z1 is life satisfaction, and z2 is homicide. Parameter estimation was conducted using the MLE method. The unitquantreg package in R [40] was utilized for estimating the parameters of the Kumaraswamy and unit-Weibull regression model. For the beta regression model, parameter estimation was facilitated by the betareg package [41]. Custom code employing the maxLik function from the maxLik package in R was employed for estimating the parameters of the NBUW regression model. Model performance assessment comprised generating QQ plots of randomized quantile residuals and applying the Kolmogorov-Smirnov test to evaluate model suitability. Additionally, comparative analysis across different models was conducted using the AIC and BIC.

The results of the parameter estimation are presented in Table 8. The NBUW regression model exhibited superior performance based on the AIC and BIC metrics compared to the Kumaraswamy and beta regression models. In addition, the AIC and BIC values of the Unit-Weibull and NBUW models were found to be comparable. The K-S test results affirmed satisfactory model fit across all models. Fig 14 illustrates a QQ plot of quantile residuals. It also confirms the superior performance of the NBUW quantile regression.

thumbnail
Table 8. Parameter estimation and model comparison statistics.

https://doi.org/10.1371/journal.pone.0323888.t008

Furthermore, the effectiveness of the proposed quantile regression model was evaluated against the unit logistic, unit Johnson, and unit Burr XII models. The AIC and BIC values for the Unit Burr XII model were -66.17 and -59.31 respectively, for the unit logistic quantile regression model they were -64.34 and -57.49, and for the unit Johnson quantile regression model, they were -65.54 and -58.68. In contrast, our proposed model yielded AIC and BIC values of -66.36 and -59.5, respectively. It also highlights the superiority of the NBUW model over existing models in the literature.

The results of the NBUW quantile regression analysis indicate that the percentage of student achievement is significantly associated with both life satisfaction and homicide. The results further show that student attainment is positively associated with life satisfaction and negatively associated with homicide. The results of both the UW and NBUW quantile regressions exhibit consistency. However, these findings diverge from those of the beta and Kumaraswamy regressions, where only homicide demonstrates a statistically significant negative relationship with student attainment. It is intriguing to observe that the fitting of the regression model, the direction of the association, and the significance of the predictor variables with the response variable all depend on the choice of distribution. In the current data scenario, the superior fit of NBUW quantile regression over other competing models is confirmed by metrics such as AIC, BIC, and the QQ plot of randomized residuals.

Hence, it is crucial to select the distribution that best fits the data at hand. In the presence of outliers, opting for quantile regression is prudent. Furthermore, our proposed model proves to be a favorable option for practitioners dealing with bounded response variables containing outliers.

Risk assessment data

The data utilized in this study originate from the managerial cost-effectiveness investigation conducted by [30]. Furthermore, it can be accessed through the personal web page of Prof. E. Frees at https://sites.google.com/a/wisc.edu/jed-frees/. The purpose of gathering the data was to evaluate the cost-effectiveness of the company’s management philosophy to mitigate its exposure to different property losses and accidents while considering the unique characteristics of the company. The data set consists of seven variables, with firm cost serving as the response variable and the remaining variables as covariates. Each variable is described below based on the information provided by the source:

  • Firm cost(y): This variable measures the evaluation of the effectiveness of the firm’s risk management in terms of cost. It includes the total property and casualty premiums as well as uninsured losses, expressed as a percentage.
  • Assume(z1): Denotes the per-occurrence retention amount as a percentage of total assets.
  • Cap(z2): Indicates ownership of a captive insurance entity by the firm.
  • Size log(z3): Natural logarithm of total assets.
  • Indcost(z4): Represents the firm’s industrial risk measure.
  • Central(z5): A measure indicating the significance of local managers in determining the retained risk amount.
  • Soph(z6): Indicates the degree of importance placed on utilizing analytical tools.

The response variable, firm cost, exhibits right-skewness along with the presence of outliers (see Fig 15). Consequently, our proposed quantile regression model is particularly suited to handle such data characteristics. To investigate the relationship between firm costs and the specified predictor variables, we compared three widely used regression models, beta, Kumaraswamy, and unit-Weibull, with the NBUW quantile regression model, utilizing risk assessment data. The formulation of the quantile regression model is as follows:

(20)

where represents the median for Kumaraswamy, unit-Weibull, and NBUW, and the mean for the beta regression model. are regression coefficients. The model parameters were estimated using a method similar to that outlined in the educational attainment data analysis. The performance of the model was evaluated through the plotting of half-normal quantile residual plots. The suitability of the model was evaluated using the Kolmogorov-Smirnov test, and model comparison was conducted using the AIC and BIC.

The parameter estimation results are detailed in Table 9. Based on the AIC and BIC criteria, the NBUW regression model outperformed the commonly used Kumaraswamy and beta regression models, as well as the unit-Weibull regression model. K-S test results indicated satisfactory model fits for all models except the beta regression model. The half-normal plot of quantile residuals, depicted in Fig 16, illustrates that the NBUW regression exhibits greater robustness to outliers compared to the mean-based beta regression and other competing models.

thumbnail
Fig 12. Relative absolute bias and RMSE of estimated shape parameter for q = 0.9.

https://doi.org/10.1371/journal.pone.0323888.g012

thumbnail
Fig 13. Boxplot of student attainment proportions.

https://doi.org/10.1371/journal.pone.0323888.g013

thumbnail
Fig 14. QQ Plot of randomized quantile residuals.

https://doi.org/10.1371/journal.pone.0323888.g014

thumbnail
Fig 16. Half normal plot of randomized quantile residuals.

https://doi.org/10.1371/journal.pone.0323888.g016

thumbnail
Table 9. Parameter estimation and model comparison statistics of risk assessment data.

https://doi.org/10.1371/journal.pone.0323888.t009

Furthermore, we compared our proposed quantile regression model with unit logistic quantile regression, unit Johnson quantile regression, and unit Burr XII quantile regression models. The AIC and BIC values for the Unit Burr II model were -87.02 and -68.69, respectively; for the unit logistic quantile regression model, they were -222.67 and -204.34, and for the unit Johnson quantile regression model, they were -200.18 and -181.85. In contrast, our proposed model yielded AIC and BIC values of -223.69 and -206.36, respectively.

From the results derived from NBUW quantile regression, only the logarithm of firm size and the firm’s industrial risk measure are statistically significant. The logarithm of the size of the firm is negatively associated, while the industrial risk measure is positively associated with the cost effectiveness of risk management of the firm. The remaining variables are not statistically significant. The results exhibit consistency across UW, NBUW, and beta regression models, except goodness of fit. However, in Kumaraswamy quantile regression, only the logarithm of firm size is significant, while other variables do not demonstrate statistical significance. It confirms the importance of selecting the appropriate distribution when analyzing the relationship between a bounded response variable and predictors. Several other studies have also reached conclusions similar to ours [19,21]. Furthermore, in situations such as risk assessment data where outliers are present, median-based regressions prove to be more robust. Upon examining the AIC, BIC, and residual plots, it becomes evident that our proposed NBUW distribution and its associated quantile regression are optimal choices for practitioners.

Conclusion

In summary, this research introduces a new bounded unit Weibull distribution designed specifically for the (0,1) domain, which plays a critical role in characterizing phenomena in various applied sciences. Our exploration has unveiled several intriguing properties, including different moments and their generating function, entropies, and a linear form of the beta distribution. In addition, we have devised the SPRT and ASN for the proposed model and established a new quantile regression model. A simulation study was carried out to evaluate the parameter recovery of the proposed new bounded unit Weibull distribution and the associated quantile regression model, which yielded satisfactory results for all parameters.

The proposed quantile regression model was also applied to two real datasets: one on student attainment and the other on risk assessment. For the risk assessment dataset, our proposed model outperformed the well-known beta and Kumaraswami regression models, as well as the UW regression model, based on two information criteria (AIC and BIC) and residual plots. Furthermore, the NBUW quantile regression model demonstrated superiority over log-logistic, Johnson, and Burr XII quantile regression models based on AIC and BIC. Similarly, for the education attainment dataset, the NBUW regression model outperformed the beta and Kumaraswamy regression models, while NBUW performed comparably to the UW quantile regression model. Furthermore, in the context of the educational attainment dataset, the NBUW quantile regression model exhibited superiority over the log-logistic, Johnson, and Burr XII quantile regression models based on AIC and BIC criteria. Based on the findings, when outliers are present in the dataset, such as in risk assessment data, median-based regression proves to be robust to these outliers. Specifically, NBUW quantile regression could be a favorable model choice for practitioners in such situations. Although our proposed model may not be applicable when zeros are present in the dataset, it could represent a superior option for researchers dealing with bounded data situations within the range of 0 and 1. Its superiority has been demonstrated in applied data situations presented in this paper. These findings underscore the importance of our contributions to the advancement of the statistical toolkit to analyze bounded variables in various scientific disciplines.

Limitations

Although the NBUW model has several advantages over the existing models, it is not free from limitations. Below are some of the limitations and potential directions for future research:

  1. In real-world applications, practitioners often require a model that can accommodate zeros. The NBUW model lacks this feature; therefore, future research should focus on adapting it to handle excess zeros too.
  2. NBUW distribution is univariate; therefore, future research should focus on extending it to multivariate settings.
  3. NBUW quantile regression requires independent responses and is not capable of modeling correlated outcomes. Therefore, future research should focus on extending it to mixed-effect models to properly accommodate correlated responses.

Supporting information

References

  1. 1. Kumaraswamy P. A generalized probability density function for double-bounded random processes. J Hydrol. 1980;46(1–2):79–88.
  2. 2. Koenker R, Chernozhukov V, He X, Peng L. Handbook of quantile regression. 1st ed. New York: Chapman and Hall/CRC. 2017.
  3. 3. Mitnik PA, Baek S. The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat Papers. 2012;54(1):177–92.
  4. 4. Mazucheli J, Menezes AFB, Dey S. The unit-Birnbaum-Saunders distribution with applications. Chilean J Statist. 2018;9(1):47–57.
  5. 5. Ghitany ME, Mazucheli J, Menezes AFB, Alqallaf F. The unit-inverse Gaussian distribution: a new alternative to two-parameter distributions on the unit interval. Commun Statist - Theory Methods. 2018;48(14):3423–38.
  6. 6. Gündüz S, Korkmaz MÇ. A new unit distribution based on the unbounded johnson distribution rule: the unit johnson SU distribution. Pak J Stat Oper Res. 2020;16(3):471–90.
  7. 7. Nasiru S, Chesneau C, Ocloo SK. The log-cosine-power unit distribution: a new unit distribution for proportion data analysis. Decis Analyt J. 2024;10:100397.
  8. 8. Mazucheli J, Menezes AFB, Chakraborty S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. J Appl Statist. 2018;46(4):700–14.
  9. 9. Mazucheli J, Menezes AFB, Dey S. Improved maximum-likelihood estimators for the parameters of the unit-gamma distribution. Commun Statist-Theory Methods. 2018;47(15):3767–78.
  10. 10. Fayomi A, Hassan AS, Almetwally EM. Inference and quantile regression for the unit-exponentiated Lomax distribution. PLoS One. 2023;18(7):e0288635. pmid:37463159
  11. 11. Brehm J, Gates S. Donut shops and speed traps: evaluating models of supervision on police behavior. Am J Politic Sci. 1993;37(2):555.
  12. 12. Paolino P. Maximum likelihood estimation of models with beta-distributed dependent variables. Polit anal. 2001;9(4):325–46.
  13. 13. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. J Appl Statist. 2004;31(7):799–815.
  14. 14. Altun E. The log-weighted exponential regression model: alternative to the beta regression model. Commun Statist - Theory Methods. 2019;50(10):2306–21.
  15. 15. Altun E, El-Morshedy M, Eliwa MS. A new regression model for bounded response variable: an alternative to the beta and unit-Lindley regression models. PLoS One. 2021;16(1):e0245627. pmid:33481884
  16. 16. Mazucheli J, Alves B, Menezes AFB, Leiva V. An overview on parametric quantile regression models and their computational implementation with applications to biomedical problems including COVID-19 data. Comput Methods Programs Biomed. 2022;221:106816. pmid:35580528
  17. 17. Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46(1):33.
  18. 18. Buchinsky M. Recent advances in quantile regression models: a practical guideline for empirical research. J Hum Resour. 1998;33(1):88.
  19. 19. Mazucheli J, Menezes AFB, Fernandes LB, de Oliveira RP, Ghitany ME. The unit-Weibull distribution as an alternative to the Kumaraswamy distribution for the modeling of quantiles conditional on covariates. J Appl Stat. 2019;47(6):954–74. pmid:35706917
  20. 20. Korkmaz MÇ, Chesneau C. On the unit Burr-XII distribution with the quantile regression modeling and applications. Comp Appl Math. 2021;40(1).
  21. 21. Korkmaz MÇ, Korkmaz ZS. The unit log-log distribution: a new unit distribution with alternative quantile regression modeling and educational measurements applications. J Appl Stat. 2021;50(4):889–908. pmid:36925910
  22. 22. Bakouch HS, Nik AS, Asgharzadeh A, Salinas HS. A flexible probability model for proportion data: unit-half-normal distribution. Commun Statist: Case Stud Data Anal Appl. 2021;7(2):271–88.
  23. 23. Ribeiro-Reis LD. Unit log-logistic distribution and unit log-logistic regression model. J Indian Soc Probab Stat. 2021;22(2):375–88.
  24. 24. Ribeiro TF, Peña-Ramírez FA, Guerra RR, Cordeiro GM. Another unit Burr XII quantile regression model based on the different reparameterization applied to dropout in Brazilian undergraduate courses. PLoS One. 2022;17(11):e0276695. pmid:36327245
  25. 25. Korkmaz MÇ, Altun E, Chesneau C, Yousof HM. On the Unit-Chen distribution with associated quantile regression and applications. Mathematica Slovaca. 2022;72(3):765–86.
  26. 26. Castro M, Azevedo C, Nobre J. A robust quantile regression for bounded variables based on the Kumaraswamy Rectangular distribution. Stat Comput. 2024;34(2):74.
  27. 27. Bashir S, Masood B, Al-Essa LA, Sanaullah A, Saleem I. Properties, quantile regression, and application of bounded exponentiated Weibull distribution to COVID-19 data of mortality and survival rates. Sci Rep. 2024;14(1):14353. pmid:38906935
  28. 28. Rasay H, Alinezhad E. Developing an adaptable sequential probability ratio test applicable for lifetime analysis of different continuous distributions. Qual Technol Quant Manag. 2022;19(4):511–30.
  29. 29. Stefan AM, Schönbrodt FD, Evans NJ, Wagenmakers E-J. Efficiency in sequential testing: comparing the sequential probability ratio test and the sequential Bayes factor test. Behav Res Methods. 2022;54(6):3100–17. pmid:35233752
  30. 30. Schmit JT, Roth K. Cost effectiveness of risk management practices. J Risk Insur. 1990;57(3):455.
  31. 31. Hanushek EA, Woessmann L. How much do educational outcomes matter in OECD countries?. Econ Policy. 2011;26(67):427–91.
  32. 32. Korkmaz MC, Chesneau C, Korkmaz ZS. Transmuted unit Rayleigh quantile regression model: Alternative to beta and Kumaraswamy quantile regression models. UPB Sci Bull Ser A: Appl Math Phys. 2021;83:149–58.
  33. 33. Korkmaz MÇ, Chesneau C, Korkmaz ZS. A new alternative quantile regression model for the bounded response with educational measurements applications of OECD countries. J Appl Stat. 2021;50(1):131–54. pmid:36530782
  34. 34. Mazucheli J, Menezes AFB, Ghitany ME. The unit-Weibull distribution and associated inference. J Appl Statist. 2018;13:1–22.
  35. 35. Henningsen A, Toomet O. maxLik: a package for maximum likelihood estimation in R. Comput Stat. 2010;26(3):443–58.
  36. 36. Yu K, Lu Z, Stander J. Quantile regression: applications and current research areas. J Royal Statistical Soc D. 2003;52(3):331–50.
  37. 37. Dunn PK, Smyth GK. Randomized quantile residuals. J Comput Graph Statist. 1996;5(3):236–44.
  38. 38. Cox DR, Snell EJ. A general definition of residuals. J Roy Statist Soc Ser B: Statist Methodol. 1968;30(2):248–65.
  39. 39. Mazucheli J, Bapat SR, Menezes AFB. A new one-parameter unit-Lindley distribution. Chilean J Statist. 2020;11(1).
  40. 40. Menezes AFB, Mazucheli J. Unitquantreg: Parametric quantile regression models for bounded data. 2021. https://andrmenezes.github.io/unitquantreg/
  41. 41. Zeileis A, Cribari-Neto F, Gruen B, Kosmidis I, Simas AB, Rocha AV. betareg: beta regression. 2020.