Skip to main content
Advertisement
Browse Subject Areas
?

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here.

  • Loading metrics

Modeling physics data with the generalized Marshall-Olkin Kumaraswamy distribution

  • Selim Gündüz ,

    Contributed equally to this work with: Selim Gündüz, Egemen Ozkan

    Roles Data curation, Funding acquisition, Formal analysis, Investigation, Methodology, Resources, Writing – original draft, Writing – review & editing

    Affiliation Department of Business Administration, Faculty of Business, Adana Alparslan Türkeş Science and Technology University, Adana, Türkiye

  • Egemen Ozkan ,

    Contributed equally to this work with: Selim Gündüz, Egemen Ozkan

    Roles Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    eozkan@yildiz.edu.tr

    Affiliation Department of Statistics, Yildiz Technical University, İstanbul, Türkiye

  • Kadir Karakaya

    Roles Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Project administration, Resources, Software, Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing

    Affiliation Department of Statistics, Selcuk University, Konya, Türkiye

Abstract

In this paper, a new distribution defined on a bounded interval is introduced, and its main properties, such as moments, Lorenz, and Bonferroni curves, are examined. The unknown parameters of the proposed distribution are estimated using several techniques, including maximum likelihood, least squares, weighted least squares, Anderson–Darling, Cramér–von Mises, maximum product spacing, right-tail Anderson–Darling, minimum spacing absolute distance, and minimum spacing absolute-log distance methods. The performance of these estimation methods is evaluated through Monte Carlo simulations under different parameter scenarios. Additionally, a new quantile regression model based on the proposed distribution is developed, offering greater flexibility for modeling bounded dependent variables. The capability of the proposed distribution to represent various hazard rate shapes, such as inverted-bathtub, bathtub, increasing, decreasing, constant, and increasing–decreasing–increasing, is to demonstrate its applicability and flexibility in real data analyses, particularly in cases where traditional models may underperform. Four different real-data applications from the fields of medicine, politics, physics, and education are presented to demonstrate that the proposed model is used as a strong alternative to the well-known Beta and Kumaraswamy distributions in modeling bounded data. The study provides a robust statistical tool for the analysis of bounded data, with potential applications in datasets related to medicine, politics, physics, and educational sciences.

1 Introduction

Statistical data analysis is fundamental to empirical research and applied sciences, driving progress across a wide range of fields such as econometrics, biomedical engineering, and environmental sustainability. The increasing complexity and size of datasets in these fields have driven the development of statistical methods for extracting reliable and interpretable insights. The lifetime statistical distributions such as exponentiated exponential [1], cubic rank transmuted generalized Gompertz [2], transmuted lower record type power function [3], Gumbel Marshall-Olkin Lomax [4], extended odd Weibull exponential [5], Odd-Lindley half logistic [6], Marshall-Olkin kappa [7], Kumaraswamy generalized kappa [8], and exponentiated additive Weibull [9] provided flexibility in modeling lifetime data. Despite these methodological advances, a significant gap persists in the statistical literature: there is a lack of distributions specifically designed for bounded data, which are commonly encountered in domains such as clinical trial results, financial ratios, mortality ratios, and ecological measurements. Although the beta and Kumaraswamy [10] distributions are widely used for modeling bounded data, they may not always be flexible enough to model complex data structures. In this context, new unit distribution models are derived from existing lifetime distributions through various transformation methods. Some of the recently proposed unit interval distributions based on existing distributions are as follows. The unit Weibull (UW) [11], unit Burr XII (UBXII) [12], unit log-log [13], unit Birnbaum-Saunders [14], unit inverse Gaussian [15], log-weighted exponential [16], and unit Chen [17] distributions are derived by using the transformation , the unit Lindley (UL) [18] distribution is obtained by the transformation . The unit improved second-degree Lindley [19], logit Slash [20], arcsecant hyperbolic normal [21], and transmuted unit Rayleigh (UR) [22] distributions are among the unit distributions obtained through other transformation methods.

In addition to generating new unit distributions, another important aspect of bounded data modeling is regression analysis. Regression analysis is a fundamental method that examines the relationship between the dependent variable and the independent variables, allowing for inference and predictive modeling beyond descriptive statistics. However, traditional regression models often struggle with data confined within specific intervals because standard distributions such as Gaussian do not adequately handle boundary constraints. This limitation highlights the essential requirement for new bounded probability distributions and corresponding regression frameworks. Accordingly, new regression approaches have been developed to model bounded data. These approaches provide a more flexible modeling framework compared to classical regression methods by considering the fact that the dependent variable is within the (0,1) interval. In this regard, the beta [23], Kumaraswamy (KW) [24], UBXII [12], log-extended exponential geometric [25], unit Burr-Hatke [26], and unit log-log [13] regression models are introduced as alternatives to classical regression models.

This study aims to propose a new and flexible distribution as an alternative to the well-known bounded models, such as the KW, beta, UL, UBXII, UW, and UR distributions. Although the existing unit distributions in the literature are used to model various types of bounded data, they are generally limited to representing only particular hazard rate shapes. Moreover, the proposed distribution provides substantial flexibility in modeling various datasets by representing diverse hazard rate shapes, including inverse-bathtub, bathtub, increasing, decreasing, constant, and increasing–decreasing–increasing. This regression model provides flexible modeling of dependent variables defined on the bounded data and extends data analysis by incorporating covariate effects. Finally, the efficiency of the proposed new distribution, compared to existing bounded distributions, is demonstrated through real data applications from the fields of political science, physics, medicine, and educational sciences. The findings confirm that the developed model provides a robust and innovative alternative to the existing bounded distributions and regression models in the literature.

The rest of the paper is structured as follows: In Sect 2, the generalized Marshall-Olkin Kumaraswamy (GMOKW) distribution is introduced, and its distributional properties, including moments, Lorenz, and Bonferroni curves, are investigated. Sect 3 provides an overview of nine different estimation methods. In Sect 4, a Monte Carlo simulation study is conducted to evaluate the performance of the estimators. In Sect 5, the new regression model is defined, and the unknown regression parameters are estimated via maximum likelihood methodology. In Sect 6, real data applications are presented to compare the proposed distribution and regression model with existing models by using selection criteria. In the final section, the study is concluded and key findings are summarized.

2 The generalized Marshall-Olkin Kumaraswamy distribution

In this section, the new bounded distribution is introduced. Firstly, the generalized Marshall-Olkin family introduced by [27] is described.The cumulative distribution function (cdf) and the probability density function (pdf) of the generalized Marshall-Olkin distribution family are given respectively, as

(1)

and

(2)

where and are shape parameters and are demonstrated the pdf and cdf functions of the baseline distribution, respectively. Assume that the random variable Y follows the KW distribution proposed by [10], with its cdf and pdf are given by,

(3)

and

(4)

where are shape parameters and 0 < y < 1. By substituting and into Eqs (1) and (2), we obtain the GMOKW. The cdf and pdf functions for the new distribution can be written as follows:

(5)

and

(6)

where and 0 < y < 1. In Fig 1, the pdf plots for different combinations of parameter values of the GMOKW are presented. Some shapes show a unimodal structure, initially increasing and then decreasing, while others demonstrate a monotonically decreasing or J-shaped increasing structure. In this case, the pdf plots shows that the GMOKW distribution has flexible shapes for different combinations of parameters.

The hazard rate function (hrf) for the new distribution obtained is written:

Fig 2 provides hrf plots for different combinations of parameters. Since the GMOKW distribution has hrfs with the shape of an upside-down bathtub, bathtub-shaped, increasing, decreasing, constant, and increasing-decreasing-increasing as depicted in Fig 2 for different parameter values, it can be regarded as a flexible distribution for modeling.

2.1 Series expansion and analytical representation for GMOKW distribution

For any y and α satisfying , the pdf of the GMOKW can be expressed as following series expansion:

where , , and are pdf and cdf of KW distribution, as given in Eqs (3) and (4).

2.2 Moments

If the random variable Y has a GMOKW distribution, the rth moment is obtained by,

where Then rth moment is easily computed by

where is defined as gamma function. The skewness (S), and kurtosis (K) coefficients for the GMOKW distribution are obtained by using the first four moments as follows:

where

Table 1 provides the variance, kurtosis, and skewness coefficients for various and γ values. According to Table 1, it is clearly seen that as the and γ values increase, the expected value also increases. For small parameter values, the distribution exhibits right skewness and leptokurtosis, whereas larger γ values result in a transition towards left skewness and platykurtosis. For larger values, the skewness and kurtosis of the distribution increases. These results demonstrate that the GMOKW distribution can model different data structures through its shape parameters. This indicates that the distribution has the capability to be applied to a wide range of real-world datasets. Additionally, the pdf plots in Fig 1 are consistent with these results, and the graphical evidence supports the ability of the distribution to model different types of datasets.

thumbnail
Table 1. Expected value, variance. S and K for different parameter.

https://doi.org/10.1371/journal.pone.0329568.t001

The rth incomplete moment of the GMOKW distribution is given by

where is an incomplete beta function defined as . The detailed information about the incomplete beta function can be found in [28].

2.3 Lorenz and Bonferroni curve

The Lorenz and Bonferroni curves are used to analyze data in economics, medical sciences, business and reliability. Accordingly, Lorenz and Bonferroni curves for the GMOKW distribution are given by respectively,

and

where p indicates the quantil function and μ is the first moment.

2.4 Quantile function

The quantile function for the GMOKW distribution can be obtained by

(7)

where . This function is also used in simulation for generating data from the GMOKW distribution.

3 Parameter estimation

In this section, nine estimators are used to estimate the parameters of the GMOKW distribution, including maximum likelihood, least squares, weighted least squares, Anderson-Darling, Cramer-von Mises, maximum product spacing, right-tail Anderson-Darling, minimum spacing absolute distance, and minimum spacing absolute-log distance. Let denote a sample taken from the GMOKW distribution and represent the order statistics corresponding to this sample. Moreover, let Yi denote the observerd values of the order statistics corresponding to for . Based on this information, the estimators for the GMOKW distribution are derived as follows.

3.1 Method of maximum likelihood

The likelihood function for the GMOKW distribution is given by

(8)

By taking the logarithm of Eq (8), the log-likelihood function is obtained as follows.

(9)

The maximum likelihood estimators (MLEs) of the parameters are obtained by solving the following system of equations.

3.2 Method of least squares

This method is utilized for parameter estimation in a mathematical model by reducing the sum of squared differences between the observed data and the predicted outcomes. The following equation can be minimized to find the least square estimator (LSE) of the parameter:

(10)

The minimization of the function presented in Eq (10) is obtained by taking the partial derivatives with respect to each parameter and solving the resulting system of equations:

where .

3.3 Methods of weighted least squares

The weighted least squares estimator (WLSE) are a minimization-based estimation method introduced as an alternative to LSE. The fundamental purpose of WLSE is to assign different weights to each observation, rather than giving equal weights to observations as in LSE, in order to estimate the parameters of a probability model. Then the WLSE of the parameter of the GMOKW distribution is obtained by minimizing the following equation.

(11)

where . The function given in Eq (11) is minimized by solving the system of equations obtained by taking the partial derivatives with respect to the parameters and equating them to zero. The partial derivatives of the function with respect to the parameters are given by:

where is defined.

3.4 Method of Anderson-Darling

The Anderson-Darling estimator (ADE) is a method introduced to provide more accurate estimators in the presence of deviations from the assumed model. Accordingly, the ADE of the parameter of the GMOKW distribution is obtained by minimizing the following function.

(12)

where is defined. Therefore, the first derivatives of the function presented in Eq (12) are obtained with respect to the unknown parameters and equated to zero, resulting in the following system of nonlinear equations.

3.5 Method of Cramer-von Mises

Cramer-von Mises estimator (CvME) method is a minimization-based estimation method designed to determine the parameter that ensures the best fit between the observed data and the theoretical distribution. Accordingly, the CvME for the GMOKW distribution is obtained by minimizing the following function:

(13)

The minimization of the function given in Eq (13) is obtained by solving the system of equations obtained by taking the partial derivatives with respect to the parameters and setting them equal to zero. The partial derivatives of the function with respect to the parameters are given by

3.6 Method of maximum product spacing

The maximum product spacing estimator (MPSE) method is a method proposed by [29] as an alternative to the MLE. This method is based on maximizing the product of differences based on ordered observations. Accordingly, the MPSE of the parameter of the GMOKW distribution is obtained by maximizing the following function.

where and and F(.) denote the cdf given in Eq (5). Accordingly, the MPSEs of the parameters are obtained by solving the following equations.

The terms , , , and are respectively defined as follows.

(14)(15)(16)(17)

3.7 Method of right tail Anderson-Darling

This estimation method aims to minimize the Anderson-Darling statistic in the right tail to estimate the parameters of a specific distribution. Accordingly, the right tail Anderson-Darling estimator (RTADE) for the GMOKW distribution is obtained by minimizing the following objective function.

(18)

where F(.) represents cdf defined in Eq (5). The RTADEs of the parameters α, λ, , and γ are obtained by solving the following system of equations.

where and are given in Eqs (14)–(17).

3.8 Method of minimum spacing absolute distance

The minimum spacing absolute distance estimator (MSADE) is obtained by minimizing the absolute difference between the ordered observations. Thus, the MSADE for the GMOKW distribution is obtained by minimizing the following function.

(19)

where F(.) denote the cdf given in Eq (5). The MSADEs of the parameters α, λ, , and γ are obtained by minimizing the function in Eq (19) with respect to these parameters. This requires solving the following system of equations:

where and demonstrate the sign function defined as

Moreover the functions and are given in Eqs (14)–(17).

3.9 Method of minimum spacing absolute-log distance

The minimum spacing absolute-log distance estimator (MSALDE) is obtained by minimizing the absolute logarithm difference between the ordered observations. Thus, the MSALDE for the GMOKW distribution is obtained by minimizing the following function:

(20)

where F(.) demosrate the cdf given in Eq (5). The MSALDEs of the parameters α, λ, , and γ are obtained by minimizing the function in Eq (20) with respect to these parameters.

where and demonstrate the sign function defined as

Moreover, the functions and are given in Eqs (14)–(17).

Since the systems of equations derived from all minimization and maximization process involve nonlinear expressions, the parameter estimators cannot be obtained in closed form. Therefore, numerical optimization algorithms such as BFGS, Nelder–Mead, CG, and L-BFGS-B implemented in the R software are used to obtain the parameter estimators.

4 Simulation study

In this section, a Monte Carlo simulation study is conducted to evaluate the performance of the parameter estimators for the GMOKW distribution based on MLE, LSE, WLSE, ADE, CvM, MPSE, RTADE,MSADE, and MSALDE methods.The simulation study is carried out following these steps

  • The simulation study is conducted with N = 1000 replications, and the parameters values are settings , and .
  • Random numbers are generated from the GMOKW distribution for the specified parameter values with size of and 1000.
  • estimates are obtained for each parameter using the generated samples for .
  • The performance of the estimators is evaluated in terms of bias and mean squared errors (MSEs) using the estimated values and the true parameter value . The bias and MSEs are obtained by calculating the following equations.

The Monte Carlo simulation steps for parameter estimation are summarized in Table 2. Simulation results are reported in Tables 35 and their graphical representation are given in Figs 38. The simulation results presented in Tables 3–5 can be summarized as follows.

thumbnail
Fig 8. Graphical representation of MSE values in Table 4.

https://doi.org/10.1371/journal.pone.0329568.g008

thumbnail
Table 2. Monte Carlo simulation steps for parameter estimation.

https://doi.org/10.1371/journal.pone.0329568.t002

thumbnail
Table 3. The bias and MSEs of all estimators for unknown parameters in Case1.

https://doi.org/10.1371/journal.pone.0329568.t003

thumbnail
Table 4. The bias and MSEs of all estimators for unknown parameters in Case2.

https://doi.org/10.1371/journal.pone.0329568.t004

thumbnail
Table 5. The bias and MSEs of all estimators for unknown parameters in Case3.

https://doi.org/10.1371/journal.pone.0329568.t005

  • Based on the results presented in Table 3 and their corresponding graphical representations, it is clearly seen that as the sample size increases, the bias and MSE values decrease for all parameter estimation methods. For smaller sample sizes , the higher bias and MSE values obtained by the LSE and RTADE methods indicate that these methods produce inconsistent results under small sample sizes. On the other hand, the MPSE and MLE methods outperform the other estimation methods with lower bias and MSE measures across both small and large sample sizes. The WLSE, ADE, and CvME methods provide results comparable to these methods, whereas the LSE method demonstrates higher bias and MSE values.
  • According to the results and graphical representations in Table 4, both bias and MSE values decrease as the sample size increases for all estimators. For small sample sizes (n = 50, 100, 250), the MLE and MPSE estimators generally provide the best results with the lowest bias and MSE values, while the WLSE method provides competitive estimates. In addition, the MLE and MPSE clearly outperform the other estimators in terms of both bias and MSE as the sample size increases. On the other hand, the MSALDE and RTADE methods exhibit higher bias and MSE values compared to the others and therefore tend to underperform in most sample sizes.
  • When examining Table 5 and Figs 7 and 8, it is observed that the RTADE and MSALDE methods provide inconsistent results in terms of bias and MSE for small sample sizes . In contrast, the MPSE and MLE methods provide more reliable estimates with lower error values compared to other methods. As the sample size increases, the differences among the methods decrease, and particularly the MLE, MPSE, and WLSE methods demonstrate more stable and efficient performance than the other estimation methods.

5 A novel regression analysis

The quantile regression model is proposed by [30]. This model provides an appropriate alternative to classical regression, as it does not require any assumptions about the error terms and can be applied when the assumptions of ordinary regression are violated. This method is used to model the conditional quantiles of the response variable as a function of explanatory variables, and is particularly preferred for obtaining robust estimates in datasets containing outliers or skewed distributions. When the response variable is defined on a bounded interval, regression models based on unit distributions can be used to model its conditional mean or quantiles; among these, the beta regression model is one of the most commonly used approaches in the literature. However, when the response variable exhibits a skewed distribution, the use of the beta regression model may not be appropriate. In this case, using the median instead of the mean provides a more suitable alternative. By reparameterizing unit distributions based on the quantile function, several new regression models have been developed to model the conditional quantiles of variables bounded within the (0,1) interval. In this context, the beta [23], KW [24], UBXII [12], unit log-log [13], and log-extended exponential geometric [25] quantile regression models are proposed.In this section, a novel regression analysis based on the GMOKW distribution is introduced as an alternative to the existing regression models. The quantile function presented in Eq (7) is utilized to derive this novel regression model. Re-parameterization of the pdf and cdf of the GMOKW can be accomplished by employing the quantile function. Let and then

(21)

is obtained and the in Eq (21) will be denoted as . The cdf and pdf of the re-parametrized distribution are derived, respectively, by

(22)

and

(23)

where in Eq (21), and denotes the quantile regression parameter. The p is chosen from and can take values of 0.25, 0.5, or 0.75. In the remainder of the manuscript, the random variable T will be denoted by . After defining the QGMOKW, the new regression model incorporating the pdf of the QGMOKW from Eq (23) can be introduced. Consider for and and are unknown parameters, and p is selected as 0.5. The proposed quantile regression model is formulated as follows:

where are the unknown regression parameter vectors and and known ith vector of the covariates and g is a link function. The following logit-link function is utilized since the QGMOKW is defined within the interval :

(24)

The is obtained as using Eq (24)

(25)

5.1 Estimation for regression model parameters

Let be a random sample from the distribution. The corresponding log-likelihood function is expressed as

(26)

where The ML of , say , is obtained by maximizing in Eq (26). Under certain regularity conditions, the asymptotic distribution of follows a multivariate normal distribution, where J denotes the expected information matrix. In practice, the observed information matrix is often used instead of J.

The log-likelihood function in Eq (26) is maximized numerically to obtain the ML estimates of Φ, denoted as . We employed a numerical optimization algorithm, available in the R optim function, for this purpose. The standard errors (SEs) of the parameters, reported in Table 13, are obtained from the observed information matrix. This matrix (J) is numerically approximated by the negative of the Hessian matrix at . The variance-covariance matrix is then computed as , and the SEs are the square roots of the diagonal elements of this matrix.

6 Real-life data analyses

In this section, four real-life data analyses are carried out to demonstrate the applicability of the proposed distribution and regression model.

6.1 Application of new distribution without covariates

In this subsection, three real datasets are analyzed to demonstrate the applicability of the GMOKW distribution in real-world scenarios. The GMOKW is compared with several competing distributions such as Beta, KW [10], UW [11], UL [18], UBXII [12], and UR [31].

The pdfs for these competing distributions are provided in Table 6. The MLE method is used to estimate all the distribution parameters of the real datasets. The fit of the distributions is evaluated with selection criteria such as Kolmogorov-Smirnov (KS), Anderson Darling (AD), Cramer Von Mises (CVM) statistics and their corresponding p-value, as well as the Akaike Information Criterion (AIC), Bayesian Information Criterion (BIC), consistent Akaike Information Criterion (CAIC), Hannan-Quinn Information Criterion (HQIC) and value. In terms of selection criteria, the model with the lowest KS, AD, and CvM test statistics, and the lowest , AIC, BIC, CAIC, and HQIC values is defined as the best-fitting model to datasets. On the contrary the model with the highest p-value of the KS, AD, and CvM test statistics provides the best fit to the data.

The first dataset comprises the share of Electoral College votes secured by the winning candidate in U.S. presidential elections from 1824 to 2016 [32].In Table 7, the MLEs of the fitted distributions and their corresponding SEs are reported for dataset I, while Table 8 presents the model selection criteria and relevant statistics. Moreover, the cdf plots for the fitted distributions are demonstrated in Fig 9. As seen from Table 8, the GMOKW distribution provides a better fit to the data than other distributions in terms of several criteria. This results is also supported by the comparison of the empirical and fitted cdf plots shown in Fig 6. Therefore, the GMOKW distribution can be considered as the model that best represents the empirical distribution function.

thumbnail
Table 8. The modeling results for all distributions on dataset I.

https://doi.org/10.1371/journal.pone.0329568.t008

The second dataset comprises the mortality rates in the United Kingdom over a 60-day period [33]. In Table 9, the MLEs of the fitted distributions and their corresponding SEs are reported for dataset II, while Table 10 presents the model selection criteria and relevant statistics. Moreover, the cdf plots for the fitted distributions are demonstrated in Fig 10. According to Table 10, the GMOKW model provides the best fit to the data for all selection criteria except the BIC when compared with the competing models. This finding, supported by the empirical and fitted CDF plots presented in Fig 10, indicates that the GMOKW distribution best represents the empirical distribution function.

thumbnail
Table 10. The modeling results for all distributions on dataset II.

https://doi.org/10.1371/journal.pone.0329568.t010

The third dataset belongs to the Foulum region of Denmark and is acquired through radar imagery collected by the EMISAR sensor. This dataset contains 101 observations; each data point corresponds to the squared modulus of complex-valued radar returns, which represent backscatter intensity under horizontal polarization. The dataset was previously studied by authors [34] and [35]. In Table 11, the MLEs of the fitted distributions and their corresponding SEs are reported for dataset II, while Table 12 presents the model selection criteria and relevant statistics. Moreover, the cdf plots for the fitted distributions are demonstrated in Fig 11. As shown in Table 12, the GMOKW model yields the lowest information criterion values and the lowest test statistics compared with the competing models.

thumbnail
Table 11. The MLEs and corresponding SEs for dataset III.

https://doi.org/10.1371/journal.pone.0329568.t011

thumbnail
Table 12. The modeling results for all distributions on dataset III.

https://doi.org/10.1371/journal.pone.0329568.t012

6.2 Application of new regression model with covariates

In this subsection, a real data analysis is performed to evaluate the usability and superiority of the new regression model. The dataset consists of the percentage of educational attainment in OECD countries, along with components such as voter turnout percentage, homicide rate, and life satisfaction. The data is accessible at https://stats.oecd.org/, and more detailed information can be found in [22].

For comparison purposes, the beta regression (BR), Kumaraswamy regression (KR), and log-extended exponential geometric regression (LEEG) models [25] are considered. This application aims to examine the relationship between the percentage of education level values of OECD countries (variable y) and voter participation percentage (variable x1), homicide rate (variable x2), and life satisfaction (variable x3).

For all models, ML, SE, , and AIC are calculated. The results are presented in Table 13. Table 13 shows that the QGMOKW model has the best modeling capacity and can be considered an alternative to the beta and Kumaraswamy regression models in the literature.

thumbnail
Table 13. Data analysis results for regression real data.

https://doi.org/10.1371/journal.pone.0329568.t013

For the QGMOKW model, the intercept is estimated at 0.9455 with a p-value of 0.6761, which is not statistically significant at the 0.05 level.When all predictors are zero, the expected response is approximately 0.9455, but this estimate is not reliable due to its insignificance. The coefficient is estimated at -2.5397 with a p-value of 0.0516, which is very close to the 0.05 significance level but does not strictly meet the threshold. This suggests a possible negative relationship between the corresponding predictor and the response variable; however, the statistical evidence remains weak. In contrast, is estimated at -0.0609 with a p-value of 0.0147, indicating statistical significance at the 0.05 level. This suggests that a one-unit increase in the predictor corresponding to is associated with an average decrease of 0.0609 units in the response variable. The coefficient is estimated at 0.3695 with a p-value of 0.9664, making it highly insignificant. This implies that the corresponding predictor does not have a meaningful effect on the response variable.

Furthermore, the parameter α is estimated at 0.9853 with a p-value of 0.5898, indicating that it is not statistically significant. Similarly, λ is estimated at 0.5787 with a p-value of 0.6604, showing no significant effect. The parameter γ is estimated at 3.3220 with a p-value of 0.9891, making it highly insignificant. Thus, it does not contribute meaningfully to the model.

thumbnail
Table 14. The first dataset: Electoral College vote shares in U.S. presidential elections (1824-2016).

https://doi.org/10.1371/journal.pone.0329568.t014

Overall, the only statistically significant parameter at the 0.05 level is , which suggests that its corresponding predictor has a meaningful negative effect on the response variable. Other parameters, including , , , α, λ, and γ, do not show strong evidence of influence in the model.

7 Conclusion

In this study, a new unit distribution referred to as the GMOKW distribution is introduced. Some mathematical properties of the proposed distribution, such as moments, quantile function, and Lorenz and Bonferroni curves, are investigated. Nine different estimators are considered to obtain the unknown parameters of the model, and the performance of these estimators is evaluated through a Monte Carlo simulation study. According to the Monte Carlo simulation results, in both small and large sample sizes, the MLE and MPSE methods outperform other estimator methods. Moreover, a GMOKW-based quantile regression model is proposed to model the conditional quantiles of the response variable as a function of explanatory variables. To evaluate the applicability of the proposed distribution and regression model, four real data analyses are conducted. Three datasets are utilized to evaluate the performance of the proposed distribution, whereas one dataset is used to implement the GMOKW-based quantile regression model. The results obtained from real data analyses indicate that the GMOKW distribution can be used as an alternative to other bounded distributions in modeling datasets. In addition, the GMOKW-based quantile regression model exhibits high flexibility and efficiency in modeling the association between bounded response variables and covariates.

In the light of the obtained results, it is evaluated that the proposed distribution is applicable in modeling bounded data sets obtained from various fields such as medicine, education, engineering, physics, economics and actuarial science. The GMOKW model provides a more accurate and flexible for representing bounded data such as proportions, success probabilities, and risk levels. In terms of fitting performance and interpretability, the new distribution and its corresponding regression model can be used as a powerful and flexible method for analyzing real-world bounded datasets across various disciplines, providing a strong alternative to well-known unit distributions such as the KW, Beta, UW, and UBXII distributions. Future studies may focus on extending the proposed model to different censoring schemes and developing more efficient estimation techniques.

Appendix

The datasets used for the real-life data analysis in Sect 6 are presented below.

thumbnail
Table 15. The second dataset: Mortality rates in the United Kingdom over a 60-day period.

https://doi.org/10.1371/journal.pone.0329568.t015

thumbnail
Table 16. The third dataset: Radar imagery collected by the EMISAR sensor in the Foulum region.

https://doi.org/10.1371/journal.pone.0329568.t016

References

  1. 1. Gupta RC, Gupta PL, Gupta RD. Modeling failure time data by Lehman alternatives. Communications in Statistics - Theory and Methods. 1998;27(4):887–904.
  2. 2. Taniş C, Saraçoğlu B. Cubic rank transmuted generalized Gompertz distribution: properties and applications. J Appl Stat. 2022;50(1):195–213. pmid:36530779
  3. 3. Taniş C. Transmuted lower record type power function distribution. J Sci Arts. 2021;21(4):951–60.
  4. 4. Nwezza EE, Ugwuowo FI. The Marshall-Olkin Gumbel-Lomax distribution: properties and applications. Heliyon. 2020;6(3):e03569. pmid:32195394
  5. 5. Afify AZ, Mohamed OA. A new three-parameter exponential distribution with variable shapes for the hazard rate: estimation and applications. Mathematics. 2020;8(1):135.
  6. 6. Eliwa MS, Altun E, Alhussain ZA, Ahmed EA, Salah MM, Ahmed HH, et al. A new one-parameter lifetime distribution and its regression model with applications. PLoS One. 2021;16(2):e0246969. pmid:33606720
  7. 7. Javed M, Nawaz T, Irfan M. The Marshall-Olkin Kappa distribution: properties and applications. Journal of King Saud University - Science. 2019;31(4):684–91.
  8. 8. Nawaz T, Hussain S, Ahmad T, Naz F, Abid M. Kumaraswamy generalized Kappa distribution with application to stream flow data. Journal of King Saud University - Science. 2020;32(1):172–82.
  9. 9. Ahmad AE-BA, Ghazal MGM. Exponentiated additive Weibull distribution. Reliability Engineering & System Safety. 2020;193:106663.
  10. 10. Kumaraswamy P. A generalized probability density function for double-bounded random processes. Journal of Hydrology. 1980;46(1–2):79–88.
  11. 11. Mazucheli J, Menezes AFB, Ghitany ME. The unit-Weibull distribution and associated inference. J Appl Probab Stat. 2018;13(2):1–22.
  12. 12. Korkmaz MÇ, Chesneau C. On the unit Burr-XII distribution with the quantile regression modeling and applications. Comp Appl Math. 2021;40(1).
  13. 13. Korkmaz MÇ, Korkmaz ZS. The unit log-log distribution: a new unit distribution with alternative quantile regression modeling and educational measurements applications. J Appl Stat. 2021;50(4):889–908. pmid:36925910
  14. 14. Mazucheli J, Leiva V, Alves B, Menezes AFB. A new quantile regression for modeling bounded data under a unit birnbaum–saunders distribution with applications in medicine and politics. Symmetry. 2021;13(4):682.
  15. 15. Ghitany ME, Mazucheli J, Menezes AFB, Alqallaf F. The unit-inverse Gaussian distribution: A new alternative to two-parameter distributions on the unit interval. Communications in Statistics - Theory and Methods. 2018;48(14):3423–38.
  16. 16. Altun E. The log-weighted exponential regression model: alternative to the beta regression model. Communications in Statistics - Theory and Methods. 2019;50(10):2306–21.
  17. 17. Korkmaz MÇ, Altun E, Chesneau C, Yousof HM. On the unit-Chen distribution with associated quantile regression and applications. Mathematica Slovaca. 2022;72(3):765–86.
  18. 18. Mazucheli J, Menezes AFB, Chakraborty S. On the one parameter unit-Lindley distribution and its associated regression model for proportion data. Journal of Applied Statistics. 2018;46(4):700–14.
  19. 19. Altun E, Cordeiro GM. The unit-improved second-degree Lindley distribution: inference and regression modeling. Comput Stat. 2019;35(1):259–79.
  20. 20. Korkmaz MÇ. A new heavy-tailed distribution defined on the bounded interval: the logit slash distribution and its application. J Appl Stat. 2019;47(12):2097–119. pmid:35706840
  21. 21. Korkmaz MÇ, Chesneau C, Korkmaz ZS. On the arcsecant hyperbolic normal distribution. Properties, quantile regression modeling and applications. Symmetry. 2021;13(1):117.
  22. 22. Korkmaz MÇ, Chesneau C, Korkmaz ZS. Transmuted unit Rayleigh quantile regression model: alternative to beta and Kumaraswamy quantile regression models. Univ Politeh Buchar Sci Bull Ser Appl Math Phys. 2021;83(3):149–58.
  23. 23. Ferrari S, Cribari-Neto F. Beta regression for modelling rates and proportions. Journal of Applied Statistics. 2004;31(7):799–815.
  24. 24. Mitnik PA, Baek S. The Kumaraswamy distribution: median-dispersion re-parameterizations for regression modeling and simulation-based estimation. Stat Papers. 2012;54(1):177–92.
  25. 25. Jodrá P, Jiménez-Gamero MD. A quantile regression model for bounded responses based on the exponential-geometric distribution. Revstat - Statistical Journal. 2020;18(4):415–36.
  26. 26. Sağlam Ş, Karakaya K. Unit Burr-Hatke distribution with a new quantile regression model. J Sci Arts. 2022;22(3):663–76.
  27. 27. Chesneau C, Karakaya K, Bakouch HS, Kuş C. An alternative to the Marshall-Olkin family of distributions: bootstrap, regression and applications. Commun Appl Math Comput. 2022;4(4):1229–57.
  28. 28. Gradshteyn IS, Ryzhik IM. Table of integrals, series, and products. 7th ed. Academic Press; 2014.
  29. 29. Cheng RCH, Amin NAK. Estimating parameters in continuous univariate distributions with a shifted origin. Journal of the Royal Statistical Society Series B: Statistical Methodology. 1983;45(3):394–403.
  30. 30. Koenker R, Bassett G. Regression quantiles. Econometrica. 1978;46(1):33.
  31. 31. Bantan RAR, Chesneau C, Jamal F, Elgarhy M, Tahir MH, Ali A, et al. Some new facts about the unit-rayleigh distribution with applications. Mathematics. 2020;8(11):1954.
  32. 32. Nadarajah S, Kebe M. The confluent hypergeometric beta distribution. Mathematics. 2023;11(9):2169.
  33. 33. Almetwally EM. The odd Weibull inverse topp-leone distribution with applications to COVID-19 Data. Ann Data Sci. 2022;9(1):121–40. pmid:38624798
  34. 34. Alizadeh M, Cordeiro GM, Nascimento ADC, Lima M do CS, Ortega EMM. Odd-Burr generalized family of distributions with some applications. Journal of Statistical Computation and Simulation. 2016;87(2):367–89.
  35. 35. El-Bar AMTA, Athar H, Kayid M, Sayed RM, Balogun OS, Felifel AM. The cosine-sine model: dual generalized order statistics, characterization, and estimation methods with applications to physics and radiation. Journal of Radiation Research and Applied Sciences. 2025;18(2):101324.