Figures
Abstract
This paper provides a novel model that is more relevant than the well-known conventional distributions, which stand for the two-parameter distribution of the lifetime modified Kies Topp–Leone (MKTL) model. Compared to the current distributions, the most recent one gives an unusually varied collection of probability functions. The density and hazard rate functions exhibit features, demonstrating that the model is flexible to several kinds of data. Multiple statistical characteristics have been obtained. To estimate the parameters of the MKTL model, we employed various estimation techniques, including maximum likelihood estimators (MLEs) and the Bayesian estimation approach. We compared the traditional reliability function model to the fuzzy reliability function model within the reliability analysis framework. A complete Monte Carlo simulation analysis is conducted to determine the precision of these estimators. The suggested model outperforms competing models in real-world applications and may be chosen as an enhanced model for building a statistical model for the COVID-19 data and other data sets with similar features.
Citation: El-Sherpieny E-SA, Almetwally EM, Muse AH, Hussam E (2023) Data analysis for COVID-19 deaths using a novel statistical model: Simulation and fuzzy application. PLoS ONE 18(4): e0283618. https://doi.org/10.1371/journal.pone.0283618
Editor: Qichun Zhang, University of Bradford, UNITED KINGDOM
Received: November 12, 2022; Accepted: March 13, 2023; Published: April 10, 2023
Copyright: © 2023 El-Sherpieny et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: All relevant data are within the paper.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
The modeling of lifespan distributions has received considerable attention over many years and decades. Because of the pertinence of modeling events and pandemics, there has been consistent growth throughout the course of time in the interest in modeling for lifespan distributions.
Distribution theory researchers often model for data in two ways: adding a new parameter to the distribution of interest to make it more flexible or constructing a new distribution family. This is done to provide the maximum flexibility possible given the distribution.
Modeling is fascinating in various fields, including manufacturing, engineering, reliability, and health research. See (Anake et al. [1]) for more reading.
This article provides the statistical modeling of the COVID-19 death certificates in the Kingdom of Saudi Arabia. If you are interested in reading more about this point of research, you may want to explore the following: Kumar [2], Khakharia et al. [3], Wang [4], Lalmuanawma et al. [5] and Bullock et al. [6].
Hassan et al. [7] was the first one to introduce the mathematical formulas for the cumulative distribution function (CDF) and probability density function (PDF) of the Topp-Leone distribution (TL) distribution with the shape parameter β > 0, as shown in the upcoming equation:
(1)
and,
(2)
Kumar and Dharmaja [8] investigated the features of the exponents Kies distribution. See Dey et al. [9] for more reading about modified Kies distribution. Al-Babtain et al. [10] introduced a novel family of distributions based on the modified Kies (MK) distribution family. If G(z;δ) is considered as the CDF of the baseline, which depends on a parameter vector called δ so we can express and describe the CDF of the MK family with the following equation:
(3)
where Θ is vector contains the parameter (θ, Δ) and Δ is parameter vector of G(x).
We can express the PDF of Eq (3) like below
(4)
The two-parameter modified Kies-Topp-Leone (MKTL) distribution is derived in this article. The MKTL distribution has several beneficial features. Because it may be either negatively or positively skew or even symmetrical, the proposed MKTL distribution has a very versatile PDF, enabling more versatility. It can potentially behave as either a decrease, increase, bathtub, or reverse-J risk rate. An additional advantage of the distribution that has been claimed is that it has a precise closed-form CDF and is very simple to modify.
In light of these merits, the distribution is a great choice for use in various fields, such as the evaluation of biological organisms, durability, and financial statistics, amongst other distributions. This article discussed one real-data application and concluded, based on the modeling of the findings, that the new distribution is an ideal rival to many common and standard distributions with the same number of scale and shape parameters. This was determined by comparing the results of the new distribution to those of the common and standard distributions as an example of this is the Type II Power Topp-Leone inverse exponential, see (Bantan et al. [11]), and others like [12]), [13]), [14]), and modified Kies exponential distributions see (Al-Babtain et al. [10]).
In our further research, we want to develop a novel version of the bivariate modified Kies inverted Topp-Leone model, which will be based on copula see for more information [15–17].
The remainder of this study is organized as follows: In Section 2, The MKTL distribution is constructed. In Section 3, the MKTL distribution and some of its mathematical properties are the subject of this discussion. In Section 4, for the MKTL distribution, we obtain a technique for estimating its parameters. In Section 5, successfully attained Fuzzy Reliability. In Section 6, We acquire MKTL distribution simulated results. In Section 7, data analysis for real-world data is presented. The paper is summed up, and its conclusion is presented in Section 8.
2 MKTL distribution
In the next part, we will review the mathematical equations underpinning the suggested distribution. By using Eq (1) and substituting in Eq (3) the CDF for the MKTL distribution is defined:
(5)
and the PDF that corresponds to it may be accessed here
(6)
such that Θ represents the parameter vector for the parameters (θ, β).
The survival function of the MKTL distribution is shown as
(7)
The hazard rate function (hr) of the MKTL distribution may be represented as
(8)
Figs 1 and 2 graph several plots of the MKTL distribution, using the values that have been provided for the parameters θ and β. It is clear from looking at the representations in Fig 2 that the MKTL distribution HR function may be scaled up or down or even modeled into the form of a bathtub. The TL distribution is a very poor model for data and phenomena that show the bathtub’s increasing and declining shapes and failure rates. That is one of the benefits of the MKTL distribution over the TL distribution. As a consequence of this, the MKTL distribution is in a stronger position than its competitor to analyze lifespan data.
3 Statistical properties of the proposed distribution of MKTL
3.1 A general expansion of the MKTL density
In this part, we give a linear representation of the MK family. Then we demonstrate how this representation may be used to produce a usable linear representation for the MKTL distribution. It is possible to offer the following as a depiction of the MK family’s mixture:
(9)
It is possible to rewrite the final equation of the MKTL distribution using the PDF and CDF values from the TL distribution.
(10)
As we can see the above equation number (10) represents the TL density using the parameter β[α(j + 1) + k].
3.2 Quantile for the MKTL distribution
The quantile function of the MKTL distribution, is derived by inverting Eq (5), like that we can see that x = F−1(x, Θ)(U) as follows:
(11)
using Eq (11) we can easily find the three quarterlies, also we can generate data using this equation.
3.3 Moments for the MKTL distribution
We can use equation number (10) to find the mathematical formula of the rth moment like shown below for the variable x:
(12)
where
.
4 Various approaches for estimation
We were successful in overcoming the challenges associated with estimating the MKTL distribution parameters by using a combination of Bayesian and non-Bayesian techniques throughout the estimation process. These methods are referred to by their individual names, such as maximum likelihood estimators (MLE) and Bayesian procedures based on squared error loss function (SELF).
4.1 Maximum likelihood estimators
Let it be assumed that the values x1, …, xn are drawn at random from the MKTL distribution, which has the parameters θ and β. It can be shown that the likelihood function for the MKTL distribution is:
(13)
The following formula may be used to get the log-likelihood function for the MKTL distribution:
(14)
We will find the first derivatives for the above equations as find below:
(15)
and
(16)
Using the first derivative of Eq (14) with respect to θ and β, we can determine the MLE of these distribution parameters. Using the Newton-Raphson approach, which is implemented in the R program, the MLE may be optimized from the log-likelihood.
4.2 Bayesian estimation method
Bayesian estimation is one o the most important techniques for estimation. It considers the parameters as a random variable, by assuming that it has a prior distribution. We make the assumption that both θ and β have gamma priors. Also, The gamma priors for distribution parameters are respectively as follows:
(17)
and
(18)
Given that it is commonly known that the two priors are not dependent on one another, the joint prior of θ and β may be derived as follows:
(19)
We may utilize the estimate and variance-covariance matrix of the MLE approach in order to obtain adequate and superior values for the hyper-parameters of the independent joint prior. The estimated hyper-parameters may be stated as follows after the mean and variance of the gamma priors have been equated.
wherein B is the number of times the repetition is performed.
The posterior distribution is constructed in the following way
(20)
It is well known that u(Θ) = u(θ, β). The squared error loss function is used for Bayesian estimates of distribution parameters as below:
(21)
It is obvious that SELF estimations of u(θ, β) in (21), are very hard and look impossible to calculate. Since calculating numerous integrals analytically or even mathematically by hand is notoriously difficult, we utilized Mathematica 12 to employ an approximation approach that has shown to be highly helpful in solving these types of integration: the SELF estimator.
Therefore, the Markov chain Monte Carlo (MCMC) approach was used to approximately determine integrals in this setting. The Metropolis-Hastings (MH) algorithm, also known as the random walk algorithm, is a key part of MCMC. In a manner analogous to acceptance-rejection sampling, the MH algorithm takes into account the possibility that a candidate value might be formed from a proposal distribution at each stage of the process. Conditional posterior densities of MKTL are used to produce random samples, which are then analyzed using the MH method as follows:
(22)
and
(23)
For further detail, read Chen and Shao [18]. A confidence interval (CI) that is used by Bayesian estimators is referred to as the credible interval or, alternatively, as the highest posterior density (HPD) interval. They took advantage of a method that has seen a lot of usages elsewhere to generate HPD estimates for distribution characteristics that were unknown to them. It is recommended that estimates be generated using samples drawn using the MH algorithm that has been presented; for further details on the proposed approach, see Chen and Shao [18].
5 Fuzzy history
The ideas of durability and HR function could be viewed as probabilities that define a lifetime; nevertheless, the scope of HR functions has recently expanded as a result of the incorporation of hazy components into these functions. In conventional models of system dependability, the mortality rate probability of the parts involved is represented as actual numerical values. This is done in order to better understand the system’s behavior. However, this precision of system lifetimes does not hold true in the actual world because the values of system parameters collected by experimentation or guesswork are all susceptible to some degree of error.
It is feasible to utilize the fuzzy set theory in order to explain the real world in a way that is both realistic and practical. This is made possible by the fact that it is possible to apply the theory. As a direct result of this, putting into practice the idea of fuzziness will ultimately end up being more acceptable. Because of this, the idea of fuzziness has to be taken into consideration whether one is analyzing the behavior of a system or talking about the dependability of a system.
In order to generate fuzzy parameters, first, the membership functions are applied to characterize the level of fuzziness associated with the lifespan information or system parameters, and then the results of that characterization are used in the process of producing the fuzzy parameters. This is done so that the production of fuzzy parameters will be easier to do. One day, the fuzzy set theory may be used to explain the actual world in a manner that is not only believable but also helpful. This will be the case when it is applied. This is only one example of the many different ways in which the idea may be used.
5.1 Concepts and relationships in the science of fuzzy sets
In addition to the reliability of the probability distribution, the kind of data utilized to estimate the parameters is an incredibly crucial component in influencing the correctness of the findings that we reached. As a consequence of this, the data type must be specified. One of these types of data is known as fuzzy data, and it is one of the more recent and important advances in the field of statistics. This is due to the fact that many events in the real world do not have clear bounds. So suppose that T is a continuous random variable that represents a system’s failure time (component). Sabry et al. [19] introduced the inference of fuzzy reliability model for inverse Rayleigh distribution. Tolba et al. [20] discussed fuzzy statistical inference for stress-strength reliability using inverse Lomax lifetime distribution. Mohamed et al. [21] obtained fuzzy inference of reliability analysis for Type II Half logistic Weibull distribution. Meriem et al. [22] derived statistical inference, and fuzzy reliability for power xlindley distribution. We can easily calculate the fuzzy dependability using the fuzzy probability formula, see Chen and Pham [23]
(24)
where μ(x) is a membership function, For more reading see [24]
Then suppose that μ(x) is
(25)
The lifespan x(γ) for μ(x), may be computed using the computing technique of the fuzzy numbers function and correlates to a given value of γ − Cut, γ ∈ [0, 1], , as follows: Chen and Pham [23],
then
(26)
As a result, the fuzzy reliability values may be determined for all γ values, We investigate the fuzzy reliability of MKTL distribution based on the fuzzy reliability definition as follows equations:
If γ = 0
(27)
If γ has value
(28)
where x(γ) = t1 + γ(t2 − t1), and if γ = 1
(29)
6 Simulation results
Here, we estimate the MKTL parameters using two different approaches and compare them to evaluate their relative performance in a simulated setting. For the parameters θ = (0.5, 1.5, 3) and β, we explore a range of sample sizes (n = 30, 50, 100). Here, we choose N = 5, 000 samples at random from the MKTL distribution. Average bias (Abias), mean squared error (MSE), lower and higher confidence intervals (CI), and coverage probability are calculated for each estimate (CP).
Abias, mean squared error, and confidence interval are used to compare the efficiency of various estimators; those with less MSE values are preferred. The R software package is used to get the simulated outcomes. Tables 1–4 exhibit the Abias, MSE, and CI for the MLE and Bayesian estimations. In addition, as the sample size grows, the average estimate produced by any estimating technique becomes closer and closer to the real parameter values.
6.1 Observation and concluding remarks concerning the simulation and its results
- When the sample size is increased, the mean values of the parameters move closer to their original values, and the mean squared error (MSE) falls.
- The performance of all estimators is quite high. They all estimate very tiny MSE and mean values similar to the original parameters.
- In terms of MSE values and parameter mean values, the differences between all estimators are relatively modest.
7 Real data analysis
This part aims to demonstrate the MKTL distribution’s applicability to one set of actual data. When compared to other competing models, MKTL distribution is examined, namely: Kumaraswamy, Beta, Gompertz Lomax (GL) (Oguntunde et al. [13]), Topp-Leaon generalized exponential (TLGE), Type II Power Topp-Leone inverse exponential (TIIPTLIE), modified Kies exponential (MKEx), and alpha power inverted TL (APITL).
Table 5 provide values of Cramér-von Mises (W), Anderson-Darling (A), and Kolmogorov- Smirnov (KS) statistic along with its P-value for all models fitted based on one real data sets. In addition, these tables contain the MLE and standard errors (SE) of the parameters for the considered models. In Table 5 results when the MKTL compared to all other models tried to fit the COVID-19, the MKTL distribution has the greatest P-value and the smallest distance of Kolmogorov-Smirnov(KS), W, and A value. Fig 3 show the fit empirical, histogram, and PP-plot for the MKTL distribution for the data under investigation.
The data represents a COVID-19 data belong to Saudi Arabia of 27 days, from 4 August 2021 to 30 August 2021 see the link https://covid19.who.int/.
These data formed of drought mortality rate. The data are as follows: 0.2113, 0.2683, 0.2487, 0.2674, 0.1716, 0.2666, 0.2091, 0.2278, 0.1706, 0.2271, 0.1890, 0.2077, 0.2452, 0.1319, 0.2259, 0.1504, 0.1879, 0.1689, 0.2063, 0.2249, 0.1686, 0.1310, 0.1497, 0.1309, 0.1495, 0.1121, and 0.1120. Results are tabulated in Table 5 and Fig 3. Fig 3 show Fuzzy Reliability when t1 = 0.12 and t2 = 0.25 with different value of γ cut plot and boxplot for the MKTL distribution for the data being analyzed. We note the Fuzzy Reliability increases as increases γ cut. Table 6 shows MLE and Bayesian for parameter estimates of MKTL distribution. Fig 4 display convergence charts of the Markov chain Monte Carlo method for parameter estimations of the MKTL distribution.
7.1 Conclusions and suggestions regarding the application
- From the data set, we can observe that MKTL yields the best P-value and the shortest W*, A*, and KS distances.
- From Fig 3, We may conclude that MKTL was the best-fitting model for this dataset.
- From Fig 5, we can deduce that Fuzzy Reliability when t1 = 0.12 and t2 = 0.25 with different value of γ cut plot, when γ increases then Fuzzy reliability increases. In the box plot for the MKTL distribution for the data under investigation, we note the data haven’t outliers.
- Referring to Table 5, we can see that TIIPTLIE, TLGE, Kumaraswamy, beta, GL, APITL, and MKEx distribution provides good fitting for this data set, but the proposed model was the best.
- Referring to Table 6, the Bayesian estimate technique is the most suitable estimation approach to use with this data.
8 Summary
This work introduces a novel two-parameter model we name the modified Kies Topp-Leone distribution (or MKTL distribution for short). When analyzing lifespan data, the MKTL distribution offers greater leeway than more standard distributions. The MKTL distribution’s hazard function, quantiles, and moments are shown along with its survival function and linear representation. A fuzzy reliability measure for the MKTL distribution has been derived. Our research shows that the Bayesian approach is superior to the MLE technique. We give an application for the COVID-19 mortality rate and show that the MKTL distribution is superior to other alternatives for fitting this data. To estimate the parameters of the MKTL distribution, MLE and Bayesian methods are used. To evaluate the model’s efficiency, we present estimating techniques, the findings of Fuzzy Reliability, and simulations. Compared to the TIIPTLIE, TLGE, Kumaraswamy, beta, GL, APITL, and MKEx distributions, The proposed model presented using real-world data shows a consistently superior fit.
References
- 1. Anake T.A.; Oguntunde P.E.; Odetunmibi O.A. On a fractional beta- distribution. Int. J. Math. Comput. 2015, 26, 26–34.
- 2. Kumar S. Monitoring Novel Corona Virus (COVID-19) Infections in India by Cluster Analysis. Ann. Data Sci. 2020, 7, 417–425.
- 3. Khakharia A.; Shah V.; Jain S.; Shah J.; Tiwari A.; Daphal P.; et al. Outbreak Prediction of COVID-19 for Dense and Populated Countries Using Machine Learning. Ann. Data Sci. 2020, 8, 1–19.
- 4. Wang Y.z.J. A call for caution in extrapolating chest CT sensitivity for COVID-19 derived from hospital data to patients among general population. Quant. Imaging Med. Surg. 2020, 10, 798. pmid:32269938
- 5. Lalmuanawma S.; Hussain J.; Chhakchhuak L. Applications of machine learning and artificial intelligence for Covid-19 (SARS-CoV-2) pandemic: A review. Chaos Solitons Fractals 2020, 139, 110059. pmid:32834612
- 6.
Bullock, J.; Pham, K.H.; Lam, C.S.N.; Luengo-Oroz, M. Mapping the landscape of artificial intelligence applications against COVID-19. arXiv 2020, arXiv:2003.11336.
- 7. Hassan A.S.; Elgarhy M.; Ragab R. Statistical Properties and Estimation of Inverted Topp-Leone Distribution. J. Stat. Appl. Probab. 2020, forthcoming.
- 8. Kumar C.S.; Dharmaja S.H.S. The exponentiated reduced Kies distribution: Properties and applications. Commun. Stat.-Theory Methods 2017, 46, 8778–8790.
- 9. Dey S.; Nassar M.; Kumar D. Moments and estimation of reduced Kies distribution based on progressive type-II right censored order statistics. Hacet. J. Math. Stat. 2019, 48, 332–350.
- 10. Al-Babtain A.A.; Shakhatreh M.K.; Nassar M.; Afify A.Z. A New Modified Kies Family: Properties, Estimation Under Complete and type-II Censored Samples, and Engineering Applications. Mathematics 2020, 8, 1345.
- 11. Bantan R. A., Jamal F., Chesneau C., & Elgarhy M. (2020). Type II Power Topp-Leone generated family of distributions with statistical inference and applications. Symmetry, 12(1), 75.
- 12. Kunjiratanachot N., Bodhisuwan W., & Volodin A. (2018). The Topp-Leone generalized exponential power series distribution with applications. J. Probab. Stat. Sci, 16(2), 197–208.
- 13. Oguntunde P. E., Khaleel M. A., Ahmed M. T., Adejumo A. O., & Odetunmibi O. A. (2017). A new generalization of the Lomax distribution with increasing, decreasing, and constant failure rate. Modelling and Simulation in Engineering, 2017.
- 14. Ibrahim G. M., Hassan A. S., Almetwally E. M., & Almongy H. M. Parameter Estimation of Alpha Power Inverted Topp-Leone Distribution with Applications. Intelligent Automation & Soft Computing, 29(2), 353–371.
- 15. Almetwally E. M., & Meraou M. A. (2022). Application of Environmental Data with New Extension of Nadarajah-Haghighi Distribution. Computational Journal of Mathematical and Statistical Sciences, 1(1), 26–41.
- 16. Almetwally E.M.; Muhammed H.Z. On a bivariate Fréchet distribution. J. Stat Appl Probab. 2020 9, 1–21.
- 17. Chesneau C. (2023). On New Three-and Two-Dimensional Ratio-Power Copulas. Computational Journal of Mathematical and Statistical Sciences, 2(1), 106–122.
- 18. Chen M. H., & Shao Q. M. (1999). Monte Carlo estimation of Bayesian credible and HPD intervals. Journal of Computational and Graphical Statistics, 8(1), 69–92.
- 19. Sabry M. A., Almetwally E. M., Alamri O. A., Yusuf M., Almongy H. M., et al. (2021). Inference of fuzzy reliability model for inverse Rayleigh distribution. Aims Math, 6(9), 9770–9785.
- 20. Tolba A. H., Ramadan D. A., Almetwally E. M., Jawa T. M., & Sayed-Ahmed N. (2022). Statistical inference for stress-strength reliability using inverse Lomax lifetime distribution with mechanical engineering applications. Thermal Science, 26(1), 303–326.
- 21. Mohamed R. A., Tolba A. H., Almetwally E. M., & Ramadan D. A. (2022). Inference of Reliability Analysis for Type II Half Logistic Weibull Distribution with Application of Bladder Cancer. Axioms, 11(8), 386.
- 22. Meriem B., Gemeay A. M., Almetwally E. M., Halim Z., Alshawarbeh E., et al. (2022). The power xlindley distribution: Statistical inference, fuzzy reliability, and covid-19 application. Journal of Function Spaces, 2022, 1–21.
- 23.
Chen G. and Pham T. T. (2000), Introduction to fuzzy sets, fuzzy logic, and fuzzy control systems, CRC press.
- 24. Riad Fathy H., et al. Fuzzy reliability analysis of the covid-19 mortality rate using a new modified Kies Kumaraswamy model. Journal of Mathematics 2022 (2022).