Figures
Abstract
Since the spread of COVID-19 pandemic in early 2020, modeling the related factors became mandatory, requiring new families of statistical distributions to be formulated. In the present paper we are interested in modeling the vaccination rate in some African countries. The recorded data in these countries show less vaccination rate, which will affect the spread of new active cases and will increase the mortality rate. A new extension of the inverted Nadarajah-Haghighi distribution is considered, which has four parameters and is obtained by combining the inverted Nadarajah-Haghighi distribution and the odd Lomax-G family. The proposed distribution is called the odd Lomax inverted Nadarajah-Haghighi (OLINH) distribution. This distribution owns many virtuous characteristics and attractive statistical properties, such as, the simple linear representation of density function, the flexibility of the hazard rate curve and the odd ratio of failure, in addition to other properties related to quantile, the rth-moment, moment generating function, Rényi entropy, and the function of ordered statistics. In this paper we address the problem of parameter estimation from frequentest and Bayesian approach, accordingly a comparison between the performance of the two estimation methods is implemented using simulation analysis and some numerical techniques. Finally different goodness of fit measures are used for modeling the COVID-19 vaccination rate, which proves the suitability of the OLINH distribution over other competitive distributions.
Citation: Almongy HM, Almetwally EM, Haj Ahmad H, H. Al-nefaie A (2022) Modeling of COVID-19 vaccination rate using odd Lomax inverted Nadarajah-Haghighi distribution. PLoS ONE 17(10): e0276181. https://doi.org/10.1371/journal.pone.0276181
Editor: Alessandro Barbiero, Universita degli Studi di Milano, ITALY
Received: November 28, 2021; Accepted: September 30, 2022; Published: October 21, 2022
Copyright: © 2022 Almongy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Data Availability: Data are within the following Supporting Information file: https://covid19.who.int/who-data/vaccination-data.csv.
Funding: The author(s) received no specific funding for this work.
Competing interests: The authors have declared that no competing interests exist.
1 Introduction
The amount of data obtained for analysis has been growing increasingly, requiring new statistical distributions that enables us to depict every phenomenon under study. Modeling real-life observations using probability distributions is one of the most essential responsibilities that statisticians must handle. Many scientific fields require statistical models to describe the trend and to predict the future behaviour of their data, for example, medical, engineering, finance, and others. Therefore many lifetime models have been employed in literature to describe various forms of survival data, so the newly created families of distributions are strongly depending on the quality of statistical analysis processes,the flexibility and the characteristics of the new models, therefore, significant efforts are focusing on constructing new statistical models. Still there is a persistent need to create new models or formulate new extensions for achieving better fit of the real lifetime data.
Tahir et al. [1] proposed the inverted Nadarajah-Haghighi (INH)) distribution,which is a new inverted model with decreasing and uni-modal (right-skewed) density, with decreasing and upside-down bathtub hazard rate shapes (UBT). They addressed several statistical features of the INH distribution and used various frequentest approaches to estimate the model’s parameters. They have demonstrated the suitability of INH distribution by testing real-life data sets. They also obtained that the INH model was better fit with comparison to other well-known lifetime models such as, the inverted exponential, the inverted gamma, the inverted Weibull and the inverted Lindley among others.
Several researchers have addressed the applications of inverted distributions, one can refer to Folks and Chhikara [2], Rosaiah and Kantam [3], De Gusmao et al. [4], Joshi and Kumar [5], Almetwally [6], Ibrahim and Almetwally [7], Ramos et al. [8], Almetwally [9], Hassan et al. [10], and Basheer et. al [11] among others. Some generalizations of the INH distribution were introduced in literature for example, the Marshall-Olkin INH distribution was studied by Raffiq et al. [12], Toumaj et al. [13] proposed the transmuted INH distribution. Elshahhat and Rastogi [14] discussed parameter estimation of lifetime for the INH distribution with Type-II progressively censored samples. Still there is space for new generalizations and extensions for the INH distribution, consequently, the new extension is superior to the original INH distribution and other competitive models specially for modeling COVID-19 vaccination rate.
Let x be a random variable with the parameters δ, θ > 0 that follows the inverse Nadarajah-Haghighi distribution (INH). The CDF and pdf are as follows:
(1)
and,
(2)
where Θ = (δ, θ) is a parameter vector of INH distribution.
In this work we are introducing a new extension of INH distribution with four parameters, namely the odd Lomax INH (OLINH) based on the odd Lomax-G (OL-G) family introduced by Cordeiro et al. [15]. Adding more parameters to the original distribution improves that distribution and make it more flexible and reliable to model some real life data.
Let be the pdf of a baseline model with vector parameter Θ, then the CDF of the OL-G family is given by:
(3)
where Ω = (α, β, Θ) is a vector of parameters of OL-G family. The pdf of (3) is defined by
(4)
where α, β > 0 are shape parameters. The random variable with pdf (4) is denoted by X ∼OL-G(Ω). A new extended four-parameter Weibull, Lomax, log-logistic, and log-Lindley distributions, called the OL-Weibull, OL-Lomax, OL-log-logistic, and OL-log-Lindley distributions respectively, were introduced by Cordeiro et al. [15]. Odd Lomax-exponential distribution was introduced by Ogunsanya et al. [16]. Yakura et al. [17] introduced the Lomax-Kumaraswamy distribution. Marzouk et al [18] obtained a generalized odd Lomax family of distributions with applications. The extended odd Lomax family of distribution was described by Abubakari et al. [19].
The main idea of this work is to study the statistical properties of the new extension model and investigate the point and interval estimation for its unknown four-parameters. Two estimation methods are considered: the maximum likelihood, and the Bayesian estimation methods. To verify the efficiency of the proposed estimation methods and to study how these estimators perform for various sample sizes and parameter values, statistical analysis is carried out using simulation study via R-coding. A real data example emphasizes the suitability of OLINH model over INH and other competitive models with two and three parameters. The rest of this article is organized as follows: The OLINH distribution is defined in Section 2. In Section 3, some statistical properties for the OLINH distribution are obtained. Section 4 studies two methods of estimation. To judge the efficiency of these estimation methods, a simulation study is performed in Section 5. The Application of COVID-19 vaccinate rate data from 46 different African countries is considered in Section 6 for illustrative purpose. Finally, in Section 7, conclusions are provided.
2 OLINH distribution
Consider the OL-G family with the INH distribution as a baseline function, then a four-parameters OLINH distribution is generated. By substituting the INH model’s CDF and pdf files (1) and (2) in the OL-G family (3) and (4), the OLINH distribution CDF and pdf are obtained as:
(5)
and
(6)
respectively, where x > 0, α, β, δ, θ > 0. A random variable with pdf (6) is denoted by X ∼OLINH(α, β, δ, θ). The hazard rate function (hrf) of the OLINH distribution is given by
The odds ratio of failure (ORF) of the OLINH distribution is otained by
Figs 1 and 2 are separate shapes of the OLINH distribution’s pdf and the hrf for different parameters values respectively. The density shape of the OLINH distribution can be right-skewed and Rev-J shaped. The hrf of the OLINH distribution has some interesting shapes, such as, decreasing and upside down bathtub. Different shapes of hrf create an appealing features for modeling many lifetime data such as biomedical and biological studies, reliability analysis, physical engineering, and survival analysis.
3 Statistical characteristics of OLINH distribution
In this section, we observe some statistical characteristics of the OLINH distribution, such as, the linear representation of its pdf, quantile, the moments, the moment generating function (MGF), Rényi entropy and ordered statistic function.
3.1 Linear representation
According to Cordeiro et al. [15] the linear representation for the density of the OL-G family is given by
(7)
where
. The linear representation for the cumulative density of the OL-G family is as follows
(8)
Using Eq (7), the Linear representation for the pdf of the OLINH density can be written as
(9) Eq (9) denotes the exponentiated INH density with power (k + j + 1). Using Eq (8), we obtain the linear representation of CDF for the OLINH distribution
(10)
Linear representation for pdf and CDF of the OLINH are valuable when finding moments, moment generating function, Rényi entropy, and ordered statistics density.
3.2 Quantile for the OLINH distribution
The quantile of a certain distribution is an important measure of location, it is usually used to create a random sample in simulation analysis. To do so, let x = Q(x) = F(x, Ω)−1, hence for the OLINH distribution the quantile can be obtained by inverting Eq (5) to get:
(11)
In particular, the three quartiles, say Q1, Q2, and Q3 can be observed by selecting some fixed values of q = 0.25, 0.5, and 0.75, respectively, in Eq (11). By this equation, we can obtain skewness and kurtosis measures, see Fig 3.
3.3 Moments for the OLINH distribution
Let x be a random variable following the OLINH distribution, then the rth moment of x follows from Eq (9), and using power series with some algebraic manipulations to have the following
where
The ordinary moments are useful in evaluating skewness and kurtosis values see Fig 3. The rth incomplete moment of OLINH is expressed as
where
is the lower incomplete gamma function. The incomplete moment is useful in finding Benferroni and Lorenz curves, mean residual life, mean waiting time and other measures.
The moment generating function of OLINH distribution is given by
(12)
3.4 Rényi entropy
Rényi entropy is known as an extension of Shannon entropy, Rényi entropy of order ζ is defined as
Using the OLINH density from Eq (6), and apply the power series with integration techniques and some algebraic simplification Rényi entropy can be written as
(13)
where
Fig 4 shows the Rényi Entropy for some parameter values of OLINH model with different values of ζ. Rényi Entropy has many applications for more information see [20–23]. By this Figure, we note that the Rényi Entropy decreases when the ζ values increases.
3.5 Order statistics
Let x1, …, xn be a sample of size n drawn randomly from a continuous pdf f(x). Suppose x1:n < x2:n < … < xn:n are the related order statistics. If the random sample follow OLINH distribution, then from Eqs (8) and (9) the pdf of the kth order statistics xk:n is given by
where ht+u+1(x) is the exponentiated INH density with power t + u + 1 and
From the above equation we can say that the OLINH order statistics pdf is a represented as a linear combination of the exponentiated INH densities, hence many statistical properties of the ordered statistics can be derived easily from the characteristic of ht+u+1(x).
4 Estimation methods
The estimation problem of the OLINH distribution parameters is studied in this section using: The maximum likelihood estimator (MLE), and the Bayesian estimation based on the squared error loss function.
4.1 Maximum likelihood estimation
Let X1, …, Xn be a random sample from OLINH distribution with parameters α, β, δ and θ. Then the log-likelihood for the OLINH is provided by
(14)
To maximize the log-likelihood equation, we need to take the partial derivatives of l(Ω) with respect to the model parameters α, β, δ and θ and equate them to zero, hence we obtain the following system of nonlinear equations:
(15)
(16)
(17)
and
(18)
where
and
It is possible to obtain the MLE of α (
) explicitly from Eq (15), hence
where
and
, are the MLEs of β, δ and θ respectively, and they are obtained numerically by solving the above system using some techniques such as the Newton-Raphson method, R packages are used for that purpose.
4.2 Bayesian estimation
The Bayesian approach deals with the parameters as random variables with certain prior distribution. The ability to incorporate prior knowledge into research makes the Bayesian method very useful in the survival analysis. One of the main problems associated with survival analysis is the limitation of data availability. For the parameters α, β, δ and θ we suggest gamma distribution as prior functions, therefore the parameters α, β, δ and θ have gamma distributions Gamma(μ1, ν1), Gamma(μ2, ν2), Gamma(μ3, ν3), and Gamma(μ4, ν4) respectively. Hence the independent joint prior density function can be written as follows:
(19)
The joint posterior density function of Ω is calculated using the likelihood function and joint prior function, and is given by
(20)
Based on the squared error loss function, the Bayes estimators of
is:
(21)
It’s worth noting that the integrals offered by Eq (21) can’t be obtained manually. As a result, we use a numerical method called the Markov Chain Monte Carlo (MCMC) method to approximate the integrals value. The MCMC method’s most popular applications are the Metropolis-Hasting (MH) algorithm and the Gibbs sampling. The MH algorithm, like acceptance-rejection sampling, assumes that for each iteration of the process, a selected value from a proposal distribution can be generated. We apply the MH inside Gibbs sampling to create random samples of conditional posterior densities from the OLINH distribution family. The posterior conditional distributions are as follows:
(22)
(23)
(24)
and
(25)
5 Simulation
In this section, the Monte-Carlo simulation process is utilized to compare the conventional estimation methods: MLE and Bayesian estimation method under square error loss function. Simulation analysis is based on MCMC method for estimating the OLINH lifespan distribution’s parameters using R software with 5000 iterations, hence random samples are generated from the OLINH distribution samples, where x represents the OLINH lifetime for various parameter actual values and sample sizes n: (30, 80, and 150). Different real values of the parameters of the OLINH distribution are obtained.
Asymptotic confidence intervals for MLE and the Bayesian credible intervals are obtained, the highest posterior density interval (HPDI) was used for finding the credible intervals. The best estimator method is defined by minimizing estimator’s relative bias (RB), the mean squared error (MSE), and the length of confidence interval (L.CI).
,
and L.CI(Ω) = Upper(Ω)−Lower(Ω)
Tables 1–3 summarize the simulation results of the methods discussed in this paper for point and interval estimation. The RB, MSE, and L.CI values are used to make the essential comparisons between various point and interval estimating methods. The following conclusions are summarized from these tables:
- The RB, MSE, and L.CI decrease as n increases for actual parameters of the OLINH distribution.
- Bayesian estimation is the best estimation method.
- Credible interval of Bayesian estimation by HPDI is the shortest CI of parameters of OLINH distribution.
- For fixed α, β, δ, and sample size, the RB, MSE, and L.CI increase as θ increases.
- For fixed α, δ, θ, and sample size, the RB, MSE, and L.CI increase as β increases, in almost all cases.
- For fixed β, δ, θ, and sample size, the RB, MSE, and L.CI increase as α increases,in almost all cases.
6 Analysis of COVID-19 vaccination
COVID-19 vaccination rate data from 46 different countries in southern Africa is considered, some statistical measures are summarized in Table 4. Our goal is to model these rates by implementing the OLINH distribution to describe their trend and to predict future values of the vaccination rate. For that purpose some goodness of fit measures are used and a comparison between our model and other competitive models are presented in Table 5. The goodness of fit measures are: Kolmogorov-Smirnov statistics (KSS) with P-value (KSP-value), Cramér-von Mises statistics (CVMS), Anderson-Darling statistics (ADS), Akaike information criterion statistics (AICS), Bayesian information criterion statistics (BICS), Hannan-Quinn information criterion statistics (HQICS) and consistent AICS (CAICS).
The considered data belong to 46 Countries in southern Africa, as following: Saint Helena, Nigeria, Seychelles, Democratic Republic of the Congo, Mali, Malawi, Madagascar, Mauritius, South Sudan, Equatorial Guinea, Burkina Faso, Mauritania, Botswana, Cabo Verde, Ethiopia, Guinea-Bissau, Ivoire, Liberia, Algeria, Mozambique, Chad, Gambia, Kenya, Comoros, Guinea, Central African Republic, Congo, Eswatini, Namibia, Benin, Niger, Uganda, United Republic of Tanzania, South Africa, Senegal, Angola, Cameroon, Zambia, Ghana, Rwanda, Zimbabwe, Sierra Leone, Lesotho, Togo, Sao Tome and Principe, and Gabon.
The data represents the rate of persons fully vaccinated per 100 as follows: 0.042, 0.205, 0.285, 0.319, 0.464, 0.550, 0.889, 0.895, 0.939, 0.986, 1.000, 1.088, 1.212, 1.244, 1.450, 1.593, 1.844, 2.039, 2.157, 2.167, 2.334, 2.440, 2.657, 3.685, 3.879, 4.493, 4.800, 4.944, 5.155, 5.674, 7.602, 10.004, 12.238, 12.520, 12.553, 13.063, 15.105, 15.229, 15.629, 15.848, 18.641, 18.940, 29.885, 58.162, 61.838, and 72.286.
Table 5 shows that the OLINH distribution has the least values for all information measures with respect to other distributions. The suggested competitive distributions are: the extended odd Weibull inverse Nadarajah-Haghighi (EOWINH)(Almetwally [24]), exponential Lomax (EL) (El-Bassiouny et al. [25]), Kumaraswamy Weibull (KW) (Cordeiro et al. [26]), Kumaraswamy Inverted Topp-Leone (KITL) (Hassan et al. [10]), odd Weibull inverse Topp-Leone (OWITL) (Almetwally [27]), new exponential-X Fréchet (NEF) (Alzeley et al. [28]), Modified Kies INH (MKINH), and Weibull Lomax (WL) (Tahir et al. [29]). As a result, we conclude that OLINH best suits and fit the COVID-19 vaccination rate data set. Fig 5 shows the OLINH estimated CDF and pdf of the COVID-19 vaccination data. Fig 6 shows the PP-plot, and QQ-plot of fitted OLINH of the COVID-19 vaccination data. The Q-Q and P-P plots in Fig 6 indicate that our distribution is a good fit for modeling the actual data. Fig 7 shows Box plot, TTT plot and estimated hazard with empirical hazard. Fig 8 represents the estimated CDF with empirical CDF for different models of COVID-19 vaccination. Fig 9 shows estimated pdf with probability in histogram for different models of COVID-19 vaccination. The Bayesian estimation method of the OLINH distribution is the best estimation method, according to Table 6. Fig 10 show the estimates values have maximum of log-likelihood values of OLINH distribution. Figs 11 and 12 depicts history plots, estimated marginal posterior density, and MCMC convergence of α, β, δ and θ. Fig 13 show estimated survival and and hazard rate by the MLE and the Bayesian estimation methods.
7 Conclusion
A new Extension of INH and Lomax distributions called OLINH distribution is formulated in this paper. We studied its statistical properties and obtained its pdf as linear representation, quantile function of moments, moment generation functions, and Rényi entropy are also obtained. Point estimation of the OLINH unknown parameters α, β, δ, and θ were considered by the MLE, and the Bayesian estimation methods. Interval estimation of the OLINH parameters α, β, δ, and θ were considered by the MLE asymptotic approximation, and Bayesian credible interval estimation methods. To distinguish the performance of the different estimation methods, a comparison was carried out through Monte-Carlo simulation analysis using the R package. For that reason, the COVID-19 data sets were also considered, and OLINH was shown to match these data better compared to other competitive distributions. Bayesian estimation was better than the MLE for estimating the parameters of OLINH distribution.
Acknowledgments
The authors are grateful to the anonymous referee for a careful checking of the details and for helpful comments that improved this paper.
References
- 1. Tahir MH, Cordeiro GM, Ali S, Dey S, Manzoor A. The inverted Nadarajah–Haghighi distribution: estimation methods and applications. Journal of Statistical Computation and Simulation. 2018 Sep 22;88(14):2775–98.
- 2. Folks JL, Chhikara RS. The inverse Gaussian distribution and its statistical application—a review. Journal of the Royal Statistical Society: Series B (Methodological). 1978 Jul;40(3):263–75.
- 3. Rosaiah K, Kantam RR. Acceptance sampling based on the inverse Rayleigh distribution. Stochastics and Quality Control, 2005, 20(2):277–286.
- 4. De Gusmao FR, Ortega EM, Cordeiro GM. The generalized inverse Weibull distribution. Statistical Papers. 2011 Aug;52(3):591–619.
- 5. Joshi RK, Kumar VI. Lindley inverse Weibull distribution: Theory and Applications. Bull. Math. & Stat. Res. 2020;8(3):32–46.
- 6. Almetwally EM. Extended odd Weibull inverse Rayleigh distribution with application on carbon fibres. Math. Sci. Lett. 2021;10(1):5–14.
- 7. Almetwally E. The new extension of inverse Weibull distribution with applications of medicine data. Scientific Journal for Financial and Commercial Studies and Researches (SJFCSR). 2021 Jan; 2(1):576–597.
- 8. Ramos PL, Mota AL, Ferreira PH, Ramos E, Tomazella VL, Louzada F. Bayesian analysis of the inverse generalized gamma distribution using objective priors. Journal of Statistical Computation and Simulation. 2021 Mar 4;91(4):786–816.
- 9. Almetwally EM, Alharbi R, Alnagar D, Hafez EH. A new inverted topp-leone distribution: applications to the COVID-19 mortality rate in two different countries. Axioms. 2021 Feb 26;10(1):25.
- 10. Hassan AS, Almetwally EM, Ibrahim GM. Kumaraswamy inverted Topp–Leone distribution with applications to COVID-19 data. Computers, Materials, & Continua. 2021, 68(1),: 337–358.
- 11. Basheer AM, Almetwally EM, Okasha HM. Marshall-olkin alpha power inverse Weibull distribution: non bayesian and bayesian estimations. Journal of Statistics Applications & Probability. 2021;10(2):327–45.
- 12. Raffiq G, Dar IS, Haq MA, Ramos E. The Marshall–Olkin inverted Nadarajah–Haghighi distribution: estimation and applications. Annals of Data Science. 2020 Jun 8:1–6.
- 13. Toumaj A, MirMostafaee SM, Hamedani G. The transmuted inverted Nadarajah-Haghighi distribution with an application to lifetime data. Pakistan Journal of Statistics and Operation Research. 2021; 17(2), 451–466.
- 14. Elshahhat A, Rastogi MK. Estimation of parameters of life for an inverted Nadarajah–Haghighi distribution from Type-II progressively censored samples. Journal of the Indian Society for Probability and Statistics. 2021 Jun;22(1):113–54.
- 15. Cordeiro GM, Afify AZ, Ortega EM, Suzuki AK, Mead ME. The odd Lomax generator of distributions: Properties, estimation and applications. Journal of Computational and Applied Mathematics. 2019 Feb 1;347:222–37.
- 16. Ogunsanya AS, Sanni OO, Yahya WB. Exploring some properties of odd Lomax-exponential distribution. Annals of Statistical Theory and Applications (ASTA). 2019 May;1:21–30.
- 17. Yakura BS, Sule AA, Dewu MM, Manju KA, Mohammed FB. Odd Lomax-Kumaraswamy Distribution: Its Properties and Applications. Journal of Scientific Research and Reports. 2020;26(4):45–60.
- 18. Marzouj W, Jamal F, Ahmed AE. The Generalized Odd Lomax Generated Family of Distributions with Applications. Gazi University Journal of Science. 2019 Apr 1;32(2):737–55.
- 19. Abubakari AG, Kandza-Tadi CC, Dimmua RR. Extended Odd Lomax Family of Distributions: Properties and Applications. Statistica. 2020;80(3):331–54.
- 20.
Renner R, Wolf S. Smooth Rényi entropy and applications. InInternational Symposium onInformation Theory, 2004. ISIT 2004. Proceedings. 2004 Jun 27 (p. 233). IEEE.
- 21.
Popescu TD, Aiordachioaie D. Signal segmentation in time-frequency plane using renyi entropy-application in seismic signal processing. In2013 conference on control and fault-tolerant systems (SysTol) 2013 Oct 9 (pp. 312–317). IEEE.
- 22. Hughes MS, Marsh JN, Arbeit JM, Neumann RG, Fuhrhop RW, Wallace KD, et al. Application of Renyi entropy for ultrasonic molecular imaging. The Journal of the Acoustical Society of America. 2009 May;125(5):3141–3145. pmid:19425656
- 23. Liu F, Gao X, Zhao J, Deng Y. Generalized belief entropy and its application in identifying conflict evidence. IEEE Access. 2019 Sep 4;7:126625–126633.
- 24. Almetwally E. M. (2021). Extended Odd Weibull Inverse Nadarajah-Haghighi Distribution with Application on COVID-19 in Saudi Arabia. Mathematical Sciences Letters, 10(3), 1–15.
- 25. El-Bassiouny AH, Abdo NF, Shahen HS. Exponential lomax distribution. International Journal of Computer Applications. 2015 Jan 1;121(13):24–29.
- 26. Cordeiro GM, Ortega EM, Nadarajah S. The Kumaraswamy Weibull distribution with application to failure data. Journal of the Franklin Institute. 2010 Oct 1;347(8):1399–1429.
- 27. Almetwally EM. The odd Weibull inverse topp–leone distribution with applications to COVID-19 data. Annals of Data Science. 2022 Feb;9(1):121–140.
- 28. Alzeley O, Almetwally EM, Gemeay AM, Alshanbari HM, Hafez EH, Abu-Moussa MH. Statistical inference under censored data for the new exponential-X Fréchet distribution: Simulation and application to leukemia data. Computational Intelligence and Neuroscience. 2021 Aug 29;2021. pmid:34497637
- 29. Tahir MH, Cordeiro GM, Mansoor M, ZUBAİR M. The Weibull-Lomax distribution: properties and applications. Hacettepe Journal of Mathematics and Statistics. 2015 Apr 1;44(2):455–474.